Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms #16252

Closed

Conversation

shipilev
Copy link
Member

@shipilev shipilev commented Oct 18, 2023

See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all atomic_*.hpp headers for unimplemented PlatformAdd<8> and PlatformXchg<8>, and only these seem to be affected.

Unfortunately, we cannot test these apart from the existing gtest.

Additional testing:


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16252/head:pull/16252
$ git checkout pull/16252

Update a local copy of the PR:
$ git checkout pull/16252
$ git pull https://git.openjdk.org/jdk.git pull/16252/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16252

View PR using the GUI difftool:
$ git pr show -t 16252

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16252.diff

Webrev

Link to Webrev Comment

Sorry, something went wrong.

Verified

This commit was signed with the committer’s verified signature.
renovate-bot Mend Renovate
@bridgekeeper
Copy link

bridgekeeper bot commented Oct 18, 2023

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 18, 2023
@openjdk
Copy link

openjdk bot commented Oct 18, 2023

@shipilev The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Oct 18, 2023
@mlbridge
Copy link

mlbridge bot commented Oct 18, 2023

Webrevs

@shipilev
Copy link
Member Author

This is actually pretty generic area.

/label remove hotspot-runtime
/label add hotspot

@openjdk openjdk bot removed the hotspot-runtime hotspot-runtime-dev@openjdk.org label Oct 18, 2023
@openjdk
Copy link

openjdk bot commented Oct 18, 2023

@shipilev
The hotspot-runtime label was successfully removed.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Oct 18, 2023
@openjdk
Copy link

openjdk bot commented Oct 18, 2023

@shipilev
The hotspot label was successfully added.

Copy link
Contributor

@fisk fisk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having monotonic counters tends to require 64 bits. I have pulled my hair out so many times that we can't have atomic monotonic counters because of the lack of 32 bit platform support for it. The fix looks great, thanks for fixing!

@openjdk
Copy link

openjdk bot commented Oct 18, 2023

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms

Reviewed-by: eosterlund, dholmes, kbarrett, simonis

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 43 new commits pushed to the master branch:

  • 1b15011: 8318476: Add resource consumption note to BigInteger and BigDecimal
  • 5ba9705: 8318485: Narrow klass shift should be zero if encoding range extends to 0x1_0000_0000
  • 8d9a4b4: 8317678: Fix up hashCode() for ZipFile.Source.Key
  • 69c0ae2: 8318124: JFR: Rewrite instrumentation to use Class-File API
  • c1aeac7: 8318445: More broken bailout chains in C2
  • d888b26: 8318071: IgnoreUnrecognizedVMOptions flag still causes failure in ArchiveHeapTestClass
  • bea2d48: 8312475: org.jline.util.PumpReader signed byte problem
  • 9f767aa: 8318109: Writing JFR records while a CHT has taken its lock asserts in rank checking
  • bd22d23: 8318027: Support alternative name to jdk.internal.vm.compiler
  • c2efd77: 8295795: hsdis does not build with binutils 2.39+
  • ... and 33 more: https://git.openjdk.org/jdk/compare/b07da3ae15dc820d596484d51d972404fed67fb1...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 18, 2023
Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks for fixing. I'm a bit surprised this wasn't done from the start ... makes me wonder how we missed it.

@shipilev
Copy link
Member Author

shipilev commented Oct 19, 2023

ARM32 comes back with a massive wrinkle: even though PlatformCmpxchg<8> is defined for it, it would assert later, either in reorder_cmpxchg_long_func that checks VM_Version::supports_cx8(). I guess in that case we would need to go a do a lock, like Access API does it with AccessLocker. (I was hoping it does not come to that, but...)

EDIT: Wait, something is off here. I am running on RPi 4 that is definitely ARM v7+, which should support this. I think the gtest does not initialize VM_Version or something...

@shipilev
Copy link
Member Author

shipilev commented Oct 19, 2023

EDIT: Wait, something is off here. I am running on RPi 4 that is definitely ARM v7+, which should support this. I think the gtest does not initialize VM_Version or something...

Oh, I see, that's the test bug: #16269

The contract for Atomic requires that PlatformCmpxchg<8> is implemented on all platforms:

  // Platform-specific implementation of cmpxchg.  Support for sizes
  // of 1, 4, and 8 are required.  ...
 ...
  template<size_t byte_size> struct PlatformCmpxchg;

ARM32 does not implement this for all types of machines, as per comment here:

/*
 * Atomic long operations on 32-bit ARM
 * ARM v7 supports LDREXD/STREXD synchronization instructions so no problem.
 * ARM < v7 does not have explicit 64 atomic load/store capability.
 * However, gcc emits LDRD/STRD instructions on v5te and LDM/STM on v5t
 * when loading/storing 64 bits.
 * For non-MP machines (which is all we support for ARM < v7)
 * under current Linux distros these instructions appear atomic.
 * See section A3.5.3 of ARM Architecture Reference Manual for ARM v7.
 * Also, for cmpxchg64, if ARM < v7 we check for cmpxchg64 support in the
 * Linux kernel using _kuser_helper_version. See entry-armv.S in the Linux
 * kernel source or kernel_user_helpers.txt in Linux Doc.
 */

I guess we still have a problem for ARM < v7 without kernel support, which would either assert in atomic_linux_arm.hpp, or stop later in the stub. I would argue that is a violation of Atomic contract that requires implementing PlatformCmpxchg<8> one way or the other.

Given how even current direct uses of PlatformCmpxchg<8> in those configs would fail, I don't see this PR introduces regressions: using PlatformXchg<8> and PlatformAdd<8> would break on those platform either before or after this change, and thus we can proceed even with this change. (I would pull in TEST_VM change once committed.)

Agree, @dholmes-ora?

@dean-long
Copy link
Member

I guess on arm32 without 64-bit atomic support, the gtest operations on int64 would crash, because the _LP64 guard was removed.
It looks like C2 already requires 64-bit atomic support, so older arm32 without that would not be able to run C2.
For those older arm32, the atomic support could use AccessLocker I guess, but I think C2 would still fail unless it was changed to also use AccessLocker.

@dholmes-ora
Copy link
Member

I'm afraid I've lost track of the history here. We had supports_cx8 to deal with exactly this problem. If a platform could not support 64-bit cmpxchg (and thus implicitly any 64-bit r-m-w operation) then it has to fall back to locking.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@shipilev
Copy link
Member Author

shipilev commented Oct 20, 2023

All right, hear me out. I am trying to briefly recap where we are and where we are going from here.

From the perspective of caller, the Atomic contract is ambiguous. On one hand, it says to check for supports_cx8():

  // Atomic operations on int64 types are not available on all 32-bit
  // platforms. If atomic ops on int64 are defined here they must only
  // be used from code that verifies they are available at runtime and
  // can provide an alternative action if not - see supports_cx8() for
  // a means to test availability.

...and on the other hand, it requires platforms to implement 8-byte CAS:

  // Platform-specific implementation of cmpxchg.  Support for sizes
  // of 1, 4, and 8 are required.
  ...
  template<size_t byte_size> struct PlatformCmpxchg;

This PR implicitly relies on second assumption. I think the only platform left that needs the supports_cx8() is ARM <= 6. The fact that current ARM code does not do consistent support for 64-bit atomics for all platforms is regrettable. To recap, while ARM 7 works fine, the situation with ARM <=6 is more complicated:

Before this PR:

  • Atomic unit test would pass
  • Attempt to use 8-byte cmpxchg would fail at runtime (either assert or stop in stub)
  • Attempt to use 8-byte add or xchg would fail to link

After this PR:

  • Atomic unit test would pass (now that it checks for supports_cx8())
  • Attempt to use 8-byte cmpxchg would fail at runtime (either assert or stop in stub)
  • Attempt to use 8-byte add or xchg would fail at runtime (either assert or stop in stub)

So, I don't think this qualifies as regression for ARM <= 6: we are trading the link-time failure for all ARM platforms for clean runtime failure on ARM <= 6 platform, if we violate the supports_cx8() contract. What we gain with this PR, is that we are able to do 64-bit atomics when supports_cx8() is actually true on x86_32 and ARM v7, even if we don't check it.

I dabbled a little in trying to implement 8-byte CAS (locked) implementation for ARM <= 6, and I think it would require some rewrites. There are cases, for example, when we are doing creepy non-atomic stuff when !os:is_MP if no hardware support is available. If we go locking route, I think we would also need to check if the atomic load/stores need to be covered by the same lock.

Once/if we fix ARM <= 6, that would eliminate the need for supports_cx8() altogether, which we can then phase out completely. And I think that is a good thing, because frankly it should not be an Atomic caller responsibility to handle obscure platforms in 2023, and as we see from current uses, it is not applied consistently in current code anyway.

Does all this make sense? Am I missing something? Should we proceed with this change?

@dholmes-ora
Copy link
Member

...and on the other hand, it requires platforms to implement 8-byte CAS:

Surely the supports_cx8 issue must have been raised when the template atomics were put in place. I need to go back and see what was said then, to see how we got in this mess.

I can accept the current proposal doesn't make the mess worse, it just shone a light on the fact there is a mess. If ARMv6 is broken then maybe it needs to be dropped? (Not in this PR of course.)

@dholmes-ora
Copy link
Member

From a related discussion in JDK-8246770 I was under the impression that the kernel helper would be used to provide this support on ARMv6 - see code in ./cpu/arm/vm_version_arm_32.cpp.

@dholmes-ora
Copy link
Member

Okay part of the (my) confusion here is that supports_cx8 is mainly intended for protecting Java long/double variables - see the descriptions and mechanics in ./share/oops/accessBackend.*. For C++ atomic ops we've typically steered the VM code away from needing 64-bit support if it doesn't natively exist as all accesses to such variables needs to work in with whatever locking scheme is used.

@shipilev
Copy link
Member Author

Yes, it is confusing. Atomic docs still mention supports_cx8 as the pre-check for using 64-bit atomics, which gets extra confusing if it was only intended to serve as availability check for Java accesses:

// Atomic operations on int64 types are not available on all 32-bit
// platforms. If atomic ops on int64 are defined here they must only
// be used from code that verifies they are available at runtime and
// can provide an alternative action if not - see supports_cx8() for
// a means to test availability.

So I would like to go forward with this PR. The only question that remains for me is whether to protect the gtest cases with supports_cx8 (satisfying the Atomic contract), or drop the checks there, thus giving the early warning for platform maintainers that they are actually expected to implement 64-bit atomics regardless.

@fisk
Copy link
Contributor

fisk commented Oct 23, 2023

Yes, it is confusing. Atomic docs still mention supports_cx8 as the pre-check for using 64-bit atomics, which gets extra confusing if it was only intended to serve as availability check for Java accesses:

// Atomic operations on int64 types are not available on all 32-bit
// platforms. If atomic ops on int64 are defined here they must only
// be used from code that verifies they are available at runtime and
// can provide an alternative action if not - see supports_cx8() for
// a means to test availability.

So I would like to go forward with this PR. The only question that remains for me is whether to protect the gtest cases with supports_cx8 (satisfying the Atomic contract), or drop the checks there, thus giving the early warning for platform maintainers that they are actually expected to implement 64-bit atomics regardless.

It would be nice if Atomic used a PlatformMutex or something internally, for platforms that don't have supports_cx8, so we could just assume that this stuff works across all platforms, just maybe "a bit" slower on some platforms. If we wanted to be really fancy, we could have an array that is say 8 * number of CPUs large, where addresses are hashed to a PlatformMutex to reduce contention a bit. But perhaps that's more than you bargained for with this PR!

Copy link

@kimbarrett kimbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throughout, Atomic::PlatformMumble is intended to be used in the implementation of
Atomic::Mumble, and isn't intended to be used elsewhere. I think intent is even adhered
to currently. This proposed change violates that in a bunch of places.

@simonis
Copy link
Member

simonis commented Oct 23, 2023

Throughout, Atomic::PlatformMumble is intended to be used in the implementation of Atomic::Mumble, and isn't intended to be used elsewhere. I think intent is even adhered to currently. This proposed change violates that in a bunch of places.

@kimbarrett, do you suggest to use Atomic::cmpxchg() instead of PlatformCmpxchg() in Atomic::XchgUsingCmpxchg<byte_size>::operator() and Atomic::AddUsingCmpxchg::fetch_then_add()? I think that would be reasonable. Otherwise I don't understand your comment as this PR follows the same pattern you've used when introducing atomic bitset functions. Or am I missing something?

@kimbarrett
Copy link

Throughout, Atomic::PlatformMumble is intended to be used in the implementation of Atomic::Mumble, and isn't intended to be used elsewhere. I think intent is even adhered to currently. This proposed change violates that in a bunch of places.

@kimbarrett, do you suggest to use Atomic::cmpxchg() instead of PlatformCmpxchg() in Atomic::XchgUsingCmpxchg<byte_size>::operator() and Atomic::AddUsingCmpxchg::fetch_then_add()? I think that would be reasonable. Otherwise I don't understand your comment as this PR follows the same pattern you've used when introducing atomic bitset functions. Or am I missing something?

Yes, and Atomic::load instead of PlatformLoad. So actually follow the model of the atomic bitset functions.

@shipilev
Copy link
Member Author

It would be nice if Atomic used a PlatformMutex or something internally, for platforms that don't have supports_cx8, so we could just assume that this stuff works across all platforms, just maybe "a bit" slower on some platforms.

Yes, providing the locked implementation was my initial intent as well, but now that I looked into it, it seems to be too much hassle for little, if any, gain. In other words, we can already assume 8-byte atomics work on all platforms we build for. Even Zero -- which runs on obscurest platforms imaginable -- goes into GCC built-ins for PlatformCmpxchg<8>, PlatformXchg<8> and PlatformAdd<8>! So there is already a better path than locking.

Note that I only found ARMv6 "issue" due to a test bug. PlatformXchg<8> is actually implemented for ARM v6 with the help of kernel helpers. So if we do locking/gcc-builtins implementation for the case where kernel helpers are not available, it would require some massaging for how ARM32 platform atomics are arranged. From which I hope to weasel out in this PR :)

@shipilev
Copy link
Member Author

Yes, and Atomic::load instead of PlatformLoad. So actually follow the model of the atomic bitset functions.

Right. Done in new commit. The previous version was from the time I did these fallbacks straight in platform atomic files. But in the shared code, we should definitely go for the non-platform methods. I am re-running tests, but expect no trouble.

Copy link
Member

@simonis simonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me.

Copy link

@kimbarrett kimbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that all platforms support cas64 other than arm32,
and even it does so if using a non-ancient hardware version or the kernel
helper? If so, I'd be inclined to just say it must always be supported and be
done with the supports_cx8 stuff around the Atomic class. So update the
comments for Atomic accordingly, and remove the supports_cx8 checks in the
tests. But I'm okay with the current version and leaving that for followup.

Regarding the comment "inconsistency" for Atomic, I read those as being a
general comment about 64bit operations not necessarily being supported, but
that cas64 is an exception and must be supported. I'm guessing we didn't do
what's in this PR and provide cas64-based implementations of other operations
due to also needing load64/store64, though I suppose one could even implement
those with cas64, horrid as that might be. The templatizing effort just
maintained the status quo in this regard.

@shipilev
Copy link
Member Author

Do I understand correctly that all platforms support cas64 other than arm32, and even it does so if using a non-ancient hardware version or the kernel helper?

That is my understanding after doing this work, yes.

@dholmes-ora
Copy link
Member

Note you cannot just provide a lock-based implementation of cmpxchg8 and think all is well with the world. All accesses to variables that could be updated by such a cmpxchg8 would have to be accessed in a way that is consistent/safe with the use of the lock (e.g. a raw read could see a partial update if done concurrently with a locked-cmpxchg8). Even on the Java side (via Unsafe) this is not fully in place but that issue is recognized and documented.

@dholmes-ora
Copy link
Member

I'm tempted to say the VM should simply fail to start if supports_cx8 is not true - the kernel helper has been around for many years for ARMv6 so I can't imagine there would be platforms in use now that don't support it.

@shipilev
Copy link
Member Author

I'm tempted to say the VM should simply fail to start if supports_cx8 is not true - the kernel helper has been around for many years for ARMv6 so I can't imagine there would be platforms in use now that don't support it.

Same. That would also remove any need for lockers in Access API, AFAICS. Let's try that in a separate PR and see what happens?

@shipilev
Copy link
Member Author

If there are no other comments/complaints, I am going to integrate this soon. Last call!

@dholmes-ora
Copy link
Member

Okay. I will file a RFE to make supports_cx8 a requirement and open a discussion about that.

@shipilev
Copy link
Member Author

Thanks all!

/integrate

@openjdk
Copy link

openjdk bot commented Oct 25, 2023

Going to push as commit ba7d08b.
Since your change was applied there have been 74 commits pushed to the master branch:

  • d7205e6: 8318102: macos10.14 check in CSystemColors can be removed.
  • 5ce718e: 8318100: Remove redundant check for Metal support
  • f1dfdc1: 8311877: [macos] Add CLI options to provide signing identity directly to codesign and productbuild
  • 9c819fd: 8318051: Duration.between uses exceptions for control flow
  • 1ddf826: 8316964: Security tools should not call System.exit
  • 1f2a80b: 8318306: java/util/Arrays/Sorting.java fails with "Array is not sorted at 8228-th position: 8251.0 and 8153.0"
  • 1165037: 8318569: Add getter methods for Locale and Patterns in ListFormat
  • 6f35274: 8318702: G1: Fix nonstandard indentation in g1HeapTransition.cpp
  • e272098: 8318160: javac does not reject private method reference with type-variable receiver
  • 54c613a: 8318693: Fix rendering for code blocks nested under list items in building.md
  • ... and 64 more: https://git.openjdk.org/jdk/compare/b07da3ae15dc820d596484d51d972404fed67fb1...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 25, 2023
@openjdk openjdk bot closed this Oct 25, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 25, 2023
@openjdk
Copy link

openjdk bot commented Oct 25, 2023

@shipilev Pushed as commit ba7d08b.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@shipilev shipilev deleted the JDK-8316961-fallback-atomics branch November 28, 2023 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

None yet

6 participants