8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms #16252

shipilev · 2023-10-18T18:58:58Z

See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all atomic_*.hpp headers for unimplemented PlatformAdd<8> and PlatformXchg<8>, and only these seem to be affected.

Unfortunately, we cannot test these apart from the existing gtest.

Additional testing:

linux-x86-server-fastdebug, atomic tests pass
linux-arm-server-fastdebug, atomic tests pass (with 8318525: Atomic gtest should run as TEST_VM to access VM capabilities #16269 applied)

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms (Enhancement - P4)

Reviewers

Erik Österlund (@fisk - Reviewer) ⚠️ Review applies to 66a65ae3
David Holmes (@dholmes-ora - Reviewer) ⚠️ Review applies to 66a65ae3
Kim Barrett (@kimbarrett - Reviewer)
Volker Simonis (@simonis - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16252/head:pull/16252
$ git checkout pull/16252

Update a local copy of the PR:
$ git checkout pull/16252
$ git pull https://git.openjdk.org/jdk.git pull/16252/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16252

View PR using the GUI difftool:
$ git pr show -t 16252

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16252.diff

Webrev

Link to Webrev Comment

bridgekeeper · 2023-10-18T19:00:03Z

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2023-10-18T19:01:09Z

@shipilev The following label will be automatically applied to this pull request:

hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2023-10-18T19:04:45Z

Webrevs

shipilev · 2023-10-18T19:19:57Z

This is actually pretty generic area.

/label remove hotspot-runtime
/label add hotspot

openjdk · 2023-10-18T19:21:34Z

@shipilev
The hotspot-runtime label was successfully removed.

openjdk · 2023-10-18T19:21:54Z

@shipilev
The hotspot label was successfully added.

fisk

Having monotonic counters tends to require 64 bits. I have pulled my hair out so many times that we can't have atomic monotonic counters because of the lack of 32 bit platform support for it. The fix looks great, thanks for fixing!

openjdk · 2023-10-18T20:59:51Z

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms

Reviewed-by: eosterlund, dholmes, kbarrett, simonis

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 43 new commits pushed to the master branch:

1b15011: 8318476: Add resource consumption note to BigInteger and BigDecimal
5ba9705: 8318485: Narrow klass shift should be zero if encoding range extends to 0x1_0000_0000
8d9a4b4: 8317678: Fix up hashCode() for ZipFile.Source.Key
69c0ae2: 8318124: JFR: Rewrite instrumentation to use Class-File API
c1aeac7: 8318445: More broken bailout chains in C2
d888b26: 8318071: IgnoreUnrecognizedVMOptions flag still causes failure in ArchiveHeapTestClass
bea2d48: 8312475: org.jline.util.PumpReader signed byte problem
9f767aa: 8318109: Writing JFR records while a CHT has taken its lock asserts in rank checking
bd22d23: 8318027: Support alternative name to jdk.internal.vm.compiler
c2efd77: 8295795: hsdis does not build with binutils 2.39+
... and 33 more: https://git.openjdk.org/jdk/compare/b07da3ae15dc820d596484d51d972404fed67fb1...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

dholmes-ora

Looks good! Thanks for fixing. I'm a bit surprised this wasn't done from the start ... makes me wonder how we missed it.

shipilev · 2023-10-19T13:32:49Z

ARM32 comes back with a massive wrinkle: even though PlatformCmpxchg<8> is defined for it, it would assert later, either in reorder_cmpxchg_long_func that checks VM_Version::supports_cx8(). I guess in that case we would need to go a do a lock, like Access API does it with AccessLocker. (I was hoping it does not come to that, but...)

EDIT: Wait, something is off here. I am running on RPi 4 that is definitely ARM v7+, which should support this. I think the gtest does not initialize VM_Version or something...

shipilev · 2023-10-19T15:41:35Z

EDIT: Wait, something is off here. I am running on RPi 4 that is definitely ARM v7+, which should support this. I think the gtest does not initialize VM_Version or something...

Oh, I see, that's the test bug: #16269

The contract for Atomic requires that PlatformCmpxchg<8> is implemented on all platforms:

  // Platform-specific implementation of cmpxchg.  Support for sizes
  // of 1, 4, and 8 are required.  ...
 ...
  template<size_t byte_size> struct PlatformCmpxchg;

ARM32 does not implement this for all types of machines, as per comment here:

/*
 * Atomic long operations on 32-bit ARM
 * ARM v7 supports LDREXD/STREXD synchronization instructions so no problem.
 * ARM < v7 does not have explicit 64 atomic load/store capability.
 * However, gcc emits LDRD/STRD instructions on v5te and LDM/STM on v5t
 * when loading/storing 64 bits.
 * For non-MP machines (which is all we support for ARM < v7)
 * under current Linux distros these instructions appear atomic.
 * See section A3.5.3 of ARM Architecture Reference Manual for ARM v7.
 * Also, for cmpxchg64, if ARM < v7 we check for cmpxchg64 support in the
 * Linux kernel using _kuser_helper_version. See entry-armv.S in the Linux
 * kernel source or kernel_user_helpers.txt in Linux Doc.
 */

I guess we still have a problem for ARM < v7 without kernel support, which would either assert in atomic_linux_arm.hpp, or stop later in the stub. I would argue that is a violation of Atomic contract that requires implementing PlatformCmpxchg<8> one way or the other.

Given how even current direct uses of PlatformCmpxchg<8> in those configs would fail, I don't see this PR introduces regressions: using PlatformXchg<8> and PlatformAdd<8> would break on those platform either before or after this change, and thus we can proceed even with this change. (I would pull in TEST_VM change once committed.)

Agree, @dholmes-ora?

dean-long · 2023-10-19T19:33:14Z

I guess on arm32 without 64-bit atomic support, the gtest operations on int64 would crash, because the _LP64 guard was removed.
It looks like C2 already requires 64-bit atomic support, so older arm32 without that would not be able to run C2.
For those older arm32, the atomic support could use AccessLocker I guess, but I think C2 would still fail unless it was changed to also use AccessLocker.

dholmes-ora · 2023-10-20T06:09:57Z

I'm afraid I've lost track of the history here. We had supports_cx8 to deal with exactly this problem. If a platform could not support 64-bit cmpxchg (and thus implicitly any 64-bit r-m-w operation) then it has to fall back to locking.

shipilev · 2023-10-20T12:00:50Z

All right, hear me out. I am trying to briefly recap where we are and where we are going from here.

From the perspective of caller, the Atomic contract is ambiguous. On one hand, it says to check for supports_cx8():

  // Atomic operations on int64 types are not available on all 32-bit
  // platforms. If atomic ops on int64 are defined here they must only
  // be used from code that verifies they are available at runtime and
  // can provide an alternative action if not - see supports_cx8() for
  // a means to test availability.

...and on the other hand, it requires platforms to implement 8-byte CAS:

  // Platform-specific implementation of cmpxchg.  Support for sizes
  // of 1, 4, and 8 are required.
  ...
  template<size_t byte_size> struct PlatformCmpxchg;

This PR implicitly relies on second assumption. I think the only platform left that needs the supports_cx8() is ARM <= 6. The fact that current ARM code does not do consistent support for 64-bit atomics for all platforms is regrettable. To recap, while ARM 7 works fine, the situation with ARM <=6 is more complicated:

Before this PR:

Atomic unit test would pass
Attempt to use 8-byte cmpxchg would fail at runtime (either assert or stop in stub)
Attempt to use 8-byte add or xchg would fail to link

After this PR:

Atomic unit test would pass (now that it checks for supports_cx8())
Attempt to use 8-byte cmpxchg would fail at runtime (either assert or stop in stub)
Attempt to use 8-byte add or xchg would fail at runtime (either assert or stop in stub)

So, I don't think this qualifies as regression for ARM <= 6: we are trading the link-time failure for all ARM platforms for clean runtime failure on ARM <= 6 platform, if we violate the supports_cx8() contract. What we gain with this PR, is that we are able to do 64-bit atomics when supports_cx8() is actually true on x86_32 and ARM v7, even if we don't check it.

I dabbled a little in trying to implement 8-byte CAS (locked) implementation for ARM <= 6, and I think it would require some rewrites. There are cases, for example, when we are doing creepy non-atomic stuff when !os:is_MP if no hardware support is available. If we go locking route, I think we would also need to check if the atomic load/stores need to be covered by the same lock.

Once/if we fix ARM <= 6, that would eliminate the need for supports_cx8() altogether, which we can then phase out completely. And I think that is a good thing, because frankly it should not be an Atomic caller responsibility to handle obscure platforms in 2023, and as we see from current uses, it is not applied consistently in current code anyway.

Does all this make sense? Am I missing something? Should we proceed with this change?

dholmes-ora · 2023-10-23T02:42:04Z

...and on the other hand, it requires platforms to implement 8-byte CAS:

Surely the supports_cx8 issue must have been raised when the template atomics were put in place. I need to go back and see what was said then, to see how we got in this mess.

I can accept the current proposal doesn't make the mess worse, it just shone a light on the fact there is a mess. If ARMv6 is broken then maybe it needs to be dropped? (Not in this PR of course.)

dholmes-ora · 2023-10-23T04:57:30Z

From a related discussion in JDK-8246770 I was under the impression that the kernel helper would be used to provide this support on ARMv6 - see code in ./cpu/arm/vm_version_arm_32.cpp.

dholmes-ora · 2023-10-23T05:40:22Z

Okay part of the (my) confusion here is that supports_cx8 is mainly intended for protecting Java long/double variables - see the descriptions and mechanics in ./share/oops/accessBackend.*. For C++ atomic ops we've typically steered the VM code away from needing 64-bit support if it doesn't natively exist as all accesses to such variables needs to work in with whatever locking scheme is used.

shipilev · 2023-10-23T09:10:50Z

Yes, it is confusing. Atomic docs still mention supports_cx8 as the pre-check for using 64-bit atomics, which gets extra confusing if it was only intended to serve as availability check for Java accesses:

jdk/src/hotspot/share/runtime/atomic.hpp

Lines 58 to 62 in 729f4c5

    
           // Atomic operations on int64 types are not available on all 32-bit 
        
           // platforms. If atomic ops on int64 are defined here they must only 
        
           // be used from code that verifies they are available at runtime and 
        
           // can provide an alternative action if not - see supports_cx8() for 
        
           // a means to test availability.

So I would like to go forward with this PR. The only question that remains for me is whether to protect the gtest cases with supports_cx8 (satisfying the Atomic contract), or drop the checks there, thus giving the early warning for platform maintainers that they are actually expected to implement 64-bit atomics regardless.

fisk · 2023-10-23T09:40:47Z

Yes, it is confusing. Atomic docs still mention supports_cx8 as the pre-check for using 64-bit atomics, which gets extra confusing if it was only intended to serve as availability check for Java accesses:

jdk/src/hotspot/share/runtime/atomic.hpp

Lines 58 to 62 in 729f4c5

// Atomic operations on int64 types are not available on all 32-bit

// platforms. If atomic ops on int64 are defined here they must only

// be used from code that verifies they are available at runtime and

// can provide an alternative action if not - see supports_cx8() for

// a means to test availability.

So I would like to go forward with this PR. The only question that remains for me is whether to protect the gtest cases with supports_cx8 (satisfying the Atomic contract), or drop the checks there, thus giving the early warning for platform maintainers that they are actually expected to implement 64-bit atomics regardless.

It would be nice if Atomic used a PlatformMutex or something internally, for platforms that don't have supports_cx8, so we could just assume that this stuff works across all platforms, just maybe "a bit" slower on some platforms. If we wanted to be really fancy, we could have an array that is say 8 * number of CPUs large, where addresses are hashed to a PlatformMutex to reduce contention a bit. But perhaps that's more than you bargained for with this PR!

kimbarrett

Throughout, Atomic::PlatformMumble is intended to be used in the implementation of
Atomic::Mumble, and isn't intended to be used elsewhere. I think intent is even adhered
to currently. This proposed change violates that in a bunch of places.

simonis · 2023-10-23T10:38:44Z

Throughout, Atomic::PlatformMumble is intended to be used in the implementation of Atomic::Mumble, and isn't intended to be used elsewhere. I think intent is even adhered to currently. This proposed change violates that in a bunch of places.

@kimbarrett, do you suggest to use Atomic::cmpxchg() instead of PlatformCmpxchg() in Atomic::XchgUsingCmpxchg<byte_size>::operator() and Atomic::AddUsingCmpxchg::fetch_then_add()? I think that would be reasonable. Otherwise I don't understand your comment as this PR follows the same pattern you've used when introducing atomic bitset functions. Or am I missing something?

kimbarrett · 2023-10-23T10:48:55Z

Throughout, Atomic::PlatformMumble is intended to be used in the implementation of Atomic::Mumble, and isn't intended to be used elsewhere. I think intent is even adhered to currently. This proposed change violates that in a bunch of places.

@kimbarrett, do you suggest to use Atomic::cmpxchg() instead of PlatformCmpxchg() in Atomic::XchgUsingCmpxchg<byte_size>::operator() and Atomic::AddUsingCmpxchg::fetch_then_add()? I think that would be reasonable. Otherwise I don't understand your comment as this PR follows the same pattern you've used when introducing atomic bitset functions. Or am I missing something?

Yes, and Atomic::load instead of PlatformLoad. So actually follow the model of the atomic bitset functions.

shipilev · 2023-10-23T14:26:00Z

It would be nice if Atomic used a PlatformMutex or something internally, for platforms that don't have supports_cx8, so we could just assume that this stuff works across all platforms, just maybe "a bit" slower on some platforms.

Yes, providing the locked implementation was my initial intent as well, but now that I looked into it, it seems to be too much hassle for little, if any, gain. In other words, we can already assume 8-byte atomics work on all platforms we build for. Even Zero -- which runs on obscurest platforms imaginable -- goes into GCC built-ins for PlatformCmpxchg<8>, PlatformXchg<8> and PlatformAdd<8>! So there is already a better path than locking.

Note that I only found ARMv6 "issue" due to a test bug. PlatformXchg<8> is actually implemented for ARM v6 with the help of kernel helpers. So if we do locking/gcc-builtins implementation for the case where kernel helpers are not available, it would require some massaging for how ARM32 platform atomics are arranged. From which I hope to weasel out in this PR :)

shipilev · 2023-10-23T14:27:15Z

Yes, and Atomic::load instead of PlatformLoad. So actually follow the model of the atomic bitset functions.

Right. Done in new commit. The previous version was from the time I did these fallbacks straight in platform atomic files. But in the shared code, we should definitely go for the non-platform methods. I am re-running tests, but expect no trouble.

simonis

Thanks, looks good to me.

kimbarrett

Do I understand correctly that all platforms support cas64 other than arm32,
and even it does so if using a non-ancient hardware version or the kernel
helper? If so, I'd be inclined to just say it must always be supported and be
done with the supports_cx8 stuff around the Atomic class. So update the
comments for Atomic accordingly, and remove the supports_cx8 checks in the
tests. But I'm okay with the current version and leaving that for followup.

Regarding the comment "inconsistency" for Atomic, I read those as being a
general comment about 64bit operations not necessarily being supported, but
that cas64 is an exception and must be supported. I'm guessing we didn't do
what's in this PR and provide cas64-based implementations of other operations
due to also needing load64/store64, though I suppose one could even implement
those with cas64, horrid as that might be. The templatizing effort just
maintained the status quo in this regard.

shipilev · 2023-10-24T06:46:19Z

Do I understand correctly that all platforms support cas64 other than arm32, and even it does so if using a non-ancient hardware version or the kernel helper?

That is my understanding after doing this work, yes.

dholmes-ora · 2023-10-24T09:49:38Z

Note you cannot just provide a lock-based implementation of cmpxchg8 and think all is well with the world. All accesses to variables that could be updated by such a cmpxchg8 would have to be accessed in a way that is consistent/safe with the use of the lock (e.g. a raw read could see a partial update if done concurrently with a locked-cmpxchg8). Even on the Java side (via Unsafe) this is not fully in place but that issue is recognized and documented.

dholmes-ora · 2023-10-24T09:52:07Z

I'm tempted to say the VM should simply fail to start if supports_cx8 is not true - the kernel helper has been around for many years for ARMv6 so I can't imagine there would be platforms in use now that don't support it.

shipilev · 2023-10-24T09:58:04Z

I'm tempted to say the VM should simply fail to start if supports_cx8 is not true - the kernel helper has been around for many years for ARMv6 so I can't imagine there would be platforms in use now that don't support it.

Same. That would also remove any need for lockers in Access API, AFAICS. Let's try that in a separate PR and see what happens?

shipilev · 2023-10-24T14:36:17Z

If there are no other comments/complaints, I am going to integrate this soon. Last call!

dholmes-ora · 2023-10-25T04:08:30Z

Okay. I will file a RFE to make supports_cx8 a requirement and open a discussion about that.

shipilev · 2023-10-25T08:29:29Z

Thanks all!

/integrate

openjdk · 2023-10-25T08:30:02Z

Going to push as commit ba7d08b.
Since your change was applied there have been 74 commits pushed to the master branch:

d7205e6: 8318102: macos10.14 check in CSystemColors can be removed.
5ce718e: 8318100: Remove redundant check for Metal support
f1dfdc1: 8311877: [macos] Add CLI options to provide signing identity directly to codesign and productbuild
9c819fd: 8318051: Duration.between uses exceptions for control flow
1ddf826: 8316964: Security tools should not call System.exit
1f2a80b: 8318306: java/util/Arrays/Sorting.java fails with "Array is not sorted at 8228-th position: 8251.0 and 8153.0"
1165037: 8318569: Add getter methods for Locale and Patterns in ListFormat
6f35274: 8318702: G1: Fix nonstandard indentation in g1HeapTransition.cpp
e272098: 8318160: javac does not reject private method reference with type-variable receiver
54c613a: 8318693: Fix rendering for code blocks nested under list items in building.md
... and 64 more: https://git.openjdk.org/jdk/compare/b07da3ae15dc820d596484d51d972404fed67fb1...master

Your commit was automatically rebased without conflicts.

openjdk · 2023-10-25T08:30:12Z

@shipilev Pushed as commit ba7d08b.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Work

Verified

This commit was signed with the committer’s verified signature.

renovate-bot Mend Renovate

GPG key ID: A1020E7F5FFD673D

Verified
Learn about vigilant mode

66a65ae

shipilev mentioned this pull request Oct 18, 2023

8315149: Add hsperf counters for CPU time of internal GC threads #15082

Closed

3 tasks

openjdk bot added the rfr label Oct 18, 2023

openjdk bot added the hotspot-runtime label Oct 18, 2023

openjdk bot removed the hotspot-runtime label Oct 18, 2023

openjdk bot added the hotspot label Oct 18, 2023

fisk approved these changes Oct 18, 2023

View reviewed changes

openjdk bot added the ready label Oct 18, 2023

dholmes-ora approved these changes Oct 19, 2023

View reviewed changes

shipilev added 2 commits October 20, 2023 12:50

Merge branch 'master' into JDK-8316961-fallback-atomics

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

f2c523a

Protect 64-bit tests with supports_cx8()

e7677da

kimbarrett suggested changes Oct 23, 2023

View reviewed changes

Use public methods instead of Platform*

bbc6420

simonis approved these changes Oct 23, 2023

View reviewed changes

kimbarrett approved these changes Oct 24, 2023

View reviewed changes

openjdk bot added the integrated label Oct 25, 2023

openjdk bot closed this Oct 25, 2023

openjdk bot removed ready rfr labels Oct 25, 2023

shipilev deleted the JDK-8316961-fallback-atomics branch November 28, 2023 14:54

8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms #16252

8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms #16252

Conversation

shipilev commented Oct 18, 2023 • edited by openjdk bot Loading

Progress

Issue

Reviewers

Reviewing

Webrev

bridgekeeper bot commented Oct 18, 2023

openjdk bot commented Oct 18, 2023

mlbridge bot commented Oct 18, 2023 • edited Loading

Webrevs

shipilev commented Oct 18, 2023

openjdk bot commented Oct 18, 2023

openjdk bot commented Oct 18, 2023

fisk left a comment

Choose a reason for hiding this comment

openjdk bot commented Oct 18, 2023 • edited Loading

dholmes-ora left a comment

Choose a reason for hiding this comment

shipilev commented Oct 19, 2023 • edited Loading

shipilev commented Oct 19, 2023 • edited Loading

dean-long commented Oct 19, 2023

dholmes-ora commented Oct 20, 2023

shipilev commented Oct 20, 2023 • edited Loading

dholmes-ora commented Oct 23, 2023

dholmes-ora commented Oct 23, 2023

dholmes-ora commented Oct 23, 2023

shipilev commented Oct 23, 2023

fisk commented Oct 23, 2023

kimbarrett left a comment

Choose a reason for hiding this comment

simonis commented Oct 23, 2023

kimbarrett commented Oct 23, 2023

shipilev commented Oct 23, 2023

shipilev commented Oct 23, 2023

simonis left a comment

Choose a reason for hiding this comment

kimbarrett left a comment

Choose a reason for hiding this comment

shipilev commented Oct 24, 2023

dholmes-ora commented Oct 24, 2023

dholmes-ora commented Oct 24, 2023

shipilev commented Oct 24, 2023

shipilev commented Oct 24, 2023

dholmes-ora commented Oct 25, 2023

shipilev commented Oct 25, 2023

openjdk bot commented Oct 25, 2023

openjdk bot commented Oct 25, 2023

shipilev commented Oct 18, 2023 •

edited by openjdk bot

Loading

mlbridge bot commented Oct 18, 2023 •

edited

Loading

openjdk bot commented Oct 18, 2023 •

edited

Loading

shipilev commented Oct 19, 2023 •

edited

Loading

shipilev commented Oct 19, 2023 •

edited

Loading

shipilev commented Oct 20, 2023 •

edited

Loading