-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8283276: java/io/ObjectStreamClass/ObjectStreamClassCaching.java fails with various GCs #9533
Conversation
👋 Welcome back shade! A progress list of the required criteria for merging this PR into |
Still waiting for revies. @rkennke, maybe? :) |
WhiteBox has a number of methods that allow better control over GC, e.g. fullGC(), concurrentGCRunToIdle(), etc. Would that help? See for example: test/hotspot/jtreg/gc/TestReferenceRefersTo.java |
I think the test relies on assumption that |
Well, WB GC control can run concurrent GC, and wait until certain control points are reached or conc cycle is finished. An example that does that and checks referents get cleared is: test/hotspot/jtreg/gc/TestReferenceClearDuringReferenceProcessing.java TBH, not certain how GC policies play into that, and how well WB is supported in each GC. Might be worth trying? Other that that, how does removinch System.gc() help make the test more reliable? |
When you call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, then. Thanks!
@shipilev This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 91 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Thanks! Any other reviews, please? |
Just a note that the failing test is not about checking that cache is cleared after plain System.gc(), but about checking that cache is NOT cleared after System.gc() when there was no real memory pressure. The cache uses SoftReference(s). So it appears that some gc(s) do clear SoftReference(s) on System.gc(). Does WhiteBox GC util have a means to provoke gc without SoftReference processing? |
As I said above, there seem to be no such method. It would require hacking into each GC policy to prevent weakrefs clearing. |
...my above claims might appear strange to some reader as the test clearly wraps a cached ObjectStreamClass instance with a WeakReference and then checks whether this WeakReference is cleared or not. So an explanation is in order... The internal cache uses SoftReference(s). Since the cache is encapsulated, the test does not check it directly but uses a trick. It wraps the cached instance with a WeakReference. Since the instance is already wrapped with SoftReference in the cache and there is no strong reference holding it, the instance can be considered Softly Reachable. And we know that GC must clear all XxxReference(s) regardless of their type pointing to the same referent atomically. By detecting in the test that a WeakReference is cleared, we assume that the internal SoftReference is cleared too. If this was not the case, GC could be considdered to have a bug and is not by the spec. So what we are observing now is either System.gc() clearing SoftReference or a GC bug. I don't belive it is a GC bug though. |
Just to be on the same page... It is not about weakref clearing but about softref clearing. The test assumes that System.gc() would not clear a softref on the 1st call at least. |
Contemplating how this test could be fixed without using WhiteBox gc testing... The test could wrap the cached instance with a WeakReference as it does now (ref1) and then have a second WeakReference that would wrap an instance of "new Object()", (ref2)... Then the test coul gradually allocate more and more heap in a loop, checking both WeakReference(s) as it goes. When ref2 is cleared but ref1 is not, the test would succeed. Any other outcome ( such as both refs being cleared at the same point) could be considered a failure. This would assume that GCs would 1st clear XxxReferences of weakly reachable referents and only much later those of softly reachable. I hope this property is universal to all GCs although it is not guaranteed by the spec. |
...I can take this over, unless you want to do it, Aleksey? |
I find it dubious to try and guess what GCs would do with non-strong refs, but feel free. Don't reassign the bug yet, just see how messy that would be? |
On the other hand, this test is in tier2, so it makes lots of testing with other GCs not clean. I would like to have this fix in, and then do any followups that might make the test more targeted. |
By removing System.gc() you effectively make the test a NO-OP. It then basically tests just that newly constructed SoftReference is not cleared in the next moment after construction. I would then rather just disable the test until it is fixed properly... |
@shipilev you said that the test fails with some other GCs (Parallel, Shenandoah, ...). Who is responsible to select the GC algorithm? What I'm asking is whether one should explicitly enumerate several @Runs of the test with explicit selection of different GCs (which ones?) or is this a matter of some external test infrastructure that runs all the tests in the suite and pre-selects different GCs for them? |
I managed to construct/fix test that passes on G1, Shenandoah and ZGC... The combination of JVM options: -Xmx10m -XX:SoftRefLRUPolicyMSPerMB=1 ... makes ZGC and Shenandoah very aggressive and practically causes SoftReferences to be treated equally as WeakReferences. By increasing the max. heap size a bit and using defaults for SoftRefLRUPolicyMSPerMB (I believe 1000 ms/MB for G1 and 200 ms/MB for Shenandoah and ZGC), G1 and Shenandoah were happy with the changed test. For ZGC I also had to make sure that the CacheEffectiveness test is executed as 1st test in VM instance, followed by CacheReleaseUnderMemoryPressure. |
So how do we proceed @shipilev ? I open another issue or are you willing to accept my changes into your issue? |
I prefer to commit the simpler (mine) version first, and the follow up with any more sophisticated attempt to fix it.
Yes, but not really. There is still a 100ms sleep and reference processing involved, which somewhat verifies that the cache is not blown away immediately/accidentally.
CIs routinely test existing suites with different GCs. The PR body contains a sample reproducer how that happens. If you are doing the separate |
Just a 100ms sleep means hardly any chance that reference processing will be involved as VM practically does nothing at that time to trigger GC. If you are referring to this: // to trigger any ReferenceQueue processing...
lookupObjectStreamClass(AnotherTestClass.class); ...then the comment is perhaps misleading. Looking up another class just caches another entry and at the same time processes any enqueued References to remove stale cache entries. But for a Reference to be enqueued into a ReferenceQueue, GC threads must 1st clear it and link it into a "pending" list, ReferenceHandler java thread must then unlink it from the "pending" list and enqueue it into Reference's associated ReferenceQueue and only then can the request for another class process it from that ReferenceQueue. But the test is not observing that 3rd part (cleanup of stale cache entries). It observes the 1st part (atomic clearing of all XxxReferences that refer to the same referent) which is performed by GC. So this is hardly going to happen as the sole trigger for that (since System.gc() was removed) remains allocation of new objects and not enough may be allocated to trigger GC. But if your purpose was to shut up the failing test, then this is one way to do it.
Thanks for the hint. I'll add that. |
What I was trying to say above is that without System.gc() the test would succeed even if caching was not actually there and lookup always constructed new ObjectStreamClass instance. There's not enough allocation activity to clear the WeakReference constructed in the test with the ObjectStreamClass referent even if that referent was only weakly reachable... |
All right, fine. This bug takes too much time. Take the bug, but problem-list the test first. |
Test appears to pass fine with G1. But it fails with other GCs, for example Parallel, Shenandoah, etc, it fails:
I believe this is because
System.gc()
is not that reliable about what happens with weak references. As seen with other GCs, they can clear the weakrefs on Full GC. In fact, the test fails with G1 if we do a second System.gc() in this test. So the test itself is flaky. The fix is to avoid doingSystem.gc()
altogether in that subtest. The test is still retained to see that reference is not cleared for a while.Additional testing:
-XX:+UseSerialGC
, 100 repetitions-XX:+UseParallelGC
, 100 repetitions-XX:+UseG1GC
, 100 repetitions-XX:+UseShenandoahGC
, 100 repetitions-XX:+UseZGC
, 100 repetitionsProgress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9533/head:pull/9533
$ git checkout pull/9533
Update a local copy of the PR:
$ git checkout pull/9533
$ git pull https://git.openjdk.org/jdk pull/9533/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 9533
View PR using the GUI difftool:
$ git pr show -t 9533
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9533.diff