-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pack old evacuations tightly #145
Conversation
This reverts commit 2132a6d.
Reviewed-by: kdnilsen
This reduces likelihood of evacuation failures and delays the need for a subsequent old evacuation.
The reason to enforce a promotion budget is to make more memory available for loaning to young-gen allocations during evacuation and update-refs phases of gc. A second reason to enforce a promotion budget is to assure that old-gen memory does not get so full that there is no memory available into which to copy objects during old-gen evacuations. With this patch, if it is available, we set the promotion budget large enough to hold all of the live objects contained within the young-gen collection set. Manually resolved merge conflicts and brought in some additional context from the development branch.
Reset promoted expended at start of concurrent GC rather than end. This corrects a bug that manifests when concurrent GC degenerates and we don't finish in the normal way. Following a degenerated GC, we were failing to reset promoted expended. Fix up the remediation code that executes when PLAB allocation fails. We had previously failed to provide any special handling for threads that do not have PLABs so objects encountered by such a thread were always evacuated to young so they could be promoted in a subsequent GC pass. While it was relatively harmless to leave these rare objects in young-gen memory, a more significant impact was that this event would be logged as promotion failure and this would trigger immediate execution of an old-gen collection, even when old-gen memory is plentiful. Now, if a thread that does not have a PLAB needs to promote an object, we use a shared allocation to obtain memory for the object's copy. There seems to be a single JVM thread that is set up without a PLAB and this thread occassionally stumbles onto objects that need to be promoted. Be more generous in the computation of promotion reserve. The miserly budget previously set aside was causing promotions to fail, resulting in significant increase in attempts at shared-memory allocations and causing older objects to accumulate in young-gen. During subsequent GC passes, these objects would once again be likely to fail to promote, causing a a repeat of the problems just described. Insert code to account for round-off errors that occur in calculations and enforcement of promotion budgets. The previous failure to account for these round-off errors was resulting in wraparound arithmetic on the size_t type, causing promotion budget to be extremely large in some rare cases. This would result in depletion of old-gen space to the extent that subsequent evacuations of old-gen could not be performed. Add logging reports when promotions fail. Allow memory set aside to fulfill the minimum evacuation reserve in old-gen to be loaned to young-gen if the memory is not needed in the current pass to hold old-gen evacuations.
1. Correct off-by-one error in promotion eligibility test 2. Disable the post-compaction overwriting of region affiliations which was causing regions that had previously been FREE and are now holding compacted old-gen data to be mislabeled as YOUNG. In some cases, post-compaction relabeling of OLD regions caused the total live memory in YOUNG to exceed its capacity, thus rejecting further allocation requests, and immediately triggering another degenerated or Full GC.
When a tenure-aged region is added to the collection set, we know that the entirety of its live data is intended to be promoted. Account for this knowledge in the budgeting of the young-gen and old-gen evacuation budgets.
Should be >= rather than >.
Reviewed-by: kdnilsen
Reviewed-by: kdnilsen
My first attempt to fix the unwanted relabeling of OLD regions to YOUNG required several invocations of ShenandahHeapRegion::make_regular_bypass() from outside of the heap lock. This violated protocol, resulting in upheaval.
Report the amount of memory available in old and young at ends of each GC.
This change makes the code more clear and easier to maintain.
Unnamed threads are unable to query the current value of GCId. In the rare event that an unnamed thread experiences a promotion failure, the log message needs to be handled differently than the normal case.
When promotion memory runs short, be more frugal in how we respond. Also, fixup some comments and add some instrumentation.
Also some new instrumentation, which will be removed before upstreaming final patch.
…tly' into pack-old-evacuations-tightly
👋 Welcome back kdnilsen! A progress list of the required criteria for merging this PR into |
|
size_t new_cset = cur_cset + r->get_live_data_bytes(); | ||
|
||
size_t new_cset; | ||
if (is_generational && (r->age() >= InitialTenuringThreshold)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we changing region affiliation en masse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't "promote" regions by relabeling region affiliation. But I found that the computed budget for promotion reserve is way too low in the case that entire regions have reached tenure age. So I made an adjustment to compute larger promotion reserves when we know that all live data within a collection set region is known to be promoted.
Webrevs
|
@@ -153,6 +153,8 @@ class ShenandoahHeuristics : public CHeapObj<mtGC> { | |||
|
|||
virtual void record_requested_gc(); | |||
|
|||
virtual size_t prioritize_aged_regions(size_t old_available, size_t num_regions, bool preselected_regions[]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call this select_aged_regions
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will make this change.
// set (because it has less live data and thus can fit within the evacuation limits even though it has less | ||
// garbage). | ||
|
||
size_t young_evacuation_reserve = (young_generation->soft_max_capacity() * ShenandoahEvacReserve) / 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we try to align this more closely with how ShenandoahFreeSet
behaves? It uses max_capacity
, for example. It could also tell us exactly how much it really reserved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change this.
@@ -991,6 +1017,15 @@ void ShenandoahHeap::retire_plab(PLAB* plab) { | |||
} | |||
} | |||
|
|||
void ShenandoahHeap::retire_plab(PLAB* plab) { | |||
if (!mode()->is_generational()) { | |||
plab->retire(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have plabs in the other modes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PLABs only exist in generational mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should fix this. sorry
@kdnilsen This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been no new commits pushed to the ➡️ To integrate this PR with the above commit message to the |
/integrate |
Going to push as commit bd13ba3. |
This pull request includes several improvements to reduce the likelihood of evacuation failures and to reduce the frequency of old-gen collections. Key contributions:
This commit has demonstrated fairly significant benefits on certain workloads. For specjbb2015, we got a 17% improvement in critical jops, from 3093.847 to 3622. For an extremem 2020 workload,
we improved p99.999 for customer preparation processing by more than four fold, from 607,107 microseconds to 142,286 microseconds, decreasing the number of concurrent GCs from 99 to 37, old GCs from 12 to 7, full GCs from 3 to 0.
Progress
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/shenandoah pull/145/head:pull/145
$ git checkout pull/145
Update a local copy of the PR:
$ git checkout pull/145
$ git pull https://git.openjdk.org/shenandoah pull/145/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 145
View PR using the GUI difftool:
$ git pr show -t 145
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/shenandoah/pull/145.diff