Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pack old evacuations tightly #145

Closed
wants to merge 38 commits into from

Conversation

kdnilsen
Copy link
Contributor

@kdnilsen kdnilsen commented Jun 24, 2022

This pull request includes several improvements to reduce the likelihood of evacuation failures and to reduce the frequency of old-gen collections. Key contributions:

  1. If a PLAB allocation fails, resize the the thread's preferred PLAB size to minimum size and retry the allocation request
  2. Place a limit on how large the preferred PLAB size grows for individual threads. Otherwise a small number of threads can consume the full promotion budget, leaving nothing left for the PLABs required by other threads.
  3. If a PLAB promotion fails, and the remaining words within the PLAB is greater than minimum size, that means this is a request to promote a relatively large object. In this case, do not retire the PLAB. Instead, use a shared allocation to promote this large object.
  4. If a PLAB promotion fails and the remaining words is less than the PLAB minimum size, try to replace the PLAB and then retry the PLAB allocation. If this still fails, evacuate the object to young-gen using a GCLAB. Don't try to promote with a shared allocation because that is too "expensive".

This commit has demonstrated fairly significant benefits on certain workloads. For specjbb2015, we got a 17% improvement in critical jops, from 3093.847 to 3622. For an extremem 2020 workload,
we improved p99.999 for customer preparation processing by more than four fold, from 607,107 microseconds to 142,286 microseconds, decreasing the number of concurrent GCs from 99 to 37, old GCs from 12 to 7, full GCs from 3 to 0.


Progress

  • Change must not contain extraneous whitespace
  • Change must be properly reviewed (1 review required, with at least 1 Committer)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/shenandoah pull/145/head:pull/145
$ git checkout pull/145

Update a local copy of the PR:
$ git checkout pull/145
$ git pull https://git.openjdk.org/shenandoah pull/145/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 145

View PR using the GUI difftool:
$ git pr show -t 145

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/shenandoah/pull/145.diff

Sorry, something went wrong.

kdnilsen and others added 30 commits March 9, 2022 00:10
This reduces likelihood of evacuation failures and delays the need for a
subsequent old evacuation.
The reason to enforce a promotion budget is to make more memory
available for loaning to young-gen allocations during evacuation and
update-refs phases of gc.  A second reason to enforce a promotion budget
is to assure that old-gen memory does not get so full that there is no
memory available into which to copy objects during old-gen evacuations.

With this patch, if it is available, we set the promotion budget large
enough to hold all of the live objects contained within the young-gen
collection set.

Manually resolved merge conflicts and brought in some additional context
from the development branch.
Reset promoted	expended at start of concurrent	GC rather than end.  This corrects a bug that manifests when concurrent
GC degenerates and we don't finish in the normal way.  Following a degenerated GC, we were failing to reset promoted
expended.

Fix up the remediation code that executes when PLAB allocation fails.  We had previously failed	to provide any special
handling for threads that do not have PLABs so objects encountered by such a thread were always evacuated to young so
they could be promoted in a subsequent GC pass.	 While it was relatively harmless to leave these rare objects in young-gen
memory, a more	significant impact was that this event would be logged as promotion failure and this would trigger immediate
execution of an	old-gen	collection, even when old-gen memory is	plentiful.  Now, if a thread that does not have a PLAB
needs to promote an object, we use a shared allocation to obtain memory for the object's copy.  There seems to be a single
JVM thread that is set up without a PLAB and this thread occassionally stumbles onto objects that need to be promoted.

Be more	generous in the	computation of promotion reserve.  The miserly budget previously set aside was causing
promotions to fail, resulting in significant increase in attempts at shared-memory allocations and causing older
objects	to accumulate in young-gen.  During subsequent GC passes, these	objects	would once again be likely to fail
to promote, causing a a	repeat of the problems just described.

Insert code to account for round-off errors that occur in calculations and enforcement of promotion budgets.  The
previous failure to account for	these round-off	errors was resulting in	wraparound arithmetic on the size_t type,
causing	promotion budget to be extremely large in some rare cases.  This would result in depletion of old-gen space
to the extent that subsequent evacuations of old-gen could not be performed.

Add logging reports when promotions fail.

Allow memory set aside to fulfill the minimum evacuation reserve in old-gen to be loaned to young-gen if the memory
is not needed in the current pass to hold old-gen evacuations.
1. Correct off-by-one error in promotion eligibility test
 2. Disable the post-compaction overwriting of region affiliations which was causing regions that had previously
    been FREE and are now holding compacted old-gen data to be mislabeled as YOUNG.  In some cases, post-compaction
    relabeling of OLD regions caused the total live memory in YOUNG to exceed its capacity, thus rejecting further
    allocation requests, and immediately triggering another degenerated or Full GC.
When a tenure-aged region is added to the collection set, we know that
the entirety of its live data is intended to be promoted.  Account for
this knowledge in the budgeting of the young-gen and old-gen evacuation
budgets.
Should be >= rather than >.
My first attempt to fix the unwanted relabeling of OLD regions to YOUNG
required several invocations of
ShenandahHeapRegion::make_regular_bypass() from outside of the heap
lock.  This violated protocol, resulting in upheaval.
Report the amount of memory available in old and young at ends of each
GC.
This change makes the code more clear and easier to maintain.
Unnamed threads are unable to query the current value of GCId.  In the
rare event that an unnamed thread experiences a promotion failure, the
log message needs to be handled differently than the normal case.
kdnilsen added 6 commits June 19, 2022 14:47
When promotion memory runs short, be more frugal in how we respond.
Also, fixup some comments and add some instrumentation.
Also some new instrumentation, which will be removed before upstreaming
final patch.
@bridgekeeper
Copy link

bridgekeeper bot commented Jun 24, 2022

👋 Welcome back kdnilsen! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 24, 2022

⚠️ @kdnilsen This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

size_t new_cset = cur_cset + r->get_live_data_bytes();

size_t new_cset;
if (is_generational && (r->age() >= InitialTenuringThreshold)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we changing region affiliation en masse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't "promote" regions by relabeling region affiliation. But I found that the computed budget for promotion reserve is way too low in the case that entire regions have reached tenure age. So I made an adjustment to compute larger promotion reserves when we know that all live data within a collection set region is known to be promoted.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 24, 2022
@mlbridge
Copy link

mlbridge bot commented Jun 24, 2022

Webrevs

@@ -153,6 +153,8 @@ class ShenandoahHeuristics : public CHeapObj<mtGC> {

virtual void record_requested_gc();

virtual size_t prioritize_aged_regions(size_t old_available, size_t num_regions, bool preselected_regions[]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this select_aged_regions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will make this change.

// set (because it has less live data and thus can fit within the evacuation limits even though it has less
// garbage).

size_t young_evacuation_reserve = (young_generation->soft_max_capacity() * ShenandoahEvacReserve) / 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try to align this more closely with how ShenandoahFreeSet behaves? It uses max_capacity, for example. It could also tell us exactly how much it really reserved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change this.

@@ -991,6 +1017,15 @@ void ShenandoahHeap::retire_plab(PLAB* plab) {
}
}

void ShenandoahHeap::retire_plab(PLAB* plab) {
if (!mode()->is_generational()) {
plab->retire();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have plabs in the other modes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PLABs only exist in generational mode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should fix this. sorry

@openjdk
Copy link

openjdk bot commented Jun 24, 2022

@kdnilsen This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

Pack old evacuations tightly

Reviewed-by: wkemper

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 24, 2022
@kdnilsen
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Jun 24, 2022

Going to push as commit bd13ba3.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 24, 2022
@openjdk openjdk bot closed this Jun 24, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 24, 2022
@openjdk
Copy link

openjdk bot commented Jun 24, 2022

@kdnilsen Pushed as commit bd13ba3.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

None yet

2 participants