8292296: Use multiple threads to process ParallelGC deferred updates #10313

nick-arm · 2022-09-16T16:09:26Z

This is a follow-up to an initial patch I posted a while back to hotspot-gc-dev:

https://mail.openjdk.org/pipermail/hotspot-gc-dev/2022-August/039905.html

The problem here is that some applications including SPECjbb spend a lot of time in the "Deferred Updates" stage of parallel compaction if they happen to generate a lot of objects that cross region boundaries.

The patch above is parallelising the existing serial processing of deferred updates on the main VM thread. However I think we can solve this in a simpler way by instead having each GC worker thread keep a private list of the deferred objects it encountered during compaction, and then once all regions have been compacted, process its private list of deferred updates.

We know that compaction_with_stealing_work() won't return until all regions have been compacted because otherwise
terminator->offer_termination() would return false and the worker thread would attempt to steal tasks from another thread.

The advantage of this approach over a separate parallel deferred updates step is that we don't have to worry about adding heuristics for when and how many worker threads to start up, which has the potential to cause regressions in some cases. Processing the deferred objects on the worker thread shouldn't be any slower than the existing serial scan on the VM thread, even if all the deferred objects end up on the queue of one thread (there's no attempt to balance or work-steal between threads). We also avoid having to scan each region for deferred objects in the common case where there are none in a space.

The new per-thread deferred objects list is dynamically allocated but its size is bounded by the number of 512k heap regions as we will push at most one pointer per region.

With SPECjbb on AWS c7g.16xlarge I see median full GC pause times reduce by around 20% with a corresponding ~1% increase in critical-jOPS averaged over several runs. On the "derby" benchmark from SPECjvm I also see an improvement in median full GC pause times of around 11%. I tried a variety of other benchmarks from Dacapo and SPECjvm but I couldn't see any other significant effect: it seems quite dependent on the type and size of objects allocated.

Tested tier1-3 with -XX:+UseParallelGC.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8292296: Use multiple threads to process ParallelGC deferred updates

Reviewers

Thomas Schatzl (@tschatzl - Reviewer)
Albert Mingkun Yang (@albertnetymk - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/10313/head:pull/10313
$ git checkout pull/10313

Update a local copy of the PR:
$ git checkout pull/10313
$ git pull https://git.openjdk.org/jdk pull/10313/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 10313

View PR using the GUI difftool:
$ git pr show -t 10313

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/10313.diff

This is a follow-up to an initial patch I posted a while back to hotspot-gc-dev: https://mail.openjdk.org/pipermail/hotspot-gc-dev/2022-August/039905.html The problem here is that some applications including SPECjbb spend a lot of time in the "Deferred Updates" stage of parallel compaction if they happen to generate a lot of objects that cross region boundaries. The patch above is parallelising the existing serial processing of deferred updates on the main VM thread. However I think we can solve this in a simpler way by instead having each GC worker thread keep a private list of the deferred objects it encountered during compaction, and then once all regions have been compacted, process its private list of deferred updates. We know that `compaction_with_stealing_work()` won't return until all regions have been compacted because otherwise `terminator->offer_termination()` would return false and the worker thread would attempt to steal tasks from another thread. The advantage of this approach over a separate parallel deferred updates step is that we don't have to worry about adding heuristics for when and how many worker threads to start up, which has the potential to cause regressions in some cases. Processing the deferred objects on the worker thread shouldn't be any slower than the existing serial scan on the VM thread, even if all the deferred objects end up on the queue of one thread (there's no attempt to balance or work-steal between threads). We also avoid having to scan each region for deferred objects in the common case where there are none in a space. The new per-thread deferred objects list is dynamically allocated but its size is bounded by the number of 512k heap regions as we will push at most one pointer per region. With SPECjbb on AWS c7g.16xlarge I see median full GC pause times reduce by around 20% with a corresponding ~1% increase in critical-jOPS averaged over several runs. On the "derby" benchmark from SPECjvm I also see an improvement in median full GC pause times of around 11%. I tried a variety of other benchmarks from Dacapo and SPECjvm but I couldn't see any other significant effect: it seems quite dependent on the type and size of objects allocated. Tested tier1-3 with -XX:+UseParallelGC.

bridgekeeper · 2022-09-16T16:11:38Z

👋 Welcome back ngasson! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2022-09-16T16:13:51Z

@nick-arm The following label will be automatically applied to this pull request:

hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2022-09-16T16:16:34Z

Webrevs

01: Full - Incremental (a4a24531)
00: Full (6a99b848)

src/hotspot/share/gc/parallel/psParallelCompact.cpp

tschatzl

Lgtm sans the assert issue Albert mentioned. I agree that this is a much nicer solution than adding heuristics again.

openjdk · 2022-09-20T12:17:05Z

@nick-arm This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8292296: Use multiple threads to process ParallelGC deferred updates

Reviewed-by: tschatzl, ayang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 54 new commits pushed to the master branch:

e9401e6: 8293364: IGV: Refactor Action in EditorTopComponent and fix minor bugs
844a95b: 8292892: Javadoc index descriptions are not deterministic
8d1dd6a: 8294076: Improve ant detection in idea.sh
4e7cb15: 8293480: IGV: Update Bytecode and ControlFlow Component immediately when opening a new graph
8ecdaa6: 8294000: Filler array klass should be in jdk/vm/internal, not in java/vm/internal
379f309: 8287217: C2: PhaseCCP: remove not visited nodes, prevent type inconsistency
12e3510: 8293798: Fix test bugs due to incompatibility with -XX:+AlwaysIncrementalInline
cb72f80: 8293978: Duplicate simple loop back-edge will crash the vm
cddd6de: 8279941: sun/security/pkcs11/Signature/TestDSAKeyLength.java fails when NSS version detection fails
21008ca: 8285383: vmTestbase/nsk/jvmti/scenarios/hotswap/HS204/hs204t001/hs204t001.java failed with "exit code: 96"
... and 44 more: https://git.openjdk.org/jdk/compare/dfb9c0663370fc8335caf06ca6f0cb4dac95ce2d...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

albertnetymk · 2022-09-21T09:38:19Z

src/hotspot/share/gc/parallel/psParallelCompact.cpp

  }
+
+  cm->update_contents(cast_to_oop(addr));
+  assert(oopDesc::is_oop(cast_to_oop(addr)), "Expected an oop at " PTR_FORMAT, p2i(cast_to_oop(addr)));


I believe this can be moved up a bit, e.g. btw L2601 and L2602.

Never mind; the assert should be after the obj body is properly updated. It's good as is.

nick-arm · 2022-09-22T08:35:08Z

Thanks for the reviews! Any more comments or is this change ok to integrate now?

tschatzl · 2022-09-22T09:02:15Z

I'd say, ship it... 🚢

nick-arm · 2022-09-22T10:14:06Z

/integrate

openjdk · 2022-09-22T10:16:45Z

Going to push as commit 3fa6778.
Since your change was applied there have been 74 commits pushed to the master branch:

800e68d: 8292044: HttpClient doesn't handle 102 or 103 properly
83abfa5: 8255670: Improve C2's detection of modified nodes
5652030: 8292376: A few Swing methods use inheritDoc on exceptions which are not inherited
03f287d: 8293995: Problem list sun/tools/jstatd/TestJstatdRmiPort.java on all platforms because of 8293577
d5bee4a: 8294086: RISC-V: Cleanup InstructionMark usages in the backend
47f233a: 8292202: modules_do is called without Module_lock
742bc04: 8294100: RISC-V: Move rt_call and xxx_move from SharedRuntime to MacroAssembler
2283c32: 8294149: JMH 1.34 and later requires jopt-simple 5.0.4
9f90eb0: 8294062: Improve parsing performance of j.l.c.MethodTypeDesc
c6be2cd: 8293156: Dcmd VM.classloaders fails to print the full hierarchy
... and 64 more: https://git.openjdk.org/jdk/compare/dfb9c0663370fc8335caf06ca6f0cb4dac95ce2d...master

Your commit was automatically rebased without conflicts.

openjdk · 2022-09-22T10:17:09Z

@nick-arm Pushed as commit 3fa6778.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk bot added the rfr Pull request is ready for review label Sep 16, 2022

openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Sep 16, 2022

albertnetymk reviewed Sep 19, 2022

View reviewed changes

src/hotspot/share/gc/parallel/psParallelCompact.cpp Outdated Show resolved Hide resolved

tschatzl approved these changes Sep 20, 2022

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Sep 20, 2022

Make assert more strict

a4a2453

tschatzl approved these changes Sep 21, 2022

View reviewed changes

albertnetymk approved these changes Sep 21, 2022

View reviewed changes

openjdk bot added the integrated Pull request has been integrated label Sep 22, 2022

openjdk bot closed this Sep 22, 2022

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8292296: Use multiple threads to process ParallelGC deferred updates #10313

8292296: Use multiple threads to process ParallelGC deferred updates #10313

nick-arm commented Sep 16, 2022 •

edited by openjdk bot

bridgekeeper bot commented Sep 16, 2022

openjdk bot commented Sep 16, 2022

mlbridge bot commented Sep 16, 2022 •

edited

tschatzl left a comment

openjdk bot commented Sep 20, 2022 •

edited

albertnetymk Sep 21, 2022

albertnetymk Sep 21, 2022

nick-arm commented Sep 22, 2022

tschatzl commented Sep 22, 2022

nick-arm commented Sep 22, 2022

openjdk bot commented Sep 22, 2022

openjdk bot commented Sep 22, 2022

8292296: Use multiple threads to process ParallelGC deferred updates #10313

8292296: Use multiple threads to process ParallelGC deferred updates #10313

Conversation

nick-arm commented Sep 16, 2022 • edited by openjdk bot

Progress

Issue

Reviewers

Reviewing

bridgekeeper bot commented Sep 16, 2022

openjdk bot commented Sep 16, 2022

mlbridge bot commented Sep 16, 2022 • edited

Webrevs

tschatzl left a comment

Choose a reason for hiding this comment

openjdk bot commented Sep 20, 2022 • edited

albertnetymk Sep 21, 2022

Choose a reason for hiding this comment

albertnetymk Sep 21, 2022

Choose a reason for hiding this comment

nick-arm commented Sep 22, 2022

tschatzl commented Sep 22, 2022

nick-arm commented Sep 22, 2022

openjdk bot commented Sep 22, 2022

openjdk bot commented Sep 22, 2022

nick-arm commented Sep 16, 2022 •

edited by openjdk bot

mlbridge bot commented Sep 16, 2022 •

edited

openjdk bot commented Sep 20, 2022 •

edited