Skip to content

8346920: Serial: Support allocation in old generation before GC #22899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

albertnetymk
Copy link
Member

@albertnetymk albertnetymk commented Jan 2, 2025

This PR introduces a new strategy to determine whether an allocation should be attempted in the old generation or if a GC cycle should be initiated, based on the GCTimeRatio. With this change, the benchmark attached to the ticket now completes in ~13 GC, a significant improvement compared to the >1000 GC observed previously.

Test: tier1-3


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Integration blocker

 ⚠️ Title mismatch between PR and JBS for issue JDK-8346920

Issue

  • JDK-8346920: Serial: Support allocation in old generation when heap is almost full (Enhancement - P4) ⚠️ Title mismatch between PR and JBS.

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22899/head:pull/22899
$ git checkout pull/22899

Update a local copy of the PR:
$ git checkout pull/22899
$ git pull https://git.openjdk.org/jdk.git pull/22899/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 22899

View PR using the GUI difftool:
$ git pr show -t 22899

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22899.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 2, 2025

👋 Welcome back ayang! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jan 2, 2025

@albertnetymk This change is no longer ready for integration - check the PR body for details.

@openjdk openjdk bot changed the title 8346920 8346920: Serial: Support allocation in old generation before GC Jan 2, 2025
@openjdk openjdk bot added the rfr Pull request is ready for review label Jan 2, 2025
@openjdk
Copy link

openjdk bot commented Jan 2, 2025

@albertnetymk The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Jan 2, 2025
@mlbridge
Copy link

mlbridge bot commented Jan 2, 2025

Webrevs

Copy link
Member

@lgxbslgx lgxbslgx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement.

Comment on lines -332 to +334
// Note that only large objects get a shot at being
// allocated in later generations.
bool first_only = !should_try_older_generation_allocation(size);

bool first_only = !should_try_older_generation_allocation(size)
&& is_long_enough_from_prev_gc_pause_end();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the first_only actually young_only? It means that the allocation is only attempted in young generation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. I believe Serial used to support >2 generations, so some generic names are left as is. Fixing it requires updating in multiple places, such as the method name older_generation, arg in attempt_allocation, etc. How about we do that invasive but pure refactoring in its own PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, such refactoring should be finished in other issues.


Ticks prev_gc_pause_end;
Tickspan gc_pause;
if (full_gc_pause_end < young_gc_pause_end ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A unnecessary space after young_gc_pause_end.

@@ -50,6 +50,7 @@ class STWGCTimer;

class DefNewGeneration: public Generation {
friend class VMStructs;
friend class SerialHeap;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really know whether it is a good way. Maybe we can export a method DefNewGeneration::gc_timer.

Copy link
Member

@lgxbslgx lgxbslgx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 7, 2025
@tschatzl
Copy link
Contributor

As far as I understand, the test program expands that ArrayList with new Short objects, allocating until OOM.

The problem to be solved in this case is how quickly the collector goes OOM; it also is related to the C2 compiler doing reallocation of C2 stack-allocated objects (when I ran the test with -XX:TieredStopAtLevel=3 Serial GC also quickly OOMEd reproducably).

Current code, since the typical allocation here is small, prefers doing a GC pause to allocation in old gen because the allocation is small (that Short object). The GC pause finds out that it should do a full gc only because its predictions indicate that the young gc is "unsafe" anyway (it's not "unsafe" as in not able to do it, but likely to end up in an evacuation failure).

The reason why the application will fail faster is that it allows many more increments of that list array without GC. Without the change, as the that array gets larger, the frequency of the GCs will get larger without noticable more promotion going on as the amount of possible reallocations of list per GC gets smaller. And it will do full gcs all the time because of the predictions indicating an unsafe young gc (which is correct as the list array will be > to-space quickly).

(It would be nice if the title and description of the CR somehow reflected this - it reads like there is no support of direct allocation in old gen for Serial GC which is in fact not the case).

Incidentally, Parallel GC has the same issue, running hundreds of full gcs before terminating (I did not check how many it takes, and probably never also waited for Serial GC).

As far as I understand the change, this change prevents "back-to-back" GCs, opting for allocating everything into the old generation (if not full) instead. I wonder if that has not negative impact (more full gcs) for applications that legitimately allocate lots of (short-living) objects as this will fill up the old generation (much) faster than before. Particularly applications that use relatively small heap for that.
It seems like this behavior may be advantageous for startup when supposedly initializing/loading data structures too (just fill up the whole heap and compact completely).

Did you run some other benchmarks with this change? (Maybe the smaller dacapo benchmarks at "reasonably" sized heaps will show something?)

I'm actually a bit conflicted about this change. It seems to fix a (rare?) policy issue, but at the same time adds a new mechanic (use of GCTimeRatio) for Serial GC.
So for now I would like to wait for measurements on potential impact on "more realistic" applications.

Hth,
Thomas

@mlbridge
Copy link

mlbridge bot commented Jan 20, 2025

Mailing list message from Kirk Pepperdine on hotspot-gc-dev:

Hi,

One of the more important goals when tuning generational collectors and especially when tuning the the serial and parallel is to minimize the number of transients that end up in tenured. With that in mind I?m nervous about allocating in tenured to avoid a collection cycle. First, the probability that these (prematurely promoted) allocations are transit is likely > 90%. More over, transients in tenured tend to create zombies. Zombies put further pressure on tenure. Each of these ?conditions? will increase the frequency of full collections and fulls are way more costly than young collections.

IME, techniques that have been tried in the past to avoid collections end up being more expensive than simply triggering the collection when it is needed. Worst case, I?d allocate in the active survivor space as that should be about 50% empty by default. At least that space will be collected by the next young gen thus avoiding the problems that come with having transients in tenured.

Kind regards,
Kirk Pepperdine

@albertnetymk
Copy link
Member Author

As far as I understand the change, this change prevents "back-to-back" GCs, opting for allocating everything into the old generation (if not full) instead.

I would not say this patch prefers allocation-in-old-gen over running gc. The behavior is governed by GCTimeRatio. One can get back the original behavior with a small GCTimeRatio.

Did you run some other benchmarks with this change? (Maybe the smaller dacapo benchmarks at "reasonably" sized heaps will show something?)

Tried dacapo bms but no perf diff using 2x-live-size heap sizes. (This patch mostly affects the back-to-back gc case, which should be rare in properly configured system/bms.)

It seems to fix a (rare?) policy issue, but at the same time adds a new mechanic (use of GCTimeRatio) for Serial GC.

Currently, there is no way to tune Serial based on app-vs-gc time, and this patch essentially adds that support.

@mlbridge
Copy link

mlbridge bot commented Jan 20, 2025

Mailing list message from Kirk Pepperdine on hotspot-gc-dev:

Hi,

On Jan 20, 2025, at 9:05?PM, Albert Mingkun Yang <ayang at openjdk.org> wrote:

On Mon, 20 Jan 2025 14:55:05 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

As far as I understand the change, this change prevents "back-to-back" GCs, opting for allocating everything into the old generation (if not full) instead.

I would not say this patch prefers allocation-in-old-gen over running gc. The behavior is governed by `GCTimeRatio`. One can get back the original behavior with a small `GCTimeRatio`.

Did you run some other benchmarks with this change? (Maybe the smaller dacapo benchmarks at "reasonably" sized heaps will show something?)

Tried dacapo bms but no perf diff using 2x-live-size heap sizes. (This patch mostly affects the back-to-back gc case, which should be rare in properly configured system/bms.)

It seems to fix a (rare?) policy issue, but at the same time adds a new mechanic (use of GCTimeRatio) for Serial GC.

Currently, there is no way to tune Serial based on app-vs-gc time, and this patch essentially adds that support.

This is counter to my experience. If young gen is properly tuned, GC overhead is generally minimalized. IME, this is the third most optimal way to control GC overhead. Using GCTimeRatio is way down on that list given the current implementations.

Kind regards,
Kirk

@albertnetymk
Copy link
Member Author

If young gen is properly tuned, GC overhead is generally minimalized.

That approach is inflexible and requires trial and error, as it depends on knowing the allocation rate. The GC percentage (GCTimeRatio) provides a more direct way to express that requirement or intention.

Using GCTimeRatio is way down on that list given the current implementations.

Serial doesn't support GCTimeRatio; this PR aims to add support for it in Serial. In the long run, as I envision it, once Serial implements auto-heap-resizing, MaxGCPauseMillis and GCTimeRatio will become the two primary flags for controlling GC.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Jan 23, 2025
@albertnetymk
Copy link
Member Author

Superseded by #23270, where it doesn't use GCTimeRatio and affects only tight-heap configurations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-gc hotspot-gc-dev@openjdk.org rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

None yet

3 participants