Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8315478: GenShen: Tolerate round-off errors in preselected promotion budget #317

Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 20 additions & 1 deletion src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp
Original file line number Diff line number Diff line change
@@ -394,7 +394,19 @@ void ShenandoahGeneration::adjust_evacuation_budgets(ShenandoahHeap* heap, Shena
// and promotion reserves. Try shrinking OLD now in case that gives us a bit more runway for mutator allocations during
// evac and update phases.
size_t old_consumed = old_evacuated_committed + young_advance_promoted_reserve_used;
assert(old_available >= old_consumed, "Cannot consume more than is available");

if (old_available < old_consumed) {
// This can happen due to round-off errors when adding the results of truncated integer arithmetic.
// We've already truncated old_evacuated_committed. Truncate young_advance_promoted_reserve_used here.
assert(young_advance_promoted_reserve_used <= (33 * (old_available - old_evacuated_committed)) / 32,
"Round-off errors should be less than 3.125%%, committed: " SIZE_FORMAT ", reserved: " SIZE_FORMAT,
young_advance_promoted_reserve_used, old_available - old_evacuated_committed);
young_advance_promoted_reserve_used = old_available - old_evacuated_committed;
old_consumed = old_evacuated_committed + young_advance_promoted_reserve_used;
}

assert(old_available >= old_consumed, "Cannot consume (" SIZE_FORMAT ") more than is available (" SIZE_FORMAT ")",
old_consumed, old_available);
Comment on lines +397 to +409
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you are just adding debugging code to catch the issue more easily? (as well as the change further below)

I don't see any change in the control flow or the state here.

Is the intention to add this change and the one below to aid debugging of a very difficult to reproduce problem?

Copy link
Member

@ysramakrishna ysramakrishna Sep 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, never mind, I see that you are adjusting some values there, which I missed in a first reading of the code.

I don't understand this logic well enough to review this change, so please go ahead an land this with the review you have from William.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

   old_consumed = old_evacuated_committed + old_available - old_evacuated_committed
                             = old_available

So effectively this is saying if old_available < old_consumed, make old_available = old_consumed so the subsequent assertion is satisfied. You are asserting that this can only happen if the difference is because of round-off errors.

I do have a question about this, though. Can we avoid the round-off errors in the arithmetic entirely, by doing rounding in the "safe direction" (e.g. rounding down or up) at each step where rounding is needed, rather than relying on catching the condition and adjusting the values to re-establish the invariants?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be possible to do all round-off arithmetic more conservatively, to avoid the need for this check. The "problem" arises when the sum of individual round-off errors is different than the single round-off error for the sum of promotions. My preference would be to fix this in a separate PR if we choose to do so.

size_t excess_old = old_available - old_consumed;
size_t unaffiliated_old_regions = old_generation->free_unaffiliated_regions();
size_t unaffiliated_old = unaffiliated_old_regions * region_size_bytes;
@@ -599,6 +611,8 @@ size_t ShenandoahGeneration::select_aged_regions(size_t old_available, size_t nu
// Sort in increasing order according to live data bytes. Note that candidates represents the number of regions
// that qualify to be promoted by evacuation.
if (candidates > 0) {
size_t selected_regions = 0;
size_t selected_live = 0;
QuickSort::sort<AgedRegionData>(sorted_regions, candidates, compare_by_aged_live, false);
for (size_t i = 0; i < candidates; i++) {
size_t region_live_data = sorted_regions[i]._live_data;
@@ -607,6 +621,8 @@ size_t ShenandoahGeneration::select_aged_regions(size_t old_available, size_t nu
ShenandoahHeapRegion* region = sorted_regions[i]._region;
old_consumed += promotion_need;
candidate_regions_for_promotion_by_copy[region->index()] = true;
selected_regions++;
selected_live += region_live_data;
} else {
// We rejected this promotable region from the collection set because we had no room to hold its copy.
// Add this region to promo potential for next GC.
@@ -615,6 +631,9 @@ size_t ShenandoahGeneration::select_aged_regions(size_t old_available, size_t nu
// We keep going even if one region is excluded from selection because we need to accumulate all eligible
// regions that are not preselected into promo_potential
}
log_info(gc)("Preselected " SIZE_FORMAT " regions containing " SIZE_FORMAT " live bytes,"
" consuming: " SIZE_FORMAT " of budgeted: " SIZE_FORMAT,
selected_regions, selected_live, old_consumed, old_available);
}
heap->set_pad_for_promote_in_place(promote_in_place_pad);
heap->set_promotion_potential(promo_potential);