8338780: GenShen: Fix up some comments

Y. Srinivas Ramakrishna · Y. Srinivas Ramakrishna · commit 977200a0c073 · 2024-08-26T23:22:34.000Z
Reviewed-by: kdnilsen
diff --git a/src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp b/src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp
@@ -22,8 +22,8 @@
  * questions.
  *
  */
-#include "precompiled.hpp"
 
+#include "precompiled.hpp"
 
 #include "gc/shared/gcCause.hpp"
 #include "gc/shenandoah/heuristics/shenandoahHeuristics.hpp"
@@ -205,6 +205,34 @@ static double saturate(double value, double min, double max) {
   return MAX2(MIN2(value, max), min);
 }
 
+//  Rationale:
+//    The idea is that there is an average allocation rate and there are occasional abnormal bursts (or spikes) of
+//    allocations that exceed the average allocation rate.  What do these spikes look like?
+//
+//    1. At certain phase changes, we may discard large amounts of data and replace it with large numbers of newly
+//       allocated objects.  This "spike" looks more like a phase change.  We were in steady state at M bytes/sec
+//       allocation rate and now we're in a "reinitialization phase" that looks like N bytes/sec.  We need the "spike"
+//       accommodation to give us enough runway to recalibrate our "average allocation rate".
+//
+//   2. The typical workload changes.  "Suddenly", our typical workload of N TPS increases to N+delta TPS.  This means
+//       our average allocation rate needs to be adjusted.  Once again, we need the "spike" accomodation to give us
+//       enough runway to recalibrate our "average allocation rate".
+//
+//    3. Though there is an "average" allocation rate, a given workload's demand for allocation may be very bursty.  We
+//       allocate a bunch of LABs during the 5 ms that follow completion of a GC, then we perform no more allocations for
+//       the next 150 ms.  It seems we want the "spike" to represent the maximum divergence from average within the
+//       period of time between consecutive evaluation of the should_start_gc() service.  Here's the thinking:
+//
+//       a) Between now and the next time I ask whether should_start_gc(), we might experience a spike representing
+//          the anticipated burst of allocations.  If that would put us over budget, then we should start GC immediately.
+//       b) Between now and the anticipated depletion of allocation pool, there may be two or more bursts of allocations.
+//          If there are more than one of these bursts, we can "approximate" that these will be separated by spans of
+//          time with very little or no allocations so the "average" allocation rate should be a suitable approximation
+//          of how this will behave.
+//
+//    For cases 1 and 2, we need to "quickly" recalibrate the average allocation rate whenever we detect a change
+//    in operation mode.  We want some way to decide that the average rate has changed, while keeping average
+//    allocation rate computation independent.
 bool ShenandoahAdaptiveHeuristics::should_start_gc() {
   size_t capacity = _space_info->soft_max_capacity();
   size_t available = _space_info->soft_available();
@@ -238,34 +266,6 @@ bool ShenandoahAdaptiveHeuristics::should_start_gc() {
       return true;
     }
   }
-  //  Rationale:
-  //    The idea is that there is an average allocation rate and there are occasional abnormal bursts (or spikes) of
-  //    allocations that exceed the average allocation rate.  What do these spikes look like?
-  //
-  //    1. At certain phase changes, we may discard large amounts of data and replace it with large numbers of newly
-  //       allocated objects.  This "spike" looks more like a phase change.  We were in steady state at M bytes/sec
-  //       allocation rate and now we're in a "reinitialization phase" that looks like N bytes/sec.  We need the "spike"
-  //       accommodation to give us enough runway to recalibrate our "average allocation rate".
-  //
-  //   2. The typical workload changes.  "Suddenly", our typical workload of N TPS increases to N+delta TPS.  This means
-  //       our average allocation rate needs to be adjusted.  Once again, we need the "spike" accomodation to give us
-  //       enough runway to recalibrate our "average allocation rate".
-  //
-  //    3. Though there is an "average" allocation rate, a given workload's demand for allocation may be very bursty.  We
-  //       allocate a bunch of LABs during the 5 ms that follow completion of a GC, then we perform no more allocations for
-  //       the next 150 ms.  It seems we want the "spike" to represent the maximum divergence from average within the
-  //       period of time between consecutive evaluation of the should_start_gc() service.  Here's the thinking:
-  //
-  //       a) Between now and the next time I ask whether should_start_gc(), we might experience a spike representing
-  //          the anticipated burst of allocations.  If that would put us over budget, then we should start GC immediately.
-  //       b) Between now and the anticipated depletion of allocation pool, there may be two or more bursts of allocations.
-  //          If there are more than one of these bursts, we can "approximate" that these will be separated by spans of
-  //          time with very little or no allocations so the "average" allocation rate should be a suitable approximation
-  //          of how this will behave.
-  //
-  //    For cases 1 and 2, we need to "quickly" recalibrate the average allocation rate whenever we detect a change
-  //    in operation mode.  We want some way to decide that the average rate has changed.  Make average allocation rate
-  //    computations an independent effort.
   // Check if allocation headroom is still okay. This also factors in:
   //   1. Some space to absorb allocation spikes (ShenandoahAllocSpikeFactor)
   //   2. Accumulated penalties from Degenerated and Full GC
diff --git a/src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.hpp b/src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.hpp
@@ -140,6 +140,10 @@ class ShenandoahAdaptiveHeuristics : public ShenandoahHeuristics {
   // source of feedback to adjust trigger parameters.
   TruncatedSeq _available;
 
+  // A conservative minimum threshold of free space that we'll try to maintain when possible.
+  // For example, we might trigger a concurrent gc if we are likely to drop below
+  // this threshold, or we might consider this when dynamically resizing generations
+  // in the generational case. Controlled by global flag ShenandoahMinFreeThreshold.
   size_t min_free_threshold();
 };
 
diff --git a/src/hotspot/share/gc/shenandoah/heuristics/shenandoahGlobalHeuristics.cpp b/src/hotspot/share/gc/shenandoah/heuristics/shenandoahGlobalHeuristics.cpp
@@ -40,36 +40,10 @@ ShenandoahGlobalHeuristics::ShenandoahGlobalHeuristics(ShenandoahGlobalGeneratio
 void ShenandoahGlobalHeuristics::choose_collection_set_from_regiondata(ShenandoahCollectionSet* cset,
                                                                        RegionData* data, size_t size,
                                                                        size_t actual_free) {
-  // The logic for cset selection in adaptive is as follows:
-  //
-  //   1. We cannot get cset larger than available free space. Otherwise we guarantee OOME
-  //      during evacuation, and thus guarantee full GC. In practice, we also want to let
-  //      application to allocate something. This is why we limit CSet to some fraction of
-  //      available space. In non-overloaded heap, max_cset would contain all plausible candidates
-  //      over garbage threshold.
-  //
-  //   2. We should not get cset too low so that free threshold would not be met right
-  //      after the cycle. Otherwise we get back-to-back cycles for no reason if heap is
-  //      too fragmented. In non-overloaded non-fragmented heap min_garbage would be around zero.
-  //
-  // Therefore, we start by sorting the regions by garbage. Then we unconditionally add the best candidates
-  // before we meet min_garbage. Then we add all candidates that fit with a garbage threshold before
-  // we hit max_cset. When max_cset is hit, we terminate the cset selection. Note that in this scheme,
-  // ShenandoahGarbageThreshold is the soft threshold which would be ignored until min_garbage is hit.
-
-  // In generational mode, the sort order within the data array is not strictly descending amounts of garbage.  In
-  // particular, regions that have reached tenure age will be sorted into this array before younger regions that contain
-  // more garbage.  This represents one of the reasons why we keep looking at regions even after we decide, for example,
-  // to exclude one of the regions because it might require evacuation of too much live data.
-
-
-
   // Better select garbage-first regions
   QuickSort::sort<RegionData>(data, (int) size, compare_by_garbage);
 
-  size_t cur_young_garbage = add_preselected_regions_to_collection_set(cset, data, size);
-
-  choose_global_collection_set(cset, data, size, actual_free, cur_young_garbage);
+  choose_global_collection_set(cset, data, size, actual_free, 0 /* cur_young_garbage */);
 
   log_cset_composition(cset);
 }
@@ -126,10 +100,7 @@ void ShenandoahGlobalHeuristics::choose_global_collection_set(ShenandoahCollecti
 
   for (size_t idx = 0; idx < size; idx++) {
     ShenandoahHeapRegion* r = data[idx].get_region();
-    if (cset->is_preselected(r->index())) {
-      fatal("There should be no preselected regions during GLOBAL GC");
-      continue;
-    }
+    assert(!cset->is_preselected(r->index()), "There should be no preselected regions during GLOBAL GC");
     bool add_region = false;
     if (r->is_old() || (r->age() >= tenuring_threshold)) {
       size_t new_cset = old_cur_cset + r->get_live_data_bytes();
diff --git a/src/hotspot/share/gc/shenandoah/heuristics/shenandoahYoungHeuristics.cpp b/src/hotspot/share/gc/shenandoah/heuristics/shenandoahYoungHeuristics.cpp
@@ -42,27 +42,14 @@ ShenandoahYoungHeuristics::ShenandoahYoungHeuristics(ShenandoahYoungGeneration*
 void ShenandoahYoungHeuristics::choose_collection_set_from_regiondata(ShenandoahCollectionSet* cset,
                                                                       RegionData* data, size_t size,
                                                                       size_t actual_free) {
-  // The logic for cset selection in adaptive is as follows:
+  // See comments in ShenandoahAdaptiveHeuristics::choose_collection_set_from_regiondata():
+  // we do the same here, but with the following adjustments for generational mode:
   //
-  //   1. We cannot get cset larger than available free space. Otherwise we guarantee OOME
-  //      during evacuation, and thus guarantee full GC. In practice, we also want to let
-  //      application to allocate something. This is why we limit CSet to some fraction of
-  //      available space. In non-overloaded heap, max_cset would contain all plausible candidates
-  //      over garbage threshold.
-  //
-  //   2. We should not get cset too low so that free threshold would not be met right
-  //      after the cycle. Otherwise we get back-to-back cycles for no reason if heap is
-  //      too fragmented. In non-overloaded non-fragmented heap min_garbage would be around zero.
-  //
-  // Therefore, we start by sorting the regions by garbage. Then we unconditionally add the best candidates
-  // before we meet min_garbage. Then we add all candidates that fit with a garbage threshold before
-  // we hit max_cset. When max_cset is hit, we terminate the cset selection. Note that in this scheme,
-  // ShenandoahGarbageThreshold is the soft threshold which would be ignored until min_garbage is hit.
-
-  // In generational mode, the sort order within the data array is not strictly descending amounts of garbage.  In
-  // particular, regions that have reached tenure age will be sorted into this array before younger regions that contain
-  // more garbage.  This represents one of the reasons why we keep looking at regions even after we decide, for example,
-  // to exclude one of the regions because it might require evacuation of too much live data.
+  // In generational mode, the sort order within the data array is not strictly descending amounts
+  // of garbage. In particular, regions that have reached tenure age will be sorted into this
+  // array before younger regions that typically contain more garbage. This is one reason why,
+  // for example, we continue examining regions even after rejecting a region that has
+  // more live data than we can evacuate.
 
   // Better select garbage-first regions
   QuickSort::sort<RegionData>(data, (int) size, compare_by_garbage);