8332587: RISC-V: secondary_super_cache does not scale well #19320

zifeihan · 2024-05-21T08:31:53Z

Implementation of subtype checking JDK-8180450 for linux-riscv64.
This optimization depends on availability of the Zbb extension which has the cpop instruction.

Correctness testing:

Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release)
Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug)
Run all of tier1 with -XX:+VerifySecondarySupers

JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs

Original:

Benchmark                             Mode  Cnt    Score   Error  Units
SecondarySuperCacheHits.test  avgt   15  11.375 ± 0.071  ns/op
SecondarySuperCacheInterContention.test     avgt   15  646.087 ± 32.587  ns/op
SecondarySuperCacheInterContention.test:t1  avgt   15  600.090 ± 83.779  ns/op
SecondarySuperCacheInterContention.test:t2  avgt   15  692.084 ± 73.218  ns/op
SecondarySupersLookup.testNegative00  avgt   15   16.420 ± 0.239  ns/op
SecondarySupersLookup.testNegative01  avgt   15   18.307 ± 0.260  ns/op
SecondarySupersLookup.testNegative02  avgt   15   21.695 ± 0.458  ns/op
SecondarySupersLookup.testNegative03  avgt   15   24.855 ± 0.664  ns/op
SecondarySupersLookup.testNegative04  avgt   15   27.305 ± 0.522  ns/op
SecondarySupersLookup.testNegative05  avgt   15   29.719 ± 0.385  ns/op
SecondarySupersLookup.testNegative06  avgt   15   32.231 ± 0.498  ns/op
SecondarySupersLookup.testNegative07  avgt   15   33.747 ± 0.603  ns/op
SecondarySupersLookup.testNegative08  avgt   15   35.856 ± 0.629  ns/op
SecondarySupersLookup.testNegative09  avgt   15   37.077 ± 0.546  ns/op
SecondarySupersLookup.testNegative10  avgt   15   39.408 ± 0.465  ns/op
SecondarySupersLookup.testNegative16  avgt   15   51.041 ± 0.547  ns/op
SecondarySupersLookup.testNegative20  avgt   15   58.722 ± 0.922  ns/op
SecondarySupersLookup.testNegative30  avgt   15   77.310 ± 0.654  ns/op
SecondarySupersLookup.testNegative32  avgt   15   81.116 ± 0.854  ns/op
SecondarySupersLookup.testNegative40  avgt   15   96.311 ± 0.840  ns/op
SecondarySupersLookup.testNegative50  avgt   15  115.427 ± 0.838  ns/op
SecondarySupersLookup.testNegative55  avgt   15  124.371 ± 1.076  ns/op
SecondarySupersLookup.testNegative56  avgt   15  126.796 ± 0.916  ns/op
SecondarySupersLookup.testNegative57  avgt   15  127.952 ± 1.202  ns/op
SecondarySupersLookup.testNegative58  avgt   15  131.956 ± 4.515  ns/op
SecondarySupersLookup.testNegative59  avgt   15  131.858 ± 1.066  ns/op
SecondarySupersLookup.testNegative60  avgt   15  134.294 ± 0.888  ns/op
SecondarySupersLookup.testNegative61  avgt   15  135.462 ± 1.037  ns/op
SecondarySupersLookup.testNegative62  avgt   15  137.805 ± 0.999  ns/op
SecondarySupersLookup.testNegative63  avgt   15  139.335 ± 1.164  ns/op
SecondarySupersLookup.testNegative64  avgt   15  141.401 ± 0.947  ns/op
SecondarySupersLookup.testPositive01  avgt   15   10.731 ± 0.152  ns/op
SecondarySupersLookup.testPositive02  avgt   15   10.726 ± 0.142  ns/op
SecondarySupersLookup.testPositive03  avgt   15   10.728 ± 0.145  ns/op
SecondarySupersLookup.testPositive04  avgt   15   10.730 ± 0.149  ns/op
SecondarySupersLookup.testPositive05  avgt   15   10.730 ± 0.151  ns/op
SecondarySupersLookup.testPositive06  avgt   15   10.730 ± 0.150  ns/op
SecondarySupersLookup.testPositive07  avgt   15   10.731 ± 0.148  ns/op
SecondarySupersLookup.testPositive08  avgt   15   10.730 ± 0.150  ns/op
SecondarySupersLookup.testPositive09  avgt   15   10.730 ± 0.151  ns/op
SecondarySupersLookup.testPositive10  avgt   15   10.734 ± 0.156  ns/op
SecondarySupersLookup.testPositive16  avgt   15   10.742 ± 0.160  ns/op
SecondarySupersLookup.testPositive20  avgt   15   10.730 ± 0.150  ns/op
SecondarySupersLookup.testPositive30  avgt   15   10.735 ± 0.156  ns/op
SecondarySupersLookup.testPositive32  avgt   15   10.731 ± 0.147  ns/op
SecondarySupersLookup.testPositive40  avgt   15   10.744 ± 0.179  ns/op
SecondarySupersLookup.testPositive50  avgt   15   10.733 ± 0.152  ns/op
SecondarySupersLookup.testPositive60  avgt   15   10.748 ± 0.189  ns/op
SecondarySupersLookup.testPositive63  avgt   15   10.726 ± 0.142  ns/op
SecondarySupersLookup.testPositive64  avgt   15   10.733 ± 0.155  ns/op
TypePollution.instanceOfInterfaceSwitchLinearNoSCC          avgt   12  51592.312 ± 6662.713  ns/op
TypePollution.instanceOfInterfaceSwitchLinearSCC            avgt   12  50294.723 ±  371.807  ns/op
TypePollution.instanceOfInterfaceSwitchTableNoSCC           avgt   12  53752.017 ±  346.287  ns/op
TypePollution.instanceOfInterfaceSwitchTableSCC             avgt   12  50053.321 ±  642.562  ns/op
TypePollution.parallelInstanceOfInterfaceSwitchLinearNoSCC  avgt   12   1830.935 ±  262.496  ms/op
TypePollution.parallelInstanceOfInterfaceSwitchLinearSCC    avgt   12   1745.503 ±  201.047  ms/op
TypePollution.parallelInstanceOfInterfaceSwitchTableNoSCC   avgt   12   1794.322 ±  283.656  ms/op
TypePollution.parallelInstanceOfInterfaceSwitchTableSCC     avgt   12   1808.875 ±  235.126  ms/op

With patch:

Benchmark                             Mode  Cnt    Score   Error  Units
SecondarySuperCacheHits.test  avgt   15  11.382 ± 0.027  ns/op
SecondarySuperCacheInterContention.test     avgt   15  621.291 ± 40.619  ns/op
SecondarySuperCacheInterContention.test:t1  avgt   15  635.382 ± 58.347  ns/op
SecondarySuperCacheInterContention.test:t2  avgt   15  607.200 ± 65.490  ns/op
SecondarySupersLookup.testNegative00  avgt   15   13.275 ± 0.223  ns/op
SecondarySupersLookup.testNegative01  avgt   15   13.264 ± 0.201  ns/op
SecondarySupersLookup.testNegative02  avgt   15   13.261 ± 0.194  ns/op
SecondarySupersLookup.testNegative03  avgt   15   13.271 ± 0.210  ns/op
SecondarySupersLookup.testNegative04  avgt   15   13.265 ± 0.201  ns/op
SecondarySupersLookup.testNegative05  avgt   15   13.258 ± 0.191  ns/op
SecondarySupersLookup.testNegative06  avgt   15   13.280 ± 0.225  ns/op
SecondarySupersLookup.testNegative07  avgt   15   13.268 ± 0.201  ns/op
SecondarySupersLookup.testNegative08  avgt   15   13.266 ± 0.202  ns/op
SecondarySupersLookup.testNegative09  avgt   15   13.261 ± 0.196  ns/op
SecondarySupersLookup.testNegative10  avgt   15   13.268 ± 0.198  ns/op
SecondarySupersLookup.testNegative16  avgt   15   13.268 ± 0.205  ns/op
SecondarySupersLookup.testNegative20  avgt   15   13.284 ± 0.231  ns/op
SecondarySupersLookup.testNegative30  avgt   15   13.281 ± 0.226  ns/op
SecondarySupersLookup.testNegative32  avgt   15   13.273 ± 0.215  ns/op
SecondarySupersLookup.testNegative40  avgt   15   13.287 ± 0.233  ns/op
SecondarySupersLookup.testNegative50  avgt   15   13.292 ± 0.242  ns/op
SecondarySupersLookup.testNegative55  avgt   15   53.064 ± 0.757  ns/op
SecondarySupersLookup.testNegative56  avgt   15   53.052 ± 0.767  ns/op
SecondarySupersLookup.testNegative57  avgt   15   53.068 ± 0.803  ns/op
SecondarySupersLookup.testNegative58  avgt   15   53.076 ± 0.776  ns/op
SecondarySupersLookup.testNegative59  avgt   15   53.095 ± 0.846  ns/op
SecondarySupersLookup.testNegative60  avgt   15   75.106 ± 1.033  ns/op
SecondarySupersLookup.testNegative61  avgt   15   76.832 ± 4.047  ns/op
SecondarySupersLookup.testNegative62  avgt   15   75.085 ± 1.010  ns/op
SecondarySupersLookup.testNegative63  avgt   15  153.709 ± 0.893  ns/op
SecondarySupersLookup.testNegative64  avgt   15  155.623 ± 0.922  ns/op
SecondarySupersLookup.testPositive01  avgt   15   10.727 ± 0.145  ns/op
SecondarySupersLookup.testPositive02  avgt   15   10.734 ± 0.157  ns/op
SecondarySupersLookup.testPositive03  avgt   15   10.731 ± 0.151  ns/op
SecondarySupersLookup.testPositive04  avgt   15   10.733 ± 0.156  ns/op
SecondarySupersLookup.testPositive05  avgt   15   10.742 ± 0.168  ns/op
SecondarySupersLookup.testPositive06  avgt   15   10.729 ± 0.148  ns/op
SecondarySupersLookup.testPositive07  avgt   15   10.738 ± 0.163  ns/op
SecondarySupersLookup.testPositive08  avgt   15   10.736 ± 0.159  ns/op
SecondarySupersLookup.testPositive09  avgt   15   10.735 ± 0.158  ns/op
SecondarySupersLookup.testPositive10  avgt   15   10.730 ± 0.150  ns/op
SecondarySupersLookup.testPositive16  avgt   15   10.734 ± 0.157  ns/op
SecondarySupersLookup.testPositive20  avgt   15   10.731 ± 0.149  ns/op
SecondarySupersLookup.testPositive30  avgt   15   10.733 ± 0.156  ns/op
SecondarySupersLookup.testPositive32  avgt   15   10.729 ± 0.149  ns/op
SecondarySupersLookup.testPositive40  avgt   15   10.732 ± 0.154  ns/op
SecondarySupersLookup.testPositive50  avgt   15   10.735 ± 0.154  ns/op
SecondarySupersLookup.testPositive60  avgt   15   10.736 ± 0.157  ns/op
SecondarySupersLookup.testPositive63  avgt   15   10.733 ± 0.155  ns/op
SecondarySupersLookup.testPositive64  avgt   15   10.729 ± 0.149  ns/op
TypePollution.instanceOfInterfaceSwitchLinearNoSCC          avgt   12  52942.830 ±  851.030  ns/op
TypePollution.instanceOfInterfaceSwitchLinearSCC            avgt   12  49901.169 ±  663.837  ns/op
TypePollution.instanceOfInterfaceSwitchTableNoSCC           avgt   12  41443.364 ± 1045.602  ns/op
TypePollution.instanceOfInterfaceSwitchTableSCC             avgt   12  38050.773 ± 1301.636  ns/op
TypePollution.parallelInstanceOfInterfaceSwitchLinearNoSCC  avgt   12   1634.976 ±    5.730  ms/op
TypePollution.parallelInstanceOfInterfaceSwitchLinearSCC    avgt   12   1613.755 ±  199.612  ms/op
TypePollution.parallelInstanceOfInterfaceSwitchTableNoSCC   avgt   12   1979.354 ±  207.235  ms/op
TypePollution.parallelInstanceOfInterfaceSwitchTableSCC     avgt   12   1525.198 ±  167.922  ms/op

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8332587: RISC-V: secondary_super_cache does not scale well (Enhancement - P4)

Reviewers

Hamlin Li (@Hamlin-Li - Reviewer) ⚠️ Review applies to cd656692
Fei Yang (@RealFYang - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19320/head:pull/19320
$ git checkout pull/19320

Update a local copy of the PR:
$ git checkout pull/19320
$ git pull https://git.openjdk.org/jdk.git pull/19320/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19320

View PR using the GUI difftool:
$ git pr show -t 19320

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19320.diff

Webrev

Link to Webrev Comment

bridgekeeper · 2024-05-21T08:32:49Z

👋 Welcome back gcao! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2024-05-21T08:33:46Z

@zifeihan This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8332587: RISC-V: secondary_super_cache does not scale well

Reviewed-by: mli, fyang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 18 new commits pushed to the master branch:

5cad0b4: 8322708: Global HTML attributes are not allowed
6420846: 8334396: RISC-V: verify perf of ReverseBytesI/L
c6f3bf4: 8334026: Provide a diagnostic PrintMemoryMapAtExit switch on Linux
cabd104: 8334164: The fix for JDK-8322811 should use _filename.is_set() rather than strcmp()
d7dad50: 8334544: C2: wrong control assigned in PhaseIdealLoop::clone_assertion_predicate_for_unswitched_loops()
ff30240: 8334239: Introduce macro for ubsan method/function exclusions
2d4185f: 8332717: ZGC: Division by zero in heuristics
fad6644: 8333754: Add a Test against ECDSA and ECDH NIST Test vector
b211929: 8334570: Problem list gc/TestAlwaysPreTouchBehavior.java
4e58d8c: 8309821: Link to hidden classes section in Class specification for Class::isHidden
... and 8 more: https://git.openjdk.org/jdk/compare/48621ae193ef70b2fae4dcb7ddc524f349beb131...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@Hamlin-Li, @RealFYang) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

openjdk · 2024-05-21T08:34:15Z

@zifeihan The following label will be automatically applied to this pull request:

hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2024-06-03T06:47:10Z

Webrevs

Hamlin-Li · 2024-06-03T08:27:50Z

Hey, is there performance comparison data when zbb is not supported? Just want to know if it can also get performance gain in that situation, then this optimization is a more generic one.

The reason for ask this question is that seems JDK-8180450 is an optimization for cache line, it should or could bring benefit even if zbb is not supported?

zifeihan · 2024-06-03T10:57:54Z

Hi, the following is an implementation using scalar assembly when zbb is not available.

diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
index ce6696c18a8..93e1045d2d4 100644
--- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
@@ -3481,6 +3481,36 @@ void MacroAssembler::check_klass_subtype_slow_path(Register sub_klass,
   bind(L_fallthrough);
 }
 
+void MacroAssembler::population_count(Register dst, Register src,
+                                      Register tmp1, Register tmp2) {
+
+  if (UsePopCountInstruction) {
+    cpop(dst, src);
+  } else {
+    assert_different_registers(src, tmp1, tmp2);
+    assert_different_registers(dst, tmp1, tmp2);
+    Label loop, done;
+
+    mv(tmp1, src);
+    // dst = 0;
+    // while(tmp1 != 0) {
+    //   dst++;
+    //   tmp1 &= (tmp1 - 1);
+    // }
+    mv(dst, zr);
+    beqz(tmp1, done);
+    {
+      bind(loop);
+      addi(dst, dst, 1);
+      mv(tmp2, tmp1);
+      addi(tmp2, tmp2, -1);
+      andr(tmp1, tmp1, tmp2);
+      bnez(tmp1, loop);
+    }
+    bind(done);
+  }
+}
+
 // Ensure that the inline code and the stub are using the same registers.
 #define LOOKUP_SECONDARY_SUPERS_TABLE_REGISTERS                     \
 do {                                                                \
@@ -3533,7 +3563,7 @@ bool MacroAssembler::lookup_secondary_supers_table(Register r_sub_klass,
   // Get the first array index that can contain super_klass into r_array_index.
   if (bit != 0) {
     slli(r_array_index, r_bitmap, (Klass::SECONDARY_SUPERS_TABLE_MASK - bit));
-    cpop(r_array_index, r_array_index);
+    population_count(r_array_index, r_array_index, t0, tmp1);
   } else {
     mv(r_array_index, (u1)1);
   }
diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp
index 2575d5ea2ff..3e4930d5605 100644
--- a/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp
+++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp
@@ -322,6 +322,8 @@ class MacroAssembler: public Assembler {
                                      Label* L_success,
                                      Label* L_failure);
 
+  void population_count(Register dst, Register src, Register tmp1, Register tmp2);
+
   // As above, but with a constant super_klass.
   // The result is in Register result, not the condition codes.
   bool lookup_secondary_supers_table(Register r_sub_klass,

JMH tested on HiFive Unmatched(has not Zbb):

Original
Benchmark                             Mode  Cnt    Score    Error  Units                                                                                                                           [14/1836]
SecondarySupersLookup.testNegative00  avgt   15   28.625 ±  7.158  ns/op
SecondarySupersLookup.testNegative01  avgt   15   33.860 ±  6.312  ns/op
SecondarySupersLookup.testNegative02  avgt   15   30.887 ±  4.773  ns/op
SecondarySupersLookup.testNegative03  avgt   15   39.477 ±  6.945  ns/op
SecondarySupersLookup.testNegative04  avgt   15   34.976 ±  3.080  ns/op
SecondarySupersLookup.testNegative05  avgt   15   42.025 ±  8.324  ns/op
SecondarySupersLookup.testNegative06  avgt   15   49.359 ±  8.480  ns/op
SecondarySupersLookup.testNegative07  avgt   15   49.996 ± 11.841  ns/op
SecondarySupersLookup.testNegative08  avgt   15   58.468 ±  8.485  ns/op
SecondarySupersLookup.testNegative09  avgt   15   57.198 ± 10.803  ns/op
SecondarySupersLookup.testNegative10  avgt   15   63.531 ±  5.595  ns/op
SecondarySupersLookup.testNegative16  avgt   15   73.716 ±  9.231  ns/op
SecondarySupersLookup.testNegative20  avgt   15   88.823 ± 16.179  ns/op
SecondarySupersLookup.testNegative30  avgt   15  118.832 ± 18.866  ns/op
SecondarySupersLookup.testNegative32  avgt   15  126.538 ± 23.139  ns/op
SecondarySupersLookup.testNegative40  avgt   15  149.722 ± 31.675  ns/op
SecondarySupersLookup.testNegative50  avgt   15  186.958 ± 39.203  ns/op
SecondarySupersLookup.testNegative55  avgt   15  193.787 ± 29.629  ns/op
SecondarySupersLookup.testNegative56  avgt   15  204.451 ± 34.491  ns/op
SecondarySupersLookup.testNegative57  avgt   15  204.104 ± 27.130  ns/op
SecondarySupersLookup.testNegative58  avgt   15  207.017 ± 31.201  ns/op
SecondarySupersLookup.testNegative59  avgt   15  219.159 ± 33.664  ns/op
SecondarySupersLookup.testNegative60  avgt   15  208.726 ± 27.195  ns/op
SecondarySupersLookup.testNegative61  avgt   15  214.557 ± 30.992  ns/op
SecondarySupersLookup.testNegative62  avgt   15  212.104 ± 30.843  ns/op
SecondarySupersLookup.testNegative63  avgt   15  227.805 ± 39.706  ns/op
SecondarySupersLookup.testNegative64  avgt   15  229.951 ± 42.039  ns/op
SecondarySupersLookup.testPositive01  avgt   15   18.498 ±  4.687  ns/op
SecondarySupersLookup.testPositive02  avgt   15   20.130 ±  4.955  ns/op
SecondarySupersLookup.testPositive03  avgt   15   18.576 ±  4.383  ns/op
SecondarySupersLookup.testPositive04  avgt   15   19.202 ±  4.554  ns/op
SecondarySupersLookup.testPositive05  avgt   15   18.923 ±  4.730  ns/op
SecondarySupersLookup.testPositive06  avgt   15   20.494 ±  6.282  ns/op
SecondarySupersLookup.testPositive07  avgt   15   17.679 ±  2.386  ns/op
SecondarySupersLookup.testPositive08  avgt   15   19.396 ±  7.047  ns/op
SecondarySupersLookup.testPositive09  avgt   15   18.163 ±  2.950  ns/op
SecondarySupersLookup.testPositive10  avgt   15   21.135 ±  5.552  ns/op
SecondarySupersLookup.testPositive16  avgt   15   20.117 ±  4.606  ns/op
SecondarySupersLookup.testPositive20  avgt   15   21.209 ±  5.800  ns/op
SecondarySupersLookup.testPositive30  avgt   15   21.388 ±  6.792  ns/op
SecondarySupersLookup.testPositive32  avgt   15   19.720 ±  4.559  ns/op
SecondarySupersLookup.testPositive40  avgt   15   17.354 ±  2.707  ns/op
SecondarySupersLookup.testPositive50  avgt   15   20.825 ±  6.062  ns/op
SecondarySupersLookup.testPositive60  avgt   15   19.910 ±  5.621  ns/op
SecondarySupersLookup.testPositive63  avgt   15   18.989 ±  3.156  ns/op
SecondarySupersLookup.testPositive64  avgt   15   20.298 ±  5.357  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'

With patch:
Benchmark                             Mode  Cnt    Score    Error  Units
SecondarySupersLookup.testNegative00  avgt   15   21.124 ±  1.601  ns/op
SecondarySupersLookup.testNegative01  avgt   15   25.788 ±  3.713  ns/op
SecondarySupersLookup.testNegative02  avgt   15   25.501 ±  5.616  ns/op
SecondarySupersLookup.testNegative03  avgt   15   22.800 ±  6.454  ns/op
SecondarySupersLookup.testNegative04  avgt   15   21.790 ±  2.629  ns/op
SecondarySupersLookup.testNegative05  avgt   15   25.485 ±  6.082  ns/op
SecondarySupersLookup.testNegative06  avgt   15   24.801 ±  5.387  ns/op
SecondarySupersLookup.testNegative07  avgt   15   24.425 ±  4.686  ns/op
SecondarySupersLookup.testNegative08  avgt   15   24.486 ±  4.044  ns/op
SecondarySupersLookup.testNegative09  avgt   15   23.810 ±  3.838  ns/op
SecondarySupersLookup.testNegative10  avgt   15   25.085 ±  3.756  ns/op
SecondarySupersLookup.testNegative16  avgt   15   22.018 ±  2.924  ns/op
SecondarySupersLookup.testNegative20  avgt   15   23.161 ±  3.271  ns/op
SecondarySupersLookup.testNegative30  avgt   15   23.705 ±  4.669  ns/op
SecondarySupersLookup.testNegative32  avgt   15   25.048 ±  7.125  ns/op
SecondarySupersLookup.testNegative40  avgt   15   24.661 ±  3.541  ns/op
SecondarySupersLookup.testNegative50  avgt   15   22.918 ±  2.879  ns/op
SecondarySupersLookup.testNegative55  avgt   15  250.982 ± 10.224  ns/op
SecondarySupersLookup.testNegative56  avgt   15  251.020 ±  8.432  ns/op
SecondarySupersLookup.testNegative57  avgt   15  255.998 ±  9.054  ns/op
SecondarySupersLookup.testNegative58  avgt   15  257.347 ± 11.340  ns/op
SecondarySupersLookup.testNegative59  avgt   15  277.727 ± 10.007  ns/op
SecondarySupersLookup.testNegative60  avgt   15  304.818 ± 12.092  ns/op
SecondarySupersLookup.testNegative61  avgt   15  308.956 ± 13.060  ns/op
SecondarySupersLookup.testNegative62  avgt   15  309.804 ± 14.715  ns/op
SecondarySupersLookup.testNegative63  avgt   15  416.021 ±  8.051  ns/op
SecondarySupersLookup.testNegative64  avgt   15  425.190 ± 10.966  ns/op
SecondarySupersLookup.testPositive01  avgt   15   18.369 ±  4.490  ns/op
SecondarySupersLookup.testPositive02  avgt   15   21.595 ±  6.626  ns/op
SecondarySupersLookup.testPositive03  avgt   15   19.327 ±  4.973  ns/op
SecondarySupersLookup.testPositive04  avgt   15   19.636 ±  4.759  ns/op
SecondarySupersLookup.testPositive05  avgt   15   17.055 ±  2.329  ns/op
SecondarySupersLookup.testPositive06  avgt   15   18.712 ±  3.333  ns/op
SecondarySupersLookup.testPositive07  avgt   15   20.508 ±  4.213  ns/op
SecondarySupersLookup.testPositive08  avgt   15   19.208 ±  3.761  ns/op
SecondarySupersLookup.testPositive09  avgt   15   18.061 ±  3.619  ns/op
SecondarySupersLookup.testPositive10  avgt   15   17.519 ±  3.322  ns/op
SecondarySupersLookup.testPositive16  avgt   15   19.099 ±  4.358  ns/op
SecondarySupersLookup.testPositive20  avgt   15   20.731 ±  5.230  ns/op
SecondarySupersLookup.testPositive30  avgt   15   18.048 ±  2.994  ns/op
SecondarySupersLookup.testPositive32  avgt   15   18.817 ±  3.856  ns/op
SecondarySupersLookup.testPositive40  avgt   15   17.165 ±  2.536  ns/op
SecondarySupersLookup.testPositive50  avgt   15   20.060 ±  4.473  ns/op
SecondarySupersLookup.testPositive60  avgt   15   17.296 ±  2.411  ns/op
SecondarySupersLookup.testPositive63  avgt   15   19.313 ±  4.133  ns/op
SecondarySupersLookup.testPositive64  avgt   15   21.258 ±  4.909  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'

JMH tested on LicheePi 4A(has not Zbb):

Original
Benchmark                             Mode  Cnt    Score   Error  Units
SecondarySupersLookup.testNegative00  avgt   15   26.297 ± 0.941  ns/op
SecondarySupersLookup.testNegative01  avgt   15   26.345 ± 1.404  ns/op
SecondarySupersLookup.testNegative02  avgt   15   27.191 ± 1.421  ns/op
SecondarySupersLookup.testNegative03  avgt   15   28.546 ± 1.475  ns/op
SecondarySupersLookup.testNegative04  avgt   15   28.785 ± 1.329  ns/op
SecondarySupersLookup.testNegative05  avgt   15   30.524 ± 1.816  ns/op
SecondarySupersLookup.testNegative06  avgt   15   30.284 ± 0.984  ns/op
SecondarySupersLookup.testNegative07  avgt   15   31.594 ± 1.093  ns/op
SecondarySupersLookup.testNegative08  avgt   15   32.778 ± 1.269  ns/op
SecondarySupersLookup.testNegative09  avgt   15   33.549 ± 0.913  ns/op
SecondarySupersLookup.testNegative10  avgt   15   35.468 ± 1.643  ns/op
SecondarySupersLookup.testNegative16  avgt   15   68.065 ± 1.890  ns/op
SecondarySupersLookup.testNegative20  avgt   15   58.098 ± 2.704  ns/op
SecondarySupersLookup.testNegative30  avgt   15   70.944 ± 2.929  ns/op
SecondarySupersLookup.testNegative32  avgt   15   76.134 ± 3.719  ns/op
SecondarySupersLookup.testNegative40  avgt   15   89.092 ± 4.396  ns/op
SecondarySupersLookup.testNegative50  avgt   15  105.226 ± 4.877  ns/op
SecondarySupersLookup.testNegative55  avgt   15  115.744 ± 6.281  ns/op
SecondarySupersLookup.testNegative56  avgt   15  119.860 ± 5.618  ns/op
SecondarySupersLookup.testNegative57  avgt   15  117.818 ± 5.497  ns/op
SecondarySupersLookup.testNegative58  avgt   15  121.410 ± 6.781  ns/op
SecondarySupersLookup.testNegative59  avgt   15  124.500 ± 7.016  ns/op
SecondarySupersLookup.testNegative60  avgt   15  125.322 ± 5.241  ns/op
SecondarySupersLookup.testNegative61  avgt   15  129.009 ± 4.680  ns/op
SecondarySupersLookup.testNegative62  avgt   15  126.704 ± 5.917  ns/op
SecondarySupersLookup.testNegative63  avgt   15  131.529 ± 5.247  ns/op
SecondarySupersLookup.testNegative64  avgt   15  134.511 ± 4.925  ns/op
SecondarySupersLookup.testPositive01  avgt   15   22.386 ± 0.680  ns/op
SecondarySupersLookup.testPositive02  avgt   15   21.655 ± 0.492  ns/op
SecondarySupersLookup.testPositive03  avgt   15   22.123 ± 0.671  ns/op
SecondarySupersLookup.testPositive04  avgt   15   22.050 ± 0.610  ns/op
SecondarySupersLookup.testPositive05  avgt   15   22.048 ± 0.614  ns/op
SecondarySupersLookup.testPositive06  avgt   15   21.850 ± 0.597  ns/op
SecondarySupersLookup.testPositive07  avgt   15   21.844 ± 0.619  ns/op
SecondarySupersLookup.testPositive08  avgt   15   21.832 ± 0.601  ns/op
SecondarySupersLookup.testPositive09  avgt   15   21.743 ± 0.527  ns/op
SecondarySupersLookup.testPositive10  avgt   15   22.037 ± 0.609  ns/op
SecondarySupersLookup.testPositive16  avgt   15   22.300 ± 0.502  ns/op
SecondarySupersLookup.testPositive20  avgt   15   21.607 ± 0.498  ns/op
SecondarySupersLookup.testPositive30  avgt   15   21.836 ± 0.602  ns/op
SecondarySupersLookup.testPositive32  avgt   15   21.629 ± 0.484  ns/op
SecondarySupersLookup.testPositive40  avgt   15   21.850 ± 0.621  ns/op
SecondarySupersLookup.testPositive50  avgt   15   22.478 ± 0.130  ns/op
SecondarySupersLookup.testPositive60  avgt   15   22.058 ± 0.617  ns/op
SecondarySupersLookup.testPositive63  avgt   15   21.828 ± 0.596  ns/op
SecondarySupersLookup.testPositive64  avgt   15   22.077 ± 0.603  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'

With patch:
Benchmark                             Mode  Cnt    Score    Error  Units
SecondarySupersLookup.testNegative00  avgt   15   24.305 ±  1.269  ns/op
SecondarySupersLookup.testNegative01  avgt   15   23.552 ±  1.214  ns/op
SecondarySupersLookup.testNegative02  avgt   15   22.720 ±  0.771  ns/op
SecondarySupersLookup.testNegative03  avgt   15   22.713 ±  0.834  ns/op
SecondarySupersLookup.testNegative04  avgt   15   22.924 ±  0.684  ns/op
SecondarySupersLookup.testNegative05  avgt   15   22.614 ±  0.604  ns/op
SecondarySupersLookup.testNegative06  avgt   15   22.387 ±  0.641  ns/op
SecondarySupersLookup.testNegative07  avgt   15   22.201 ±  0.502  ns/op
SecondarySupersLookup.testNegative08  avgt   15   22.391 ±  0.606  ns/op
SecondarySupersLookup.testNegative09  avgt   15   22.462 ±  0.617  ns/op
SecondarySupersLookup.testNegative10  avgt   15   22.525 ±  0.202  ns/op
SecondarySupersLookup.testNegative16  avgt   15   22.439 ±  0.616  ns/op
SecondarySupersLookup.testNegative20  avgt   15   22.963 ±  0.298  ns/op
SecondarySupersLookup.testNegative30  avgt   15   22.642 ±  0.621  ns/op
SecondarySupersLookup.testNegative32  avgt   15   22.306 ±  0.670  ns/op
SecondarySupersLookup.testNegative40  avgt   15   22.663 ±  0.644  ns/op
SecondarySupersLookup.testNegative50  avgt   15   22.001 ±  0.238  ns/op
SecondarySupersLookup.testNegative55  avgt   15  128.558 ±  5.735  ns/op
SecondarySupersLookup.testNegative56  avgt   15  128.633 ±  4.893  ns/op
SecondarySupersLookup.testNegative57  avgt   15  129.143 ±  5.955  ns/op
SecondarySupersLookup.testNegative58  avgt   15  132.434 ±  6.478  ns/op
SecondarySupersLookup.testNegative59  avgt   15  130.243 ±  5.901  ns/op
SecondarySupersLookup.testNegative60  avgt   15  163.505 ±  8.278  ns/op
SecondarySupersLookup.testNegative61  avgt   15  163.934 ±  9.008  ns/op
SecondarySupersLookup.testNegative62  avgt   15  162.247 ±  6.238  ns/op
SecondarySupersLookup.testNegative63  avgt   15  213.133 ±  9.582  ns/op
SecondarySupersLookup.testNegative64  avgt   15  214.724 ± 11.562  ns/op
SecondarySupersLookup.testPositive01  avgt   15   21.622 ±  0.482  ns/op
SecondarySupersLookup.testPositive02  avgt   15   21.842 ±  0.602  ns/op
SecondarySupersLookup.testPositive03  avgt   15   22.274 ±  0.516  ns/op
SecondarySupersLookup.testPositive04  avgt   15   21.833 ±  0.632  ns/op
SecondarySupersLookup.testPositive05  avgt   15   21.842 ±  0.603  ns/op
SecondarySupersLookup.testPositive06  avgt   15   21.630 ±  0.527  ns/op
SecondarySupersLookup.testPositive07  avgt   15   22.054 ±  0.581  ns/op
SecondarySupersLookup.testPositive08  avgt   15   21.872 ±  0.613  ns/op
SecondarySupersLookup.testPositive09  avgt   15   21.839 ±  0.604  ns/op
SecondarySupersLookup.testPositive10  avgt   15   21.619 ±  0.494  ns/op
SecondarySupersLookup.testPositive16  avgt   15   21.624 ±  0.509  ns/op
SecondarySupersLookup.testPositive20  avgt   15   21.828 ±  0.595  ns/op
SecondarySupersLookup.testPositive30  avgt   15   21.861 ±  0.617  ns/op
SecondarySupersLookup.testPositive32  avgt   15   22.141 ±  0.609  ns/op
SecondarySupersLookup.testPositive40  avgt   15   21.632 ±  0.485  ns/op
SecondarySupersLookup.testPositive50  avgt   15   21.856 ±  0.597  ns/op
SecondarySupersLookup.testPositive60  avgt   15   22.068 ±  0.610  ns/op
SecondarySupersLookup.testPositive63  avgt   15   21.647 ±  0.496  ns/op
SecondarySupersLookup.testPositive64  avgt   15   21.847 ±  0.595  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'

With the above test data, we see that there is a performance decrease from testNegative55 to testNegative64 when zbb is not available. This is because a loop is needed to count the number of 1 when Zbb is not available. Therefore, the current patch is only optimized when Zbb is available.

zifeihan · 2024-06-03T11:00:20Z

Hey, is there performance comparison data when zbb is not supported? Just want to know if it can also get performance gain in that situation, then this optimization is a more generic one.

The reason for ask this question is that seems JDK-8180450 is an optimization for cache line, it should or could bring benefit even if zbb is not supported?

Hi, Thanks for the review. Sorry for being late. I've implemented the scalar register version before, but tested it and had data performance decrease, so the scalar implementation is not included in this Patch.

Hamlin-Li

Thanks for explanation. Seems this patch will also use some instructions in zba/zbs extension. Will they also impact the final performance?

Also has some minor comments first.

src/hotspot/cpu/riscv/vm_version_riscv.hpp

src/hotspot/cpu/riscv/riscv.ad

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp

zifeihan · 2024-06-03T11:33:01Z

Thanks for explanation. Seems this patch will also use some instructions in zba/zbs extension. Will they also impact the final performance?

Also has some minor comments first.

I'm sorry for the Banana Pi BPI-F3 board data I forgot to specify that Banana Pi BPI-F3 board doesn't enable zba, zbb, zbs by default, here zbb is enabled manually by me. So the Banana Pi BPI-F3 board JMH data here is the performance JMH data without zba,zbs enabled.

Hamlin-Li

Some more comments.

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp

zifeihan · 2024-06-04T02:16:52Z

@Hamlin-Li : Hi Hamlin, I've fixed all of the above comments. I've enabled Zba, Zbb, Zbs and tested JMH before and the performance has improved a bit compared to just enabling Zbb, I'll be re-testing the JMH with Zba, Zbb,Zbs enabled today and I'll update here when I'm done with it.

zifeihan · 2024-06-04T07:54:08Z

JMH tested on Banana Pi BPI-F3 board (has Zba,Zbb,Zbs) and Enable UseZba, UseZbb, UseZbs
Original(not with patch):

Benchmark                             Mode  Cnt    Score   Error  Units
SecondarySupersLookup.testNegative00  avgt   15   16.405 ± 0.221  ns/op                                                                SecondarySupersLookup.testNegative01  avgt   15   18.276 ± 0.251  ns/op
SecondarySupersLookup.testNegative02  avgt   15   21.519 ± 0.314  ns/op
SecondarySupersLookup.testNegative03  avgt   15   24.434 ± 0.370  ns/op                                                                SecondarySupersLookup.testNegative04  avgt   15   27.246 ± 0.441  ns/op                                                                SecondarySupersLookup.testNegative05  avgt   15   29.652 ± 0.397  ns/op
SecondarySupersLookup.testNegative06  avgt   15   32.049 ± 0.533  ns/op
SecondarySupersLookup.testNegative07  avgt   15   33.568 ± 0.501  ns/op
SecondarySupersLookup.testNegative08  avgt   15   35.593 ± 0.606  ns/op
SecondarySupersLookup.testNegative09  avgt   15   37.220 ± 0.407  ns/op
SecondarySupersLookup.testNegative10  avgt   15   39.522 ± 0.511  ns/op
SecondarySupersLookup.testNegative16  avgt   15   51.146 ± 0.667  ns/op
SecondarySupersLookup.testNegative20  avgt   15   58.404 ± 0.654  ns/op
SecondarySupersLookup.testNegative30  avgt   15   77.190 ± 0.796  ns/op
SecondarySupersLookup.testNegative32  avgt   15   81.144 ± 0.761  ns/op
SecondarySupersLookup.testNegative40  avgt   15   96.018 ± 0.733  ns/op
SecondarySupersLookup.testNegative50  avgt   15  115.170 ± 0.876  ns/op
SecondarySupersLookup.testNegative55  avgt   15  125.827 ± 4.322  ns/op
SecondarySupersLookup.testNegative56  avgt   15  126.151 ± 0.979  ns/op
SecondarySupersLookup.testNegative57  avgt   15  129.638 ± 4.326  ns/op
SecondarySupersLookup.testNegative58  avgt   15  131.960 ± 4.584  ns/op
SecondarySupersLookup.testNegative59  avgt   15  131.403 ± 1.051  ns/op
SecondarySupersLookup.testNegative60  avgt   15  133.660 ± 0.888  ns/op
SecondarySupersLookup.testNegative61  avgt   15  137.293 ± 4.852  ns/op
SecondarySupersLookup.testNegative62  avgt   15  137.476 ± 1.081  ns/op
SecondarySupersLookup.testNegative63  avgt   15  139.028 ± 1.026  ns/op
SecondarySupersLookup.testNegative64  avgt   15  143.545 ± 5.011  ns/op
SecondarySupersLookup.testPositive01  avgt   15   10.734 ± 0.156  ns/op
SecondarySupersLookup.testPositive02  avgt   15   10.727 ± 0.145  ns/op
SecondarySupersLookup.testPositive03  avgt   15   10.729 ± 0.149  ns/op
SecondarySupersLookup.testPositive04  avgt   15   10.724 ± 0.140  ns/op
SecondarySupersLookup.testPositive05  avgt   15   10.730 ± 0.152  ns/op
SecondarySupersLookup.testPositive06  avgt   15   10.726 ± 0.143  ns/op
SecondarySupersLookup.testPositive07  avgt   15   10.731 ± 0.151  ns/op
SecondarySupersLookup.testPositive08  avgt   15   10.735 ± 0.158  ns/op
SecondarySupersLookup.testPositive09  avgt   15   10.731 ± 0.152  ns/op
SecondarySupersLookup.testPositive10  avgt   15   10.728 ± 0.147  ns/op
SecondarySupersLookup.testPositive16  avgt   15   10.732 ± 0.153  ns/op
SecondarySupersLookup.testPositive20  avgt   15   10.727 ± 0.145  ns/op
SecondarySupersLookup.testPositive30  avgt   15   10.749 ± 0.170  ns/op
SecondarySupersLookup.testPositive32  avgt   15   10.732 ± 0.153  ns/op
SecondarySupersLookup.testPositive40  avgt   15   10.732 ± 0.153  ns/op
SecondarySupersLookup.testPositive50  avgt   15   10.732 ± 0.153  ns/op
SecondarySupersLookup.testPositive60  avgt   15   10.733 ± 0.155  ns/op
SecondarySupersLookup.testPositive63  avgt   15   10.733 ± 0.153  ns/op
SecondarySupersLookup.testPositive64  avgt   15   10.735 ± 0.158  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'

With patch:

Benchmark                             Mode  Cnt    Score   Error  Units
SecondarySupersLookup.testNegative00  avgt   15   13.270 ± 0.201  ns/op
SecondarySupersLookup.testNegative01  avgt   15   13.261 ± 0.194  ns/op
SecondarySupersLookup.testNegative02  avgt   15   13.265 ± 0.198  ns/op
SecondarySupersLookup.testNegative03  avgt   15   13.274 ± 0.222  ns/op
SecondarySupersLookup.testNegative04  avgt   15   13.262 ± 0.197  ns/op
SecondarySupersLookup.testNegative05  avgt   15   13.264 ± 0.200  ns/op
SecondarySupersLookup.testNegative06  avgt   15   13.259 ± 0.193  ns/op
SecondarySupersLookup.testNegative07  avgt   15   13.261 ± 0.196  ns/op
SecondarySupersLookup.testNegative08  avgt   15   13.255 ± 0.186  ns/op
SecondarySupersLookup.testNegative09  avgt   15   13.267 ± 0.200  ns/op
SecondarySupersLookup.testNegative10  avgt   15   13.265 ± 0.200  ns/op
SecondarySupersLookup.testNegative16  avgt   15   13.277 ± 0.220  ns/op
SecondarySupersLookup.testNegative20  avgt   15   13.270 ± 0.209  ns/op
SecondarySupersLookup.testNegative30  avgt   15   13.279 ± 0.223  ns/op
SecondarySupersLookup.testNegative32  avgt   15   13.284 ± 0.232  ns/op
SecondarySupersLookup.testNegative40  avgt   15   13.288 ± 0.237  ns/op
SecondarySupersLookup.testNegative50  avgt   15   13.290 ± 0.241  ns/op
SecondarySupersLookup.testNegative55  avgt   15   51.179 ± 0.761  ns/op
SecondarySupersLookup.testNegative56  avgt   15   51.175 ± 0.763  ns/op
SecondarySupersLookup.testNegative57  avgt   15   51.550 ± 1.070  ns/op
SecondarySupersLookup.testNegative58  avgt   15   51.182 ± 0.737  ns/op
SecondarySupersLookup.testNegative59  avgt   15   51.169 ± 0.773  ns/op
SecondarySupersLookup.testNegative60  avgt   15   74.605 ± 1.445  ns/op
SecondarySupersLookup.testNegative61  avgt   15   74.434 ± 1.006  ns/op
SecondarySupersLookup.testNegative62  avgt   15   74.587 ± 1.078  ns/op
SecondarySupersLookup.testNegative63  avgt   15  155.881 ± 0.856  ns/op
SecondarySupersLookup.testNegative64  avgt   15  158.028 ± 5.778  ns/op
SecondarySupersLookup.testPositive01  avgt   15   10.744 ± 0.176  ns/op
SecondarySupersLookup.testPositive02  avgt   15   10.731 ± 0.151  ns/op
SecondarySupersLookup.testPositive03  avgt   15   10.727 ± 0.146  ns/op
SecondarySupersLookup.testPositive04  avgt   15   10.732 ± 0.153  ns/op
SecondarySupersLookup.testPositive05  avgt   15   10.728 ± 0.147  ns/op
SecondarySupersLookup.testPositive06  avgt   15   10.731 ± 0.151  ns/op
SecondarySupersLookup.testPositive07  avgt   15   10.725 ± 0.143  ns/op
SecondarySupersLookup.testPositive08  avgt   15   10.730 ± 0.148  ns/op
SecondarySupersLookup.testPositive09  avgt   15   10.734 ± 0.156  ns/op
SecondarySupersLookup.testPositive10  avgt   15   10.726 ± 0.144  ns/op
SecondarySupersLookup.testPositive16  avgt   15   10.727 ± 0.146  ns/op
SecondarySupersLookup.testPositive20  avgt   15   10.731 ± 0.152  ns/op
SecondarySupersLookup.testPositive30  avgt   15   10.729 ± 0.150  ns/op
SecondarySupersLookup.testPositive32  avgt   15   10.728 ± 0.147  ns/op
SecondarySupersLookup.testPositive40  avgt   15   10.745 ± 0.181  ns/op
SecondarySupersLookup.testPositive50  avgt   15   10.732 ± 0.153  ns/op
SecondarySupersLookup.testPositive60  avgt   15   10.729 ± 0.148  ns/op
SecondarySupersLookup.testPositive63  avgt   15   10.734 ± 0.158  ns/op
SecondarySupersLookup.testPositive64  avgt   15   10.732 ± 0.154  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'

Hamlin-Li · 2024-06-04T08:55:24Z

There are a bit regression in cases of testNegative63/64, although these might be rare cases or not very common cases, but it's worth to have a try to improve it if possible.
I guess it's related to the implementation for the cases when bitmap is full. When it's full, before go to repne_scan, there're some instructions to execute. I wonder if it will help to have another "bitmap full test" just after "bitmap false test" (which is test_bit(t0, r_bitmap, bit);). But I'm not sure if it's feasible, maybe worth a try.

zifeihan · 2024-06-07T09:45:29Z

There are a bit regression in cases of testNegative63/64, although these might be rare cases or not very common cases, but it's worth to have a try to improve it if possible. I guess it's related to the implementation for the cases when bitmap is full. When it's full, before go to repne_scan, there're some instructions to execute. I wonder if it will help to have another "bitmap full test" just after "bitmap false test" (which is test_bit(t0, r_bitmap, bit);). But I'm not sure if it's feasible, maybe worth a try.

Hi, Sorry for being late. Thanks for the suggestion, I gave it a try.

diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
index 61e8016db4a..af1649c061f 100644
--- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
@@ -3664,6 +3664,32 @@ bool MacroAssembler::lookup_secondary_supers_table(Register r_sub_klass,
   test_bit(t0, r_bitmap, bit);
   beqz(t0, L_fallthrough);
 
+  // We will consult the secondary-super array.
+  ld(r_array_base, Address(r_sub_klass, in_bytes(Klass::secondary_supers_offset())));
+
+  mv(tmp3, r_bitmap);
+  if (bit != 0) {
+    ror_imm(tmp3, tmp3, bit);
+  }
+
+  Label skip;
+  addi(t0, tmp3, (u1)1);
+  bnez(t0, skip);
+
+    // Load the array length.
+  lwu(r_array_length, Address(r_array_base, Array<Klass*>::length_offset_in_bytes()));
+  // And adjust the array base to point to the data.
+  // NB! Effectively increments current slot index by 1.
+  assert(Array<Klass*>::base_offset_in_bytes() == wordSize, "");
+  addi(r_array_base, r_array_base, Array<Klass*>::base_offset_in_bytes());
+  
+  repne_scan(r_array_base, r_super_klass, r_array_length, t0);
+  bne(r_super_klass, t0, L_fallthrough);
+  mv(result, zr);
+  beqz(result, L_fallthrough);
+  
+  bind(skip);
+
   // Get the first array index that can contain super_klass into r_array_index.
   if (bit != 0) {
     slli(r_array_index, r_bitmap, (Klass::SECONDARY_SUPERS_TABLE_MASK - bit));
@@ -3672,9 +3698,6 @@ bool MacroAssembler::lookup_secondary_supers_table(Register r_sub_klass,
     mv(r_array_index, (u1)1);
   }
 
-  // We will consult the secondary-super array.
-  ld(r_array_base, Address(r_sub_klass, in_bytes(Klass::secondary_supers_offset())));
-
   // The value i in r_array_index is >= 1, so even though r_array_base
   // points to the length, we don't need to adjust it to point to the data.
   assert(Array<Klass*>::base_offset_in_bytes() == wordSize, "Adjust this code");

JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs

Benchmark                             Mode  Cnt    Score   Error  Units
SecondarySupersLookup.testNegative00  avgt   15   13.262 ± 0.197  ns/op
SecondarySupersLookup.testNegative01  avgt   15   13.273 ± 0.222  ns/op
SecondarySupersLookup.testNegative02  avgt   15   13.264 ± 0.199  ns/op
SecondarySupersLookup.testNegative03  avgt   15   13.275 ± 0.222  ns/op
SecondarySupersLookup.testNegative04  avgt   15   13.264 ± 0.198  ns/op
SecondarySupersLookup.testNegative05  avgt   15   13.259 ± 0.192  ns/op
SecondarySupersLookup.testNegative06  avgt   15   13.260 ± 0.195  ns/op
SecondarySupersLookup.testNegative07  avgt   15   13.275 ± 0.221  ns/op
SecondarySupersLookup.testNegative08  avgt   15   13.261 ± 0.196  ns/op
SecondarySupersLookup.testNegative09  avgt   15   13.267 ± 0.201  ns/op
SecondarySupersLookup.testNegative10  avgt   15   13.272 ± 0.211  ns/op
SecondarySupersLookup.testNegative16  avgt   15   13.271 ± 0.200  ns/op
SecondarySupersLookup.testNegative20  avgt   15   13.271 ± 0.210  ns/op
SecondarySupersLookup.testNegative30  avgt   15   13.277 ± 0.219  ns/op
SecondarySupersLookup.testNegative32  avgt   15   13.280 ± 0.224  ns/op
SecondarySupersLookup.testNegative40  avgt   15   13.285 ± 0.232  ns/op
SecondarySupersLookup.testNegative50  avgt   15   13.288 ± 0.237  ns/op
SecondarySupersLookup.testNegative55  avgt   15   54.940 ± 0.771  ns/op
SecondarySupersLookup.testNegative56  avgt   15   54.934 ± 0.798  ns/op
SecondarySupersLookup.testNegative57  avgt   15   54.909 ± 0.766  ns/op
SecondarySupersLookup.testNegative58  avgt   15   54.679 ± 0.830  ns/op
SecondarySupersLookup.testNegative59  avgt   15   54.941 ± 0.819  ns/op
SecondarySupersLookup.testNegative60  avgt   15   76.957 ± 0.945  ns/op
SecondarySupersLookup.testNegative61  avgt   15   76.956 ± 1.007  ns/op
SecondarySupersLookup.testNegative62  avgt   15   76.938 ± 0.976  ns/op
SecondarySupersLookup.testNegative63  avgt   15  138.371 ± 1.192  ns/op
SecondarySupersLookup.testNegative64  avgt   15  140.137 ± 1.077  ns/op
SecondarySupersLookup.testPositive01  avgt   15   10.734 ± 0.149  ns/op
SecondarySupersLookup.testPositive02  avgt   15   10.730 ± 0.150  ns/op
SecondarySupersLookup.testPositive03  avgt   15   10.727 ± 0.146  ns/op
SecondarySupersLookup.testPositive04  avgt   15   10.735 ± 0.157  ns/op
SecondarySupersLookup.testPositive05  avgt   15   10.730 ± 0.149  ns/op
SecondarySupersLookup.testPositive06  avgt   15   10.735 ± 0.155  ns/op
SecondarySupersLookup.testPositive07  avgt   15   10.730 ± 0.149  ns/op
SecondarySupersLookup.testPositive08  avgt   15   10.730 ± 0.149  ns/op
SecondarySupersLookup.testPositive09  avgt   15   10.731 ± 0.151  ns/op
SecondarySupersLookup.testPositive10  avgt   15   10.728 ± 0.148  ns/op
SecondarySupersLookup.testPositive16  avgt   15   10.733 ± 0.151  ns/op
SecondarySupersLookup.testPositive20  avgt   15   10.728 ± 0.146  ns/op
SecondarySupersLookup.testPositive30  avgt   15   10.737 ± 0.162  ns/op
SecondarySupersLookup.testPositive32  avgt   15   10.733 ± 0.154  ns/op
SecondarySupersLookup.testPositive40  avgt   15   10.729 ± 0.148  ns/op
SecondarySupersLookup.testPositive50  avgt   15   10.732 ± 0.151  ns/op
SecondarySupersLookup.testPositive60  avgt   15   10.729 ± 0.148  ns/op
SecondarySupersLookup.testPositive63  avgt   15   10.730 ± 0.150  ns/op
SecondarySupersLookup.testPositive64  avgt   15   10.735 ± 0.158  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'

After the fix it got better with the cases of testNegative63/64. As said above, this case is very rare, Is there a need for this additional complexity?

Hamlin-Li · 2024-06-10T18:32:05Z

Thanks for updating!

With the fix, although it improves the perf for testNegative63/64, but seems it brings some regression for testNegative55-62, in this sense the fix should not be taken.
I'll take another look, sorry for long waiting.

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp

Hamlin-Li

Thanks for your patience.
I was thinking to jump to L_bitmap_full in lookup_secondary_supers_table_slow_path, in this way I guess it might address the performance issue when bitmap full, and not introduce regression in other cases. But I'm not sure how much complication will be brought into the implementation. So, let's skip this rare case optmization.

Some minor comment, otherwise looks good.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp

theRealAph · 2024-06-14T13:37:03Z

There are a bit regression in cases of testNegative63/64, although these might be rare cases or not very common cases, but it's worth to have a try to improve it if possible. I guess it's related to the implementation for the cases when bitmap is full. When it's full, before go to repne_scan, there're some instructions to execute. I wonder if it will help to have another "bitmap full test" just after "bitmap false test" (which is test_bit(t0, r_bitmap, bit);). But I'm not sure if it's feasible, maybe worth a try.

So many superinterfaces is very rate. So rare, in fact, that it may never have happened in production Java code.

theRealAph · 2024-06-14T13:48:16Z

It's worth running the "before" test with -XX:-UseSecondarySupersCache. This gives you a much better idea of the cost when you don't get any hit on the one-element secondary supers cache.

zifeihan · 2024-06-15T06:56:33Z

It's worth running the "before" test with -XX:-UseSecondarySupersCache. This gives you a much better idea of the cost when you don't get any hit on the one-element secondary supers cache.

JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, and use -XX:-UseSecondarySupersCache to disable UseSecondarySupersCache
Original(not with patch):

Benchmark                             Mode  Cnt    Score   Error  Units                     [81/1889]
SecondarySupersLookup.testNegative00  avgt   15   15.153 ± 0.219  ns/op
SecondarySupersLookup.testNegative01  avgt   15   17.029 ± 0.287  ns/op
SecondarySupersLookup.testNegative02  avgt   15   20.242 ± 0.353  ns/op
SecondarySupersLookup.testNegative03  avgt   15   23.737 ± 0.515  ns/op
SecondarySupersLookup.testNegative04  avgt   15   26.097 ± 0.477  ns/op
SecondarySupersLookup.testNegative05  avgt   15   28.460 ± 0.546  ns/op
SecondarySupersLookup.testNegative06  avgt   15   31.025 ± 0.622  ns/op
SecondarySupersLookup.testNegative07  avgt   15   32.070 ± 0.518  ns/op
SecondarySupersLookup.testNegative08  avgt   15   34.656 ± 0.586  ns/op
SecondarySupersLookup.testNegative09  avgt   15   36.140 ± 0.622  ns/op
SecondarySupersLookup.testNegative10  avgt   15   38.304 ± 0.577  ns/op
SecondarySupersLookup.testNegative16  avgt   15   49.672 ± 0.726  ns/op
SecondarySupersLookup.testNegative20  avgt   15   57.241 ± 0.709  ns/op
SecondarySupersLookup.testNegative30  avgt   15   76.189 ± 0.804  ns/op
SecondarySupersLookup.testNegative32  avgt   15   79.821 ± 0.809  ns/op
SecondarySupersLookup.testNegative40  avgt   15   95.006 ± 0.999  ns/op
SecondarySupersLookup.testNegative50  avgt   15  113.808 ± 0.933  ns/op
SecondarySupersLookup.testNegative55  avgt   15  122.897 ± 1.237  ns/op
SecondarySupersLookup.testNegative56  avgt   15  125.180 ± 1.005  ns/op
SecondarySupersLookup.testNegative57  avgt   15  126.606 ± 0.925  ns/op
SecondarySupersLookup.testNegative58  avgt   15  128.890 ± 0.809  ns/op
SecondarySupersLookup.testNegative59  avgt   15  130.382 ± 1.092  ns/op
SecondarySupersLookup.testNegative60  avgt   15  132.426 ± 1.045  ns/op
SecondarySupersLookup.testNegative61  avgt   15  133.953 ± 1.062  ns/op
SecondarySupersLookup.testNegative62  avgt   15  136.156 ± 0.974  ns/op
SecondarySupersLookup.testNegative63  avgt   15  137.958 ± 1.172  ns/op
SecondarySupersLookup.testNegative64  avgt   15  142.439 ± 4.703  ns/op
SecondarySupersLookup.testPositive01  avgt   15   17.030 ± 0.218  ns/op
SecondarySupersLookup.testPositive02  avgt   15   20.688 ± 0.793  ns/op
SecondarySupersLookup.testPositive03  avgt   15   24.253 ± 0.566  ns/op
SecondarySupersLookup.testPositive04  avgt   15   27.154 ± 0.495  ns/op
SecondarySupersLookup.testPositive05  avgt   15   28.596 ± 0.892  ns/op
SecondarySupersLookup.testPositive06  avgt   15   30.846 ± 0.304  ns/op
SecondarySupersLookup.testPositive07  avgt   15   32.960 ± 0.564  ns/op
SecondarySupersLookup.testPositive08  avgt   15   34.706 ± 0.377  ns/op
SecondarySupersLookup.testPositive09  avgt   15   36.615 ± 0.453  ns/op
SecondarySupersLookup.testPositive10  avgt   15   38.760 ± 0.462  ns/op
SecondarySupersLookup.testPositive16  avgt   15   49.848 ± 0.489  ns/op
SecondarySupersLookup.testPositive20  avgt   15   57.419 ± 0.467  ns/op
SecondarySupersLookup.testPositive30  avgt   15   76.303 ± 0.530  ns/op
SecondarySupersLookup.testPositive32  avgt   15   79.537 ± 0.323  ns/op
SecondarySupersLookup.testPositive40  avgt   15   94.493 ± 0.565  ns/op
SecondarySupersLookup.testPositive50  avgt   15  113.451 ± 1.932  ns/op
SecondarySupersLookup.testPositive60  avgt   15  133.233 ± 4.262  ns/op
SecondarySupersLookup.testPositive63  avgt   15  137.090 ± 2.027  ns/op
SecondarySupersLookup.testPositive64  avgt   15  138.784 ± 1.650  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'

With patch

Benchmark                             Mode  Cnt    Score   Error  Units
SecondarySupersLookup.testNegative00  avgt   15   12.621 ± 0.172  ns/op
SecondarySupersLookup.testNegative01  avgt   15   12.614 ± 0.163  ns/op
SecondarySupersLookup.testNegative02  avgt   15   12.619 ± 0.169  ns/op
SecondarySupersLookup.testNegative03  avgt   15   12.621 ± 0.171  ns/op
SecondarySupersLookup.testNegative04  avgt   15   12.617 ± 0.167  ns/op
SecondarySupersLookup.testNegative05  avgt   15   12.618 ± 0.167  ns/op
SecondarySupersLookup.testNegative06  avgt   15   12.626 ± 0.180  ns/op
SecondarySupersLookup.testNegative07  avgt   15   12.621 ± 0.175  ns/op
SecondarySupersLookup.testNegative08  avgt   15   12.623 ± 0.176  ns/op
SecondarySupersLookup.testNegative09  avgt   15   12.626 ± 0.177  ns/op
SecondarySupersLookup.testNegative10  avgt   15   12.625 ± 0.181  ns/op
SecondarySupersLookup.testNegative16  avgt   15   12.625 ± 0.178  ns/op
SecondarySupersLookup.testNegative20  avgt   15   12.624 ± 0.178  ns/op
SecondarySupersLookup.testNegative30  avgt   15   12.635 ± 0.195  ns/op
SecondarySupersLookup.testNegative32  avgt   15   12.638 ± 0.200  ns/op
SecondarySupersLookup.testNegative40  avgt   15   12.644 ± 0.209  ns/op
SecondarySupersLookup.testNegative50  avgt   15   12.646 ± 0.212  ns/op
SecondarySupersLookup.testNegative55  avgt   15   51.029 ± 0.833  ns/op
SecondarySupersLookup.testNegative56  avgt   15   51.484 ± 1.074  ns/op
SecondarySupersLookup.testNegative57  avgt   15   51.170 ± 0.731  ns/op
SecondarySupersLookup.testNegative58  avgt   15   51.775 ± 1.573  ns/op
SecondarySupersLookup.testNegative59  avgt   15   51.000 ± 0.919  ns/op
SecondarySupersLookup.testNegative60  avgt   15   73.169 ± 0.950  ns/op
SecondarySupersLookup.testNegative61  avgt   15   73.537 ± 1.235  ns/op
SecondarySupersLookup.testNegative62  avgt   15   75.116 ± 4.232  ns/op
SecondarySupersLookup.testNegative63  avgt   15  153.908 ± 1.126  ns/op
SecondarySupersLookup.testNegative64  avgt   15  155.937 ± 1.101  ns/op
SecondarySupersLookup.testPositive01  avgt   15   17.656 ± 0.220  ns/op
SecondarySupersLookup.testPositive02  avgt   15   17.012 ± 0.192  ns/op
SecondarySupersLookup.testPositive03  avgt   15   17.643 ± 0.200  ns/op
SecondarySupersLookup.testPositive04  avgt   15   17.640 ± 0.196  ns/op
SecondarySupersLookup.testPositive05  avgt   15   17.018 ± 0.203  ns/op
SecondarySupersLookup.testPositive06  avgt   15   17.643 ± 0.201  ns/op
SecondarySupersLookup.testPositive07  avgt   15   17.648 ± 0.208  ns/op
SecondarySupersLookup.testPositive08  avgt   15   17.652 ± 0.214  ns/op
SecondarySupersLookup.testPositive09  avgt   15   17.650 ± 0.207  ns/op
SecondarySupersLookup.testPositive10  avgt   15   17.026 ± 0.213  ns/op
SecondarySupersLookup.testPositive16  avgt   15   17.649 ± 0.209  ns/op
SecondarySupersLookup.testPositive20  avgt   15   17.652 ± 0.216  ns/op
SecondarySupersLookup.testPositive30  avgt   15   37.097 ± 0.301  ns/op
SecondarySupersLookup.testPositive32  avgt   15   37.351 ± 0.459  ns/op
SecondarySupersLookup.testPositive40  avgt   15   42.951 ± 1.256  ns/op
SecondarySupersLookup.testPositive50  avgt   15   17.642 ± 0.194  ns/op
SecondarySupersLookup.testPositive60  avgt   15   37.099 ± 0.300  ns/op
SecondarySupersLookup.testPositive63  avgt   15  153.592 ± 6.942  ns/op
SecondarySupersLookup.testPositive64  avgt   15  153.214 ± 2.070  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'

@theRealAph : Hi, Can you take a look? Thanks

theRealAph · 2024-06-15T08:32:28Z

It's worth running the "before" test with -XX:-UseSecondarySupersCache. This gives you a much better idea of the cost when you don't get any hit on the one-element secondary supers cache.

@theRealAph : Hi, Can you take a look? Thanks

That all looks like it's working as expected.

RealFYang

LGTM.

src/hotspot/cpu/riscv/vm_version_riscv.cpp

zifeihan · 2024-06-19T17:14:33Z

@Hamlin-Li @RealFYang @theRealAph : In the case of scalar register implementations, as discussed above, Such huge numbers of secondary supers don't occur in real-world code., can we add scalar register implementations to this PR as well?

Hamlin-Li · 2024-06-19T17:42:54Z

@Hamlin-Li @RealFYang @theRealAph : In the case of scalar register implementations, as discussed above, Such huge numbers of secondary supers don't occur in real-world code., can we add scalar register implementations to this PR as well?

I suggest to add scalar version (i.e. use population_count instead when zbb is not supported).

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp

zifeihan · 2024-06-20T13:42:11Z

Thanks all for the review.
/integrate

openjdk · 2024-06-20T13:42:54Z

@zifeihan
Your change (at version 0e81d27) is now ready to be sponsored by a Committer.

Hamlin-Li · 2024-06-20T13:44:21Z

/sponsor

openjdk · 2024-06-20T13:45:32Z

Going to push as commit 001d686.
Since your change was applied there have been 18 commits pushed to the master branch:

5cad0b4: 8322708: Global HTML attributes are not allowed
6420846: 8334396: RISC-V: verify perf of ReverseBytesI/L
c6f3bf4: 8334026: Provide a diagnostic PrintMemoryMapAtExit switch on Linux
cabd104: 8334164: The fix for JDK-8322811 should use _filename.is_set() rather than strcmp()
d7dad50: 8334544: C2: wrong control assigned in PhaseIdealLoop::clone_assertion_predicate_for_unswitched_loops()
ff30240: 8334239: Introduce macro for ubsan method/function exclusions
2d4185f: 8332717: ZGC: Division by zero in heuristics
fad6644: 8333754: Add a Test against ECDSA and ECDH NIST Test vector
b211929: 8334570: Problem list gc/TestAlwaysPreTouchBehavior.java
4e58d8c: 8309821: Link to hidden classes section in Class specification for Class::isHidden
... and 8 more: https://git.openjdk.org/jdk/compare/48621ae193ef70b2fae4dcb7ddc524f349beb131...master

Your commit was automatically rebased without conflicts.

openjdk · 2024-06-20T13:45:39Z

@Hamlin-Li @zifeihan Pushed as commit 001d686.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

theRealAph · 2024-06-24T08:14:25Z

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp

+
+    // Check for wraparound.
+    Label skip;
+    bge(r_array_length, r_array_index, skip);


Are you sure this test is correct? I would have thought it would be bgt. If length == index, then you must set index to 0. I would have expected this to fail testing.

Good catch! Yes, I agree that we should use bgt here.

Are you sure this test is correct? I would have thought it would be bgt. If length == index, then you must set index to 0. I would have expected this to fail testing.

Thanks, I will fix it like blt(r_array_index,r_array_length,skip); Linked JBS: https://bugs.openjdk.org/browse/JDK-8334843.

openjdk bot added the hotspot label May 21, 2024

8332587: RISC-V: secondary_super_cache does not scale well

Loading
Loading status checks…

8062db4

zifeihan force-pushed the JDK-8332587 branch from cd41d3a to 8062db4 Compare May 28, 2024 10:17

zifeihan added 3 commits May 28, 2024 21:24

Merge remote-tracking branch 'upstream/master' into JDK-8332587

Loading
Loading status checks…

3686ecd

Fix client VM build

Loading
Loading status checks…

fa844a1

Merge remote-tracking branch 'upstream/master' into JDK-8332587

Loading
Loading status checks…

53d3432

zifeihan marked this pull request as ready for review June 3, 2024 06:43

openjdk bot added the rfr label Jun 3, 2024

Hamlin-Li reviewed Jun 3, 2024

View reviewed changes

Fix for Hamlin comment

Loading
Loading status checks…

c911032

openjdk bot removed the rfr label Jun 4, 2024

Fix Code format

Loading
Loading status checks…

0c7c9f5

openjdk bot added the rfr label Jun 4, 2024

Merge remote-tracking branch 'upstream/master' into JDK-8332587

d5ebf5c

RealFYang reviewed Jun 11, 2024

View reviewed changes

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp Outdated Show resolved Hide resolved

Polish Code Comment

2b8c1e0

Hamlin-Li reviewed Jun 14, 2024

View reviewed changes

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Show resolved Hide resolved

Put "secondary super table" generate code inside COMPILER2 macro

Loading
Loading status checks…

ec01d64

Hamlin-Li approved these changes Jun 17, 2024

View reviewed changes

openjdk bot added the ready label Jun 17, 2024

RealFYang approved these changes Jun 19, 2024

View reviewed changes

Merge remote-tracking branch 'upstream/master' into JDK-8332587

Loading
Loading status checks…

cd65669

theRealAph reviewed Jun 19, 2024

View reviewed changes

src/hotspot/cpu/riscv/vm_version_riscv.cpp Outdated Show resolved Hide resolved

Add population_count for scalar version

Loading
Loading status checks…

4c829c8

RealFYang suggested changes Jun 20, 2024

View reviewed changes

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp Show resolved Hide resolved

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp Outdated Show resolved Hide resolved

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp Outdated Show resolved Hide resolved

Add comment and fix population_count

Loading
Loading status checks…

0e81d27

RealFYang approved these changes Jun 20, 2024

View reviewed changes

openjdk bot added the sponsor label Jun 20, 2024

openjdk bot added the integrated label Jun 20, 2024

openjdk bot closed this Jun 20, 2024

openjdk bot removed ready rfr sponsor labels Jun 20, 2024

theRealAph reviewed Jun 24, 2024

View reviewed changes

zifeihan deleted the JDK-8332587 branch June 26, 2024 03:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8332587: RISC-V: secondary_super_cache does not scale well #19320

8332587: RISC-V: secondary_super_cache does not scale well #19320

zifeihan commented May 21, 2024 •

edited by openjdk bot

Loading

bridgekeeper bot commented May 21, 2024

openjdk bot commented May 21, 2024 •

edited

Loading

openjdk bot commented May 21, 2024

mlbridge bot commented Jun 3, 2024 •

edited

Loading

Hamlin-Li commented Jun 3, 2024 •

edited

Loading

zifeihan commented Jun 3, 2024

zifeihan commented Jun 3, 2024

Hamlin-Li left a comment

zifeihan commented Jun 3, 2024

Hamlin-Li left a comment

zifeihan commented Jun 4, 2024

zifeihan commented Jun 4, 2024 •

edited

Loading

Hamlin-Li commented Jun 4, 2024 •

edited

Loading

zifeihan commented Jun 7, 2024 •

edited

Loading

Hamlin-Li commented Jun 10, 2024

Hamlin-Li left a comment

theRealAph commented Jun 14, 2024

theRealAph commented Jun 14, 2024

zifeihan commented Jun 15, 2024

theRealAph commented Jun 15, 2024

RealFYang left a comment

zifeihan commented Jun 19, 2024

Hamlin-Li commented Jun 19, 2024

zifeihan commented Jun 20, 2024

openjdk bot commented Jun 20, 2024

Hamlin-Li commented Jun 20, 2024

openjdk bot commented Jun 20, 2024

openjdk bot commented Jun 20, 2024

theRealAph Jun 24, 2024

RealFYang Jun 24, 2024

zifeihan Jun 24, 2024

8332587: RISC-V: secondary_super_cache does not scale well #19320

8332587: RISC-V: secondary_super_cache does not scale well #19320

Conversation

zifeihan commented May 21, 2024 • edited by openjdk bot Loading

Correctness testing:

JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs

Progress

Issue

Reviewers

Reviewing

Webrev

bridgekeeper bot commented May 21, 2024

openjdk bot commented May 21, 2024 • edited Loading

openjdk bot commented May 21, 2024

mlbridge bot commented Jun 3, 2024 • edited Loading

Webrevs

Hamlin-Li commented Jun 3, 2024 • edited Loading

zifeihan commented Jun 3, 2024

zifeihan commented Jun 3, 2024

Hamlin-Li left a comment

Choose a reason for hiding this comment

zifeihan commented Jun 3, 2024

Hamlin-Li left a comment

Choose a reason for hiding this comment

zifeihan commented Jun 4, 2024

zifeihan commented Jun 4, 2024 • edited Loading

Hamlin-Li commented Jun 4, 2024 • edited Loading

zifeihan commented Jun 7, 2024 • edited Loading

JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs

Hamlin-Li commented Jun 10, 2024

Hamlin-Li left a comment

Choose a reason for hiding this comment

theRealAph commented Jun 14, 2024

theRealAph commented Jun 14, 2024

zifeihan commented Jun 15, 2024

theRealAph commented Jun 15, 2024

RealFYang left a comment

Choose a reason for hiding this comment

zifeihan commented Jun 19, 2024

Hamlin-Li commented Jun 19, 2024

zifeihan commented Jun 20, 2024

openjdk bot commented Jun 20, 2024

Hamlin-Li commented Jun 20, 2024

openjdk bot commented Jun 20, 2024

openjdk bot commented Jun 20, 2024

theRealAph Jun 24, 2024

Choose a reason for hiding this comment

RealFYang Jun 24, 2024

Choose a reason for hiding this comment

zifeihan Jun 24, 2024

Choose a reason for hiding this comment

zifeihan commented May 21, 2024 •

edited by openjdk bot

Loading

openjdk bot commented May 21, 2024 •

edited

Loading

mlbridge bot commented Jun 3, 2024 •

edited

Loading

Hamlin-Li commented Jun 3, 2024 •

edited

Loading

zifeihan commented Jun 4, 2024 •

edited

Loading

Hamlin-Li commented Jun 4, 2024 •

edited

Loading

zifeihan commented Jun 7, 2024 •

edited

Loading