Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8332587: RISC-V: secondary_super_cache does not scale well #19320

Closed
wants to merge 16 commits into from

Conversation

zifeihan
Copy link
Member

@zifeihan zifeihan commented May 21, 2024

Implementation of subtype checking JDK-8180450 for linux-riscv64.
This optimization depends on availability of the Zbb extension which has the cpop instruction.

Correctness testing:

  • Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release)
  • Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug)
  • Run all of tier1 with -XX:+VerifySecondarySupers

JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs

Original:

Benchmark                             Mode  Cnt    Score   Error  Units
SecondarySuperCacheHits.test  avgt   15  11.375 ± 0.071  ns/op
SecondarySuperCacheInterContention.test     avgt   15  646.087 ± 32.587  ns/op
SecondarySuperCacheInterContention.test:t1  avgt   15  600.090 ± 83.779  ns/op
SecondarySuperCacheInterContention.test:t2  avgt   15  692.084 ± 73.218  ns/op
SecondarySupersLookup.testNegative00  avgt   15   16.420 ± 0.239  ns/op
SecondarySupersLookup.testNegative01  avgt   15   18.307 ± 0.260  ns/op
SecondarySupersLookup.testNegative02  avgt   15   21.695 ± 0.458  ns/op
SecondarySupersLookup.testNegative03  avgt   15   24.855 ± 0.664  ns/op
SecondarySupersLookup.testNegative04  avgt   15   27.305 ± 0.522  ns/op
SecondarySupersLookup.testNegative05  avgt   15   29.719 ± 0.385  ns/op
SecondarySupersLookup.testNegative06  avgt   15   32.231 ± 0.498  ns/op
SecondarySupersLookup.testNegative07  avgt   15   33.747 ± 0.603  ns/op
SecondarySupersLookup.testNegative08  avgt   15   35.856 ± 0.629  ns/op
SecondarySupersLookup.testNegative09  avgt   15   37.077 ± 0.546  ns/op
SecondarySupersLookup.testNegative10  avgt   15   39.408 ± 0.465  ns/op
SecondarySupersLookup.testNegative16  avgt   15   51.041 ± 0.547  ns/op
SecondarySupersLookup.testNegative20  avgt   15   58.722 ± 0.922  ns/op
SecondarySupersLookup.testNegative30  avgt   15   77.310 ± 0.654  ns/op
SecondarySupersLookup.testNegative32  avgt   15   81.116 ± 0.854  ns/op
SecondarySupersLookup.testNegative40  avgt   15   96.311 ± 0.840  ns/op
SecondarySupersLookup.testNegative50  avgt   15  115.427 ± 0.838  ns/op
SecondarySupersLookup.testNegative55  avgt   15  124.371 ± 1.076  ns/op
SecondarySupersLookup.testNegative56  avgt   15  126.796 ± 0.916  ns/op
SecondarySupersLookup.testNegative57  avgt   15  127.952 ± 1.202  ns/op
SecondarySupersLookup.testNegative58  avgt   15  131.956 ± 4.515  ns/op
SecondarySupersLookup.testNegative59  avgt   15  131.858 ± 1.066  ns/op
SecondarySupersLookup.testNegative60  avgt   15  134.294 ± 0.888  ns/op
SecondarySupersLookup.testNegative61  avgt   15  135.462 ± 1.037  ns/op
SecondarySupersLookup.testNegative62  avgt   15  137.805 ± 0.999  ns/op
SecondarySupersLookup.testNegative63  avgt   15  139.335 ± 1.164  ns/op
SecondarySupersLookup.testNegative64  avgt   15  141.401 ± 0.947  ns/op
SecondarySupersLookup.testPositive01  avgt   15   10.731 ± 0.152  ns/op
SecondarySupersLookup.testPositive02  avgt   15   10.726 ± 0.142  ns/op
SecondarySupersLookup.testPositive03  avgt   15   10.728 ± 0.145  ns/op
SecondarySupersLookup.testPositive04  avgt   15   10.730 ± 0.149  ns/op
SecondarySupersLookup.testPositive05  avgt   15   10.730 ± 0.151  ns/op
SecondarySupersLookup.testPositive06  avgt   15   10.730 ± 0.150  ns/op
SecondarySupersLookup.testPositive07  avgt   15   10.731 ± 0.148  ns/op
SecondarySupersLookup.testPositive08  avgt   15   10.730 ± 0.150  ns/op
SecondarySupersLookup.testPositive09  avgt   15   10.730 ± 0.151  ns/op
SecondarySupersLookup.testPositive10  avgt   15   10.734 ± 0.156  ns/op
SecondarySupersLookup.testPositive16  avgt   15   10.742 ± 0.160  ns/op
SecondarySupersLookup.testPositive20  avgt   15   10.730 ± 0.150  ns/op
SecondarySupersLookup.testPositive30  avgt   15   10.735 ± 0.156  ns/op
SecondarySupersLookup.testPositive32  avgt   15   10.731 ± 0.147  ns/op
SecondarySupersLookup.testPositive40  avgt   15   10.744 ± 0.179  ns/op
SecondarySupersLookup.testPositive50  avgt   15   10.733 ± 0.152  ns/op
SecondarySupersLookup.testPositive60  avgt   15   10.748 ± 0.189  ns/op
SecondarySupersLookup.testPositive63  avgt   15   10.726 ± 0.142  ns/op
SecondarySupersLookup.testPositive64  avgt   15   10.733 ± 0.155  ns/op
TypePollution.instanceOfInterfaceSwitchLinearNoSCC          avgt   12  51592.312 ± 6662.713  ns/op
TypePollution.instanceOfInterfaceSwitchLinearSCC            avgt   12  50294.723 ±  371.807  ns/op
TypePollution.instanceOfInterfaceSwitchTableNoSCC           avgt   12  53752.017 ±  346.287  ns/op
TypePollution.instanceOfInterfaceSwitchTableSCC             avgt   12  50053.321 ±  642.562  ns/op
TypePollution.parallelInstanceOfInterfaceSwitchLinearNoSCC  avgt   12   1830.935 ±  262.496  ms/op
TypePollution.parallelInstanceOfInterfaceSwitchLinearSCC    avgt   12   1745.503 ±  201.047  ms/op
TypePollution.parallelInstanceOfInterfaceSwitchTableNoSCC   avgt   12   1794.322 ±  283.656  ms/op
TypePollution.parallelInstanceOfInterfaceSwitchTableSCC     avgt   12   1808.875 ±  235.126  ms/op

With patch:

Benchmark                             Mode  Cnt    Score   Error  Units
SecondarySuperCacheHits.test  avgt   15  11.382 ± 0.027  ns/op
SecondarySuperCacheInterContention.test     avgt   15  621.291 ± 40.619  ns/op
SecondarySuperCacheInterContention.test:t1  avgt   15  635.382 ± 58.347  ns/op
SecondarySuperCacheInterContention.test:t2  avgt   15  607.200 ± 65.490  ns/op
SecondarySupersLookup.testNegative00  avgt   15   13.275 ± 0.223  ns/op
SecondarySupersLookup.testNegative01  avgt   15   13.264 ± 0.201  ns/op
SecondarySupersLookup.testNegative02  avgt   15   13.261 ± 0.194  ns/op
SecondarySupersLookup.testNegative03  avgt   15   13.271 ± 0.210  ns/op
SecondarySupersLookup.testNegative04  avgt   15   13.265 ± 0.201  ns/op
SecondarySupersLookup.testNegative05  avgt   15   13.258 ± 0.191  ns/op
SecondarySupersLookup.testNegative06  avgt   15   13.280 ± 0.225  ns/op
SecondarySupersLookup.testNegative07  avgt   15   13.268 ± 0.201  ns/op
SecondarySupersLookup.testNegative08  avgt   15   13.266 ± 0.202  ns/op
SecondarySupersLookup.testNegative09  avgt   15   13.261 ± 0.196  ns/op
SecondarySupersLookup.testNegative10  avgt   15   13.268 ± 0.198  ns/op
SecondarySupersLookup.testNegative16  avgt   15   13.268 ± 0.205  ns/op
SecondarySupersLookup.testNegative20  avgt   15   13.284 ± 0.231  ns/op
SecondarySupersLookup.testNegative30  avgt   15   13.281 ± 0.226  ns/op
SecondarySupersLookup.testNegative32  avgt   15   13.273 ± 0.215  ns/op
SecondarySupersLookup.testNegative40  avgt   15   13.287 ± 0.233  ns/op
SecondarySupersLookup.testNegative50  avgt   15   13.292 ± 0.242  ns/op
SecondarySupersLookup.testNegative55  avgt   15   53.064 ± 0.757  ns/op
SecondarySupersLookup.testNegative56  avgt   15   53.052 ± 0.767  ns/op
SecondarySupersLookup.testNegative57  avgt   15   53.068 ± 0.803  ns/op
SecondarySupersLookup.testNegative58  avgt   15   53.076 ± 0.776  ns/op
SecondarySupersLookup.testNegative59  avgt   15   53.095 ± 0.846  ns/op
SecondarySupersLookup.testNegative60  avgt   15   75.106 ± 1.033  ns/op
SecondarySupersLookup.testNegative61  avgt   15   76.832 ± 4.047  ns/op
SecondarySupersLookup.testNegative62  avgt   15   75.085 ± 1.010  ns/op
SecondarySupersLookup.testNegative63  avgt   15  153.709 ± 0.893  ns/op
SecondarySupersLookup.testNegative64  avgt   15  155.623 ± 0.922  ns/op
SecondarySupersLookup.testPositive01  avgt   15   10.727 ± 0.145  ns/op
SecondarySupersLookup.testPositive02  avgt   15   10.734 ± 0.157  ns/op
SecondarySupersLookup.testPositive03  avgt   15   10.731 ± 0.151  ns/op
SecondarySupersLookup.testPositive04  avgt   15   10.733 ± 0.156  ns/op
SecondarySupersLookup.testPositive05  avgt   15   10.742 ± 0.168  ns/op
SecondarySupersLookup.testPositive06  avgt   15   10.729 ± 0.148  ns/op
SecondarySupersLookup.testPositive07  avgt   15   10.738 ± 0.163  ns/op
SecondarySupersLookup.testPositive08  avgt   15   10.736 ± 0.159  ns/op
SecondarySupersLookup.testPositive09  avgt   15   10.735 ± 0.158  ns/op
SecondarySupersLookup.testPositive10  avgt   15   10.730 ± 0.150  ns/op
SecondarySupersLookup.testPositive16  avgt   15   10.734 ± 0.157  ns/op
SecondarySupersLookup.testPositive20  avgt   15   10.731 ± 0.149  ns/op
SecondarySupersLookup.testPositive30  avgt   15   10.733 ± 0.156  ns/op
SecondarySupersLookup.testPositive32  avgt   15   10.729 ± 0.149  ns/op
SecondarySupersLookup.testPositive40  avgt   15   10.732 ± 0.154  ns/op
SecondarySupersLookup.testPositive50  avgt   15   10.735 ± 0.154  ns/op
SecondarySupersLookup.testPositive60  avgt   15   10.736 ± 0.157  ns/op
SecondarySupersLookup.testPositive63  avgt   15   10.733 ± 0.155  ns/op
SecondarySupersLookup.testPositive64  avgt   15   10.729 ± 0.149  ns/op
TypePollution.instanceOfInterfaceSwitchLinearNoSCC          avgt   12  52942.830 ±  851.030  ns/op
TypePollution.instanceOfInterfaceSwitchLinearSCC            avgt   12  49901.169 ±  663.837  ns/op
TypePollution.instanceOfInterfaceSwitchTableNoSCC           avgt   12  41443.364 ± 1045.602  ns/op
TypePollution.instanceOfInterfaceSwitchTableSCC             avgt   12  38050.773 ± 1301.636  ns/op
TypePollution.parallelInstanceOfInterfaceSwitchLinearNoSCC  avgt   12   1634.976 ±    5.730  ms/op
TypePollution.parallelInstanceOfInterfaceSwitchLinearSCC    avgt   12   1613.755 ±  199.612  ms/op
TypePollution.parallelInstanceOfInterfaceSwitchTableNoSCC   avgt   12   1979.354 ±  207.235  ms/op
TypePollution.parallelInstanceOfInterfaceSwitchTableSCC     avgt   12   1525.198 ±  167.922  ms/op

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8332587: RISC-V: secondary_super_cache does not scale well (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19320/head:pull/19320
$ git checkout pull/19320

Update a local copy of the PR:
$ git checkout pull/19320
$ git pull https://git.openjdk.org/jdk.git pull/19320/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19320

View PR using the GUI difftool:
$ git pr show -t 19320

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19320.diff

Webrev

Link to Webrev Comment

Sorry, something went wrong.

@bridgekeeper
Copy link

bridgekeeper bot commented May 21, 2024

👋 Welcome back gcao! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented May 21, 2024

@zifeihan This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8332587: RISC-V: secondary_super_cache does not scale well

Reviewed-by: mli, fyang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 18 new commits pushed to the master branch:

  • 5cad0b4: 8322708: Global HTML attributes are not allowed
  • 6420846: 8334396: RISC-V: verify perf of ReverseBytesI/L
  • c6f3bf4: 8334026: Provide a diagnostic PrintMemoryMapAtExit switch on Linux
  • cabd104: 8334164: The fix for JDK-8322811 should use _filename.is_set() rather than strcmp()
  • d7dad50: 8334544: C2: wrong control assigned in PhaseIdealLoop::clone_assertion_predicate_for_unswitched_loops()
  • ff30240: 8334239: Introduce macro for ubsan method/function exclusions
  • 2d4185f: 8332717: ZGC: Division by zero in heuristics
  • fad6644: 8333754: Add a Test against ECDSA and ECDH NIST Test vector
  • b211929: 8334570: Problem list gc/TestAlwaysPreTouchBehavior.java
  • 4e58d8c: 8309821: Link to hidden classes section in Class specification for Class::isHidden
  • ... and 8 more: https://git.openjdk.org/jdk/compare/48621ae193ef70b2fae4dcb7ddc524f349beb131...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@Hamlin-Li, @RealFYang) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk
Copy link

openjdk bot commented May 21, 2024

@zifeihan The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label May 21, 2024
zifeihan added 3 commits May 28, 2024 21:24
@zifeihan zifeihan marked this pull request as ready for review June 3, 2024 06:43
@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 3, 2024
@mlbridge
Copy link

mlbridge bot commented Jun 3, 2024

@Hamlin-Li
Copy link

Hamlin-Li commented Jun 3, 2024

Hey, is there performance comparison data when zbb is not supported? Just want to know if it can also get performance gain in that situation, then this optimization is a more generic one.

The reason for ask this question is that seems JDK-8180450 is an optimization for cache line, it should or could bring benefit even if zbb is not supported?

@zifeihan
Copy link
Member Author

zifeihan commented Jun 3, 2024

Hi, the following is an implementation using scalar assembly when zbb is not available.

diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
index ce6696c18a8..93e1045d2d4 100644
--- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
@@ -3481,6 +3481,36 @@ void MacroAssembler::check_klass_subtype_slow_path(Register sub_klass,
   bind(L_fallthrough);
 }
 
+void MacroAssembler::population_count(Register dst, Register src,
+                                      Register tmp1, Register tmp2) {
+
+  if (UsePopCountInstruction) {
+    cpop(dst, src);
+  } else {
+    assert_different_registers(src, tmp1, tmp2);
+    assert_different_registers(dst, tmp1, tmp2);
+    Label loop, done;
+
+    mv(tmp1, src);
+    // dst = 0;
+    // while(tmp1 != 0) {
+    //   dst++;
+    //   tmp1 &= (tmp1 - 1);
+    // }
+    mv(dst, zr);
+    beqz(tmp1, done);
+    {
+      bind(loop);
+      addi(dst, dst, 1);
+      mv(tmp2, tmp1);
+      addi(tmp2, tmp2, -1);
+      andr(tmp1, tmp1, tmp2);
+      bnez(tmp1, loop);
+    }
+    bind(done);
+  }
+}
+
 // Ensure that the inline code and the stub are using the same registers.
 #define LOOKUP_SECONDARY_SUPERS_TABLE_REGISTERS                     \
 do {                                                                \
@@ -3533,7 +3563,7 @@ bool MacroAssembler::lookup_secondary_supers_table(Register r_sub_klass,
   // Get the first array index that can contain super_klass into r_array_index.
   if (bit != 0) {
     slli(r_array_index, r_bitmap, (Klass::SECONDARY_SUPERS_TABLE_MASK - bit));
-    cpop(r_array_index, r_array_index);
+    population_count(r_array_index, r_array_index, t0, tmp1);
   } else {
     mv(r_array_index, (u1)1);
   }
diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp
index 2575d5ea2ff..3e4930d5605 100644
--- a/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp
+++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp
@@ -322,6 +322,8 @@ class MacroAssembler: public Assembler {
                                      Label* L_success,
                                      Label* L_failure);
 
+  void population_count(Register dst, Register src, Register tmp1, Register tmp2);
+
   // As above, but with a constant super_klass.
   // The result is in Register result, not the condition codes.
   bool lookup_secondary_supers_table(Register r_sub_klass,

JMH tested on HiFive Unmatched(has not Zbb):

Original
Benchmark                             Mode  Cnt    Score    Error  Units                                                                                                                           [14/1836]
SecondarySupersLookup.testNegative00  avgt   15   28.625 ±  7.158  ns/op
SecondarySupersLookup.testNegative01  avgt   15   33.860 ±  6.312  ns/op
SecondarySupersLookup.testNegative02  avgt   15   30.887 ±  4.773  ns/op
SecondarySupersLookup.testNegative03  avgt   15   39.477 ±  6.945  ns/op
SecondarySupersLookup.testNegative04  avgt   15   34.976 ±  3.080  ns/op
SecondarySupersLookup.testNegative05  avgt   15   42.025 ±  8.324  ns/op
SecondarySupersLookup.testNegative06  avgt   15   49.359 ±  8.480  ns/op
SecondarySupersLookup.testNegative07  avgt   15   49.996 ± 11.841  ns/op
SecondarySupersLookup.testNegative08  avgt   15   58.468 ±  8.485  ns/op
SecondarySupersLookup.testNegative09  avgt   15   57.198 ± 10.803  ns/op
SecondarySupersLookup.testNegative10  avgt   15   63.531 ±  5.595  ns/op
SecondarySupersLookup.testNegative16  avgt   15   73.716 ±  9.231  ns/op
SecondarySupersLookup.testNegative20  avgt   15   88.823 ± 16.179  ns/op
SecondarySupersLookup.testNegative30  avgt   15  118.832 ± 18.866  ns/op
SecondarySupersLookup.testNegative32  avgt   15  126.538 ± 23.139  ns/op
SecondarySupersLookup.testNegative40  avgt   15  149.722 ± 31.675  ns/op
SecondarySupersLookup.testNegative50  avgt   15  186.958 ± 39.203  ns/op
SecondarySupersLookup.testNegative55  avgt   15  193.787 ± 29.629  ns/op
SecondarySupersLookup.testNegative56  avgt   15  204.451 ± 34.491  ns/op
SecondarySupersLookup.testNegative57  avgt   15  204.104 ± 27.130  ns/op
SecondarySupersLookup.testNegative58  avgt   15  207.017 ± 31.201  ns/op
SecondarySupersLookup.testNegative59  avgt   15  219.159 ± 33.664  ns/op
SecondarySupersLookup.testNegative60  avgt   15  208.726 ± 27.195  ns/op
SecondarySupersLookup.testNegative61  avgt   15  214.557 ± 30.992  ns/op
SecondarySupersLookup.testNegative62  avgt   15  212.104 ± 30.843  ns/op
SecondarySupersLookup.testNegative63  avgt   15  227.805 ± 39.706  ns/op
SecondarySupersLookup.testNegative64  avgt   15  229.951 ± 42.039  ns/op
SecondarySupersLookup.testPositive01  avgt   15   18.498 ±  4.687  ns/op
SecondarySupersLookup.testPositive02  avgt   15   20.130 ±  4.955  ns/op
SecondarySupersLookup.testPositive03  avgt   15   18.576 ±  4.383  ns/op
SecondarySupersLookup.testPositive04  avgt   15   19.202 ±  4.554  ns/op
SecondarySupersLookup.testPositive05  avgt   15   18.923 ±  4.730  ns/op
SecondarySupersLookup.testPositive06  avgt   15   20.494 ±  6.282  ns/op
SecondarySupersLookup.testPositive07  avgt   15   17.679 ±  2.386  ns/op
SecondarySupersLookup.testPositive08  avgt   15   19.396 ±  7.047  ns/op
SecondarySupersLookup.testPositive09  avgt   15   18.163 ±  2.950  ns/op
SecondarySupersLookup.testPositive10  avgt   15   21.135 ±  5.552  ns/op
SecondarySupersLookup.testPositive16  avgt   15   20.117 ±  4.606  ns/op
SecondarySupersLookup.testPositive20  avgt   15   21.209 ±  5.800  ns/op
SecondarySupersLookup.testPositive30  avgt   15   21.388 ±  6.792  ns/op
SecondarySupersLookup.testPositive32  avgt   15   19.720 ±  4.559  ns/op
SecondarySupersLookup.testPositive40  avgt   15   17.354 ±  2.707  ns/op
SecondarySupersLookup.testPositive50  avgt   15   20.825 ±  6.062  ns/op
SecondarySupersLookup.testPositive60  avgt   15   19.910 ±  5.621  ns/op
SecondarySupersLookup.testPositive63  avgt   15   18.989 ±  3.156  ns/op
SecondarySupersLookup.testPositive64  avgt   15   20.298 ±  5.357  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'

With patch:
Benchmark                             Mode  Cnt    Score    Error  Units
SecondarySupersLookup.testNegative00  avgt   15   21.124 ±  1.601  ns/op
SecondarySupersLookup.testNegative01  avgt   15   25.788 ±  3.713  ns/op
SecondarySupersLookup.testNegative02  avgt   15   25.501 ±  5.616  ns/op
SecondarySupersLookup.testNegative03  avgt   15   22.800 ±  6.454  ns/op
SecondarySupersLookup.testNegative04  avgt   15   21.790 ±  2.629  ns/op
SecondarySupersLookup.testNegative05  avgt   15   25.485 ±  6.082  ns/op
SecondarySupersLookup.testNegative06  avgt   15   24.801 ±  5.387  ns/op
SecondarySupersLookup.testNegative07  avgt   15   24.425 ±  4.686  ns/op
SecondarySupersLookup.testNegative08  avgt   15   24.486 ±  4.044  ns/op
SecondarySupersLookup.testNegative09  avgt   15   23.810 ±  3.838  ns/op
SecondarySupersLookup.testNegative10  avgt   15   25.085 ±  3.756  ns/op
SecondarySupersLookup.testNegative16  avgt   15   22.018 ±  2.924  ns/op
SecondarySupersLookup.testNegative20  avgt   15   23.161 ±  3.271  ns/op
SecondarySupersLookup.testNegative30  avgt   15   23.705 ±  4.669  ns/op
SecondarySupersLookup.testNegative32  avgt   15   25.048 ±  7.125  ns/op
SecondarySupersLookup.testNegative40  avgt   15   24.661 ±  3.541  ns/op
SecondarySupersLookup.testNegative50  avgt   15   22.918 ±  2.879  ns/op
SecondarySupersLookup.testNegative55  avgt   15  250.982 ± 10.224  ns/op
SecondarySupersLookup.testNegative56  avgt   15  251.020 ±  8.432  ns/op
SecondarySupersLookup.testNegative57  avgt   15  255.998 ±  9.054  ns/op
SecondarySupersLookup.testNegative58  avgt   15  257.347 ± 11.340  ns/op
SecondarySupersLookup.testNegative59  avgt   15  277.727 ± 10.007  ns/op
SecondarySupersLookup.testNegative60  avgt   15  304.818 ± 12.092  ns/op
SecondarySupersLookup.testNegative61  avgt   15  308.956 ± 13.060  ns/op
SecondarySupersLookup.testNegative62  avgt   15  309.804 ± 14.715  ns/op
SecondarySupersLookup.testNegative63  avgt   15  416.021 ±  8.051  ns/op
SecondarySupersLookup.testNegative64  avgt   15  425.190 ± 10.966  ns/op
SecondarySupersLookup.testPositive01  avgt   15   18.369 ±  4.490  ns/op
SecondarySupersLookup.testPositive02  avgt   15   21.595 ±  6.626  ns/op
SecondarySupersLookup.testPositive03  avgt   15   19.327 ±  4.973  ns/op
SecondarySupersLookup.testPositive04  avgt   15   19.636 ±  4.759  ns/op
SecondarySupersLookup.testPositive05  avgt   15   17.055 ±  2.329  ns/op
SecondarySupersLookup.testPositive06  avgt   15   18.712 ±  3.333  ns/op
SecondarySupersLookup.testPositive07  avgt   15   20.508 ±  4.213  ns/op
SecondarySupersLookup.testPositive08  avgt   15   19.208 ±  3.761  ns/op
SecondarySupersLookup.testPositive09  avgt   15   18.061 ±  3.619  ns/op
SecondarySupersLookup.testPositive10  avgt   15   17.519 ±  3.322  ns/op
SecondarySupersLookup.testPositive16  avgt   15   19.099 ±  4.358  ns/op
SecondarySupersLookup.testPositive20  avgt   15   20.731 ±  5.230  ns/op
SecondarySupersLookup.testPositive30  avgt   15   18.048 ±  2.994  ns/op
SecondarySupersLookup.testPositive32  avgt   15   18.817 ±  3.856  ns/op
SecondarySupersLookup.testPositive40  avgt   15   17.165 ±  2.536  ns/op
SecondarySupersLookup.testPositive50  avgt   15   20.060 ±  4.473  ns/op
SecondarySupersLookup.testPositive60  avgt   15   17.296 ±  2.411  ns/op
SecondarySupersLookup.testPositive63  avgt   15   19.313 ±  4.133  ns/op
SecondarySupersLookup.testPositive64  avgt   15   21.258 ±  4.909  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'

JMH tested on LicheePi 4A(has not Zbb):

Original
Benchmark                             Mode  Cnt    Score   Error  Units
SecondarySupersLookup.testNegative00  avgt   15   26.297 ± 0.941  ns/op
SecondarySupersLookup.testNegative01  avgt   15   26.345 ± 1.404  ns/op
SecondarySupersLookup.testNegative02  avgt   15   27.191 ± 1.421  ns/op
SecondarySupersLookup.testNegative03  avgt   15   28.546 ± 1.475  ns/op
SecondarySupersLookup.testNegative04  avgt   15   28.785 ± 1.329  ns/op
SecondarySupersLookup.testNegative05  avgt   15   30.524 ± 1.816  ns/op
SecondarySupersLookup.testNegative06  avgt   15   30.284 ± 0.984  ns/op
SecondarySupersLookup.testNegative07  avgt   15   31.594 ± 1.093  ns/op
SecondarySupersLookup.testNegative08  avgt   15   32.778 ± 1.269  ns/op
SecondarySupersLookup.testNegative09  avgt   15   33.549 ± 0.913  ns/op
SecondarySupersLookup.testNegative10  avgt   15   35.468 ± 1.643  ns/op
SecondarySupersLookup.testNegative16  avgt   15   68.065 ± 1.890  ns/op
SecondarySupersLookup.testNegative20  avgt   15   58.098 ± 2.704  ns/op
SecondarySupersLookup.testNegative30  avgt   15   70.944 ± 2.929  ns/op
SecondarySupersLookup.testNegative32  avgt   15   76.134 ± 3.719  ns/op
SecondarySupersLookup.testNegative40  avgt   15   89.092 ± 4.396  ns/op
SecondarySupersLookup.testNegative50  avgt   15  105.226 ± 4.877  ns/op
SecondarySupersLookup.testNegative55  avgt   15  115.744 ± 6.281  ns/op
SecondarySupersLookup.testNegative56  avgt   15  119.860 ± 5.618  ns/op
SecondarySupersLookup.testNegative57  avgt   15  117.818 ± 5.497  ns/op
SecondarySupersLookup.testNegative58  avgt   15  121.410 ± 6.781  ns/op
SecondarySupersLookup.testNegative59  avgt   15  124.500 ± 7.016  ns/op
SecondarySupersLookup.testNegative60  avgt   15  125.322 ± 5.241  ns/op
SecondarySupersLookup.testNegative61  avgt   15  129.009 ± 4.680  ns/op
SecondarySupersLookup.testNegative62  avgt   15  126.704 ± 5.917  ns/op
SecondarySupersLookup.testNegative63  avgt   15  131.529 ± 5.247  ns/op
SecondarySupersLookup.testNegative64  avgt   15  134.511 ± 4.925  ns/op
SecondarySupersLookup.testPositive01  avgt   15   22.386 ± 0.680  ns/op
SecondarySupersLookup.testPositive02  avgt   15   21.655 ± 0.492  ns/op
SecondarySupersLookup.testPositive03  avgt   15   22.123 ± 0.671  ns/op
SecondarySupersLookup.testPositive04  avgt   15   22.050 ± 0.610  ns/op
SecondarySupersLookup.testPositive05  avgt   15   22.048 ± 0.614  ns/op
SecondarySupersLookup.testPositive06  avgt   15   21.850 ± 0.597  ns/op
SecondarySupersLookup.testPositive07  avgt   15   21.844 ± 0.619  ns/op
SecondarySupersLookup.testPositive08  avgt   15   21.832 ± 0.601  ns/op
SecondarySupersLookup.testPositive09  avgt   15   21.743 ± 0.527  ns/op
SecondarySupersLookup.testPositive10  avgt   15   22.037 ± 0.609  ns/op
SecondarySupersLookup.testPositive16  avgt   15   22.300 ± 0.502  ns/op
SecondarySupersLookup.testPositive20  avgt   15   21.607 ± 0.498  ns/op
SecondarySupersLookup.testPositive30  avgt   15   21.836 ± 0.602  ns/op
SecondarySupersLookup.testPositive32  avgt   15   21.629 ± 0.484  ns/op
SecondarySupersLookup.testPositive40  avgt   15   21.850 ± 0.621  ns/op
SecondarySupersLookup.testPositive50  avgt   15   22.478 ± 0.130  ns/op
SecondarySupersLookup.testPositive60  avgt   15   22.058 ± 0.617  ns/op
SecondarySupersLookup.testPositive63  avgt   15   21.828 ± 0.596  ns/op
SecondarySupersLookup.testPositive64  avgt   15   22.077 ± 0.603  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'

With patch:
Benchmark                             Mode  Cnt    Score    Error  Units
SecondarySupersLookup.testNegative00  avgt   15   24.305 ±  1.269  ns/op
SecondarySupersLookup.testNegative01  avgt   15   23.552 ±  1.214  ns/op
SecondarySupersLookup.testNegative02  avgt   15   22.720 ±  0.771  ns/op
SecondarySupersLookup.testNegative03  avgt   15   22.713 ±  0.834  ns/op
SecondarySupersLookup.testNegative04  avgt   15   22.924 ±  0.684  ns/op
SecondarySupersLookup.testNegative05  avgt   15   22.614 ±  0.604  ns/op
SecondarySupersLookup.testNegative06  avgt   15   22.387 ±  0.641  ns/op
SecondarySupersLookup.testNegative07  avgt   15   22.201 ±  0.502  ns/op
SecondarySupersLookup.testNegative08  avgt   15   22.391 ±  0.606  ns/op
SecondarySupersLookup.testNegative09  avgt   15   22.462 ±  0.617  ns/op
SecondarySupersLookup.testNegative10  avgt   15   22.525 ±  0.202  ns/op
SecondarySupersLookup.testNegative16  avgt   15   22.439 ±  0.616  ns/op
SecondarySupersLookup.testNegative20  avgt   15   22.963 ±  0.298  ns/op
SecondarySupersLookup.testNegative30  avgt   15   22.642 ±  0.621  ns/op
SecondarySupersLookup.testNegative32  avgt   15   22.306 ±  0.670  ns/op
SecondarySupersLookup.testNegative40  avgt   15   22.663 ±  0.644  ns/op
SecondarySupersLookup.testNegative50  avgt   15   22.001 ±  0.238  ns/op
SecondarySupersLookup.testNegative55  avgt   15  128.558 ±  5.735  ns/op
SecondarySupersLookup.testNegative56  avgt   15  128.633 ±  4.893  ns/op
SecondarySupersLookup.testNegative57  avgt   15  129.143 ±  5.955  ns/op
SecondarySupersLookup.testNegative58  avgt   15  132.434 ±  6.478  ns/op
SecondarySupersLookup.testNegative59  avgt   15  130.243 ±  5.901  ns/op
SecondarySupersLookup.testNegative60  avgt   15  163.505 ±  8.278  ns/op
SecondarySupersLookup.testNegative61  avgt   15  163.934 ±  9.008  ns/op
SecondarySupersLookup.testNegative62  avgt   15  162.247 ±  6.238  ns/op
SecondarySupersLookup.testNegative63  avgt   15  213.133 ±  9.582  ns/op
SecondarySupersLookup.testNegative64  avgt   15  214.724 ± 11.562  ns/op
SecondarySupersLookup.testPositive01  avgt   15   21.622 ±  0.482  ns/op
SecondarySupersLookup.testPositive02  avgt   15   21.842 ±  0.602  ns/op
SecondarySupersLookup.testPositive03  avgt   15   22.274 ±  0.516  ns/op
SecondarySupersLookup.testPositive04  avgt   15   21.833 ±  0.632  ns/op
SecondarySupersLookup.testPositive05  avgt   15   21.842 ±  0.603  ns/op
SecondarySupersLookup.testPositive06  avgt   15   21.630 ±  0.527  ns/op
SecondarySupersLookup.testPositive07  avgt   15   22.054 ±  0.581  ns/op
SecondarySupersLookup.testPositive08  avgt   15   21.872 ±  0.613  ns/op
SecondarySupersLookup.testPositive09  avgt   15   21.839 ±  0.604  ns/op
SecondarySupersLookup.testPositive10  avgt   15   21.619 ±  0.494  ns/op
SecondarySupersLookup.testPositive16  avgt   15   21.624 ±  0.509  ns/op
SecondarySupersLookup.testPositive20  avgt   15   21.828 ±  0.595  ns/op
SecondarySupersLookup.testPositive30  avgt   15   21.861 ±  0.617  ns/op
SecondarySupersLookup.testPositive32  avgt   15   22.141 ±  0.609  ns/op
SecondarySupersLookup.testPositive40  avgt   15   21.632 ±  0.485  ns/op
SecondarySupersLookup.testPositive50  avgt   15   21.856 ±  0.597  ns/op
SecondarySupersLookup.testPositive60  avgt   15   22.068 ±  0.610  ns/op
SecondarySupersLookup.testPositive63  avgt   15   21.647 ±  0.496  ns/op
SecondarySupersLookup.testPositive64  avgt   15   21.847 ±  0.595  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'

With the above test data, we see that there is a performance decrease from testNegative55 to testNegative64 when zbb is not available. This is because a loop is needed to count the number of 1 when Zbb is not available. Therefore, the current patch is only optimized when Zbb is available.

@zifeihan
Copy link
Member Author

zifeihan commented Jun 3, 2024

Hey, is there performance comparison data when zbb is not supported? Just want to know if it can also get performance gain in that situation, then this optimization is a more generic one.

The reason for ask this question is that seems JDK-8180450 is an optimization for cache line, it should or could bring benefit even if zbb is not supported?

Hi, Thanks for the review. Sorry for being late. I've implemented the scalar register version before, but tested it and had data performance decrease, so the scalar implementation is not included in this Patch.

Copy link

@Hamlin-Li Hamlin-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanation. Seems this patch will also use some instructions in zba/zbs extension. Will they also impact the final performance?

Also has some minor comments first.

@zifeihan
Copy link
Member Author

zifeihan commented Jun 3, 2024

Thanks for explanation. Seems this patch will also use some instructions in zba/zbs extension. Will they also impact the final performance?

Also has some minor comments first.

I'm sorry for the Banana Pi BPI-F3 board data I forgot to specify that Banana Pi BPI-F3 board doesn't enable zba, zbb, zbs by default, here zbb is enabled manually by me. So the Banana Pi BPI-F3 board JMH data here is the performance JMH data without zba,zbs enabled.

Copy link

@Hamlin-Li Hamlin-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments.

@openjdk openjdk bot removed the rfr Pull request is ready for review label Jun 4, 2024
@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 4, 2024
@zifeihan
Copy link
Member Author

zifeihan commented Jun 4, 2024

@Hamlin-Li : Hi Hamlin, I've fixed all of the above comments. I've enabled Zba, Zbb, Zbs and tested JMH before and the performance has improved a bit compared to just enabling Zbb, I'll be re-testing the JMH with Zba, Zbb,Zbs enabled today and I'll update here when I'm done with it.

@zifeihan
Copy link
Member Author

zifeihan commented Jun 4, 2024

JMH tested on Banana Pi BPI-F3 board (has Zba,Zbb,Zbs) and Enable UseZba, UseZbb, UseZbs
Original(not with patch):

Benchmark                             Mode  Cnt    Score   Error  Units
SecondarySupersLookup.testNegative00  avgt   15   16.405 ± 0.221  ns/op                                                                SecondarySupersLookup.testNegative01  avgt   15   18.276 ± 0.251  ns/op
SecondarySupersLookup.testNegative02  avgt   15   21.519 ± 0.314  ns/op
SecondarySupersLookup.testNegative03  avgt   15   24.434 ± 0.370  ns/op                                                                SecondarySupersLookup.testNegative04  avgt   15   27.246 ± 0.441  ns/op                                                                SecondarySupersLookup.testNegative05  avgt   15   29.652 ± 0.397  ns/op
SecondarySupersLookup.testNegative06  avgt   15   32.049 ± 0.533  ns/op
SecondarySupersLookup.testNegative07  avgt   15   33.568 ± 0.501  ns/op
SecondarySupersLookup.testNegative08  avgt   15   35.593 ± 0.606  ns/op
SecondarySupersLookup.testNegative09  avgt   15   37.220 ± 0.407  ns/op
SecondarySupersLookup.testNegative10  avgt   15   39.522 ± 0.511  ns/op
SecondarySupersLookup.testNegative16  avgt   15   51.146 ± 0.667  ns/op
SecondarySupersLookup.testNegative20  avgt   15   58.404 ± 0.654  ns/op
SecondarySupersLookup.testNegative30  avgt   15   77.190 ± 0.796  ns/op
SecondarySupersLookup.testNegative32  avgt   15   81.144 ± 0.761  ns/op
SecondarySupersLookup.testNegative40  avgt   15   96.018 ± 0.733  ns/op
SecondarySupersLookup.testNegative50  avgt   15  115.170 ± 0.876  ns/op
SecondarySupersLookup.testNegative55  avgt   15  125.827 ± 4.322  ns/op
SecondarySupersLookup.testNegative56  avgt   15  126.151 ± 0.979  ns/op
SecondarySupersLookup.testNegative57  avgt   15  129.638 ± 4.326  ns/op
SecondarySupersLookup.testNegative58  avgt   15  131.960 ± 4.584  ns/op
SecondarySupersLookup.testNegative59  avgt   15  131.403 ± 1.051  ns/op
SecondarySupersLookup.testNegative60  avgt   15  133.660 ± 0.888  ns/op
SecondarySupersLookup.testNegative61  avgt   15  137.293 ± 4.852  ns/op
SecondarySupersLookup.testNegative62  avgt   15  137.476 ± 1.081  ns/op
SecondarySupersLookup.testNegative63  avgt   15  139.028 ± 1.026  ns/op
SecondarySupersLookup.testNegative64  avgt   15  143.545 ± 5.011  ns/op
SecondarySupersLookup.testPositive01  avgt   15   10.734 ± 0.156  ns/op
SecondarySupersLookup.testPositive02  avgt   15   10.727 ± 0.145  ns/op
SecondarySupersLookup.testPositive03  avgt   15   10.729 ± 0.149  ns/op
SecondarySupersLookup.testPositive04  avgt   15   10.724 ± 0.140  ns/op
SecondarySupersLookup.testPositive05  avgt   15   10.730 ± 0.152  ns/op
SecondarySupersLookup.testPositive06  avgt   15   10.726 ± 0.143  ns/op
SecondarySupersLookup.testPositive07  avgt   15   10.731 ± 0.151  ns/op
SecondarySupersLookup.testPositive08  avgt   15   10.735 ± 0.158  ns/op
SecondarySupersLookup.testPositive09  avgt   15   10.731 ± 0.152  ns/op
SecondarySupersLookup.testPositive10  avgt   15   10.728 ± 0.147  ns/op
SecondarySupersLookup.testPositive16  avgt   15   10.732 ± 0.153  ns/op
SecondarySupersLookup.testPositive20  avgt   15   10.727 ± 0.145  ns/op
SecondarySupersLookup.testPositive30  avgt   15   10.749 ± 0.170  ns/op
SecondarySupersLookup.testPositive32  avgt   15   10.732 ± 0.153  ns/op
SecondarySupersLookup.testPositive40  avgt   15   10.732 ± 0.153  ns/op
SecondarySupersLookup.testPositive50  avgt   15   10.732 ± 0.153  ns/op
SecondarySupersLookup.testPositive60  avgt   15   10.733 ± 0.155  ns/op
SecondarySupersLookup.testPositive63  avgt   15   10.733 ± 0.153  ns/op
SecondarySupersLookup.testPositive64  avgt   15   10.735 ± 0.158  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'

With patch:

Benchmark                             Mode  Cnt    Score   Error  Units
SecondarySupersLookup.testNegative00  avgt   15   13.270 ± 0.201  ns/op
SecondarySupersLookup.testNegative01  avgt   15   13.261 ± 0.194  ns/op
SecondarySupersLookup.testNegative02  avgt   15   13.265 ± 0.198  ns/op
SecondarySupersLookup.testNegative03  avgt   15   13.274 ± 0.222  ns/op
SecondarySupersLookup.testNegative04  avgt   15   13.262 ± 0.197  ns/op
SecondarySupersLookup.testNegative05  avgt   15   13.264 ± 0.200  ns/op
SecondarySupersLookup.testNegative06  avgt   15   13.259 ± 0.193  ns/op
SecondarySupersLookup.testNegative07  avgt   15   13.261 ± 0.196  ns/op
SecondarySupersLookup.testNegative08  avgt   15   13.255 ± 0.186  ns/op
SecondarySupersLookup.testNegative09  avgt   15   13.267 ± 0.200  ns/op
SecondarySupersLookup.testNegative10  avgt   15   13.265 ± 0.200  ns/op
SecondarySupersLookup.testNegative16  avgt   15   13.277 ± 0.220  ns/op
SecondarySupersLookup.testNegative20  avgt   15   13.270 ± 0.209  ns/op
SecondarySupersLookup.testNegative30  avgt   15   13.279 ± 0.223  ns/op
SecondarySupersLookup.testNegative32  avgt   15   13.284 ± 0.232  ns/op
SecondarySupersLookup.testNegative40  avgt   15   13.288 ± 0.237  ns/op
SecondarySupersLookup.testNegative50  avgt   15   13.290 ± 0.241  ns/op
SecondarySupersLookup.testNegative55  avgt   15   51.179 ± 0.761  ns/op
SecondarySupersLookup.testNegative56  avgt   15   51.175 ± 0.763  ns/op
SecondarySupersLookup.testNegative57  avgt   15   51.550 ± 1.070  ns/op
SecondarySupersLookup.testNegative58  avgt   15   51.182 ± 0.737  ns/op
SecondarySupersLookup.testNegative59  avgt   15   51.169 ± 0.773  ns/op
SecondarySupersLookup.testNegative60  avgt   15   74.605 ± 1.445  ns/op
SecondarySupersLookup.testNegative61  avgt   15   74.434 ± 1.006  ns/op
SecondarySupersLookup.testNegative62  avgt   15   74.587 ± 1.078  ns/op
SecondarySupersLookup.testNegative63  avgt   15  155.881 ± 0.856  ns/op
SecondarySupersLookup.testNegative64  avgt   15  158.028 ± 5.778  ns/op
SecondarySupersLookup.testPositive01  avgt   15   10.744 ± 0.176  ns/op
SecondarySupersLookup.testPositive02  avgt   15   10.731 ± 0.151  ns/op
SecondarySupersLookup.testPositive03  avgt   15   10.727 ± 0.146  ns/op
SecondarySupersLookup.testPositive04  avgt   15   10.732 ± 0.153  ns/op
SecondarySupersLookup.testPositive05  avgt   15   10.728 ± 0.147  ns/op
SecondarySupersLookup.testPositive06  avgt   15   10.731 ± 0.151  ns/op
SecondarySupersLookup.testPositive07  avgt   15   10.725 ± 0.143  ns/op
SecondarySupersLookup.testPositive08  avgt   15   10.730 ± 0.148  ns/op
SecondarySupersLookup.testPositive09  avgt   15   10.734 ± 0.156  ns/op
SecondarySupersLookup.testPositive10  avgt   15   10.726 ± 0.144  ns/op
SecondarySupersLookup.testPositive16  avgt   15   10.727 ± 0.146  ns/op
SecondarySupersLookup.testPositive20  avgt   15   10.731 ± 0.152  ns/op
SecondarySupersLookup.testPositive30  avgt   15   10.729 ± 0.150  ns/op
SecondarySupersLookup.testPositive32  avgt   15   10.728 ± 0.147  ns/op
SecondarySupersLookup.testPositive40  avgt   15   10.745 ± 0.181  ns/op
SecondarySupersLookup.testPositive50  avgt   15   10.732 ± 0.153  ns/op
SecondarySupersLookup.testPositive60  avgt   15   10.729 ± 0.148  ns/op
SecondarySupersLookup.testPositive63  avgt   15   10.734 ± 0.158  ns/op
SecondarySupersLookup.testPositive64  avgt   15   10.732 ± 0.154  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'

@Hamlin-Li
Copy link

Hamlin-Li commented Jun 4, 2024

There are a bit regression in cases of testNegative63/64, although these might be rare cases or not very common cases, but it's worth to have a try to improve it if possible.
I guess it's related to the implementation for the cases when bitmap is full. When it's full, before go to repne_scan, there're some instructions to execute. I wonder if it will help to have another "bitmap full test" just after "bitmap false test" (which is test_bit(t0, r_bitmap, bit);). But I'm not sure if it's feasible, maybe worth a try.

@zifeihan
Copy link
Member Author

zifeihan commented Jun 7, 2024

There are a bit regression in cases of testNegative63/64, although these might be rare cases or not very common cases, but it's worth to have a try to improve it if possible. I guess it's related to the implementation for the cases when bitmap is full. When it's full, before go to repne_scan, there're some instructions to execute. I wonder if it will help to have another "bitmap full test" just after "bitmap false test" (which is test_bit(t0, r_bitmap, bit);). But I'm not sure if it's feasible, maybe worth a try.

Hi, Sorry for being late. Thanks for the suggestion, I gave it a try.

diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
index 61e8016db4a..af1649c061f 100644
--- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp
@@ -3664,6 +3664,32 @@ bool MacroAssembler::lookup_secondary_supers_table(Register r_sub_klass,
   test_bit(t0, r_bitmap, bit);
   beqz(t0, L_fallthrough);
 
+  // We will consult the secondary-super array.
+  ld(r_array_base, Address(r_sub_klass, in_bytes(Klass::secondary_supers_offset())));
+
+  mv(tmp3, r_bitmap);
+  if (bit != 0) {
+    ror_imm(tmp3, tmp3, bit);
+  }
+
+  Label skip;
+  addi(t0, tmp3, (u1)1);
+  bnez(t0, skip);
+
+    // Load the array length.
+  lwu(r_array_length, Address(r_array_base, Array<Klass*>::length_offset_in_bytes()));
+  // And adjust the array base to point to the data.
+  // NB! Effectively increments current slot index by 1.
+  assert(Array<Klass*>::base_offset_in_bytes() == wordSize, "");
+  addi(r_array_base, r_array_base, Array<Klass*>::base_offset_in_bytes());
+  
+  repne_scan(r_array_base, r_super_klass, r_array_length, t0);
+  bne(r_super_klass, t0, L_fallthrough);
+  mv(result, zr);
+  beqz(result, L_fallthrough);
+  
+  bind(skip);
+
   // Get the first array index that can contain super_klass into r_array_index.
   if (bit != 0) {
     slli(r_array_index, r_bitmap, (Klass::SECONDARY_SUPERS_TABLE_MASK - bit));
@@ -3672,9 +3698,6 @@ bool MacroAssembler::lookup_secondary_supers_table(Register r_sub_klass,
     mv(r_array_index, (u1)1);
   }
 
-  // We will consult the secondary-super array.
-  ld(r_array_base, Address(r_sub_klass, in_bytes(Klass::secondary_supers_offset())));
-
   // The value i in r_array_index is >= 1, so even though r_array_base
   // points to the length, we don't need to adjust it to point to the data.
   assert(Array<Klass*>::base_offset_in_bytes() == wordSize, "Adjust this code");

JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs

Benchmark                             Mode  Cnt    Score   Error  Units
SecondarySupersLookup.testNegative00  avgt   15   13.262 ± 0.197  ns/op
SecondarySupersLookup.testNegative01  avgt   15   13.273 ± 0.222  ns/op
SecondarySupersLookup.testNegative02  avgt   15   13.264 ± 0.199  ns/op
SecondarySupersLookup.testNegative03  avgt   15   13.275 ± 0.222  ns/op
SecondarySupersLookup.testNegative04  avgt   15   13.264 ± 0.198  ns/op
SecondarySupersLookup.testNegative05  avgt   15   13.259 ± 0.192  ns/op
SecondarySupersLookup.testNegative06  avgt   15   13.260 ± 0.195  ns/op
SecondarySupersLookup.testNegative07  avgt   15   13.275 ± 0.221  ns/op
SecondarySupersLookup.testNegative08  avgt   15   13.261 ± 0.196  ns/op
SecondarySupersLookup.testNegative09  avgt   15   13.267 ± 0.201  ns/op
SecondarySupersLookup.testNegative10  avgt   15   13.272 ± 0.211  ns/op
SecondarySupersLookup.testNegative16  avgt   15   13.271 ± 0.200  ns/op
SecondarySupersLookup.testNegative20  avgt   15   13.271 ± 0.210  ns/op
SecondarySupersLookup.testNegative30  avgt   15   13.277 ± 0.219  ns/op
SecondarySupersLookup.testNegative32  avgt   15   13.280 ± 0.224  ns/op
SecondarySupersLookup.testNegative40  avgt   15   13.285 ± 0.232  ns/op
SecondarySupersLookup.testNegative50  avgt   15   13.288 ± 0.237  ns/op
SecondarySupersLookup.testNegative55  avgt   15   54.940 ± 0.771  ns/op
SecondarySupersLookup.testNegative56  avgt   15   54.934 ± 0.798  ns/op
SecondarySupersLookup.testNegative57  avgt   15   54.909 ± 0.766  ns/op
SecondarySupersLookup.testNegative58  avgt   15   54.679 ± 0.830  ns/op
SecondarySupersLookup.testNegative59  avgt   15   54.941 ± 0.819  ns/op
SecondarySupersLookup.testNegative60  avgt   15   76.957 ± 0.945  ns/op
SecondarySupersLookup.testNegative61  avgt   15   76.956 ± 1.007  ns/op
SecondarySupersLookup.testNegative62  avgt   15   76.938 ± 0.976  ns/op
SecondarySupersLookup.testNegative63  avgt   15  138.371 ± 1.192  ns/op
SecondarySupersLookup.testNegative64  avgt   15  140.137 ± 1.077  ns/op
SecondarySupersLookup.testPositive01  avgt   15   10.734 ± 0.149  ns/op
SecondarySupersLookup.testPositive02  avgt   15   10.730 ± 0.150  ns/op
SecondarySupersLookup.testPositive03  avgt   15   10.727 ± 0.146  ns/op
SecondarySupersLookup.testPositive04  avgt   15   10.735 ± 0.157  ns/op
SecondarySupersLookup.testPositive05  avgt   15   10.730 ± 0.149  ns/op
SecondarySupersLookup.testPositive06  avgt   15   10.735 ± 0.155  ns/op
SecondarySupersLookup.testPositive07  avgt   15   10.730 ± 0.149  ns/op
SecondarySupersLookup.testPositive08  avgt   15   10.730 ± 0.149  ns/op
SecondarySupersLookup.testPositive09  avgt   15   10.731 ± 0.151  ns/op
SecondarySupersLookup.testPositive10  avgt   15   10.728 ± 0.148  ns/op
SecondarySupersLookup.testPositive16  avgt   15   10.733 ± 0.151  ns/op
SecondarySupersLookup.testPositive20  avgt   15   10.728 ± 0.146  ns/op
SecondarySupersLookup.testPositive30  avgt   15   10.737 ± 0.162  ns/op
SecondarySupersLookup.testPositive32  avgt   15   10.733 ± 0.154  ns/op
SecondarySupersLookup.testPositive40  avgt   15   10.729 ± 0.148  ns/op
SecondarySupersLookup.testPositive50  avgt   15   10.732 ± 0.151  ns/op
SecondarySupersLookup.testPositive60  avgt   15   10.729 ± 0.148  ns/op
SecondarySupersLookup.testPositive63  avgt   15   10.730 ± 0.150  ns/op
SecondarySupersLookup.testPositive64  avgt   15   10.735 ± 0.158  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'

After the fix it got better with the cases of testNegative63/64. As said above, this case is very rare, Is there a need for this additional complexity?

@Hamlin-Li
Copy link

Thanks for updating!

With the fix, although it improves the perf for testNegative63/64, but seems it brings some regression for testNegative55-62, in this sense the fix should not be taken.
I'll take another look, sorry for long waiting.

Copy link

@Hamlin-Li Hamlin-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience.
I was thinking to jump to L_bitmap_full in lookup_secondary_supers_table_slow_path, in this way I guess it might address the performance issue when bitmap full, and not introduce regression in other cases. But I'm not sure how much complication will be brought into the implementation. So, let's skip this rare case optmization.

Some minor comment, otherwise looks good.

@theRealAph
Copy link
Contributor

There are a bit regression in cases of testNegative63/64, although these might be rare cases or not very common cases, but it's worth to have a try to improve it if possible. I guess it's related to the implementation for the cases when bitmap is full. When it's full, before go to repne_scan, there're some instructions to execute. I wonder if it will help to have another "bitmap full test" just after "bitmap false test" (which is test_bit(t0, r_bitmap, bit);). But I'm not sure if it's feasible, maybe worth a try.

So many superinterfaces is very rate. So rare, in fact, that it may never have happened in production Java code.

@theRealAph
Copy link
Contributor

It's worth running the "before" test with -XX:-UseSecondarySupersCache. This gives you a much better idea of the cost when you don't get any hit on the one-element secondary supers cache.

@zifeihan
Copy link
Member Author

It's worth running the "before" test with -XX:-UseSecondarySupersCache. This gives you a much better idea of the cost when you don't get any hit on the one-element secondary supers cache.

JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, and use -XX:-UseSecondarySupersCache to disable UseSecondarySupersCache
Original(not with patch):

Benchmark                             Mode  Cnt    Score   Error  Units                     [81/1889]
SecondarySupersLookup.testNegative00  avgt   15   15.153 ± 0.219  ns/op
SecondarySupersLookup.testNegative01  avgt   15   17.029 ± 0.287  ns/op
SecondarySupersLookup.testNegative02  avgt   15   20.242 ± 0.353  ns/op
SecondarySupersLookup.testNegative03  avgt   15   23.737 ± 0.515  ns/op
SecondarySupersLookup.testNegative04  avgt   15   26.097 ± 0.477  ns/op
SecondarySupersLookup.testNegative05  avgt   15   28.460 ± 0.546  ns/op
SecondarySupersLookup.testNegative06  avgt   15   31.025 ± 0.622  ns/op
SecondarySupersLookup.testNegative07  avgt   15   32.070 ± 0.518  ns/op
SecondarySupersLookup.testNegative08  avgt   15   34.656 ± 0.586  ns/op
SecondarySupersLookup.testNegative09  avgt   15   36.140 ± 0.622  ns/op
SecondarySupersLookup.testNegative10  avgt   15   38.304 ± 0.577  ns/op
SecondarySupersLookup.testNegative16  avgt   15   49.672 ± 0.726  ns/op
SecondarySupersLookup.testNegative20  avgt   15   57.241 ± 0.709  ns/op
SecondarySupersLookup.testNegative30  avgt   15   76.189 ± 0.804  ns/op
SecondarySupersLookup.testNegative32  avgt   15   79.821 ± 0.809  ns/op
SecondarySupersLookup.testNegative40  avgt   15   95.006 ± 0.999  ns/op
SecondarySupersLookup.testNegative50  avgt   15  113.808 ± 0.933  ns/op
SecondarySupersLookup.testNegative55  avgt   15  122.897 ± 1.237  ns/op
SecondarySupersLookup.testNegative56  avgt   15  125.180 ± 1.005  ns/op
SecondarySupersLookup.testNegative57  avgt   15  126.606 ± 0.925  ns/op
SecondarySupersLookup.testNegative58  avgt   15  128.890 ± 0.809  ns/op
SecondarySupersLookup.testNegative59  avgt   15  130.382 ± 1.092  ns/op
SecondarySupersLookup.testNegative60  avgt   15  132.426 ± 1.045  ns/op
SecondarySupersLookup.testNegative61  avgt   15  133.953 ± 1.062  ns/op
SecondarySupersLookup.testNegative62  avgt   15  136.156 ± 0.974  ns/op
SecondarySupersLookup.testNegative63  avgt   15  137.958 ± 1.172  ns/op
SecondarySupersLookup.testNegative64  avgt   15  142.439 ± 4.703  ns/op
SecondarySupersLookup.testPositive01  avgt   15   17.030 ± 0.218  ns/op
SecondarySupersLookup.testPositive02  avgt   15   20.688 ± 0.793  ns/op
SecondarySupersLookup.testPositive03  avgt   15   24.253 ± 0.566  ns/op
SecondarySupersLookup.testPositive04  avgt   15   27.154 ± 0.495  ns/op
SecondarySupersLookup.testPositive05  avgt   15   28.596 ± 0.892  ns/op
SecondarySupersLookup.testPositive06  avgt   15   30.846 ± 0.304  ns/op
SecondarySupersLookup.testPositive07  avgt   15   32.960 ± 0.564  ns/op
SecondarySupersLookup.testPositive08  avgt   15   34.706 ± 0.377  ns/op
SecondarySupersLookup.testPositive09  avgt   15   36.615 ± 0.453  ns/op
SecondarySupersLookup.testPositive10  avgt   15   38.760 ± 0.462  ns/op
SecondarySupersLookup.testPositive16  avgt   15   49.848 ± 0.489  ns/op
SecondarySupersLookup.testPositive20  avgt   15   57.419 ± 0.467  ns/op
SecondarySupersLookup.testPositive30  avgt   15   76.303 ± 0.530  ns/op
SecondarySupersLookup.testPositive32  avgt   15   79.537 ± 0.323  ns/op
SecondarySupersLookup.testPositive40  avgt   15   94.493 ± 0.565  ns/op
SecondarySupersLookup.testPositive50  avgt   15  113.451 ± 1.932  ns/op
SecondarySupersLookup.testPositive60  avgt   15  133.233 ± 4.262  ns/op
SecondarySupersLookup.testPositive63  avgt   15  137.090 ± 2.027  ns/op
SecondarySupersLookup.testPositive64  avgt   15  138.784 ± 1.650  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'

With patch

Benchmark                             Mode  Cnt    Score   Error  Units
SecondarySupersLookup.testNegative00  avgt   15   12.621 ± 0.172  ns/op
SecondarySupersLookup.testNegative01  avgt   15   12.614 ± 0.163  ns/op
SecondarySupersLookup.testNegative02  avgt   15   12.619 ± 0.169  ns/op
SecondarySupersLookup.testNegative03  avgt   15   12.621 ± 0.171  ns/op
SecondarySupersLookup.testNegative04  avgt   15   12.617 ± 0.167  ns/op
SecondarySupersLookup.testNegative05  avgt   15   12.618 ± 0.167  ns/op
SecondarySupersLookup.testNegative06  avgt   15   12.626 ± 0.180  ns/op
SecondarySupersLookup.testNegative07  avgt   15   12.621 ± 0.175  ns/op
SecondarySupersLookup.testNegative08  avgt   15   12.623 ± 0.176  ns/op
SecondarySupersLookup.testNegative09  avgt   15   12.626 ± 0.177  ns/op
SecondarySupersLookup.testNegative10  avgt   15   12.625 ± 0.181  ns/op
SecondarySupersLookup.testNegative16  avgt   15   12.625 ± 0.178  ns/op
SecondarySupersLookup.testNegative20  avgt   15   12.624 ± 0.178  ns/op
SecondarySupersLookup.testNegative30  avgt   15   12.635 ± 0.195  ns/op
SecondarySupersLookup.testNegative32  avgt   15   12.638 ± 0.200  ns/op
SecondarySupersLookup.testNegative40  avgt   15   12.644 ± 0.209  ns/op
SecondarySupersLookup.testNegative50  avgt   15   12.646 ± 0.212  ns/op
SecondarySupersLookup.testNegative55  avgt   15   51.029 ± 0.833  ns/op
SecondarySupersLookup.testNegative56  avgt   15   51.484 ± 1.074  ns/op
SecondarySupersLookup.testNegative57  avgt   15   51.170 ± 0.731  ns/op
SecondarySupersLookup.testNegative58  avgt   15   51.775 ± 1.573  ns/op
SecondarySupersLookup.testNegative59  avgt   15   51.000 ± 0.919  ns/op
SecondarySupersLookup.testNegative60  avgt   15   73.169 ± 0.950  ns/op
SecondarySupersLookup.testNegative61  avgt   15   73.537 ± 1.235  ns/op
SecondarySupersLookup.testNegative62  avgt   15   75.116 ± 4.232  ns/op
SecondarySupersLookup.testNegative63  avgt   15  153.908 ± 1.126  ns/op
SecondarySupersLookup.testNegative64  avgt   15  155.937 ± 1.101  ns/op
SecondarySupersLookup.testPositive01  avgt   15   17.656 ± 0.220  ns/op
SecondarySupersLookup.testPositive02  avgt   15   17.012 ± 0.192  ns/op
SecondarySupersLookup.testPositive03  avgt   15   17.643 ± 0.200  ns/op
SecondarySupersLookup.testPositive04  avgt   15   17.640 ± 0.196  ns/op
SecondarySupersLookup.testPositive05  avgt   15   17.018 ± 0.203  ns/op
SecondarySupersLookup.testPositive06  avgt   15   17.643 ± 0.201  ns/op
SecondarySupersLookup.testPositive07  avgt   15   17.648 ± 0.208  ns/op
SecondarySupersLookup.testPositive08  avgt   15   17.652 ± 0.214  ns/op
SecondarySupersLookup.testPositive09  avgt   15   17.650 ± 0.207  ns/op
SecondarySupersLookup.testPositive10  avgt   15   17.026 ± 0.213  ns/op
SecondarySupersLookup.testPositive16  avgt   15   17.649 ± 0.209  ns/op
SecondarySupersLookup.testPositive20  avgt   15   17.652 ± 0.216  ns/op
SecondarySupersLookup.testPositive30  avgt   15   37.097 ± 0.301  ns/op
SecondarySupersLookup.testPositive32  avgt   15   37.351 ± 0.459  ns/op
SecondarySupersLookup.testPositive40  avgt   15   42.951 ± 1.256  ns/op
SecondarySupersLookup.testPositive50  avgt   15   17.642 ± 0.194  ns/op
SecondarySupersLookup.testPositive60  avgt   15   37.099 ± 0.300  ns/op
SecondarySupersLookup.testPositive63  avgt   15  153.592 ± 6.942  ns/op
SecondarySupersLookup.testPositive64  avgt   15  153.214 ± 2.070  ns/op
Finished running test 'micro:vm.lang.SecondarySupersLookup'

@theRealAph : Hi, Can you take a look? Thanks

@theRealAph
Copy link
Contributor

It's worth running the "before" test with -XX:-UseSecondarySupersCache. This gives you a much better idea of the cost when you don't get any hit on the one-element secondary supers cache.

@theRealAph : Hi, Can you take a look? Thanks

That all looks like it's working as expected.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 17, 2024
Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zifeihan
Copy link
Member Author

@Hamlin-Li @RealFYang @theRealAph : In the case of scalar register implementations, as discussed above, Such huge numbers of secondary supers don't occur in real-world code., can we add scalar register implementations to this PR as well?

@Hamlin-Li
Copy link

@Hamlin-Li @RealFYang @theRealAph : In the case of scalar register implementations, as discussed above, Such huge numbers of secondary supers don't occur in real-world code., can we add scalar register implementations to this PR as well?

I suggest to add scalar version (i.e. use population_count instead when zbb is not supported).

@zifeihan
Copy link
Member Author

Thanks all for the review.
/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Jun 20, 2024
@openjdk
Copy link

openjdk bot commented Jun 20, 2024

@zifeihan
Your change (at version 0e81d27) is now ready to be sponsored by a Committer.

@Hamlin-Li
Copy link

/sponsor

@openjdk
Copy link

openjdk bot commented Jun 20, 2024

Going to push as commit 001d686.
Since your change was applied there have been 18 commits pushed to the master branch:

  • 5cad0b4: 8322708: Global HTML attributes are not allowed
  • 6420846: 8334396: RISC-V: verify perf of ReverseBytesI/L
  • c6f3bf4: 8334026: Provide a diagnostic PrintMemoryMapAtExit switch on Linux
  • cabd104: 8334164: The fix for JDK-8322811 should use _filename.is_set() rather than strcmp()
  • d7dad50: 8334544: C2: wrong control assigned in PhaseIdealLoop::clone_assertion_predicate_for_unswitched_loops()
  • ff30240: 8334239: Introduce macro for ubsan method/function exclusions
  • 2d4185f: 8332717: ZGC: Division by zero in heuristics
  • fad6644: 8333754: Add a Test against ECDSA and ECDH NIST Test vector
  • b211929: 8334570: Problem list gc/TestAlwaysPreTouchBehavior.java
  • 4e58d8c: 8309821: Link to hidden classes section in Class specification for Class::isHidden
  • ... and 8 more: https://git.openjdk.org/jdk/compare/48621ae193ef70b2fae4dcb7ddc524f349beb131...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 20, 2024
@openjdk openjdk bot closed this Jun 20, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Jun 20, 2024
@openjdk
Copy link

openjdk bot commented Jun 20, 2024

@Hamlin-Li @zifeihan Pushed as commit 001d686.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.


// Check for wraparound.
Label skip;
bge(r_array_length, r_array_index, skip);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this test is correct? I would have thought it would be bgt. If length == index, then you must set index to 0. I would have expected this to fail testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Yes, I agree that we should use bgt here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this test is correct? I would have thought it would be bgt. If length == index, then you must set index to 0. I would have expected this to fail testing.

Thanks, I will fix it like blt(r_array_index,r_array_length,skip); Linked JBS: https://bugs.openjdk.org/browse/JDK-8334843.

@zifeihan zifeihan deleted the JDK-8332587 branch June 26, 2024 03:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

None yet

4 participants