-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8292289: [vectorapi] Improve the implementation of VectorTestNode #9855
Conversation
👋 Welcome back merykitty! A progress list of the required criteria for merging this PR into |
@merykitty The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
/label hotspot-compiler |
@merykitty |
Webrevs
|
Please rebase this PR and resolve the conflicts. Then I can have an internal tests for the aarch64 change parts. Thanks a lot! |
Similar to what has been discussed in #10192 , a mask of 512 bits is not matched into an XMMRegister, I have removed the untaken code paths. |
@merykitty This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
May I have another review for this PR, please? Thank you very much. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm running some tests on AArch64 platform (both Neon and SVE).
@shqking Thanks a lot for your reviews, I have addressed those in the commits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My test passed.
Performance testing.
Here shows the data on AArch64 Neon.
Before After
Benchmark (prefix) (size) Mode Cnt Score Error Score Error Units Change
ArrayMismatchBenchmark.mismatchVectorByte 0.5 9 thrpt 5 188228.938 ± 1304.492 188036.510 ± 1566.468 ops/ms -0.1%
ArrayMismatchBenchmark.mismatchVectorByte 0.5 257 thrpt 5 50714.740 ± 251.973 52319.498 ± 73.201 ops/ms 3.2%
ArrayMismatchBenchmark.mismatchVectorByte 0.5 100000 thrpt 5 188.442 ± 0.829 226.975 ± 1.196 ops/ms 20.4%
ArrayMismatchBenchmark.mismatchVectorByte 1.0 9 thrpt 5 107464.833 ± 7.967 107461.853 ± 17.613 ops/ms 0.0%
ArrayMismatchBenchmark.mismatchVectorByte 1.0 257 thrpt 5 27873.228 ± 298.765 28854.655 ± 108.520 ops/ms 3.5%
ArrayMismatchBenchmark.mismatchVectorByte 1.0 100000 thrpt 5 91.318 ± 0.032 90.234 ± 0.049 ops/ms -1.2%
ArrayMismatchBenchmark.mismatchVectorDouble 0.5 9 thrpt 5 104424.609 ± 35.394 111375.725 ± 336.651 ops/ms 6.7%
ArrayMismatchBenchmark.mismatchVectorDouble 0.5 257 thrpt 5 9466.861 ± 46.815 9523.362 ± 12.216 ops/ms 0.6%
ArrayMismatchBenchmark.mismatchVectorDouble 0.5 100000 thrpt 5 21.572 ± 0.054 22.462 ± 0.273 ops/ms 4.1%
ArrayMismatchBenchmark.mismatchVectorDouble 1.0 9 thrpt 5 65201.598 ± 1202.724 70891.579 ± 576.866 ops/ms 8.7%
ArrayMismatchBenchmark.mismatchVectorDouble 1.0 257 thrpt 5 3931.683 ± 0.432 4241.834 ± 0.531 ops/ms 7.9%
ArrayMismatchBenchmark.mismatchVectorDouble 1.0 100000 thrpt 5 9.641 ± 0.005 10.209 ± 0.007 ops/ms 5.9%
ArrayMismatchBenchmark.mismatchVectorInt 0.5 9 thrpt 5 112517.132 ± 1266.658 117607.730 ± 658.935 ops/ms 4.5%
ArrayMismatchBenchmark.mismatchVectorInt 0.5 257 thrpt 5 14627.711 ± 135.210 19549.735 ± 169.208 ops/ms 33.6%
ArrayMismatchBenchmark.mismatchVectorInt 0.5 100000 thrpt 5 40.599 ± 0.116 45.500 ± 0.105 ops/ms 12.1%
ArrayMismatchBenchmark.mismatchVectorInt 1.0 9 thrpt 5 86951.685 ± 770.519 88705.394 ± 112.681 ops/ms 2.0%
ArrayMismatchBenchmark.mismatchVectorInt 1.0 257 thrpt 5 8229.636 ± 6.450 9400.670 ± 0.555 ops/ms 14.2%
ArrayMismatchBenchmark.mismatchVectorInt 1.0 100000 thrpt 5 20.437 ± 0.032 25.996 ± 0.142 ops/ms 27.2%
ArrayMismatchBenchmark.mismatchVectorLong 0.5 9 thrpt 5 98752.429 ± 8.477 106053.731 ± 168.779 ops/ms 7.4%
ArrayMismatchBenchmark.mismatchVectorLong 0.5 257 thrpt 5 9486.680 ± 113.035 9888.039 ± 10.357 ops/ms 4.2%
ArrayMismatchBenchmark.mismatchVectorLong 0.5 100000 thrpt 5 22.884 ± 0.118 22.469 ± 0.096 ops/ms -1.8%
ArrayMismatchBenchmark.mismatchVectorLong 1.0 9 thrpt 5 72150.373 ± 441.019 71092.863 ± 746.174 ops/ms -1.5%
ArrayMismatchBenchmark.mismatchVectorLong 1.0 257 thrpt 5 4604.599 ± 42.457 4690.037 ± 7.356 ops/ms 1.9%
ArrayMismatchBenchmark.mismatchVectorLong 1.0 100000 thrpt 5 11.147 ± 0.013 11.423 ± 0.012 ops/ms 2.5%
Here shows the data on AArch64 256-bit SVE.
Before After
Benchmark (prefix) (size) Mode Cnt Score Error Score Error Units Change
ArrayMismatchBenchmark.mismatchVectorByte 0.5 9 thrpt 5 332188.434 ± 1867.441 326994.114 ± 9458.795 ops/ms -1.6%
ArrayMismatchBenchmark.mismatchVectorByte 0.5 257 thrpt 5 107444.966 ± 5050.526 100516.133 ± 1436.484 ops/ms -6.4%
ArrayMismatchBenchmark.mismatchVectorByte 0.5 100000 thrpt 5 440.107 ± 0.135 460.557 ± 0.276 ops/ms 4.6%
ArrayMismatchBenchmark.mismatchVectorByte 1.0 9 thrpt 5 194751.414 ± 1218.965 196489.976 ± 70.422 ops/ms 0.9%
ArrayMismatchBenchmark.mismatchVectorByte 1.0 257 thrpt 5 68305.755 ± 102.463 71301.912 ± 214.791 ops/ms 4.4%
ArrayMismatchBenchmark.mismatchVectorByte 1.0 100000 thrpt 5 213.639 ± 0.310 212.501 ± 0.200 ops/ms -0.5%
ArrayMismatchBenchmark.mismatchVectorDouble 0.5 9 thrpt 5 184926.046 ± 1429.361 197673.463 ± 2065.066 ops/ms 6.9%
ArrayMismatchBenchmark.mismatchVectorDouble 0.5 257 thrpt 5 27664.974 ± 211.233 30272.798 ± 122.976 ops/ms 9.4%
ArrayMismatchBenchmark.mismatchVectorDouble 0.5 100000 thrpt 5 82.780 ± 0.078 72.316 ± 0.121 ops/ms -12.6%
ArrayMismatchBenchmark.mismatchVectorDouble 1.0 9 thrpt 5 133433.039 ± 23.047 138097.066 ± 321.764 ops/ms 3.5%
ArrayMismatchBenchmark.mismatchVectorDouble 1.0 257 thrpt 5 9332.847 ± 47.940 9679.395 ± 15.648 ops/ms 3.7%
ArrayMismatchBenchmark.mismatchVectorDouble 1.0 100000 thrpt 5 25.563 ± 0.010 29.525 ± 1.410 ops/ms 15.5%
ArrayMismatchBenchmark.mismatchVectorInt 0.5 9 thrpt 5 409670.146 ± 15888.302 385940.625 ± 6430.431 ops/ms -5.8%
ArrayMismatchBenchmark.mismatchVectorInt 0.5 257 thrpt 5 36565.150 ± 1295.056 39837.700 ± 82.828 ops/ms 8.9%
ArrayMismatchBenchmark.mismatchVectorInt 0.5 100000 thrpt 5 115.997 ± 0.986 112.612 ± 0.280 ops/ms -2.9%
ArrayMismatchBenchmark.mismatchVectorInt 1.0 9 thrpt 5 153095.509 ± 760.043 159605.937 ± 114.691 ops/ms 4.3%
ArrayMismatchBenchmark.mismatchVectorInt 1.0 257 thrpt 5 20747.445 ± 28.624 21301.590 ± 64.918 ops/ms 2.7%
ArrayMismatchBenchmark.mismatchVectorInt 1.0 100000 thrpt 5 52.865 ± 0.033 53.757 ± 0.134 ops/ms 1.7%
ArrayMismatchBenchmark.mismatchVectorLong 0.5 9 thrpt 5 177529.884 ± 145.103 178435.461 ± 2410.473 ops/ms 0.5%
ArrayMismatchBenchmark.mismatchVectorLong 0.5 257 thrpt 5 20538.232 ± 7.532 20563.490 ± 53.205 ops/ms 0.1%
ArrayMismatchBenchmark.mismatchVectorLong 0.5 100000 thrpt 5 50.875 ± 0.736 52.826 ± 0.058 ops/ms 3.8%
ArrayMismatchBenchmark.mismatchVectorLong 1.0 9 thrpt 5 135797.506 ± 333.638 138437.942 ± 97.186 ops/ms 1.9%
ArrayMismatchBenchmark.mismatchVectorLong 1.0 257 thrpt 5 10561.460 ± 74.946 10337.813 ± 39.726 ops/ms -2.1%
ArrayMismatchBenchmark.mismatchVectorLong 1.0 100000 thrpt 5 26.027 ± 0.020 26.224 ± 0.046 ops/ms 0.8%
I think the performance is acceptable.
Jtreg testing
- on AARCH64 Neon, I ran tier1~3.
- on AArch64 SVE, I ran the cases under the following directories
"test/hotspot/jtreg/compiler/vectorapi/"
"test/jdk/jdk/incubator/vector/"
"test/hotspot/jtreg/compiler/vectorization/"
Besides the CMOVE_I issue in TestVectorTest.java
as I mentioned before, all other test cases passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would need to run testing again and also do our performance testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regular testing passed. I am waiting performance results.
Rerunning DaCapo and Renaissance which shows some variations. |
@vnkozlov I believe |
Yes, but you have changes in general code related to Anyway, I am approving this changes based on data I got. |
@vnkozlov Ah I get it, thanks a lot for your reviews, can I merge the patch now? |
Yes, you can integrate. |
Thank everyone for your kind reviews and suggestions. |
Going to push as commit 3dfadee.
Your commit was automatically rebased without conflicts. |
@merykitty Pushed as commit 3dfadee. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This patch modifies the node generation of
VectorSupport::test
to emit aCMoveINode
, which is picked up byBoolNode::Ideal(PhaseGVN*, bool)
to connect theVectorTestNode
directly to theBoolNode
, removing the redundant operations of materialising the test result in a GP register and do aCmpI
to get back the flags. As a result,VectorMask<T>::alltrue
is compiled into machine codes:instead of:
The results of
jdk.incubator.vector.ArrayMismatchBenchmark
shows noticeable improvements:I have not been able to conduct throughout testing on AVX512 and Aarch64 so any help would be invaluable. Thank you very much.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9855/head:pull/9855
$ git checkout pull/9855
Update a local copy of the PR:
$ git checkout pull/9855
$ git pull https://git.openjdk.org/jdk pull/9855/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 9855
View PR using the GUI difftool:
$ git pr show -t 9855
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9855.diff