Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8345298: RISC-V: Add riscv backend for Float16 operations - scalar #23844

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

Hamlin-Li
Copy link

@Hamlin-Li Hamlin-Li commented Feb 28, 2025

Hi,
Can you help to review this patch?
It's an implementation of #22754 on riscv.

Performance

data

Benchmark (vectorDim) Mode Cnt Score -master Error Score - patch Error Units Improvement (master/patch)
Float16OperationsBenchmark.absBenchmark 256 avgt 10 219.564 0.076 219.597 0.081 ns/op 1
Float16OperationsBenchmark.absBenchmark 512 avgt 10 358.873 0.575 355.011 0.07 ns/op 1.011
Float16OperationsBenchmark.absBenchmark 1024 avgt 10 582.361 0.189 581.832 0.006 ns/op 1.001
Float16OperationsBenchmark.absBenchmark 2048 avgt 10 1035.633 0.239 1034.854 0.284 ns/op 1.001
Float16OperationsBenchmark.addBenchmark 256 avgt 10 4951.702 0.194 2593.835 0.066 ns/op 1.909
Float16OperationsBenchmark.addBenchmark 512 avgt 10 9867.909 0.314 5167.568 0.162 ns/op 1.91
Float16OperationsBenchmark.addBenchmark 1024 avgt 10 21324.318 1.651 10016.456 1.07 ns/op 2.129
Float16OperationsBenchmark.addBenchmark 2048 avgt 10 42618.969 3.877 19985.662 1.233 ns/op 2.132
Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 256 avgt 10 2811.45 0.441 2701.419 140.699 ns/op 1.041
Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 512 avgt 10 5568.561 0.654 5577.598 1.123 ns/op 0.998
Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 avgt 10 11109.108 1.7 11095.644 0.644 ns/op 1.001
Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 2048 avgt 10 20017.095 0.778 21560.165 0.515 ns/op 0.928
Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 256 avgt 10 20864.303 23.768 1345.192 0.274 ns/op 15.51
Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 512 avgt 10 43596.262 102.075 2580.035 0.397 ns/op 16.898
Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 avgt 10 91565.818 250.761 5191.12 64.598 ns/op 17.639
Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 2048 avgt 10 185679.064 617.369 10343 20.025 ns/op 17.952
Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 256 avgt 10 23473.486 0.498 1856.59 0.344 ns/op 12.643
Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 512 avgt 10 47056.983 1.335 3833.72 2.487 ns/op 12.274
Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 avgt 10 94193.06 2.748 7054.642 1.243 ns/op 13.352
Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 2048 avgt 10 188399.012 376.598 13982.963 1.533 ns/op 13.473
Float16OperationsBenchmark.divBenchmark 256 avgt 10 6865.998 1.289 4029.496 1.093 ns/op 1.704
Float16OperationsBenchmark.divBenchmark 512 avgt 10 13908.829 32.138 8034.805 0.133 ns/op 1.731
Float16OperationsBenchmark.divBenchmark 1024 avgt 10 28975.886 0.595 15674.755 5.269 ns/op 1.849
Float16OperationsBenchmark.divBenchmark 2048 avgt 10 57924.129 4.542 31420.095 96.01 ns/op 1.844
Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 256 avgt 10 3301.349 1.798 3150.946 0.722 ns/op 1.048
Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 512 avgt 10 6325.897 3.003 6252.479 0.15 ns/op 1.012
Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 avgt 10 12691.852 3.678 12439.328 0.78 ns/op 1.02
Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 2048 avgt 10 25219.846 3.786 24731.585 151.057 ns/op 1.02
Float16OperationsBenchmark.euclideanDistanceFP16 256 avgt 10 11596.839 8.701 1119.235 0.033 ns/op 10.361
Float16OperationsBenchmark.euclideanDistanceFP16 512 avgt 10 23342.733 4.513 2131.066 0.231 ns/op 10.954
Float16OperationsBenchmark.euclideanDistanceFP16 1024 avgt 10 46112.17 44.496 4152.953 0.146 ns/op 11.103
Float16OperationsBenchmark.euclideanDistanceFP16 2048 avgt 10 93860.737 107.312 8198.33 1.291 ns/op 11.449
Float16OperationsBenchmark.fmaBenchmark 256 avgt 10 75400.936 60.886 2681.222 0.111 ns/op 28.122
Float16OperationsBenchmark.fmaBenchmark 512 avgt 10 139251.977 619.017 3703.767 0.471 ns/op 37.597
Float16OperationsBenchmark.fmaBenchmark 1024 avgt 10 267586.32 789.884 7286.745 0.152 ns/op 36.722
Float16OperationsBenchmark.fmaBenchmark 2048 avgt 10 509900.498 656.287 14470.574 2.257 ns/op 35.237
Float16OperationsBenchmark.getExponentBenchmark 256 avgt 10 684.578 0.486 683.204 0.101 ns/op 1.002
Float16OperationsBenchmark.getExponentBenchmark 512 avgt 10 1325.275 1.96 1323.645 0.92 ns/op 1.001
Float16OperationsBenchmark.getExponentBenchmark 1024 avgt 10 2641.872 15.528 2605.375 0.261 ns/op 1.014
Float16OperationsBenchmark.getExponentBenchmark 2048 avgt 10 5028.168 1.879 5023.716 2.348 ns/op 1.001
Float16OperationsBenchmark.isFiniteBenchmark 256 avgt 10 612.872 0.68 613.996 1.142 ns/op 0.998
Float16OperationsBenchmark.isFiniteBenchmark 512 avgt 10 1205.306 2.812 1195.51 4.283 ns/op 1.008
Float16OperationsBenchmark.isFiniteBenchmark 1024 avgt 10 2335.283 17.772 2332.876 0.349 ns/op 1.001
Float16OperationsBenchmark.isFiniteBenchmark 2048 avgt 10 4697.091 10.683 5051.694 39.746 ns/op 0.93
Float16OperationsBenchmark.isFiniteCMovBenchmark 256 avgt 10 914.27 0.704 914.195 0.202 ns/op 1
Float16OperationsBenchmark.isFiniteCMovBenchmark 512 avgt 10 1844.587 6.638 1814.631 5.204 ns/op 1.017
Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 avgt 10 3595.557 0.453 3597.302 0.594 ns/op 1
Float16OperationsBenchmark.isFiniteCMovBenchmark 2048 avgt 10 7157.07 0.673 7155.312 0.37 ns/op 1
Float16OperationsBenchmark.isFiniteStoreBenchmark 256 avgt 10 677.109 0.214 677.146 0.249 ns/op 1
Float16OperationsBenchmark.isFiniteStoreBenchmark 512 avgt 10 1336.623 1.292 1336.019 0.067 ns/op 1
Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 avgt 10 2640.22 10.33 2637.142 0.464 ns/op 1.001
Float16OperationsBenchmark.isFiniteStoreBenchmark 2048 avgt 10 5249.69 1.726 5255.893 23.545 ns/op 0.999
Float16OperationsBenchmark.isInfiniteBenchmark 256 avgt 10 728.603 1.596 693.678 1.057 ns/op 1.05
Float16OperationsBenchmark.isInfiniteBenchmark 512 avgt 10 1348.349 0.21 1297.384 1.692 ns/op 1.039
Float16OperationsBenchmark.isInfiniteBenchmark 1024 avgt 10 2610.138 0.492 2518.27 5.889 ns/op 1.036
Float16OperationsBenchmark.isInfiniteBenchmark 2048 avgt 10 5132.662 0.833 4908.692 3.115 ns/op 1.046
Float16OperationsBenchmark.isInfiniteCMovBenchmark 256 avgt 10 915.357 0.228 915.45 0.464 ns/op 1
Float16OperationsBenchmark.isInfiniteCMovBenchmark 512 avgt 10 1769.662 0.53 1767.386 0.119 ns/op 1.001
Float16OperationsBenchmark.isInfiniteCMovBenchmark 1024 avgt 10 3470.309 0.284 3472.84 1.01 ns/op 0.999
Float16OperationsBenchmark.isInfiniteCMovBenchmark 2048 avgt 10 6932.564 3.019 6932.776 2.647 ns/op 1
Float16OperationsBenchmark.isInfiniteStoreBenchmark 256 avgt 10 730.842 0.085 730.841 0.308 ns/op 1
Float16OperationsBenchmark.isInfiniteStoreBenchmark 512 avgt 10 1434.546 0.066 1434.594 0.344 ns/op 1
Float16OperationsBenchmark.isInfiniteStoreBenchmark 1024 avgt 10 2321.078 0.048 2320.688 1.318 ns/op 1
Float16OperationsBenchmark.isInfiniteStoreBenchmark 2048 avgt 10 4465.404 3.87 4463.767 1.083 ns/op 1
Float16OperationsBenchmark.isNaNBenchmark 256 avgt 10 636.582 0.118 637.523 0.012 ns/op 0.999
Float16OperationsBenchmark.isNaNBenchmark 512 avgt 10 1252.001 0.199 1251.084 0.227 ns/op 1.001
Float16OperationsBenchmark.isNaNBenchmark 1024 avgt 10 2477.057 0.761 2476.489 0.378 ns/op 1
Float16OperationsBenchmark.isNaNBenchmark 2048 avgt 10 4923.221 0.192 4920.501 0.247 ns/op 1.001
Float16OperationsBenchmark.isNaNCMovBenchmark 256 avgt 10 689.366 2.118 680.408 0.073 ns/op 1.013
Float16OperationsBenchmark.isNaNCMovBenchmark 512 avgt 10 1340.416 0.228 1341.398 0.411 ns/op 0.999
Float16OperationsBenchmark.isNaNCMovBenchmark 1024 avgt 10 2651.651 0.392 2649.589 0.475 ns/op 1.001
Float16OperationsBenchmark.isNaNCMovBenchmark 2048 avgt 10 5261.672 0.906 5312.504 7.238 ns/op 0.99
Float16OperationsBenchmark.isNaNStoreBenchmark 256 avgt 10 642.548 0.288 640.636 0.128 ns/op 1.003
Float16OperationsBenchmark.isNaNStoreBenchmark 512 avgt 10 1258.312 0.368 1258.208 1.662 ns/op 1
Float16OperationsBenchmark.isNaNStoreBenchmark 1024 avgt 10 2477.097 0.654 2476.883 0.623 ns/op 1
Float16OperationsBenchmark.isNaNStoreBenchmark 2048 avgt 10 4932.261 2.213 4933.748 6.211 ns/op 1
Float16OperationsBenchmark.maxBenchmark 256 avgt 10 6262.318 0.184 3737.469 0.442 ns/op 1.676
Float16OperationsBenchmark.maxBenchmark 512 avgt 10 13346.887 9.165 7427.994 0.388 ns/op 1.797
Float16OperationsBenchmark.maxBenchmark 1024 avgt 10 26453.036 4.278 14793.968 2.125 ns/op 1.788
Float16OperationsBenchmark.maxBenchmark 2048 avgt 10 52887.135 1.596 29524.416 5.099 ns/op 1.791
Float16OperationsBenchmark.minBenchmark 256 avgt 10 6294.667 24.257 3764.081 2.646 ns/op 1.672
Float16OperationsBenchmark.minBenchmark 512 avgt 10 13246.531 2.183 7426.716 0.868 ns/op 1.784
Float16OperationsBenchmark.minBenchmark 1024 avgt 10 26772.888 457.212 14792.145 0.623 ns/op 1.81
Float16OperationsBenchmark.minBenchmark 2048 avgt 10 53933.3 820.653 29521.248 0.755 ns/op 1.827
Float16OperationsBenchmark.mulBenchmark 256 avgt 10 4953.512 1.465 2596.331 0.089 ns/op 1.908
Float16OperationsBenchmark.mulBenchmark 512 avgt 10 9905.903 51.296 5168.052 0.773 ns/op 1.917
Float16OperationsBenchmark.mulBenchmark 1024 avgt 10 21513.646 31.823 10015.796 0.14 ns/op 2.148
Float16OperationsBenchmark.mulBenchmark 2048 avgt 10 42641.223 1.12 19990.315 0.805 ns/op 2.133
Float16OperationsBenchmark.negateBenchmark 256 avgt 10 218.717 0.13 218.341 0.006 ns/op 1.002
Float16OperationsBenchmark.negateBenchmark 512 avgt 10 352.992 0.189 356.34 0.384 ns/op 0.991
Float16OperationsBenchmark.negateBenchmark 1024 avgt 10 581.243 3.774 581.086 0.08 ns/op 1
Float16OperationsBenchmark.negateBenchmark 2048 avgt 10 1034.852 0.629 1034.441 0.462 ns/op 1
Float16OperationsBenchmark.sqrtBenchmark 256 avgt 10 49892.244 115.282 3807.597 0.089 ns/op 13.103
Float16OperationsBenchmark.sqrtBenchmark 512 avgt 10 30012.543 14.094 7678.139 1.214 ns/op 3.909
Float16OperationsBenchmark.sqrtBenchmark 1024 avgt 10 59690.622 23.593 15406.483 0.357 ns/op 3.874
Float16OperationsBenchmark.sqrtBenchmark 2048 avgt 10 118895.955 19.41 30877.922 0.35 ns/op 3.851
Float16OperationsBenchmark.subBenchmark 256 avgt 10 4991.926 6.009 2593.842 0.047 ns/op 1.925
Float16OperationsBenchmark.subBenchmark 512 avgt 10 9871.741 4.777 5165.983 0.325 ns/op 1.911
Float16OperationsBenchmark.subBenchmark 1024 avgt 10 21324.947 2.31 10015.511 0.23 ns/op 2.129
Float16OperationsBenchmark.subBenchmark 2048 avgt 10 42604.606 2.182 19992.064 0.818 ns/op 2.131

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8345298: RISC-V: Add riscv backend for Float16 operations - scalar (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/23844/head:pull/23844
$ git checkout pull/23844

Update a local copy of the PR:
$ git checkout pull/23844
$ git pull https://git.openjdk.org/jdk.git pull/23844/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 23844

View PR using the GUI difftool:
$ git pr show -t 23844

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/23844.diff

Using Webrev

Link to Webrev Comment

Sorry, something went wrong.

hamlin added 3 commits February 27, 2025 17:39
@bridgekeeper
Copy link

bridgekeeper bot commented Feb 28, 2025

👋 Welcome back mli! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Feb 28, 2025

@Hamlin-Li This change is no longer ready for integration - check the PR body for details.

@openjdk
Copy link

openjdk bot commented Feb 28, 2025

@Hamlin-Li this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout float16-scalar
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch rfr Pull request is ready for review labels Feb 28, 2025
@openjdk
Copy link

openjdk bot commented Feb 28, 2025

@Hamlin-Li The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Feb 28, 2025
@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Feb 28, 2025
@mlbridge
Copy link

mlbridge bot commented Feb 28, 2025

Webrevs

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Thanks for working on this part. Some initial comments after a cursory look.

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Mar 5, 2025
hamlin added 2 commits March 5, 2025 09:25
@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Mar 5, 2025
hamlin added 2 commits March 6, 2025 14:22
@Hamlin-Li
Copy link
Author

BTW, I just removed TestFloat16VectorConvChain test for riscv temporariely, as with this patch the ConvF2HF(AddF(ConvHF2F, ConvHF2F)) will be replaced by a ReinterpretHF2S(AddHF(ReinterpretS2HF, ReinterpretS2HF)).

In the short future, when we support vectorization of float16 operations, we can enable the test, but for riscv it will still be different form of Nodes, could be something like VectorReinterpretHF2S(VectorAddHF(VectorReinterpretS2HF, VectorReinterpretS2HF)).

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several more comments after a closer look.

@RealFYang
Copy link
Member

RealFYang commented Mar 7, 2025

BTW, I just removed TestFloat16VectorConvChain test for riscv temporariely, as with this patch the ConvF2HF(AddF(ConvHF2F, ConvHF2F)) will be replaced by a ReinterpretHF2S(AddHF(ReinterpretS2HF, ReinterpretS2HF)).

Maybe we should also update the @requires of the test at the same time?
Currently, it says | (os.arch == "riscv64" & vm.cpu.features ~= ".*zvfh.*").
Maybe we change zvfh into zfh?

hamlin added 2 commits March 7, 2025 11:33
@Hamlin-Li
Copy link
Author

Maybe we should also update the @requires of the test at the same time? Currently, it says | (os.arch == "riscv64" & vm.cpu.features ~= ".*zvfh.*"). Maybe we change zvfh into zfh?

No, as this test is for "vector conversion chain", only support of zfh should not trigger the test. BTW, scalar tests are in other test files.
In the future, when we support vectorization the IR verification test should be enabled again, but it still depends on zvfh rather than zfh.
Hope this answer your question?

@RealFYang
Copy link
Member

Maybe we should also update the @requires of the test at the same time? Currently, it says | (os.arch == "riscv64" & vm.cpu.features ~= ".*zvfh.*"). Maybe we change zvfh into zfh?

No, as this test is for "vector conversion chain", only support of zfh should not trigger the test. BTW, scalar tests are in other test files. In the future, when we support vectorization the IR verification test should be enabled again, but it still depends on zvfh rather than zfh. Hope this answer your question?

That makes sense to me. Thanks.

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. There minor comments remain. Seems fine to me otherwise.

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Thanks for the updates.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 11, 2025
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Mar 11, 2025
Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still good to me.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 11, 2025
@Hamlin-Li
Copy link
Author

Still good to me.

Thank you!

Copy link
Contributor

@robehn robehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems alright, thanks!

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

None yet

3 participants