8340241: RISC-V: Returns mispredicted #21406

robehn · 2024-10-08T12:54:35Z

Hi, please consider.

RISC-V don't have dedicated call/ret instructions.
Instead the registers used in the jal/jalr instructions determine if this is a JUMP or CALL/RET.
The cpu have a return-address stack where it stores return addresses for prediction.
There are two possible calling conventions: x1 and x5 (or using both for co-routines).
This stack is updated according this table (from unpriv manual, 2.5.1. Unconditional Jumps) for JALR:

rd is x1/x5	rs1 is x1/x5	rd=rs1	RAS action
No	No	—	None
No	Yes	—	Pop
Yes	No	—	Push
Yes	Yes	No	Pop, then push
Yes	Yes	Yes	Push

And additionally:
"A JAL instruction should push the return address onto a return-address stack (RAS) only when rd is 'x1' or x5."

As the JDK is using x5/(t0) as main scratch all plains jumps are actually calls and calls are co-routine calls (push and pop).
This causes performance issues as the predictions is often wrong.

Average time for 10 best iterations (VF2):

Benchmark	Baseline (ms)	RAS fixed (ms)	Diff
future-genetic	22126.6	20461.8	-7.52%
akka-uct	97119.6	97498	0.39%
movie-lens	82359.3	81009.2	-1.64%
scala-doku	29246.1	24518.6	-16.16%
chi-square	10207.3	10624.9	4.09%
fj-kmeans	55127.9	56169.1	1.89%
finagle-http	24845	24891.9	0.19%
reactors	97473.9	96655.5	-0.84%
dec-tree	8322.99	8243.11	-0.96%
naive-bayes	79249.1	76851.9	-3.02%
als	52678	51245.9	-2.72%
par-mnemonics	52237.4	53149.8	1.75%
scala-kmeans	2990.88	2992.14	0.04%
philosophers	9156.9	7754.5	-15.32%
log-regression	7621.65	7540.85	-1.06%
gauss-mix	9835.7	9396.25	-4.47%
mnemonics	73087.3	69426.6	-5.01%
dotty	10970.9	10719.1	-2.30%
finagle-chirper	23386.1	23630.3	1.04%
recursive fibonacci	7338.56	5369.83	-26.83%

For some of workloads, e.g. call to small function in a loop, it really matters.

This patch blacklist x5(/t0) for JAL/JALR as we only use x1 calling convention.
And changes all jumps to use x6(/t1) instead of x5(/t0).
This patch was incrementally done, i.e. the first change removed the default t0.
I visited all places makings jumps, to make sure t1 was available.
Then changed to default t1 and removed argument in many cases.

Other approaches was tested, e.g. completely switch t0 <-> t1.
This was much harder and more intrusive as you need to do the switch completely in one go.
The use of x6(/t1) as flag register in C2 was luckily not an issue as RFLAGS is always killed when making a jump.
But please inspect this.

Note jump label was a bit more tricky. To solve that this patch defaults to only use JAL when no register is supplied, now default. We never jump to a label so far away that we need a longer range.
But please consider this carefully.

Secondly note CompiledICData was moved to x5(/t0), as x1+x6 (ra/t1) is used for the call.
Please inspect this also. (as this can go silently unnotice but causing VEP to go into runtime for IC miss)

Arguably this is a performance bug, not an enhancement.

No issues found running t1->t3 fastdebug, re-testing more to make sure.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8340241: RISC-V: Returns mispredicted (Enhancement - P4)

Reviewers

Fei Yang (@RealFYang - Reviewer)
Ludovic Henry (@luhenry - Committer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21406/head:pull/21406
$ git checkout pull/21406

Update a local copy of the PR:
$ git checkout pull/21406
$ git pull https://git.openjdk.org/jdk.git pull/21406/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21406

View PR using the GUI difftool:
$ git pr show -t 21406

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21406.diff

Webrev

Link to Webrev Comment

bridgekeeper · 2024-10-08T12:55:02Z

👋 Welcome back rehn! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2024-10-08T12:55:53Z

@robehn This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8340241: RISC-V: Returns mispredicted

Reviewed-by: fyang, luhenry

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk · 2024-10-08T12:56:18Z

@robehn The following label will be automatically applied to this pull request:

hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2024-10-09T08:10:24Z

Webrevs

src/hotspot/cpu/riscv/assembler_riscv.hpp

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp

RealFYang · 2024-10-09T13:12:39Z

Great finding. Apparently, we didn't realize such an impact of this prediction hints before. Let me try this on hardwares from other vendors to see.
BTW: Does this have anything to do with the use of x6(/t1) as flag register in C2? Seems to me it's all about encoding of x1 and x5 in jalr.

robehn · 2024-10-09T15:12:12Z

Great finding. Apparently, we didn't realize such an impact of this prediction hints before. Let me try this on hardwares from other vendors to see. BTW: Does this have anything to do with the use of x6(/t1) as flag register in C2? Seems to me it's all about encoding of x1 and x5 in jalr.

Thanks! The issue in C2 is that you now know need to kill CR if your code in any scenario may execute a JALR (assuming the code do return), and that is not obvious.

RealFYang

Hi, I witnessed performance improvement on other vendor's hardware too. Minor comments after a cursory look. Will take a more closer look. Thanks.

src/hotspot/cpu/riscv/templateTable_riscv.cpp

src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp

src/hotspot/cpu/riscv/c1_CodeStubs_riscv.cpp

RealFYang · 2024-10-11T11:14:14Z

Great finding. Apparently, we didn't realize such an impact of this prediction hints before. Let me try this on hardwares from other vendors to see. BTW: Does this have anything to do with the use of x6(/t1) as flag register in C2? Seems to me it's all about encoding of x1 and x5 in jalr.

Thanks! The issue in C2 is that you now know need to kill CR if your code in any scenario may execute a JALR (assuming the code do return), and that is not obvious.

Ah. Now I see what you mean. Thanks.

robehn · 2024-10-11T11:29:58Z

Hi, I witnessed performance improvement on other vendor's hardware too. Minor comments after a cursory look. Will take a more closer look. Thanks.

Awesome, thanks !

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp

src/hotspot/cpu/riscv/gc/z/zBarrierSetAssembler_riscv.cpp

src/hotspot/cpu/riscv/c1_CodeStubs_riscv.cpp

src/hotspot/cpu/riscv/templateTable_riscv.cpp

src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp

src/hotspot/cpu/riscv/c1_LIRAssembler_riscv.cpp

src/hotspot/cpu/riscv/riscv.ad

src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp

src/hotspot/cpu/riscv/vtableStubs_riscv.cpp

RealFYang

Thanks for the update. Seems that we missed the jump_link in MacroAssembler::trampoline_call [1]? I also witnessed another place where we missed killing the rflags after this change. See comment for details.

[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L4272

(PS: Ignore this as I just noticed that it's a jal instead of jalr by the jump_link)

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp

robehn · 2024-10-16T12:21:14Z

Thanks for the update. Seems that we missed the jump_link in MacroAssembler::trampoline_call [1]? I also witnessed another place where we missed killing the rflags after this change. See comment for details.

[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L4272

(PS: Ignore this as I just noticed that it's a jal instead of jalr by the jump_link)

For C2 calls, i.e. when compiler is doing a deliberate call, RFLAG is SOC.
Which means calls do not need to kill CR, it is implicitly killed AFIACT.
C2 should have saved RFLAG register before doing the call if it needs that value after the call.

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp

openjdk · 2024-10-17T08:12:00Z

@robehn this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout remove_t0
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

RealFYang · 2024-10-17T14:06:13Z

Thanks for the update. Hopefully, I think I can finish first round of review tomorrow. BTW: It will be good to know how this may affect other benchmark workloads, like specjbb2015, etc.

robehn · 2024-10-17T14:37:29Z

Thanks for the update. Hopefully, I think I can finish first round of review tomorrow. BTW: It will be good to know how this may affect other benchmark workloads, like specjbb2015, etc.

Thank you for the in-depth review! Maybe @Hamlin-Li can take it for spin?

RealFYang

Latest version LGTM. Thanks for fixing this!

robehn · 2024-10-21T06:55:00Z

@RealFYang
I programmically looked for t1 uses in Nodes.
The only ones I found missing cr (set or kill) was ForwardExceptionjmp and RethrowException.
But compiler don't expect cr to survival these i.e. x86 do not kill cr in those cases.
Note, maybe some Node(s) was never created, so this is not 100% complete check, but 99%.

@luhenry thanks!

RealFYang · 2024-10-21T07:25:53Z

@RealFYang I programmically looked for t1 uses in Nodes. The only ones I found missing cr (set or kill) was ForwardExceptionjmp and RethrowException. But compiler don't expect cr to survival these i.e. x86 do not kill cr in those cases. Note, maybe some Node(s) was never created, so this is not 100% complete check, but 99%.

Hi, thanks for doing the check. Yeah, I think we should be safe to go.
(And I also created PR fixing similar issue for riscv-specific changes in the loom repo: openjdk/loom#215)

robehn · 2024-10-21T13:14:40Z

/integrate

openjdk · 2024-10-21T13:15:56Z

Going to push as commit 66ddaaa.
Since your change was applied there have been 8 commits pushed to the master branch:

07f550b: 8340818: Add a new jtreg test root to test the generated documentation
27ef6c9: 8341470: BigDecimal.stripTrailingZeros() optimization
5d5d88a: 8339570: Add Tidy build support for JDK tests
239d84a: 8342578: GHA: RISC-V: Bootstrap using Debian snapshot is still failing
aa060f2: 8342334: CDS: Scratch mirrors should not point to dead klasses
680dc5d: 8342496: C2/Shenandoah: SEGV in compiled code when running jcstress
8f2b23b: 8341407: C2: assert(main_limit == cl->limit() || get_ctrl(main_limit) == new_limit_ctrl) failed: wrong control for added limit
21682bc: 8342612: Increase memory usage of compiler/c2/TestScalarReplacementMaxLiveNodes.java

Your commit was automatically rebased without conflicts.

openjdk · 2024-10-21T13:16:05Z

@robehn Pushed as commit 66ddaaa.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Draft

Loading
Loading status checks…

d81501a

openjdk bot added the hotspot label Oct 8, 2024

robehn marked this pull request as ready for review October 9, 2024 08:05

openjdk bot added the rfr label Oct 9, 2024

luhenry reviewed Oct 9, 2024

View reviewed changes

RealFYang reviewed Oct 11, 2024

View reviewed changes

src/hotspot/cpu/riscv/templateTable_riscv.cpp Show resolved Hide resolved

src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp Outdated Show resolved Hide resolved

src/hotspot/cpu/riscv/c1_CodeStubs_riscv.cpp Outdated Show resolved Hide resolved

RealFYang mentioned this pull request Oct 14, 2024

8342014: RISC-V: ZStoreBarrierStubC2 clobbers rflags #21485

Closed

4 tasks

RealFYang reviewed Oct 14, 2024

View reviewed changes

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved

src/hotspot/cpu/riscv/gc/z/zBarrierSetAssembler_riscv.cpp Show resolved Hide resolved

robehn added 7 commits October 14, 2024 12:09

Fixed return should be done with RA by using RET mnemonic

1bfd40c

No need for explicit use of default t1

c8d8fdc

Use x9, comment update

1bab811

Updated asserts

b60789e

Updated assert

135686b

Another one removing explicit use of default t1

f5783d9

Merge branch 'master' into remove_t0

4e974db

RealFYang reviewed Oct 14, 2024

View reviewed changes

robehn added 4 commits October 14, 2024 15:29

Fixed no explicit use of default t1

d67b2d0

Upstream comment

1ef98a2

Revert clinit_barrier t1

59382a8

Fixed no explicit use of default t1

ec28fe3

RealFYang reviewed Oct 15, 2024

View reviewed changes

src/hotspot/cpu/riscv/c1_LIRAssembler_riscv.cpp Outdated Show resolved Hide resolved

src/hotspot/cpu/riscv/riscv.ad Outdated Show resolved Hide resolved

RealFYang reviewed Oct 15, 2024

View reviewed changes

src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp Outdated Show resolved Hide resolved

src/hotspot/cpu/riscv/vtableStubs_riscv.cpp Outdated Show resolved Hide resolved

Merge branch 'master' into remove_t0

52b02cb

RealFYang reviewed Oct 16, 2024

View reviewed changes

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp Show resolved Hide resolved

RealFYang reviewed Oct 17, 2024

View reviewed changes

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp Show resolved Hide resolved

RealFYang reviewed Oct 17, 2024

View reviewed changes

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp Show resolved Hide resolved

KILL cr

030f45b

openjdk bot added the merge-conflict label Oct 17, 2024

Merge branch 'master' into remove_t0

14d77ae

openjdk bot removed the merge-conflict label Oct 17, 2024

Updated UEP pretty print with t1

f185683

RealFYang mentioned this pull request Oct 18, 2024

8342579: RISC-V: C2: Cleanup effect of killing flag register for call instructs #21576

Closed

4 tasks

RealFYang approved these changes Oct 18, 2024

View reviewed changes

openjdk bot added the ready label Oct 18, 2024

luhenry approved these changes Oct 18, 2024

View reviewed changes

RealFYang mentioned this pull request Oct 21, 2024

RISC-V: Avoid return misprediction openjdk/loom#215

Closed

3 tasks

Merge branch 'master' into remove_t0

cf3fed8

openjdk bot added the integrated label Oct 21, 2024

openjdk bot closed this Oct 21, 2024

openjdk bot removed ready rfr labels Oct 21, 2024

RealFYang mentioned this pull request Oct 23, 2024

8342882: RISC-V: Unify handling of jumps to runtime #21661

Closed

5 tasks

robehn deleted the remove_t0 branch November 27, 2024 11:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8340241: RISC-V: Returns mispredicted #21406

8340241: RISC-V: Returns mispredicted #21406

robehn commented Oct 8, 2024 •

edited by openjdk bot

Loading

bridgekeeper bot commented Oct 8, 2024

openjdk bot commented Oct 8, 2024 •

edited

Loading

openjdk bot commented Oct 8, 2024

mlbridge bot commented Oct 9, 2024 •

edited

Loading

RealFYang commented Oct 9, 2024

robehn commented Oct 9, 2024

RealFYang left a comment •

edited

Loading

RealFYang commented Oct 11, 2024

robehn commented Oct 11, 2024

RealFYang left a comment •

edited

Loading

robehn commented Oct 16, 2024

openjdk bot commented Oct 17, 2024

RealFYang commented Oct 17, 2024

robehn commented Oct 17, 2024

RealFYang left a comment

robehn commented Oct 21, 2024

RealFYang commented Oct 21, 2024 •

edited

Loading

robehn commented Oct 21, 2024

openjdk bot commented Oct 21, 2024

openjdk bot commented Oct 21, 2024

8340241: RISC-V: Returns mispredicted #21406

8340241: RISC-V: Returns mispredicted #21406

Conversation

robehn commented Oct 8, 2024 • edited by openjdk bot Loading

Progress

Issue

Reviewers

Reviewing

Webrev

bridgekeeper bot commented Oct 8, 2024

openjdk bot commented Oct 8, 2024 • edited Loading

openjdk bot commented Oct 8, 2024

mlbridge bot commented Oct 9, 2024 • edited Loading

Webrevs

RealFYang commented Oct 9, 2024

robehn commented Oct 9, 2024

RealFYang left a comment • edited Loading

Choose a reason for hiding this comment

RealFYang commented Oct 11, 2024

robehn commented Oct 11, 2024

RealFYang left a comment • edited Loading

Choose a reason for hiding this comment

robehn commented Oct 16, 2024

openjdk bot commented Oct 17, 2024

RealFYang commented Oct 17, 2024

robehn commented Oct 17, 2024

RealFYang left a comment

Choose a reason for hiding this comment

robehn commented Oct 21, 2024

RealFYang commented Oct 21, 2024 • edited Loading

robehn commented Oct 21, 2024

openjdk bot commented Oct 21, 2024

openjdk bot commented Oct 21, 2024

robehn commented Oct 8, 2024 •

edited by openjdk bot

Loading

openjdk bot commented Oct 8, 2024 •

edited

Loading

mlbridge bot commented Oct 9, 2024 •

edited

Loading

RealFYang left a comment •

edited

Loading

RealFYang left a comment •

edited

Loading

RealFYang commented Oct 21, 2024 •

edited

Loading