Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8340241: RISC-V: Returns mispredicted #21406

Closed
wants to merge 22 commits into from
Closed

Conversation

robehn
Copy link
Contributor

@robehn robehn commented Oct 8, 2024

Hi, please consider.

RISC-V don't have dedicated call/ret instructions.
Instead the registers used in the jal/jalr instructions determine if this is a JUMP or CALL/RET.
The cpu have a return-address stack where it stores return addresses for prediction.
There are two possible calling conventions: x1 and x5 (or using both for co-routines).
This stack is updated according this table (from unpriv manual, 2.5.1. Unconditional Jumps) for JALR:

rd is x1/x5 rs1 is x1/x5 rd=rs1 RAS action
No No None
No Yes Pop
Yes No Push
Yes Yes No Pop, then push
Yes Yes Yes Push

And additionally:
"A JAL instruction should push the return address onto a return-address stack (RAS) only when rd is 'x1' or x5."

As the JDK is using x5/(t0) as main scratch all plains jumps are actually calls and calls are co-routine calls (push and pop).
This causes performance issues as the predictions is often wrong.

Average time for 10 best iterations (VF2):

Benchmark Baseline (ms) RAS fixed (ms) Diff
future-genetic 22126.6 20461.8 -7.52%
akka-uct 97119.6 97498 0.39%
movie-lens 82359.3 81009.2 -1.64%
scala-doku 29246.1 24518.6 -16.16%
chi-square 10207.3 10624.9 4.09%
fj-kmeans 55127.9 56169.1 1.89%
finagle-http 24845 24891.9 0.19%
reactors 97473.9 96655.5 -0.84%
dec-tree 8322.99 8243.11 -0.96%
naive-bayes 79249.1 76851.9 -3.02%
als 52678 51245.9 -2.72%
par-mnemonics 52237.4 53149.8 1.75%
scala-kmeans 2990.88 2992.14 0.04%
philosophers 9156.9 7754.5 -15.32%
log-regression 7621.65 7540.85 -1.06%
gauss-mix 9835.7 9396.25 -4.47%
mnemonics 73087.3 69426.6 -5.01%
dotty 10970.9 10719.1 -2.30%
finagle-chirper 23386.1 23630.3 1.04%
recursive fibonacci 7338.56 5369.83 -26.83%

For some of workloads, e.g. call to small function in a loop, it really matters.

This patch blacklist x5(/t0) for JAL/JALR as we only use x1 calling convention.
And changes all jumps to use x6(/t1) instead of x5(/t0).
This patch was incrementally done, i.e. the first change removed the default t0.
I visited all places makings jumps, to make sure t1 was available.
Then changed to default t1 and removed argument in many cases.

Other approaches was tested, e.g. completely switch t0 <-> t1.
This was much harder and more intrusive as you need to do the switch completely in one go.
The use of x6(/t1) as flag register in C2 was luckily not an issue as RFLAGS is always killed when making a jump.
But please inspect this.

Note jump label was a bit more tricky. To solve that this patch defaults to only use JAL when no register is supplied, now default. We never jump to a label so far away that we need a longer range.
But please consider this carefully.

Secondly note CompiledICData was moved to x5(/t0), as x1+x6 (ra/t1) is used for the call.
Please inspect this also. (as this can go silently unnotice but causing VEP to go into runtime for IC miss)

Arguably this is a performance bug, not an enhancement.

No issues found running t1->t3 fastdebug, re-testing more to make sure.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8340241: RISC-V: Returns mispredicted (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21406/head:pull/21406
$ git checkout pull/21406

Update a local copy of the PR:
$ git checkout pull/21406
$ git pull https://git.openjdk.org/jdk.git pull/21406/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21406

View PR using the GUI difftool:
$ git pr show -t 21406

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21406.diff

Webrev

Link to Webrev Comment

Sorry, something went wrong.

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 8, 2024

👋 Welcome back rehn! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 8, 2024

@robehn This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8340241: RISC-V: Returns mispredicted

Reviewed-by: fyang, luhenry

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Oct 8, 2024

@robehn The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Oct 8, 2024
@robehn robehn marked this pull request as ready for review October 9, 2024 08:05
@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 9, 2024
@mlbridge
Copy link

mlbridge bot commented Oct 9, 2024

@RealFYang
Copy link
Member

Great finding. Apparently, we didn't realize such an impact of this prediction hints before. Let me try this on hardwares from other vendors to see.
BTW: Does this have anything to do with the use of x6(/t1) as flag register in C2? Seems to me it's all about encoding of x1 and x5 in jalr.

@robehn
Copy link
Contributor Author

robehn commented Oct 9, 2024

Great finding. Apparently, we didn't realize such an impact of this prediction hints before. Let me try this on hardwares from other vendors to see. BTW: Does this have anything to do with the use of x6(/t1) as flag register in C2? Seems to me it's all about encoding of x1 and x5 in jalr.

Thanks! The issue in C2 is that you now know need to kill CR if your code in any scenario may execute a JALR (assuming the code do return), and that is not obvious.

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I witnessed performance improvement on other vendor's hardware too. Minor comments after a cursory look. Will take a more closer look. Thanks.

@RealFYang
Copy link
Member

Great finding. Apparently, we didn't realize such an impact of this prediction hints before. Let me try this on hardwares from other vendors to see. BTW: Does this have anything to do with the use of x6(/t1) as flag register in C2? Seems to me it's all about encoding of x1 and x5 in jalr.

Thanks! The issue in C2 is that you now know need to kill CR if your code in any scenario may execute a JALR (assuming the code do return), and that is not obvious.

Ah. Now I see what you mean. Thanks.

@robehn
Copy link
Contributor Author

robehn commented Oct 11, 2024

Hi, I witnessed performance improvement on other vendor's hardware too. Minor comments after a cursory look. Will take a more closer look. Thanks.

Awesome, thanks !

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Seems that we missed the jump_link in MacroAssembler::trampoline_call [1]? I also witnessed another place where we missed killing the rflags after this change. See comment for details.

[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L4272

(PS: Ignore this as I just noticed that it's a jal instead of jalr by the jump_link)

@robehn
Copy link
Contributor Author

robehn commented Oct 16, 2024

Thanks for the update. Seems that we missed the jump_link in MacroAssembler::trampoline_call [1]? I also witnessed another place where we missed killing the rflags after this change. See comment for details.

[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L4272

(PS: Ignore this as I just noticed that it's a jal instead of jalr by the jump_link)

For C2 calls, i.e. when compiler is doing a deliberate call, RFLAG is SOC.
Which means calls do not need to kill CR, it is implicitly killed AFIACT.
C2 should have saved RFLAG register before doing the call if it needs that value after the call.

@openjdk
Copy link

openjdk bot commented Oct 17, 2024

@robehn this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout remove_t0
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Oct 17, 2024
@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Oct 17, 2024
@RealFYang
Copy link
Member

Thanks for the update. Hopefully, I think I can finish first round of review tomorrow. BTW: It will be good to know how this may affect other benchmark workloads, like specjbb2015, etc.

@robehn
Copy link
Contributor Author

robehn commented Oct 17, 2024

Thanks for the update. Hopefully, I think I can finish first round of review tomorrow. BTW: It will be good to know how this may affect other benchmark workloads, like specjbb2015, etc.

Thank you for the in-depth review! Maybe @Hamlin-Li can take it for spin?

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest version LGTM. Thanks for fixing this!

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 18, 2024
@robehn
Copy link
Contributor Author

robehn commented Oct 21, 2024

@RealFYang
I programmically looked for t1 uses in Nodes.
The only ones I found missing cr (set or kill) was ForwardExceptionjmp and RethrowException.
But compiler don't expect cr to survival these i.e. x86 do not kill cr in those cases.
Note, maybe some Node(s) was never created, so this is not 100% complete check, but 99%.

@luhenry thanks!

@RealFYang
Copy link
Member

RealFYang commented Oct 21, 2024

@RealFYang I programmically looked for t1 uses in Nodes. The only ones I found missing cr (set or kill) was ForwardExceptionjmp and RethrowException. But compiler don't expect cr to survival these i.e. x86 do not kill cr in those cases. Note, maybe some Node(s) was never created, so this is not 100% complete check, but 99%.

Hi, thanks for doing the check. Yeah, I think we should be safe to go.
(And I also created PR fixing similar issue for riscv-specific changes in the loom repo: openjdk/loom#215)

@robehn
Copy link
Contributor Author

robehn commented Oct 21, 2024

/integrate

@openjdk
Copy link

openjdk bot commented Oct 21, 2024

Going to push as commit 66ddaaa.
Since your change was applied there have been 8 commits pushed to the master branch:

  • 07f550b: 8340818: Add a new jtreg test root to test the generated documentation
  • 27ef6c9: 8341470: BigDecimal.stripTrailingZeros() optimization
  • 5d5d88a: 8339570: Add Tidy build support for JDK tests
  • 239d84a: 8342578: GHA: RISC-V: Bootstrap using Debian snapshot is still failing
  • aa060f2: 8342334: CDS: Scratch mirrors should not point to dead klasses
  • 680dc5d: 8342496: C2/Shenandoah: SEGV in compiled code when running jcstress
  • 8f2b23b: 8341407: C2: assert(main_limit == cl->limit() || get_ctrl(main_limit) == new_limit_ctrl) failed: wrong control for added limit
  • 21682bc: 8342612: Increase memory usage of compiler/c2/TestScalarReplacementMaxLiveNodes.java

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 21, 2024
@openjdk openjdk bot closed this Oct 21, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 21, 2024
@openjdk
Copy link

openjdk bot commented Oct 21, 2024

@robehn Pushed as commit 66ddaaa.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

None yet

3 participants