-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8313406: nep_invoker_blob can be simplified more #15089
Conversation
👋 Welcome back ysuenaga! A progress list of the required criteria for merging this PR into |
@YaSuenag The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I'm running tests in our CI.
After this change, FFM performance on another ffmasm testcase was improved:
If you consider the 'error' in those benchmarks, only the first 2 digits of the throughput are significant, so I don't think we can claim that this improves performance base on these numbers alone.
However, I think the improvement in code quality is also a plus, just for making things easier to read/understand. I think these changes are non-invasive, and IIRC in the past we've always set RAX mostly since the code shape at that time didn't allow us to avoid it easily. So, I'm okay with this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All green
@YaSuenag This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 157 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
FWIW, if you want to look into reducing the generated code further, I think we can potentially reduce the amount of shuffling between registers that's needed by reordering the arguments on the Java side so that each VMStorage corresponding to an argument of the leaf method handle is the same as the register for that argument in the Java calling convention. I think the right place to do this is in DowncallLinker where we are creating the NativeEntryPoint. The way I think it should work:
Pushing this shuffling to the Java side will allow the JIT to reduce data motion, and this should result in reduced shuffling being needed overall I think. |
@JornVernee Thanks for your review! I will integrate this when I get second reviewer.
It would be great! I guess you suggested that Again, this idea is great. I'd like to call native function via FFM with less overhead. So I'm happy to help if I can. |
No, ArgumentShuffle should stay inside HotSpot. We can not do all the shuffling on the Java side. We can only eliminate some of the register moves that are needed by re-ordering the arguments on the Java side. For instance, if you look at the comment in
i.e. all the registers in the Java calling convention are 'off by one' compared to the native calling convention. This makes sense for JNI since we need to prepend the JNIEnv* to the start of the argument list, but it doesn't make sense for Panama. Let's say we have a native function taking five
i.e. the first argument we pass on the Java side gets moved (by the downcall stub) into We can simply re-arrange the entries in the
i.e. the argument that should go into Ok, but now the arguments we pass to the downcall stub are going to go in the wrong registers as well :( So, to compensate for that, we have to also re-order the incoming argument values on the Java side so that each argument will correspond to it's original register again. To do that, we just need to pass the first argument in the fifth position as well, and shift arguments 1-4 forward by one spot. Make sense? |
Ideally it is the best if we eliminate all of the shuffling completely, but it is impossible I think. We have to use Thus this topic would be lower priority if my guessing is correct. |
Since things on the Java side are visible to the JIT, it should be able to avoid the extra data motion.
Yes, it is lower priority. It's relatively complex to solve, and also CPUs, in my experience, don't generally care that much about the shuffling. They probably just change their internal register allocation table instead of doing the actual moves. Also, it will ultimately help more to implement C2 intrinsics for native calls, as that avoids going through the downcall stub altogether. I have an old POC for that which I will dust off. |
Can I get second reviewer? |
PING: could you review this PR? I need one more reviewer to push. This PR has passed java/foreign jtreg tests and CI in Oracle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
/integrate |
Going to push as commit 583cb75.
Your commit was automatically rebased without conflicts. |
In FFM, native function would be called via
nep_invoker_blob
. If the function has two arguments, it would be following:subq $0, %rsp
is for shadow space on stack, andmovq %r8, %rax
is number of args for variadic function. So they are not necessary in some case. They should be remove following if they are not needed:All java/foreign jtreg tests are passed.
We can see these stub code on ffmasm testcase with
-XX:+UnlockDiagnosticVMOptions -XX:+PrintStubCode
and hsdis library. This testcase linked the code withLinker.Option.isTrivial()
.After this change, FFM performance on another ffmasm testcase was improved:
before:
after:
Environment:
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/15089/head:pull/15089
$ git checkout pull/15089
Update a local copy of the PR:
$ git checkout pull/15089
$ git pull https://git.openjdk.org/jdk.git pull/15089/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 15089
View PR using the GUI difftool:
$ git pr show -t 15089
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/15089.diff
Webrev
Link to Webrev Comment