New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8303238: Create generalizations for existing LShift ideal transforms #12734
Conversation
👋 Welcome back SuperCoder7979! A progress list of the required criteria for merging this PR into |
@SuperCoder7979 The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Please link to JDK-8303238. |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice overall!
Some superficial comments inline.
src/hotspot/share/opto/mulnode.cpp
Outdated
add1->in(2) == in(2) ) | ||
// Convert to "(x & -(1<<c0))" | ||
return new AndINode(add1->in(1),phase->intcon( -(1<<con))); | ||
// Check for "(x >> C1) << C2" which just masks off low bits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "which just masks off the low bits" comments should move to the C1 == C2 special case. Same below and for LShiftLNode
.
@State(Scope.Benchmark) | ||
public static class BenchState { | ||
int[] ints; | ||
Random random = new Random(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A hard-coded or parameterized seed is preferred for microbenchmarking to reduce noise from different data distributions in back-to-back runs.
Thanks for the comments! I have updated the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good to me and internal testing passes. Thanks for the contribution!
This transformation is profitable because typically more bitwise ands can be dispatched per cycle than bit shifts
Are we sure that this is profitable on all architectures?
Another review would be good.
test/hotspot/jtreg/compiler/c2/irTests/LShiftINodeIdealizationTests.java
Outdated
Show resolved
Hide resolved
|
@SuperCoder7979 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 247 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@cl4es, @TobiHartmann) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
/reviewers 2 |
@TobiHartmann |
Hi, thanks for the review! I have fixed the mistype in the comments, and have updated the bug headers. As for profitability, I looked through Agner Fog's instruction tables and uops.info and found that for all x86 microarchitectures listed, bitwise For different architectures, I wasn't able to readily find resources on instruction latency such as Agner for x86, but I was able to find LLVM's scheduler models for aarch64, ppc, and risc-v. These all seemed to be similar to x86- where the bitwise |
Can't say I'm very familiar with aarch64 (yet) but on my Mac M1 (osx-aarch64) I see similar improvements:
|
Nice! Glad to see the change has an impact there :) |
Thanks for updating the comments and the additional details. Looks good to me! |
Thanks for the reviews! |
/integrate |
@SuperCoder7979 |
I've started an internal testing run (tier 1-3) and will report any issues or sponsor depending on the results. The test that's failing in GHA is an unrelated bug that was supposedly fixed yesterday: https://bugs.openjdk.org/browse/JDK-8303105 |
Testing looks good /sponsor |
Going to push as commit 8e41bf2.
Your commit was automatically rebased without conflicts. |
@cl4es @SuperCoder7979 Pushed as commit 8e41bf2. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
@SuperCoder7979 I see a new issue, it must be related to this change. Can you take a look? https://bugs.openjdk.org/browse/JDK-8304230 |
Hi, and apologies- I'll address this ASAP. Thanks for the heads up. |
Hello,
I would like to generalize two ideal transforms for bitwise shifts. Left shift nodes perform the transformations
(x >> C1) << C2 => x & (-1 << C2)
and((x >> C1) & Y) << C2 => x & (Y << C2)
, but only when the case whereC1 == C2
. However, it is possible to use both of these rules to improve cases where the constants aren't equal, by removing one of the shifts and replacing it with a bitwise and. This transformation is profitable because typically more bitwise ands can be dispatched per cycle than bit shifts. In addition, the strength reduction from a shift to a bitwise and can allow more profitable transformations to occur. These patterns are found throughout the JDK, mainly around strings and OW2 ASM. I've attached some profiling results from my (Zen 2) machine below:In addition, in the process of making this PR I've found a missing ideal transform for
RShiftLNode
, so right shifts of large numbers (such asx >> 65
) are not properly folded down, like how they areRShiftINode
andURShiftLNode
. I'll address this in a future RFR.Testing: GHA, tier1 local, and performance testing
Thanks,
Jasmine K
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/12734/head:pull/12734
$ git checkout pull/12734
Update a local copy of the PR:
$ git checkout pull/12734
$ git pull https://git.openjdk.org/jdk pull/12734/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 12734
View PR using the GUI difftool:
$ git pr show -t 12734
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12734.diff