8300865: C2: product reduction in ProdRed_Double is not vectorized #14065

sviswa7 · 2023-05-19T23:27:32Z

This PR fixes the problem with double reduction on x86_64.

In the test compiler.loopopts.superword.ProdRed_Double, the product reduction loop in prodReductionImplement() was not getting vectorized when run as follows:
jtreg -XX:CompileCommand=PrintAssembly,compiler.loopopts.superword.ProdRed_Double::prodReductionImplement compiler/loopopts/superword/ProdRed_Double.java
The print assembly generated in the pid-xxx.log output in JTwork/scratch directory was not showing any vector_reduction_double node.

This was happening as the ReductionNode::implemented was passed a vector size of one element. For the vector reduction implemented we need to check with at least vector size of two elements.

With this PR the vector_reduction_double node is generated.

Please review.

Best Regards,
Sandhya

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8300865: C2: product reduction in ProdRed_Double is not vectorized

Reviewers

Fei Gao (@fg1417 - Committer) ⚠️ Review applies to ba3b5dfa
Emanuel Peter (@eme64 - Committer)
Vladimir Kozlov (@vnkozlov - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14065/head:pull/14065
$ git checkout pull/14065

Update a local copy of the PR:
$ git checkout pull/14065
$ git pull https://git.openjdk.org/jdk.git pull/14065/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14065

View PR using the GUI difftool:
$ git pr show -t 14065

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14065.diff

Webrev

Link to Webrev Comment

bridgekeeper · 2023-05-19T23:28:56Z

👋 Welcome back sviswanathan! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2023-05-19T23:30:24Z

@sviswa7 The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2023-05-22T17:05:14Z

Webrevs

fg1417 · 2023-05-23T02:59:41Z

src/hotspot/share/opto/superword.cpp

+            // Matcher::min_vector_size may return 1 in some cases, e.g. double for x86.
+            // For vector reduction implemented check we need atleast two elements.
+            int min_vec_size = MAX2(Matcher::min_vector_size(bt), 2);
+            if (ReductionNode::implemented(use->Opcode(), min_vec_size, bt)) {


Hi @sviswa7, can we use superword_max_vector_size() as the input here? MAX2(Matcher::min_vector_size(bt), 2); may not tally with the actual situation on other 64-bit platforms. WDYT?

@fg1417 Thanks a lot for the review. Yes, we could use superword_max_vector_size here. I have made the change.

fg1417

LGTM

eme64 · 2023-05-25T06:40:56Z

@sviswa7 Thanks for taking care of this. Looks good, but let me run testing at commit 4. I will report back.

sviswa7 · 2023-05-25T17:27:50Z

Thanks a lot @eme64.

eme64 · 2023-05-30T07:59:37Z

@sviswa7 testing to tier5 and stress testing looks good.

Out of curiosity: do you have a benchmark that shows a speedup with this change? Would be nice to add it.
Maybe we could start with a benchmark from https://git.openjdk.org/jdk/pull/13056 and add some more compute-instructions to outweigh the latency of the reduction? Not sure if that is very easy.

eme64

Thanks for the fix! Looks good.

sviswa7 · 2023-05-31T00:46:36Z

Thanks a lot @eme64. There was an existing jmh benchmark for vector reduction. I have updated it to add double reduction case.

sviswa7 · 2023-05-31T00:55:22Z

The performance numbers on my desktop are:
Base runs, no vectorization happens with superword:
Benchmark (COUNT) (seed) Mode Cnt Score Error Units
VectorReduction.NoSuperword.mulRedD 512 0 avgt 4 435.795 ± 0.082 ns/op
VectorReduction.WithSuperword.mulRedD 512 0 avgt 4 434.154 ± 0.042 ns/op

With the PR reduction succeeds and vectorization of the loop happens when superword is enabled:
Benchmark (COUNT) (seed) Mode Cnt Score Error Units
VectorReduction.NoSuperword.mulRedD 512 0 avgt 4 435.897 ± 0.137 ns/op
VectorReduction.WithSuperword.mulRedD 512 0 avgt 4 405.479 ± 1.896 ns/op

eme64 · 2023-05-31T06:30:44Z

@sviswa7 Thanks for adding the benchmark. The win is small, but that was to be expected given that the double reduction has to be performed in a linear order, and hence has quite a large latency.

sviswa7 · 2023-05-31T16:00:59Z

/integrate

openjdk · 2023-05-31T16:02:11Z

@sviswa7 This pull request has not yet been marked as ready for integration.

sviswa7 · 2023-05-31T16:57:41Z

@vnkozlov Could you also please review this PR?

vnkozlov

Looks good. You may regress for next case on KNL? when using max vector size (code is in x86.ad) which I think is fine.

    case Op_MinReductionV:
    case Op_MaxReductionV:
      if (UseAVX > 2 && (!VM_Version::supports_avx512dq() && size_in_bits == 512)) {
        return false;
      }

openjdk · 2023-05-31T19:32:38Z

@sviswa7 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8300865: C2: product reduction in ProdRed_Double is not vectorized

Reviewed-by: fgao, epeter, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 167 new commits pushed to the master branch:

eae1f59: 8309159: Some minor comment and code cleanup in jdk/com/sun/jdi/PopFramesTest.java
45473ef: 8309230: ProblemList jdk/incubator/vector/Float64VectorTests.java on aarch64
78aa5f3: 8299505: findVirtual on array classes incorrectly restricts the receiver type
42ca6e6: 8308022: update for deprecated sprintf for java.base
1264902: 8308316: Default decomposition mode in Collator
70670b4: 8308872: enhance logging and some exception in krb5/Config.java
024d9b1: 8308910: Allow executeAndLog to accept running process
25b9803: 8308917: C2 SuperWord::output: assert before bailout with CountedLoopReserveKit
d66b6d8: 8308765: RISC-V: Expand size of stub routines for zgc only
4aea7da: 8309120: java/net/httpclient/AsyncShutdownNow.java fails intermittently
... and 157 more: https://git.openjdk.org/jdk/compare/939344b8433b32166f42ad73ae3d96e84b033478...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

sviswa7 · 2023-05-31T22:39:09Z

Looks good. You may regress for next case on KNL? when using max vector size (code is in x86.ad) which I think is fine.
    case Op_MinReductionV:
    case Op_MaxReductionV:
      if (UseAVX > 2 && (!VM_Version::supports_avx512dq() && size_in_bits == 512)) {
        return false;
      }

Yes, that should be ok. Thanks a lot for the review @vnkozlov.

sviswa7 · 2023-05-31T22:39:28Z

/integrate

openjdk · 2023-05-31T22:40:01Z

Going to push as commit f9ad7df.
Since your change was applied there have been 175 commits pushed to the master branch:

8eda97d: 8305320: DbgStrings and AsmRemarks are leaking
0951474: 8309150: Need to escape " inside attribute values
0119969: 8309171: Test vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java fails after JDK-8308341
f8a924a: 8308975: Fix signed integer overflow in compiler code, part 2
5531f6b: 8308819: add JDWP and JDI virtual thread support for ThreadReference.ForceEarlyReturn
e42a4b6: 8309236: ProblemList java/util/concurrent/locks/Lock/OOMEInAQS.java with ZGC and Generational ZGC again
8dbd384: 8308678: (fs) UnixPath::toRealPath needs additional permissions when running with SM (macOS)
c3cd481: 8304914: Use OperatingSystem, Architecture, and Version in jpackage
eae1f59: 8309159: Some minor comment and code cleanup in jdk/com/sun/jdi/PopFramesTest.java
45473ef: 8309230: ProblemList jdk/incubator/vector/Float64VectorTests.java on aarch64
... and 165 more: https://git.openjdk.org/jdk/compare/939344b8433b32166f42ad73ae3d96e84b033478...master

Your commit was automatically rebased without conflicts.

openjdk · 2023-05-31T22:40:17Z

@sviswa7 Pushed as commit f9ad7df.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

8300865: C2: product reduction in ProdRed_Double is not vectorized

0f6c941

openjdk bot added the hotspot-compiler label May 19, 2023

Update IR test

0cc5157

sviswa7 marked this pull request as ready for review May 22, 2023 16:59

openjdk bot added the rfr label May 22, 2023

fg1417 reviewed May 23, 2023

View reviewed changes

sviswa7 added 2 commits May 23, 2023 15:11

use max_vector_size instead

d0acd1e

change to superword_max_vector_size

ba3b5df

fg1417 approved these changes May 24, 2023

View reviewed changes

eme64 approved these changes May 30, 2023

View reviewed changes

Add jmh test case

1c29051

eme64 approved these changes May 31, 2023

View reviewed changes

vnkozlov approved these changes May 31, 2023

View reviewed changes

openjdk bot added the ready label May 31, 2023

openjdk bot added the integrated label May 31, 2023

openjdk bot closed this May 31, 2023

openjdk bot removed ready rfr labels May 31, 2023

sviswa7 deleted the doublered branch June 3, 2024 21:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8300865: C2: product reduction in ProdRed_Double is not vectorized #14065

8300865: C2: product reduction in ProdRed_Double is not vectorized #14065

sviswa7 commented May 19, 2023 •

edited by openjdk bot

Loading

bridgekeeper bot commented May 19, 2023

openjdk bot commented May 19, 2023

mlbridge bot commented May 22, 2023 •

edited

Loading

fg1417 May 23, 2023

sviswa7 May 23, 2023

fg1417 left a comment

eme64 commented May 25, 2023 •

edited

Loading

sviswa7 commented May 25, 2023

eme64 commented May 30, 2023

eme64 left a comment

sviswa7 commented May 31, 2023

sviswa7 commented May 31, 2023

eme64 commented May 31, 2023

sviswa7 commented May 31, 2023

openjdk bot commented May 31, 2023

sviswa7 commented May 31, 2023

vnkozlov left a comment

openjdk bot commented May 31, 2023

sviswa7 commented May 31, 2023

sviswa7 commented May 31, 2023

openjdk bot commented May 31, 2023

openjdk bot commented May 31, 2023

8300865: C2: product reduction in ProdRed_Double is not vectorized #14065

8300865: C2: product reduction in ProdRed_Double is not vectorized #14065

Conversation

sviswa7 commented May 19, 2023 • edited by openjdk bot Loading

Progress

Issue

Reviewers

Reviewing

Webrev

bridgekeeper bot commented May 19, 2023

openjdk bot commented May 19, 2023

mlbridge bot commented May 22, 2023 • edited Loading

Webrevs

fg1417 May 23, 2023

Choose a reason for hiding this comment

sviswa7 May 23, 2023

Choose a reason for hiding this comment

fg1417 left a comment

Choose a reason for hiding this comment

eme64 commented May 25, 2023 • edited Loading

sviswa7 commented May 25, 2023

eme64 commented May 30, 2023

eme64 left a comment

Choose a reason for hiding this comment

sviswa7 commented May 31, 2023

sviswa7 commented May 31, 2023

eme64 commented May 31, 2023

sviswa7 commented May 31, 2023

openjdk bot commented May 31, 2023

sviswa7 commented May 31, 2023

vnkozlov left a comment

Choose a reason for hiding this comment

openjdk bot commented May 31, 2023

sviswa7 commented May 31, 2023

sviswa7 commented May 31, 2023

openjdk bot commented May 31, 2023

openjdk bot commented May 31, 2023

sviswa7 commented May 19, 2023 •

edited by openjdk bot

Loading

mlbridge bot commented May 22, 2023 •

edited

Loading

eme64 commented May 25, 2023 •

edited

Loading