Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8283091: Support type conversion between different data sizes in SLP #7806

Closed
wants to merge 13 commits into from

Conversation

fg1417
Copy link

@fg1417 fg1417 commented Mar 14, 2022

After JDK-8275317, C2's SLP vectorizer has supported type conversion between the same data size. We can also support conversions between different data sizes like:
int <-> double
float <-> long
int <-> long
float <-> double

A typical test case:

int[] a;
double[] b;
for (int i = start; i < limit; i++) {
b[i] = (double) a[i];
}

Our expected OptoAssembly code for one iteration is like below:

add R12, R2, R11, LShiftL #2
vector_load V16,[R12, #16]
vectorcast_i2d V16, V16 # convert I to D vector
add R11, R1, R11, LShiftL #3 # ptr
add R13, R11, #16 # ptr
vector_store [R13], V16

To enable the vectorization, the patch solves the following problems in the SLP.

There are three main operations in the case above, LoadI, ConvI2D and StoreD. Assuming that the vector length is 128 bits, how many scalar nodes should be packed together to a vector? If we decide it separately for each operation node, like what we did before the patch in SuperWord::combine_packs(), a 128-bit vector will support 4 LoadI or 2 ConvI2D or 2 StoreD nodes. However, if we put these packed nodes in a vector node sequence, like loading 4 elements to a vector, then typecasting 2 elements and lastly storing these 2 elements, they become invalid. As a result, we should look through the whole def-use chain
and then pick up the minimum of these element sizes, like function SuperWord::max_vector_size_in_ud_chain() do in the superword.cpp. In this case, we pack 2 LoadI, 2 ConvI2D and 2 StoreD nodes, and then generate valid vector node sequence, like loading 2 elements, converting the 2 elements to another type and storing the 2 elements with new type.

After this, LoadI nodes don't make full use of the whole vector and only occupy part of it. So we adapt the code in SuperWord::get_vw_bytes_special() to the situation.

In SLP, we calculate a kind of alignment as position trace for each scalar node in the whole vector. In this case, the alignments for 2 LoadI nodes are 0, 4 while the alignment for 2 ConvI2D nodes are 0, 8. Sometimes, 4 for LoadI and 8 for ConvI2D work the same, both of which mark that this node is the second node in the whole vector, while the difference between 4 and 8 are just because of their own data sizes. In this situation, we should try to remove the impact caused by different data size in SLP. For example, in the stage of SuperWord::extend_packlist(), while determining if it's potential to pack a pair of def nodes in the function SuperWord::follow_use_defs(), we remove the side effect of different data size by transforming the target alignment from the use node. Because we believe that, assuming that the vector length is 512 bits, if the ConvI2D use nodes have alignments of 16-24 and their def nodes, LoadI, have alignments of 8-12, these two LoadI nodes should be packed as a pair as well.

Similarly, when determining if the vectorization is profitable, type conversion between different data size takes a type of one size and produces a type of another size, hence the special checks on alignment and size should be applied, like what we do in SuperWord::is_vector_use().

After solving these problems, we successfully implemented the vectorization of type conversion between different data sizes.

Here is the test data (-XX:+UseSuperWord) on NEON:

Before the patch:
Benchmark (length) Mode Cnt Score Error Units
convertD2F 523 avgt 15 216.431 ± 0.131 ns/op
convertD2I 523 avgt 15 220.522 ± 0.311 ns/op
convertF2D 523 avgt 15 217.034 ± 0.292 ns/op
convertF2L 523 avgt 15 231.634 ± 1.881 ns/op
convertI2D 523 avgt 15 229.538 ± 0.095 ns/op
convertI2L 523 avgt 15 214.822 ± 0.131 ns/op
convertL2F 523 avgt 15 230.188 ± 0.217 ns/op
convertL2I 523 avgt 15 162.234 ± 0.235 ns/op

After the patch:
Benchmark (length) Mode Cnt Score Error Units
convertD2F 523 avgt 15 124.352 ± 1.079 ns/op
convertD2I 523 avgt 15 557.388 ± 8.166 ns/op
convertF2D 523 avgt 15 118.082 ± 4.026 ns/op
convertF2L 523 avgt 15 225.810 ± 11.180 ns/op
convertI2D 523 avgt 15 166.247 ± 0.120 ns/op
convertI2L 523 avgt 15 119.699 ± 2.925 ns/op
convertL2F 523 avgt 15 220.847 ± 0.053 ns/op
convertL2I 523 avgt 15 122.339 ± 2.738 ns/op

perf data on X86:
Before the patch:
Benchmark (length) Mode Cnt Score Error Units
convertD2F 523 avgt 15 279.466 ± 0.069 ns/op
convertD2I 523 avgt 15 551.009 ± 7.459 ns/op
convertF2D 523 avgt 15 276.066 ± 0.117 ns/op
convertF2L 523 avgt 15 545.108 ± 5.697 ns/op
convertI2D 523 avgt 15 745.303 ± 0.185 ns/op
convertI2L 523 avgt 15 260.878 ± 0.044 ns/op
convertL2F 523 avgt 15 502.016 ± 0.172 ns/op
convertL2I 523 avgt 15 261.654 ± 3.326 ns/op

After the patch:
Benchmark (length) Mode Cnt Score Error Units
convertD2F 523 avgt 15 106.975 ± 0.045 ns/op
convertD2I 523 avgt 15 546.866 ± 9.287 ns/op
convertF2D 523 avgt 15 82.414 ± 0.340 ns/op
convertF2L 523 avgt 15 542.235 ± 2.785 ns/op
convertI2D 523 avgt 15 92.966 ± 1.400 ns/op
convertI2L 523 avgt 15 79.960 ± 0.528 ns/op
convertL2F 523 avgt 15 504.712 ± 4.794 ns/op
convertL2I 523 avgt 15 129.753 ± 0.094 ns/op

perf data on AVX512:
Before the patch:
Benchmark (length) Mode Cnt Score Error Units
convertD2F 523 avgt 15 282.984 ± 4.022 ns/op
convertD2I 523 avgt 15 543.080 ± 3.873 ns/op
convertF2D 523 avgt 15 273.950 ± 0.131 ns/op
convertF2L 523 avgt 15 539.568 ± 2.747 ns/op
convertI2D 523 avgt 15 745.238 ± 0.069 ns/op
convertI2L 523 avgt 15 260.935 ± 0.169 ns/op
convertL2F 523 avgt 15 501.870 ± 0.359 ns/op
convertL2I 523 avgt 15 257.508 ± 0.174 ns/op

After the patch:
Benchmark (length) Mode Cnt Score Error Units
convertD2F 523 avgt 15 76.687 ± 0.530 ns/op
convertD2I 523 avgt 15 545.408 ± 4.657 ns/op
convertF2D 523 avgt 15 273.935 ± 0.099 ns/op
convertF2L 523 avgt 15 540.534 ± 3.032 ns/op
convertI2D 523 avgt 15 745.234 ± 0.053 ns/op
convertI2L 523 avgt 15 260.865 ± 0.104 ns/op
convertL2F 523 avgt 15 63.834 ± 4.777 ns/op
convertL2I 523 avgt 15 48.183 ± 0.990 ns/op


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8283091: Support type conversion between different data sizes in SLP

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/7806/head:pull/7806
$ git checkout pull/7806

Update a local copy of the PR:
$ git checkout pull/7806
$ git pull https://git.openjdk.org/jdk pull/7806/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 7806

View PR using the GUI difftool:
$ git pr show -t 7806

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/7806.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 14, 2022

👋 Welcome back fgao! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 14, 2022
@openjdk
Copy link

openjdk bot commented Mar 14, 2022

@fg1417 The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Mar 14, 2022
@mlbridge
Copy link

mlbridge bot commented Mar 14, 2022

Webrevs

After JDK-8275317, C2's SLP vectorizer has supported type conversion
between the same data size. We can also support conversions between
different data sizes like:
int <-> double
float <-> long
int <-> long
float <-> double

A typical test case:

int[] a;
double[] b;
for (int i = start; i < limit; i++) {
    b[i] = (double) a[i];
}

Our expected OptoAssembly code for one iteration is like below:

add R12, R2, R11, LShiftL openjdk#2
vector_load   V16,[R12, openjdk#16]
vectorcast_i2d  V16, V16  # convert I to D vector
add R11, R1, R11, LShiftL openjdk#3	# ptr
add R13, R11, openjdk#16	# ptr
vector_store [R13], V16

To enable the vectorization, the patch solves the following problems
in the SLP.

There are three main operations in the case above, LoadI, ConvI2D and
StoreD. Assuming that the vector length is 128 bits, how many scalar
nodes should be packed together to a vector? If we decide it
separately for each operation node, like what we did before the patch
in SuperWord::combine_packs(), a 128-bit vector will support 4 LoadI
or 2 ConvI2D or 2 StoreD nodes. However, if we put these packed nodes
in a vector node sequence, like loading 4 elements to a vector, then
typecasting 2 elements and lastly storing these 2 elements, they become
invalid. As a result, we should look through the whole def-use chain
and then pick up the minimum of these element sizes, like function
SuperWord::max_vector_size_in_ud_chain() do in the superword.cpp.
In this case, we pack 2 LoadI, 2 ConvI2D and 2 StoreD nodes, and then
generate valid vector node sequence, like loading 2 elements,
converting the 2 elements to another type and storing the 2 elements
with new type.

After this, LoadI nodes don't make full use of the whole vector and
only occupy part of it. So we adapt the code in
SuperWord::get_vw_bytes_special() to the situation.

In SLP, we calculate a kind of alignment as position trace for each
scalar node in the whole vector. In this case, the alignments for 2
LoadI nodes are 0, 4 while the alignment for 2 ConvI2D nodes are 0, 8.
Sometimes, 4 for LoadI and 8 for ConvI2D work the same, both of which
mark that this node is the second node in the whole vector, while the
difference between 4 and 8 are just because of their own data sizes. In
this situation, we should try to remove the impact caused by different
data size in SLP. For example, in the stage of
SuperWord::extend_packlist(), while determining if it's potential to
pack a pair of def nodes in the function SuperWord::follow_use_defs(),
we remove the side effect of different data size by transforming the
target alignment from the use node. Because we believe that, assuming
that the vector length is 512 bits, if the ConvI2D use nodes have
alignments of 16-24 and their def nodes, LoadI, have alignments of 8-12,
these two LoadI nodes should be packed as a pair as well.

Similarly, when determining if the vectorization is profitable, type
conversion between different data size takes a type of one size and
produces a type of another size, hence the special checks on alignment
and size should be applied, like what we do in SuperWord::is_vector_use.

After solving these problems, we successfully implemented the
vectorization of type conversion between different data sizes.

Here is the test data on NEON:

Before the patch:
Benchmark              (length)  Mode  Cnt    Score   Error  Units
  VectorLoop.convertD2F       523  avgt   15  216.431 ± 0.131  ns/op
  VectorLoop.convertD2I       523  avgt   15  220.522 ± 0.311  ns/op
  VectorLoop.convertF2D       523  avgt   15  217.034 ± 0.292  ns/op
  VectorLoop.convertF2L       523  avgt   15  231.634 ± 1.881  ns/op
  VectorLoop.convertI2D       523  avgt   15  229.538 ± 0.095  ns/op
  VectorLoop.convertI2L       523  avgt   15  214.822 ± 0.131  ns/op
  VectorLoop.convertL2F       523  avgt   15  230.188 ± 0.217  ns/op
  VectorLoop.convertL2I       523  avgt   15  162.234 ± 0.235  ns/op

After the patch:
Benchmark              (length)  Mode  Cnt    Score    Error  Units
  VectorLoop.convertD2F       523  avgt   15  124.352 ±  1.079  ns/op
  VectorLoop.convertD2I       523  avgt   15  557.388 ±  8.166  ns/op
  VectorLoop.convertF2D       523  avgt   15  118.082 ±  4.026  ns/op
  VectorLoop.convertF2L       523  avgt   15  225.810 ± 11.180  ns/op
  VectorLoop.convertI2D       523  avgt   15  166.247 ±  0.120  ns/op
  VectorLoop.convertI2L       523  avgt   15  119.699 ±  2.925  ns/op
  VectorLoop.convertL2F       523  avgt   15  220.847 ±  0.053  ns/op
  VectorLoop.convertL2I       523  avgt   15  122.339 ±  2.738  ns/op

perf data on X86:
Before the patch:
Benchmark              (length)  Mode  Cnt    Score   Error  Units
  VectorLoop.convertD2F       523  avgt   15  279.466 ± 0.069  ns/op
  VectorLoop.convertD2I       523  avgt   15  551.009 ± 7.459  ns/op
  VectorLoop.convertF2D       523  avgt   15  276.066 ± 0.117  ns/op
  VectorLoop.convertF2L       523  avgt   15  545.108 ± 5.697  ns/op
  VectorLoop.convertI2D       523  avgt   15  745.303 ± 0.185  ns/op
  VectorLoop.convertI2L       523  avgt   15  260.878 ± 0.044  ns/op
  VectorLoop.convertL2F       523  avgt   15  502.016 ± 0.172  ns/op
  VectorLoop.convertL2I       523  avgt   15  261.654 ± 3.326  ns/op

After the patch:
Benchmark              (length)  Mode  Cnt    Score   Error  Units
  VectorLoop.convertD2F       523  avgt   15  106.975 ± 0.045  ns/op
  VectorLoop.convertD2I       523  avgt   15  546.866 ± 9.287  ns/op
  VectorLoop.convertF2D       523  avgt   15   82.414 ± 0.340  ns/op
  VectorLoop.convertF2L       523  avgt   15  542.235 ± 2.785  ns/op
  VectorLoop.convertI2D       523  avgt   15   92.966 ± 1.400  ns/op
  VectorLoop.convertI2L       523  avgt   15   79.960 ± 0.528  ns/op
  VectorLoop.convertL2F       523  avgt   15  504.712 ± 4.794  ns/op
  VectorLoop.convertL2I       523  avgt   15  129.753 ± 0.094  ns/op

perf data on AVX512:
Before the patch:
Benchmark              (length)  Mode  Cnt    Score   Error  Units
  VectorLoop.convertD2F       523  avgt   15  282.984 ± 4.022  ns/op
  VectorLoop.convertD2I       523  avgt   15  543.080 ± 3.873  ns/op
  VectorLoop.convertF2D       523  avgt   15  273.950 ± 0.131  ns/op
  VectorLoop.convertF2L       523  avgt   15  539.568 ± 2.747  ns/op
  VectorLoop.convertI2D       523  avgt   15  745.238 ± 0.069  ns/op
  VectorLoop.convertI2L       523  avgt   15  260.935 ± 0.169  ns/op
  VectorLoop.convertL2F       523  avgt   15  501.870 ± 0.359  ns/op
  VectorLoop.convertL2I       523  avgt   15  257.508 ± 0.174  ns/op

After the patch:
Benchmark              (length)  Mode  Cnt    Score   Error  Units
  VectorLoop.convertD2F       523  avgt   15   76.687 ± 0.530  ns/op
  VectorLoop.convertD2I       523  avgt   15  545.408 ± 4.657  ns/op
  VectorLoop.convertF2D       523  avgt   15  273.935 ± 0.099  ns/op
  VectorLoop.convertF2L       523  avgt   15  540.534 ± 3.032  ns/op
  VectorLoop.convertI2D       523  avgt   15  745.234 ± 0.053  ns/op
  VectorLoop.convertI2L       523  avgt   15  260.865 ± 0.104  ns/op
  VectorLoop.convertL2F       523  avgt   15   63.834 ± 4.777  ns/op
  VectorLoop.convertL2I       523  avgt   15   48.183 ± 0.990  ns/op

Change-Id: I93e60fd956547dad9204ceec90220145c58a72ef
@DamonFool
Copy link
Member

Good job!

Did you forget the micro-benchmark VectorLoop.java?

@fg1417
Copy link
Author

fg1417 commented Mar 14, 2022

Did you forget the micro-benchmark VectorLoop.java?

Thanks for your review. Sorry to miss it. Add it ASAP :-)

Fei Gao added 2 commits March 15, 2022 08:00
Change-Id: I674581135fd0844accc65520574fcef161eededa
Change-Id: I3c741255804ce410c8b6dcbdec974fa2c9051fd8
@fg1417
Copy link
Author

fg1417 commented Mar 15, 2022

Did you forget the micro-benchmark VectorLoop.java?

Thanks. Done.

@fg1417
Copy link
Author

fg1417 commented Mar 28, 2022

The patch is to support type conversion between different data sizes like ConvL2F and ConvL2I. Anybody can help review it? Thanks :)

@DamonFool
Copy link
Member

Can you explain why convertD2I becomes much slower on NEON after this patch?
Thanks.

@DamonFool
Copy link
Member

There seems conflicts with the jdk-master, so please merge with the latest version.
Thanks.

@fg1417
Copy link
Author

fg1417 commented Apr 8, 2022

Can you explain why convertD2I becomes much slower on NEON after this patch? Thanks.

Thanks for your review.

On NEON, there are no real vector instructions to do the conversion from Double to Int, which is implemented in 5 instructions

instruct vcvt2Dto2I(vecD dst, vecX src)
, costing more than scalar instructions, as we know that there are only two elements for VectorCastD2I on 128-bit NEON machine.

@DamonFool
Copy link
Member

costing more than scalar instructions, as we know that there are only two elements for VectorCastD2I on 128-bit NEON machine.

So shall we disable vcvt2Dto2I for NEON?

@fg1417
Copy link
Author

fg1417 commented Apr 8, 2022

So shall we disable vcvt2Dto2I for NEON?

I'm afraid we can't. We still need to support it in VectorAPI.

Fei Gao added 2 commits April 27, 2022 08:05
Change-Id: I1dfb4a6092302267e3796e08d411d0241b23df83
Change-Id: I8deeae48449f1fc159c9bb5f82773e1bc6b5105f
@vnkozlov
Copy link
Contributor

@fg1417 Thank you for suggesting this optimization. I see that it was not updated for some time. Do you still intend to work on it?

Please update to latest JDK (Loom was integrated) and run performance again. Also include % of changes.

I have the same concern as @DamonFool about regression when vectorizing some conversions. May be we should have additional Matcher property we could consult when trying to auto-vectorize. I understand that we need vcvt2Dto2I when VectorAPI specifically asking to generate it but we should not enforce auto-generation.

@vnkozlov
Copy link
Contributor

And separate thank you for updating tests to verify correctness of this optimization!

@fg1417
Copy link
Author

fg1417 commented May 25, 2022

Please update to latest JDK (Loom was integrated) and run performance again. Also include % of changes.

I have the same concern as @DamonFool about regression when vectorizing some conversions. May be we should have additional Matcher property we could consult when trying to auto-vectorize. I understand that we need vcvt2Dto2I when VectorAPI specifically asking to generate it but we should not enforce auto-generation.

@vnkozlov thanks for your review and kind suggestion! I'll update the patch to resolve the potential performance regression.

@vnkozlov
Copy link
Contributor

@fg1417 I don't see new update in this PR. Please also show performance numbers with new changes

Fei Gao added 2 commits June 2, 2022 12:20
Change-Id: Ieb9a530571926520e478657159d9eea1b0f8a7dd
…tch rules

Change-Id: I8dcfae69a40717356757396faa06ae2d6015d701
@fg1417
Copy link
Author

fg1417 commented Jun 2, 2022

@fg1417 I don't see new update in this PR. Please also show performance numbers with new changes

Here is the perf uplift data (ns/op) on different machines for the latest patch.

NEON perf change (ns/op)
convertB2D not supported
convertB2F -45.55%
convertB2L not supported
convertD2B not supported
convertD2F -42.32%
convertD2I not supported (VectorAPI supported)
convertD2S not supported
convertF2B -42.95%
convertF2D -45.28%
convertF2L -5.78%
convertF2S -51.30%
convertI2D -27.82%
convertI2L -44.54%
convertL2B not supported
convertL2F not supported (VectorAPI supported)
convertL2I -28.58%
convertL2S not supported
convertS2D not supported
convertS2F -53.37%
convertS2L not supported

SVE perf change (ns/op)
convertB2D -36.15%
convertB2F -63.48%
convertB2L -32.48%
convertD2B 0.02%
convertD2F -47.85%
convertD2I -46.42%
convertD2S -32.08%
convertF2B -59.54%
convertF2D -60.81%
convertF2L -61.81%
convertF2S -67.67%
convertI2D -60.63%
convertI2L -57.23%
convertL2B 0.04%
convertL2F -47.21%
convertL2I -34.49%
convertL2S -19.57%
convertS2D -47.20%
convertS2F -74.86%
convertS2L -49.00%

X86 perf change (ns/op)
convertB2D -64.13%
convertB2F -79.37%
convertB2L -70.97%
convertD2B not supported
convertD2F -62.69%
convertD2I not supported
convertD2S not supported
convertF2B not supported
convertF2D -68.90%
convertF2L not supported
convertF2S not supported
convertI2D -87.48%
convertI2L -69.64%
convertL2B -3.96%
convertL2F -0.11%
convertL2I -49.59%
convertL2S -24.75%
convertS2D -84.35%
convertS2F -86.09%
convertS2L -70.42%

AVX512 perf change (ns/op)
convertB2D -78.08%
convertB2F -86.39%
convertB2L -79.07%
convertD2B not supported
convertD2F -71.86%
convertD2I not supported
convertD2S not supported
convertF2B not supported
convertF2D -78.17%
convertF2L not supported
convertF2S not supported
convertI2D -90.26%
convertI2L -79.92%
convertL2B -70.75%
convertL2F -86.67%
convertL2I -80.94%
convertL2S -71.54%
convertS2D -90.84%
convertS2F -83.94%
convertS2L -80.51%

@fg1417
Copy link
Author

fg1417 commented Jun 2, 2022

@fg1417 I don't see new update in this PR. Please also show performance numbers with new changes

@vnkozlov I updated the patch to resolve the potential performance regression and also updated the performance numbers in the comment.

But the patch depends on the fix, 8287517: C2: assert(vlen_in_bytes == 64) failed: 2 by sviswa7 · Pull Request #8961 · openjdk/jdk (github.com), because when we enable the type conversion and loop induction at the same time, we can vectorize more scenarios like

static void test_rint(double[] a0, double[] a1) {
. When we set MaxVectorSize=16, the case here would fail. All jtreg tests passed except that one.

Please help review. Thanks.

@vnkozlov
Copy link
Contributor

vnkozlov commented Jun 6, 2022

@fg1417 Please do not rebase or force-push to an active PR as it invalidates existing review comments. All changes will be squashed into a single commit automatically when integrating. See OpenJDK Developers’ Guide for more information.

May I ask if I did anything wrong? I just rebased the master, resolved conflict and pushed a new commit as it guides... and did not do any force-push... Why I got the notification this time?

Which git instruction you used? It is recommended to use git merge master in the PR's branch after master branch update (8. Merge the latest changes):

Avoid rebasing changes, and prefer merging instead.

@vnkozlov
Copy link
Contributor

vnkozlov commented Jun 6, 2022

But only webrev was affected so it is fine.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest changes look good. I have one comment about method name.

These changes are related to #8877 which also limit vectors in auto-vectorizer. #8877 has priority since it fixed that issue.

Also JDK 19 fork is coming in 3 day: https://openjdk.java.net/projects/jdk/19/

I would advice you to push these changes into JDK 20 after fork.

I will start my testing and do approval based on results.

@@ -325,6 +325,10 @@ class Matcher : public PhaseTransform {
// should generate this one.
static const bool match_rule_supported(int opcode);

// Identify extra cases that we might want to vectorize automatically
// And exclude cases which are not profitable to auto-vectorize.
static const bool match_rule_supported_vectorization(int opcode, int vlen, BasicType bt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we should rename it to match_rule_supported_superword. That is how we call auto-vectorizer in C2.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we should rename it to match_rule_supported_superword. That is how we call auto-vectorizer in C2.

Done.

And update after #8877. Thanks.

Fei Gao added 2 commits June 9, 2022 02:11
Change-Id: I3ef746178c07004cc34c22081a3044fb40e87702
Change-Id: Ie1907f86e2df7051aa2ddb7e5b05a371e887d1bc
Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

I submitted testing.

@vnkozlov
Copy link
Contributor

vnkozlov commented Jun 9, 2022

Please consider adding IR framework test to make sure expected vector nodes are generated.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing results are good.

@openjdk
Copy link

openjdk bot commented Jun 9, 2022

@fg1417 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8283091: Support type conversion between different data sizes in SLP

Reviewed-by: kvn, sviswanathan

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 45 new commits pushed to the master branch:

  • f7ba3b7: 8288300: AArch64: Remove the assertion in fmovs/fmovd(FloatRegister, FloatRegister)
  • 0761228: 8288443: Simplify vmClasses::resolve_all()
  • 9ff4034: 8288530: ProblemList serviceability/jvmti/VMObjectAlloc/VMObjectAllocTest.java in -Xcomp mode
  • 1855e9d: 8220732: setSeed(long) java api doc is missing warning about provided seed quality
  • 6d59561: 8288425: Footprint regression due MH creation when initializing StringConcatFactory
  • 3475e12: 8288330: Avoid redundant ConcurrentHashMap.get call in Http2ClientImpl.deleteConnection
  • cb5ef3d: 8288499: Restore cancel-in-progress in GHA
  • 13d4ddc: 8286962: java/net/httpclient/ServerCloseTest.java failed once with ConnectException
  • 6633855: 8288399: MacOS debug symbol files not always deterministic in reproducible builds
  • d5cd2f2: 8284849: Add deoptimization to unified logging
  • ... and 35 more: https://git.openjdk.org/jdk/compare/f1143b1b57683665c81d24ff192a9babc30f28ea...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@vnkozlov, @sviswa7) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 9, 2022
Fei Gao added 2 commits June 14, 2022 01:35
Change-Id: I9525ae9310c3c493da29490d034cbb8f223e7f80
Change-Id: Ifbcc8d233aa27dfe93acef548c7e42721d86376e
@fg1417
Copy link
Author

fg1417 commented Jun 14, 2022

Please consider adding IR framework test to make sure expected vector nodes are generated.

Added an IR framework testcase and updated to the latest JDK.

Thanks for your review @vnkozlov .

@vnkozlov
Copy link
Contributor

Thank you for test. @sviswa7 can you review latest changes to have 2 reviews?

@sviswa7
Copy link

sviswa7 commented Jun 15, 2022

Looks good to me.

@fg1417
Copy link
Author

fg1417 commented Jun 16, 2022

Thanks for your review @sviswa7 @vnkozlov .

Can I integrate it now?

@vnkozlov
Copy link
Contributor

Thanks for your review @sviswa7 @vnkozlov .

Can I integrate it now?

Yes

@fg1417
Copy link
Author

fg1417 commented Jun 16, 2022

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Jun 16, 2022
@openjdk
Copy link

openjdk bot commented Jun 16, 2022

@fg1417
Your change (at version 49e6f56) is now ready to be sponsored by a Committer.

@pfustc
Copy link
Member

pfustc commented Jun 16, 2022

/sponsor

@openjdk
Copy link

openjdk bot commented Jun 16, 2022

Going to push as commit a179590.
Since your change was applied there have been 45 commits pushed to the master branch:

  • f7ba3b7: 8288300: AArch64: Remove the assertion in fmovs/fmovd(FloatRegister, FloatRegister)
  • 0761228: 8288443: Simplify vmClasses::resolve_all()
  • 9ff4034: 8288530: ProblemList serviceability/jvmti/VMObjectAlloc/VMObjectAllocTest.java in -Xcomp mode
  • 1855e9d: 8220732: setSeed(long) java api doc is missing warning about provided seed quality
  • 6d59561: 8288425: Footprint regression due MH creation when initializing StringConcatFactory
  • 3475e12: 8288330: Avoid redundant ConcurrentHashMap.get call in Http2ClientImpl.deleteConnection
  • cb5ef3d: 8288499: Restore cancel-in-progress in GHA
  • 13d4ddc: 8286962: java/net/httpclient/ServerCloseTest.java failed once with ConnectException
  • 6633855: 8288399: MacOS debug symbol files not always deterministic in reproducible builds
  • d5cd2f2: 8284849: Add deoptimization to unified logging
  • ... and 35 more: https://git.openjdk.org/jdk/compare/f1143b1b57683665c81d24ff192a9babc30f28ea...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 16, 2022
@openjdk openjdk bot closed this Jun 16, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Jun 16, 2022
@openjdk
Copy link

openjdk bot commented Jun 16, 2022

@pfustc @fg1417 Pushed as commit a179590.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@TobiHartmann
Copy link
Member

This change introduced a regression, see JDK-8290910.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
6 participants