8283091: Support type conversion between different data sizes in SLP #7806

fg1417 · 2022-03-14T08:18:25Z

After JDK-8275317, C2's SLP vectorizer has supported type conversion between the same data size. We can also support conversions between different data sizes like:
int <-> double
float <-> long
int <-> long
float <-> double

A typical test case:

int[] a;
double[] b;
for (int i = start; i < limit; i++) {
b[i] = (double) a[i];
}

Our expected OptoAssembly code for one iteration is like below:

add R12, R2, R11, LShiftL #2
vector_load V16,[R12, #16]
vectorcast_i2d V16, V16 # convert I to D vector
add R11, R1, R11, LShiftL #3 # ptr
add R13, R11, #16 # ptr
vector_store [R13], V16

To enable the vectorization, the patch solves the following problems in the SLP.

There are three main operations in the case above, LoadI, ConvI2D and StoreD. Assuming that the vector length is 128 bits, how many scalar nodes should be packed together to a vector? If we decide it separately for each operation node, like what we did before the patch in SuperWord::combine_packs(), a 128-bit vector will support 4 LoadI or 2 ConvI2D or 2 StoreD nodes. However, if we put these packed nodes in a vector node sequence, like loading 4 elements to a vector, then typecasting 2 elements and lastly storing these 2 elements, they become invalid. As a result, we should look through the whole def-use chain
and then pick up the minimum of these element sizes, like function SuperWord::max_vector_size_in_ud_chain() do in the superword.cpp. In this case, we pack 2 LoadI, 2 ConvI2D and 2 StoreD nodes, and then generate valid vector node sequence, like loading 2 elements, converting the 2 elements to another type and storing the 2 elements with new type.

After this, LoadI nodes don't make full use of the whole vector and only occupy part of it. So we adapt the code in SuperWord::get_vw_bytes_special() to the situation.

In SLP, we calculate a kind of alignment as position trace for each scalar node in the whole vector. In this case, the alignments for 2 LoadI nodes are 0, 4 while the alignment for 2 ConvI2D nodes are 0, 8. Sometimes, 4 for LoadI and 8 for ConvI2D work the same, both of which mark that this node is the second node in the whole vector, while the difference between 4 and 8 are just because of their own data sizes. In this situation, we should try to remove the impact caused by different data size in SLP. For example, in the stage of SuperWord::extend_packlist(), while determining if it's potential to pack a pair of def nodes in the function SuperWord::follow_use_defs(), we remove the side effect of different data size by transforming the target alignment from the use node. Because we believe that, assuming that the vector length is 512 bits, if the ConvI2D use nodes have alignments of 16-24 and their def nodes, LoadI, have alignments of 8-12, these two LoadI nodes should be packed as a pair as well.

Similarly, when determining if the vectorization is profitable, type conversion between different data size takes a type of one size and produces a type of another size, hence the special checks on alignment and size should be applied, like what we do in SuperWord::is_vector_use().

After solving these problems, we successfully implemented the vectorization of type conversion between different data sizes.

Here is the test data (-XX:+UseSuperWord) on NEON:

Before the patch:
Benchmark (length) Mode Cnt Score Error Units
convertD2F 523 avgt 15 216.431 ± 0.131 ns/op
convertD2I 523 avgt 15 220.522 ± 0.311 ns/op
convertF2D 523 avgt 15 217.034 ± 0.292 ns/op
convertF2L 523 avgt 15 231.634 ± 1.881 ns/op
convertI2D 523 avgt 15 229.538 ± 0.095 ns/op
convertI2L 523 avgt 15 214.822 ± 0.131 ns/op
convertL2F 523 avgt 15 230.188 ± 0.217 ns/op
convertL2I 523 avgt 15 162.234 ± 0.235 ns/op

After the patch:
Benchmark (length) Mode Cnt Score Error Units
convertD2F 523 avgt 15 124.352 ± 1.079 ns/op
convertD2I 523 avgt 15 557.388 ± 8.166 ns/op
convertF2D 523 avgt 15 118.082 ± 4.026 ns/op
convertF2L 523 avgt 15 225.810 ± 11.180 ns/op
convertI2D 523 avgt 15 166.247 ± 0.120 ns/op
convertI2L 523 avgt 15 119.699 ± 2.925 ns/op
convertL2F 523 avgt 15 220.847 ± 0.053 ns/op
convertL2I 523 avgt 15 122.339 ± 2.738 ns/op

perf data on X86:
Before the patch:
Benchmark (length) Mode Cnt Score Error Units
convertD2F 523 avgt 15 279.466 ± 0.069 ns/op
convertD2I 523 avgt 15 551.009 ± 7.459 ns/op
convertF2D 523 avgt 15 276.066 ± 0.117 ns/op
convertF2L 523 avgt 15 545.108 ± 5.697 ns/op
convertI2D 523 avgt 15 745.303 ± 0.185 ns/op
convertI2L 523 avgt 15 260.878 ± 0.044 ns/op
convertL2F 523 avgt 15 502.016 ± 0.172 ns/op
convertL2I 523 avgt 15 261.654 ± 3.326 ns/op

After the patch:
Benchmark (length) Mode Cnt Score Error Units
convertD2F 523 avgt 15 106.975 ± 0.045 ns/op
convertD2I 523 avgt 15 546.866 ± 9.287 ns/op
convertF2D 523 avgt 15 82.414 ± 0.340 ns/op
convertF2L 523 avgt 15 542.235 ± 2.785 ns/op
convertI2D 523 avgt 15 92.966 ± 1.400 ns/op
convertI2L 523 avgt 15 79.960 ± 0.528 ns/op
convertL2F 523 avgt 15 504.712 ± 4.794 ns/op
convertL2I 523 avgt 15 129.753 ± 0.094 ns/op

perf data on AVX512:
Before the patch:
Benchmark (length) Mode Cnt Score Error Units
convertD2F 523 avgt 15 282.984 ± 4.022 ns/op
convertD2I 523 avgt 15 543.080 ± 3.873 ns/op
convertF2D 523 avgt 15 273.950 ± 0.131 ns/op
convertF2L 523 avgt 15 539.568 ± 2.747 ns/op
convertI2D 523 avgt 15 745.238 ± 0.069 ns/op
convertI2L 523 avgt 15 260.935 ± 0.169 ns/op
convertL2F 523 avgt 15 501.870 ± 0.359 ns/op
convertL2I 523 avgt 15 257.508 ± 0.174 ns/op

After the patch:
Benchmark (length) Mode Cnt Score Error Units
convertD2F 523 avgt 15 76.687 ± 0.530 ns/op
convertD2I 523 avgt 15 545.408 ± 4.657 ns/op
convertF2D 523 avgt 15 273.935 ± 0.099 ns/op
convertF2L 523 avgt 15 540.534 ± 3.032 ns/op
convertI2D 523 avgt 15 745.234 ± 0.053 ns/op
convertI2L 523 avgt 15 260.865 ± 0.104 ns/op
convertL2F 523 avgt 15 63.834 ± 4.777 ns/op
convertL2I 523 avgt 15 48.183 ± 0.990 ns/op

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8283091: Support type conversion between different data sizes in SLP

Reviewers

Vladimir Kozlov (@vnkozlov - Reviewer) ⚠️ Review applies to 0d731bb2
Sandhya Viswanathan (@sviswa7 - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/7806/head:pull/7806
$ git checkout pull/7806

Update a local copy of the PR:
$ git checkout pull/7806
$ git pull https://git.openjdk.org/jdk pull/7806/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 7806

View PR using the GUI difftool:
$ git pr show -t 7806

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/7806.diff

bridgekeeper · 2022-03-14T08:19:51Z

👋 Welcome back fgao! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2022-03-14T08:21:39Z

@fg1417 The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2022-03-14T08:25:09Z

Webrevs

After JDK-8275317, C2's SLP vectorizer has supported type conversion between the same data size. We can also support conversions between different data sizes like: int <-> double float <-> long int <-> long float <-> double A typical test case: int[] a; double[] b; for (int i = start; i < limit; i++) { b[i] = (double) a[i]; } Our expected OptoAssembly code for one iteration is like below: add R12, R2, R11, LShiftL openjdk#2 vector_load V16,[R12, openjdk#16] vectorcast_i2d V16, V16 # convert I to D vector add R11, R1, R11, LShiftL openjdk#3 # ptr add R13, R11, openjdk#16 # ptr vector_store [R13], V16 To enable the vectorization, the patch solves the following problems in the SLP. There are three main operations in the case above, LoadI, ConvI2D and StoreD. Assuming that the vector length is 128 bits, how many scalar nodes should be packed together to a vector? If we decide it separately for each operation node, like what we did before the patch in SuperWord::combine_packs(), a 128-bit vector will support 4 LoadI or 2 ConvI2D or 2 StoreD nodes. However, if we put these packed nodes in a vector node sequence, like loading 4 elements to a vector, then typecasting 2 elements and lastly storing these 2 elements, they become invalid. As a result, we should look through the whole def-use chain and then pick up the minimum of these element sizes, like function SuperWord::max_vector_size_in_ud_chain() do in the superword.cpp. In this case, we pack 2 LoadI, 2 ConvI2D and 2 StoreD nodes, and then generate valid vector node sequence, like loading 2 elements, converting the 2 elements to another type and storing the 2 elements with new type. After this, LoadI nodes don't make full use of the whole vector and only occupy part of it. So we adapt the code in SuperWord::get_vw_bytes_special() to the situation. In SLP, we calculate a kind of alignment as position trace for each scalar node in the whole vector. In this case, the alignments for 2 LoadI nodes are 0, 4 while the alignment for 2 ConvI2D nodes are 0, 8. Sometimes, 4 for LoadI and 8 for ConvI2D work the same, both of which mark that this node is the second node in the whole vector, while the difference between 4 and 8 are just because of their own data sizes. In this situation, we should try to remove the impact caused by different data size in SLP. For example, in the stage of SuperWord::extend_packlist(), while determining if it's potential to pack a pair of def nodes in the function SuperWord::follow_use_defs(), we remove the side effect of different data size by transforming the target alignment from the use node. Because we believe that, assuming that the vector length is 512 bits, if the ConvI2D use nodes have alignments of 16-24 and their def nodes, LoadI, have alignments of 8-12, these two LoadI nodes should be packed as a pair as well. Similarly, when determining if the vectorization is profitable, type conversion between different data size takes a type of one size and produces a type of another size, hence the special checks on alignment and size should be applied, like what we do in SuperWord::is_vector_use. After solving these problems, we successfully implemented the vectorization of type conversion between different data sizes. Here is the test data on NEON: Before the patch: Benchmark (length) Mode Cnt Score Error Units VectorLoop.convertD2F 523 avgt 15 216.431 ± 0.131 ns/op VectorLoop.convertD2I 523 avgt 15 220.522 ± 0.311 ns/op VectorLoop.convertF2D 523 avgt 15 217.034 ± 0.292 ns/op VectorLoop.convertF2L 523 avgt 15 231.634 ± 1.881 ns/op VectorLoop.convertI2D 523 avgt 15 229.538 ± 0.095 ns/op VectorLoop.convertI2L 523 avgt 15 214.822 ± 0.131 ns/op VectorLoop.convertL2F 523 avgt 15 230.188 ± 0.217 ns/op VectorLoop.convertL2I 523 avgt 15 162.234 ± 0.235 ns/op After the patch: Benchmark (length) Mode Cnt Score Error Units VectorLoop.convertD2F 523 avgt 15 124.352 ± 1.079 ns/op VectorLoop.convertD2I 523 avgt 15 557.388 ± 8.166 ns/op VectorLoop.convertF2D 523 avgt 15 118.082 ± 4.026 ns/op VectorLoop.convertF2L 523 avgt 15 225.810 ± 11.180 ns/op VectorLoop.convertI2D 523 avgt 15 166.247 ± 0.120 ns/op VectorLoop.convertI2L 523 avgt 15 119.699 ± 2.925 ns/op VectorLoop.convertL2F 523 avgt 15 220.847 ± 0.053 ns/op VectorLoop.convertL2I 523 avgt 15 122.339 ± 2.738 ns/op perf data on X86: Before the patch: Benchmark (length) Mode Cnt Score Error Units VectorLoop.convertD2F 523 avgt 15 279.466 ± 0.069 ns/op VectorLoop.convertD2I 523 avgt 15 551.009 ± 7.459 ns/op VectorLoop.convertF2D 523 avgt 15 276.066 ± 0.117 ns/op VectorLoop.convertF2L 523 avgt 15 545.108 ± 5.697 ns/op VectorLoop.convertI2D 523 avgt 15 745.303 ± 0.185 ns/op VectorLoop.convertI2L 523 avgt 15 260.878 ± 0.044 ns/op VectorLoop.convertL2F 523 avgt 15 502.016 ± 0.172 ns/op VectorLoop.convertL2I 523 avgt 15 261.654 ± 3.326 ns/op After the patch: Benchmark (length) Mode Cnt Score Error Units VectorLoop.convertD2F 523 avgt 15 106.975 ± 0.045 ns/op VectorLoop.convertD2I 523 avgt 15 546.866 ± 9.287 ns/op VectorLoop.convertF2D 523 avgt 15 82.414 ± 0.340 ns/op VectorLoop.convertF2L 523 avgt 15 542.235 ± 2.785 ns/op VectorLoop.convertI2D 523 avgt 15 92.966 ± 1.400 ns/op VectorLoop.convertI2L 523 avgt 15 79.960 ± 0.528 ns/op VectorLoop.convertL2F 523 avgt 15 504.712 ± 4.794 ns/op VectorLoop.convertL2I 523 avgt 15 129.753 ± 0.094 ns/op perf data on AVX512: Before the patch: Benchmark (length) Mode Cnt Score Error Units VectorLoop.convertD2F 523 avgt 15 282.984 ± 4.022 ns/op VectorLoop.convertD2I 523 avgt 15 543.080 ± 3.873 ns/op VectorLoop.convertF2D 523 avgt 15 273.950 ± 0.131 ns/op VectorLoop.convertF2L 523 avgt 15 539.568 ± 2.747 ns/op VectorLoop.convertI2D 523 avgt 15 745.238 ± 0.069 ns/op VectorLoop.convertI2L 523 avgt 15 260.935 ± 0.169 ns/op VectorLoop.convertL2F 523 avgt 15 501.870 ± 0.359 ns/op VectorLoop.convertL2I 523 avgt 15 257.508 ± 0.174 ns/op After the patch: Benchmark (length) Mode Cnt Score Error Units VectorLoop.convertD2F 523 avgt 15 76.687 ± 0.530 ns/op VectorLoop.convertD2I 523 avgt 15 545.408 ± 4.657 ns/op VectorLoop.convertF2D 523 avgt 15 273.935 ± 0.099 ns/op VectorLoop.convertF2L 523 avgt 15 540.534 ± 3.032 ns/op VectorLoop.convertI2D 523 avgt 15 745.234 ± 0.053 ns/op VectorLoop.convertI2L 523 avgt 15 260.865 ± 0.104 ns/op VectorLoop.convertL2F 523 avgt 15 63.834 ± 4.777 ns/op VectorLoop.convertL2I 523 avgt 15 48.183 ± 0.990 ns/op Change-Id: I93e60fd956547dad9204ceec90220145c58a72ef

DamonFool · 2022-03-14T08:45:18Z

Good job!

Did you forget the micro-benchmark VectorLoop.java?

fg1417 · 2022-03-14T09:00:13Z

Did you forget the micro-benchmark VectorLoop.java?

Thanks for your review. Sorry to miss it. Add it ASAP :-)

Change-Id: I674581135fd0844accc65520574fcef161eededa

Change-Id: I3c741255804ce410c8b6dcbdec974fa2c9051fd8

fg1417 · 2022-03-15T08:19:59Z

Did you forget the micro-benchmark VectorLoop.java?

Thanks. Done.

fg1417 · 2022-03-28T06:10:55Z

The patch is to support type conversion between different data sizes like ConvL2F and ConvL2I. Anybody can help review it? Thanks :)

DamonFool · 2022-04-08T02:51:51Z

Can you explain why convertD2I becomes much slower on NEON after this patch?
Thanks.

DamonFool · 2022-04-08T03:08:27Z

There seems conflicts with the jdk-master, so please merge with the latest version.
Thanks.

fg1417 · 2022-04-08T06:55:27Z

Can you explain why convertD2I becomes much slower on NEON after this patch? Thanks.

Thanks for your review.

On NEON, there are no real vector instructions to do the conversion from Double to Int, which is implemented in 5 instructions

jdk/src/hotspot/cpu/aarch64/aarch64_neon.ad

Line 512 in 8c18705

instruct vcvt2Dto2I(vecD dst, vecX src)

, costing more than scalar instructions, as we know that there are only two elements for VectorCastD2I on 128-bit NEON machine.

DamonFool · 2022-04-08T07:03:28Z

costing more than scalar instructions, as we know that there are only two elements for VectorCastD2I on 128-bit NEON machine.

So shall we disable vcvt2Dto2I for NEON?

fg1417 · 2022-04-08T07:15:03Z

So shall we disable vcvt2Dto2I for NEON?

I'm afraid we can't. We still need to support it in VectorAPI.

Change-Id: I1dfb4a6092302267e3796e08d411d0241b23df83

Change-Id: I8deeae48449f1fc159c9bb5f82773e1bc6b5105f

vnkozlov · 2022-05-24T16:02:54Z

@fg1417 Thank you for suggesting this optimization. I see that it was not updated for some time. Do you still intend to work on it?

Please update to latest JDK (Loom was integrated) and run performance again. Also include % of changes.

I have the same concern as @DamonFool about regression when vectorizing some conversions. May be we should have additional Matcher property we could consult when trying to auto-vectorize. I understand that we need vcvt2Dto2I when VectorAPI specifically asking to generate it but we should not enforce auto-generation.

vnkozlov · 2022-05-24T16:04:12Z

And separate thank you for updating tests to verify correctness of this optimization!

fg1417 · 2022-05-25T01:13:36Z

Please update to latest JDK (Loom was integrated) and run performance again. Also include % of changes.

I have the same concern as @DamonFool about regression when vectorizing some conversions. May be we should have additional Matcher property we could consult when trying to auto-vectorize. I understand that we need vcvt2Dto2I when VectorAPI specifically asking to generate it but we should not enforce auto-generation.

@vnkozlov thanks for your review and kind suggestion! I'll update the patch to resolve the potential performance regression.

vnkozlov · 2022-05-27T22:18:12Z

@fg1417 I don't see new update in this PR. Please also show performance numbers with new changes

Change-Id: Ieb9a530571926520e478657159d9eea1b0f8a7dd

…tch rules Change-Id: I8dcfae69a40717356757396faa06ae2d6015d701

fg1417 · 2022-06-02T14:02:33Z

@fg1417 I don't see new update in this PR. Please also show performance numbers with new changes

Here is the perf uplift data (ns/op) on different machines for the latest patch.

NEON perf change (ns/op)
convertB2D not supported
convertB2F -45.55%
convertB2L not supported
convertD2B not supported
convertD2F -42.32%
convertD2I not supported (VectorAPI supported)
convertD2S not supported
convertF2B -42.95%
convertF2D -45.28%
convertF2L -5.78%
convertF2S -51.30%
convertI2D -27.82%
convertI2L -44.54%
convertL2B not supported
convertL2F not supported (VectorAPI supported)
convertL2I -28.58%
convertL2S not supported
convertS2D not supported
convertS2F -53.37%
convertS2L not supported

SVE perf change (ns/op)
convertB2D -36.15%
convertB2F -63.48%
convertB2L -32.48%
convertD2B 0.02%
convertD2F -47.85%
convertD2I -46.42%
convertD2S -32.08%
convertF2B -59.54%
convertF2D -60.81%
convertF2L -61.81%
convertF2S -67.67%
convertI2D -60.63%
convertI2L -57.23%
convertL2B 0.04%
convertL2F -47.21%
convertL2I -34.49%
convertL2S -19.57%
convertS2D -47.20%
convertS2F -74.86%
convertS2L -49.00%

X86 perf change (ns/op)
convertB2D -64.13%
convertB2F -79.37%
convertB2L -70.97%
convertD2B not supported
convertD2F -62.69%
convertD2I not supported
convertD2S not supported
convertF2B not supported
convertF2D -68.90%
convertF2L not supported
convertF2S not supported
convertI2D -87.48%
convertI2L -69.64%
convertL2B -3.96%
convertL2F -0.11%
convertL2I -49.59%
convertL2S -24.75%
convertS2D -84.35%
convertS2F -86.09%
convertS2L -70.42%

AVX512 perf change (ns/op)
convertB2D -78.08%
convertB2F -86.39%
convertB2L -79.07%
convertD2B not supported
convertD2F -71.86%
convertD2I not supported
convertD2S not supported
convertF2B not supported
convertF2D -78.17%
convertF2L not supported
convertF2S not supported
convertI2D -90.26%
convertI2L -79.92%
convertL2B -70.75%
convertL2F -86.67%
convertL2I -80.94%
convertL2S -71.54%
convertS2D -90.84%
convertS2F -83.94%
convertS2L -80.51%

fg1417 · 2022-06-02T14:04:51Z

@fg1417 I don't see new update in this PR. Please also show performance numbers with new changes

@vnkozlov I updated the patch to resolve the potential performance regression and also updated the performance numbers in the comment.

But the patch depends on the fix, 8287517: C2: assert(vlen_in_bytes == 64) failed: 2 by sviswa7 · Pull Request #8961 · openjdk/jdk (github.com), because when we enable the type conversion and loop induction at the same time, we can vectorize more scenarios like

jdk/test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java

Line 723 in 6ff2d89

static void test_rint(double[] a0, double[] a1) {

. When we set MaxVectorSize=16, the case here would fail. All jtreg tests passed except that one.

Please help review. Thanks.

vnkozlov · 2022-06-06T19:12:04Z

@fg1417 Please do not rebase or force-push to an active PR as it invalidates existing review comments. All changes will be squashed into a single commit automatically when integrating. See OpenJDK Developers’ Guide for more information.

May I ask if I did anything wrong? I just rebased the master, resolved conflict and pushed a new commit as it guides... and did not do any force-push... Why I got the notification this time?

Which git instruction you used? It is recommended to use git merge master in the PR's branch after master branch update (8. Merge the latest changes):

Avoid rebasing changes, and prefer merging instead.

vnkozlov · 2022-06-06T19:20:33Z

But only webrev was affected so it is fine.

vnkozlov

Latest changes look good. I have one comment about method name.

These changes are related to #8877 which also limit vectors in auto-vectorizer. #8877 has priority since it fixed that issue.

Also JDK 19 fork is coming in 3 day: https://openjdk.java.net/projects/jdk/19/

I would advice you to push these changes into JDK 20 after fork.

I will start my testing and do approval based on results.

vnkozlov · 2022-06-06T19:38:14Z

src/hotspot/share/opto/matcher.hpp

@@ -325,6 +325,10 @@ class Matcher : public PhaseTransform {
  // should generate this one.
  static const bool match_rule_supported(int opcode);

+  // Identify extra cases that we might want to vectorize automatically
+  // And exclude cases which are not profitable to auto-vectorize.
+  static const bool match_rule_supported_vectorization(int opcode, int vlen, BasicType bt);


May be we should rename it to match_rule_supported_superword. That is how we call auto-vectorizer in C2.

May be we should rename it to match_rule_supported_superword. That is how we call auto-vectorizer in C2.

Done.

And update after #8877. Thanks.

Change-Id: I3ef746178c07004cc34c22081a3044fb40e87702

Change-Id: Ie1907f86e2df7051aa2ddb7e5b05a371e887d1bc

vnkozlov

Good.

I submitted testing.

vnkozlov · 2022-06-09T18:01:05Z

Please consider adding IR framework test to make sure expected vector nodes are generated.

vnkozlov

Testing results are good.

openjdk · 2022-06-09T23:50:07Z

@fg1417 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8283091: Support type conversion between different data sizes in SLP

Reviewed-by: kvn, sviswanathan

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 45 new commits pushed to the master branch:

f7ba3b7: 8288300: AArch64: Remove the assertion in fmovs/fmovd(FloatRegister, FloatRegister)
0761228: 8288443: Simplify vmClasses::resolve_all()
9ff4034: 8288530: ProblemList serviceability/jvmti/VMObjectAlloc/VMObjectAllocTest.java in -Xcomp mode
1855e9d: 8220732: setSeed(long) java api doc is missing warning about provided seed quality
6d59561: 8288425: Footprint regression due MH creation when initializing StringConcatFactory
3475e12: 8288330: Avoid redundant ConcurrentHashMap.get call in Http2ClientImpl.deleteConnection
cb5ef3d: 8288499: Restore cancel-in-progress in GHA
13d4ddc: 8286962: java/net/httpclient/ServerCloseTest.java failed once with ConnectException
6633855: 8288399: MacOS debug symbol files not always deterministic in reproducible builds
d5cd2f2: 8284849: Add deoptimization to unified logging
... and 35 more: https://git.openjdk.org/jdk/compare/f1143b1b57683665c81d24ff192a9babc30f28ea...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@vnkozlov, @sviswa7) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

Change-Id: I9525ae9310c3c493da29490d034cbb8f223e7f80

Change-Id: Ifbcc8d233aa27dfe93acef548c7e42721d86376e

fg1417 · 2022-06-14T01:46:16Z

Please consider adding IR framework test to make sure expected vector nodes are generated.

Added an IR framework testcase and updated to the latest JDK.

Thanks for your review @vnkozlov .

vnkozlov · 2022-06-14T22:25:45Z

Thank you for test. @sviswa7 can you review latest changes to have 2 reviews?

sviswa7 · 2022-06-15T18:51:14Z

Looks good to me.

fg1417 · 2022-06-16T01:27:52Z

Thanks for your review @sviswa7 @vnkozlov .

Can I integrate it now?

vnkozlov · 2022-06-16T02:07:04Z

Thanks for your review @sviswa7 @vnkozlov .

Can I integrate it now?

Yes

fg1417 · 2022-06-16T02:21:07Z

/integrate

openjdk · 2022-06-16T02:22:33Z

@fg1417
Your change (at version 49e6f56) is now ready to be sponsored by a Committer.

pfustc · 2022-06-16T02:38:53Z

/sponsor

openjdk · 2022-06-16T02:41:12Z

Going to push as commit a179590.
Since your change was applied there have been 45 commits pushed to the master branch:

f7ba3b7: 8288300: AArch64: Remove the assertion in fmovs/fmovd(FloatRegister, FloatRegister)
0761228: 8288443: Simplify vmClasses::resolve_all()
9ff4034: 8288530: ProblemList serviceability/jvmti/VMObjectAlloc/VMObjectAllocTest.java in -Xcomp mode
1855e9d: 8220732: setSeed(long) java api doc is missing warning about provided seed quality
6d59561: 8288425: Footprint regression due MH creation when initializing StringConcatFactory
3475e12: 8288330: Avoid redundant ConcurrentHashMap.get call in Http2ClientImpl.deleteConnection
cb5ef3d: 8288499: Restore cancel-in-progress in GHA
13d4ddc: 8286962: java/net/httpclient/ServerCloseTest.java failed once with ConnectException
6633855: 8288399: MacOS debug symbol files not always deterministic in reproducible builds
d5cd2f2: 8284849: Add deoptimization to unified logging
... and 35 more: https://git.openjdk.org/jdk/compare/f1143b1b57683665c81d24ff192a9babc30f28ea...master

Your commit was automatically rebased without conflicts.

openjdk · 2022-06-16T02:41:36Z

@pfustc @fg1417 Pushed as commit a179590.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

TobiHartmann · 2022-07-25T10:31:07Z

This change introduced a regression, see JDK-8290910.

openjdk bot added the rfr Pull request is ready for review label Mar 14, 2022

openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Mar 14, 2022

fg1417 force-pushed the fg8283091 branch from c6d0716 to c2c1373 Compare March 14, 2022 08:29

Fei Gao added 2 commits March 15, 2022 08:00

Merge branch 'master' into fg8283091

0ea2853

Change-Id: I674581135fd0844accc65520574fcef161eededa

Add micro-benchmark cases

bf3fc41

Change-Id: I3c741255804ce410c8b6dcbdec974fa2c9051fd8

nsjian mentioned this pull request Mar 28, 2022

8283709: Add x86 back-end implementation for bit BIT_COUNT operation openjdk/panama-vector#185

Closed

2 tasks

Fei Gao added 2 commits April 27, 2022 08:05

Merge branch 'master' into fg8283091

cd07555

Change-Id: I1dfb4a6092302267e3796e08d411d0241b23df83

Merge branch 'master' into fg8283091

66f621c

Change-Id: I8deeae48449f1fc159c9bb5f82773e1bc6b5105f

Fei Gao added 2 commits June 2, 2022 12:20

Merge branch 'master' into fg8283091

725e127

Change-Id: Ieb9a530571926520e478657159d9eea1b0f8a7dd

Implement an interface for auto-vectorization to consult supported ma…

74895bf

…tch rules Change-Id: I8dcfae69a40717356757396faa06ae2d6015d701

fg1417 mentioned this pull request Jun 2, 2022

8287697: Limit auto vectorization to 32-byte vector on Cascade Lake #8877

Closed

3 tasks

vnkozlov reviewed Jun 6, 2022

View reviewed changes

Fei Gao added 2 commits June 9, 2022 02:11

Merge branch 'master' into fg8283091

49380db

Change-Id: I3ef746178c07004cc34c22081a3044fb40e87702

Update to the latest JDK and fix the function name

0d731bb

Change-Id: Ie1907f86e2df7051aa2ddb7e5b05a371e887d1bc

vnkozlov reviewed Jun 9, 2022

View reviewed changes

vnkozlov approved these changes Jun 9, 2022

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Jun 9, 2022

Fei Gao added 2 commits June 14, 2022 01:35

Merge branch 'master' into fg8283091

3bbc2f7

Change-Id: I9525ae9310c3c493da29490d034cbb8f223e7f80

Add an IR framework testcase

49e6f56

Change-Id: Ifbcc8d233aa27dfe93acef548c7e42721d86376e

sviswa7 approved these changes Jun 15, 2022

View reviewed changes

openjdk bot added the sponsor Pull request is ready to be sponsored label Jun 16, 2022

openjdk bot added the integrated Pull request has been integrated label Jun 16, 2022

openjdk bot closed this Jun 16, 2022

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Jun 16, 2022

8283091: Support type conversion between different data sizes in SLP #7806

8283091: Support type conversion between different data sizes in SLP #7806

Conversation

fg1417 commented Mar 14, 2022 • edited by openjdk bot

Progress

Issue

Reviewers

Reviewing

bridgekeeper bot commented Mar 14, 2022

openjdk bot commented Mar 14, 2022

mlbridge bot commented Mar 14, 2022 • edited

Webrevs

DamonFool commented Mar 14, 2022

fg1417 commented Mar 14, 2022

fg1417 commented Mar 15, 2022 • edited

fg1417 commented Mar 28, 2022

DamonFool commented Apr 8, 2022

DamonFool commented Apr 8, 2022

fg1417 commented Apr 8, 2022 • edited

DamonFool commented Apr 8, 2022

fg1417 commented Apr 8, 2022

vnkozlov commented May 24, 2022

vnkozlov commented May 24, 2022

fg1417 commented May 25, 2022

vnkozlov commented May 27, 2022

fg1417 commented Jun 2, 2022

fg1417 commented Jun 2, 2022 • edited

vnkozlov commented Jun 6, 2022

vnkozlov commented Jun 6, 2022

vnkozlov left a comment

Choose a reason for hiding this comment

vnkozlov Jun 6, 2022

Choose a reason for hiding this comment

fg1417 Jun 9, 2022

Choose a reason for hiding this comment

vnkozlov left a comment

Choose a reason for hiding this comment

vnkozlov commented Jun 9, 2022

vnkozlov left a comment

Choose a reason for hiding this comment

openjdk bot commented Jun 9, 2022 • edited

fg1417 commented Jun 14, 2022

vnkozlov commented Jun 14, 2022

sviswa7 commented Jun 15, 2022

fg1417 commented Jun 16, 2022

vnkozlov commented Jun 16, 2022

fg1417 commented Jun 16, 2022

openjdk bot commented Jun 16, 2022

pfustc commented Jun 16, 2022

openjdk bot commented Jun 16, 2022

openjdk bot commented Jun 16, 2022

TobiHartmann commented Jul 25, 2022

fg1417 commented Mar 14, 2022 •

edited by openjdk bot

mlbridge bot commented Mar 14, 2022 •

edited

fg1417 commented Mar 15, 2022 •

edited

fg1417 commented Apr 8, 2022 •

edited

fg1417 commented Jun 2, 2022 •

edited

openjdk bot commented Jun 9, 2022 •

edited