Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: openjdk/jdk
base: 230726ea
Choose a base ref
...
head repository: openjdk/jdk
compare: 0d731bb2
Choose a head ref
  • 11 commits
  • 22 files changed
  • 2 contributors

Commits on Mar 14, 2022

  1. 8283091: Support type conversion between different data sizes in SLP

    After JDK-8275317, C2's SLP vectorizer has supported type conversion
    between the same data size. We can also support conversions between
    different data sizes like:
    int <-> double
    float <-> long
    int <-> long
    float <-> double
    
    A typical test case:
    
    int[] a;
    double[] b;
    for (int i = start; i < limit; i++) {
        b[i] = (double) a[i];
    }
    
    Our expected OptoAssembly code for one iteration is like below:
    
    add R12, R2, R11, LShiftL #2
    vector_load   V16,[R12, #16]
    vectorcast_i2d  V16, V16  # convert I to D vector
    add R11, R1, R11, LShiftL #3	# ptr
    add R13, R11, #16	# ptr
    vector_store [R13], V16
    
    To enable the vectorization, the patch solves the following problems
    in the SLP.
    
    There are three main operations in the case above, LoadI, ConvI2D and
    StoreD. Assuming that the vector length is 128 bits, how many scalar
    nodes should be packed together to a vector? If we decide it
    separately for each operation node, like what we did before the patch
    in SuperWord::combine_packs(), a 128-bit vector will support 4 LoadI
    or 2 ConvI2D or 2 StoreD nodes. However, if we put these packed nodes
    in a vector node sequence, like loading 4 elements to a vector, then
    typecasting 2 elements and lastly storing these 2 elements, they become
    invalid. As a result, we should look through the whole def-use chain
    and then pick up the minimum of these element sizes, like function
    SuperWord::max_vector_size_in_ud_chain() do in the superword.cpp.
    In this case, we pack 2 LoadI, 2 ConvI2D and 2 StoreD nodes, and then
    generate valid vector node sequence, like loading 2 elements,
    converting the 2 elements to another type and storing the 2 elements
    with new type.
    
    After this, LoadI nodes don't make full use of the whole vector and
    only occupy part of it. So we adapt the code in
    SuperWord::get_vw_bytes_special() to the situation.
    
    In SLP, we calculate a kind of alignment as position trace for each
    scalar node in the whole vector. In this case, the alignments for 2
    LoadI nodes are 0, 4 while the alignment for 2 ConvI2D nodes are 0, 8.
    Sometimes, 4 for LoadI and 8 for ConvI2D work the same, both of which
    mark that this node is the second node in the whole vector, while the
    difference between 4 and 8 are just because of their own data sizes. In
    this situation, we should try to remove the impact caused by different
    data size in SLP. For example, in the stage of
    SuperWord::extend_packlist(), while determining if it's potential to
    pack a pair of def nodes in the function SuperWord::follow_use_defs(),
    we remove the side effect of different data size by transforming the
    target alignment from the use node. Because we believe that, assuming
    that the vector length is 512 bits, if the ConvI2D use nodes have
    alignments of 16-24 and their def nodes, LoadI, have alignments of 8-12,
    these two LoadI nodes should be packed as a pair as well.
    
    Similarly, when determining if the vectorization is profitable, type
    conversion between different data size takes a type of one size and
    produces a type of another size, hence the special checks on alignment
    and size should be applied, like what we do in SuperWord::is_vector_use.
    
    After solving these problems, we successfully implemented the
    vectorization of type conversion between different data sizes.
    
    Here is the test data on NEON:
    
    Before the patch:
    Benchmark              (length)  Mode  Cnt    Score   Error  Units
      VectorLoop.convertD2F       523  avgt   15  216.431 ± 0.131  ns/op
      VectorLoop.convertD2I       523  avgt   15  220.522 ± 0.311  ns/op
      VectorLoop.convertF2D       523  avgt   15  217.034 ± 0.292  ns/op
      VectorLoop.convertF2L       523  avgt   15  231.634 ± 1.881  ns/op
      VectorLoop.convertI2D       523  avgt   15  229.538 ± 0.095  ns/op
      VectorLoop.convertI2L       523  avgt   15  214.822 ± 0.131  ns/op
      VectorLoop.convertL2F       523  avgt   15  230.188 ± 0.217  ns/op
      VectorLoop.convertL2I       523  avgt   15  162.234 ± 0.235  ns/op
    
    After the patch:
    Benchmark              (length)  Mode  Cnt    Score    Error  Units
      VectorLoop.convertD2F       523  avgt   15  124.352 ±  1.079  ns/op
      VectorLoop.convertD2I       523  avgt   15  557.388 ±  8.166  ns/op
      VectorLoop.convertF2D       523  avgt   15  118.082 ±  4.026  ns/op
      VectorLoop.convertF2L       523  avgt   15  225.810 ± 11.180  ns/op
      VectorLoop.convertI2D       523  avgt   15  166.247 ±  0.120  ns/op
      VectorLoop.convertI2L       523  avgt   15  119.699 ±  2.925  ns/op
      VectorLoop.convertL2F       523  avgt   15  220.847 ±  0.053  ns/op
      VectorLoop.convertL2I       523  avgt   15  122.339 ±  2.738  ns/op
    
    perf data on X86:
    Before the patch:
    Benchmark              (length)  Mode  Cnt    Score   Error  Units
      VectorLoop.convertD2F       523  avgt   15  279.466 ± 0.069  ns/op
      VectorLoop.convertD2I       523  avgt   15  551.009 ± 7.459  ns/op
      VectorLoop.convertF2D       523  avgt   15  276.066 ± 0.117  ns/op
      VectorLoop.convertF2L       523  avgt   15  545.108 ± 5.697  ns/op
      VectorLoop.convertI2D       523  avgt   15  745.303 ± 0.185  ns/op
      VectorLoop.convertI2L       523  avgt   15  260.878 ± 0.044  ns/op
      VectorLoop.convertL2F       523  avgt   15  502.016 ± 0.172  ns/op
      VectorLoop.convertL2I       523  avgt   15  261.654 ± 3.326  ns/op
    
    After the patch:
    Benchmark              (length)  Mode  Cnt    Score   Error  Units
      VectorLoop.convertD2F       523  avgt   15  106.975 ± 0.045  ns/op
      VectorLoop.convertD2I       523  avgt   15  546.866 ± 9.287  ns/op
      VectorLoop.convertF2D       523  avgt   15   82.414 ± 0.340  ns/op
      VectorLoop.convertF2L       523  avgt   15  542.235 ± 2.785  ns/op
      VectorLoop.convertI2D       523  avgt   15   92.966 ± 1.400  ns/op
      VectorLoop.convertI2L       523  avgt   15   79.960 ± 0.528  ns/op
      VectorLoop.convertL2F       523  avgt   15  504.712 ± 4.794  ns/op
      VectorLoop.convertL2I       523  avgt   15  129.753 ± 0.094  ns/op
    
    perf data on AVX512:
    Before the patch:
    Benchmark              (length)  Mode  Cnt    Score   Error  Units
      VectorLoop.convertD2F       523  avgt   15  282.984 ± 4.022  ns/op
      VectorLoop.convertD2I       523  avgt   15  543.080 ± 3.873  ns/op
      VectorLoop.convertF2D       523  avgt   15  273.950 ± 0.131  ns/op
      VectorLoop.convertF2L       523  avgt   15  539.568 ± 2.747  ns/op
      VectorLoop.convertI2D       523  avgt   15  745.238 ± 0.069  ns/op
      VectorLoop.convertI2L       523  avgt   15  260.935 ± 0.169  ns/op
      VectorLoop.convertL2F       523  avgt   15  501.870 ± 0.359  ns/op
      VectorLoop.convertL2I       523  avgt   15  257.508 ± 0.174  ns/op
    
    After the patch:
    Benchmark              (length)  Mode  Cnt    Score   Error  Units
      VectorLoop.convertD2F       523  avgt   15   76.687 ± 0.530  ns/op
      VectorLoop.convertD2I       523  avgt   15  545.408 ± 4.657  ns/op
      VectorLoop.convertF2D       523  avgt   15  273.935 ± 0.099  ns/op
      VectorLoop.convertF2L       523  avgt   15  540.534 ± 3.032  ns/op
      VectorLoop.convertI2D       523  avgt   15  745.234 ± 0.053  ns/op
      VectorLoop.convertI2L       523  avgt   15  260.865 ± 0.104  ns/op
      VectorLoop.convertL2F       523  avgt   15   63.834 ± 4.777  ns/op
      VectorLoop.convertL2I       523  avgt   15   48.183 ± 0.990  ns/op
    
    Change-Id: I93e60fd956547dad9204ceec90220145c58a72ef
    Faye Gao authored and Fei Gao committed Mar 14, 2022
    Configuration menu
    Copy the full SHA
    c2c1373 View commit details
    Browse the repository at this point in the history

Commits on Mar 15, 2022

  1. Merge branch 'master' into fg8283091

    Change-Id: I674581135fd0844accc65520574fcef161eededa
    Fei Gao committed Mar 15, 2022
    Configuration menu
    Copy the full SHA
    0ea2853 View commit details
    Browse the repository at this point in the history
  2. Add micro-benchmark cases

    Change-Id: I3c741255804ce410c8b6dcbdec974fa2c9051fd8
    Fei Gao committed Mar 15, 2022
    Configuration menu
    Copy the full SHA
    bf3fc41 View commit details
    Browse the repository at this point in the history

Commits on Apr 27, 2022

  1. Merge branch 'master' into fg8283091

    Change-Id: I1dfb4a6092302267e3796e08d411d0241b23df83
    Fei Gao committed Apr 27, 2022
    Configuration menu
    Copy the full SHA
    cd07555 View commit details
    Browse the repository at this point in the history

Commits on May 12, 2022

  1. Merge branch 'master' into fg8283091

    Change-Id: I8deeae48449f1fc159c9bb5f82773e1bc6b5105f
    Fei Gao committed May 12, 2022
    Configuration menu
    Copy the full SHA
    66f621c View commit details
    Browse the repository at this point in the history

Commits on Jun 2, 2022

  1. Merge branch 'master' into fg8283091

    Change-Id: Ieb9a530571926520e478657159d9eea1b0f8a7dd
    Fei Gao committed Jun 2, 2022
    Configuration menu
    Copy the full SHA
    725e127 View commit details
    Browse the repository at this point in the history
  2. Implement an interface for auto-vectorization to consult supported ma…

    …tch rules
    
    Change-Id: I8dcfae69a40717356757396faa06ae2d6015d701
    Fei Gao committed Jun 2, 2022
    Configuration menu
    Copy the full SHA
    74895bf View commit details
    Browse the repository at this point in the history

Commits on Jun 6, 2022

  1. Merge branch 'master' into fg8283091

    Change-Id: I42bec08da55e86fb1f049bb691138f3fcf6dbed6
    Fei Gao committed Jun 6, 2022
    Configuration menu
    Copy the full SHA
    2b08a0a View commit details
    Browse the repository at this point in the history
  2. Add assertion line for opcode() and withdraw some common code as a fu…

    …nction
    
    Change-Id: I7b5dbe60fec6979de454f347d074e6fc01126dfe
    Fei Gao committed Jun 6, 2022
    Configuration menu
    Copy the full SHA
    cf97e42 View commit details
    Browse the repository at this point in the history

Commits on Jun 9, 2022

  1. Merge branch 'master' into fg8283091

    Change-Id: I3ef746178c07004cc34c22081a3044fb40e87702
    Fei Gao committed Jun 9, 2022
    Configuration menu
    Copy the full SHA
    49380db View commit details
    Browse the repository at this point in the history
  2. Update to the latest JDK and fix the function name

    Change-Id: Ie1907f86e2df7051aa2ddb7e5b05a371e887d1bc
    Fei Gao committed Jun 9, 2022
    Configuration menu
    Copy the full SHA
    0d731bb View commit details
    Browse the repository at this point in the history