Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8292758: put support for UNSIGNED5 format into its own header file #10067

Closed
wants to merge 5 commits into from

Conversation

rose00
Copy link
Contributor

@rose00 rose00 commented Aug 29, 2022

Refactor code from inside of CompressedStream into its own unit.

This code is likely to be used in future refactorings, such as JDK-8292818 (replace 96-bit representation for field metadata with variable-sized streams).

Add gtests.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8292758: put support for UNSIGNED5 format into its own header file

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/10067/head:pull/10067
$ git checkout pull/10067

Update a local copy of the PR:
$ git checkout pull/10067
$ git pull https://git.openjdk.org/jdk pull/10067/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 10067

View PR using the GUI difftool:
$ git pr show -t 10067

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/10067.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 29, 2022

👋 Welcome back jrose! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 29, 2022

@rose00 The following labels will be automatically applied to this pull request:

  • hotspot
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability serviceability-dev@openjdk.org hotspot hotspot-dev@openjdk.org labels Aug 29, 2022
@openjdk
Copy link

openjdk bot commented Aug 31, 2022

@rose00 this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout compressed-stream
git fetch https://git.openjdk.org/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Aug 31, 2022
@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Aug 31, 2022
@rose00
Copy link
Contributor Author

rose00 commented Sep 1, 2022

This code passes tiers 1,2,3.

@rose00 rose00 marked this pull request as ready for review September 1, 2022 23:53
@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 1, 2022
@mlbridge
Copy link

mlbridge bot commented Sep 2, 2022

Webrevs

@openjdk-notifier
Copy link

@rose00 Please do not rebase or force-push to an active PR as it invalidates existing review comments. All changes will be squashed into a single commit automatically when integrating. See OpenJDK Developers’ Guide for more information.

@rose00
Copy link
Contributor Author

rose00 commented Sep 2, 2022

The new header file presents the encoding algorithm by means of templates.
The template arguments in general are:

  • ARY - a logical base address for reads and writes of bytes
  • OFF - an integral type (of any size or signed-ness) providing an offset to ARY
  • GET and SET - function-like arguments (e.g., lambdas) which get or set bytes from an address logically of the form a[i] shaped like ARY[OFF]
  • GFN a lambda used when the application requires on-the fly resizing of an output buffer (of type ARY)

Defaults are set in such a way that any C++ types that natively support a[i] can be fully inferred, including the get/set behaviors.

In addition, there are small "gadgets" for reading a series of ints from a buffer, writing a series to a buffer, and sizing a series (which is faster than writing or reading). These are not yet used. However, prototyping of further use cases for this compression (particularly, FieldInfo) makes it clear that these are repeated tasks that "canned" templates will help with.

Comment on lines 83 to 85
const int min_expansion = UNSIGNED5::MAX_LENGTH;
if (nsize < min_expansion*2)
nsize = min_expansion*2;
Copy link
Member

@dean-long dean-long Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear if this is needed or just an optimization. Maybe add a comment. Also, using MAX2 might be clearer.

Copy link
Member

@dean-long dean-long left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about the "excluded bytes" change. It appears to change the encoding, which would mean that SA would not be able to read streams from earlier JVMs, which may or may not be a concern. I suggest splitting this up into the straight-forward refactor (without the excluded bytes change), then adding excluded bytes as a follow-up.

@rose00
Copy link
Contributor Author

rose00 commented Sep 3, 2022

I suggest splitting this up into the straight-forward refactor (without the excluded bytes change), then adding excluded bytes as a follow-up.

Yes, that is a slight change. Splitting is not necessary for the reason you mention because this PR includes SA changes. This SA change is tested by several jtreg tests: That is, injecting bugs causes SA tests to fail, and fixing them fixes the tests. This is the case because the compressed stream data structure is used for all stack walking. So if there's a bug, we find it immediately. And the same is true for SA unit tests which peform stack walking. Net result: It's safe, there is no bug, because testing coverage is robust.

The math of the encoding is the same whether there are 255 or 256 byte values available, so the adjustment is very low risk. (The math works for any underlying byte type or size; we choose 8-bit bytes excluding nulls to provide 255 distinct tokens.) The benefit is that the encoding can be accompanied by null termination, which enhances debuggability and resilience against bad counts and bad pointers.

@dean-long
Copy link
Member

Wouldn't the SA stack walk fail when attaching to a core dump from an earlier JVM that does not exclude nulls?

@plummercj
Copy link
Contributor

Wouldn't the SA stack walk fail when attaching to a core dump from an earlier JVM that does not exclude nulls?

I'm not sure how important compatibility is with older JVMs. SA is capable of (if properly maintained) connecting to any JVM. You'll see signs of that in the code:

// The threadStatus is only present starting in 1.5
if (threadStatusField != null) {
  Oop holderOop = threadHolderField.getValue(threadOop);
  return (int) threadStatusField.getValue(holderOop);
} else {
  // All we can easily figure out is if it is alive, but that is
  // enough info for a valid unknown status.
  ...
}

However, attempts like this one are ancient, and certainly this one should be removed. There's probably a 100 other reasons why attaching to a 1.5 JVM won't work.

We don't do any compatibility testing with older JVMs. For the most part I'd say SA users should always match SA with the version of the JVM they are targeting. I'd be interested in hearing if anyone feels otherwise.

Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding the SA and debug.cpp for debugging this.

src/hotspot/share/code/compressedStream.cpp Show resolved Hide resolved

// T.I.L.
// $ sh ./configure ... --with-gtest=<...>/googletest ...
// $ make exploded-test TEST=gtest:unsigned5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does T.I.L mean? This isn't the only way to run this test, so don't include this.

src/hotspot/share/utilities/unsigned5.cpp Outdated Show resolved Hide resolved
}

// returns the encoded byte length of an unsigned 32-bit int
static constexpr int encoded_length(uint32_t value) {
Copy link
Contributor

@coleenp coleenp Sep 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should all of these functions be private?

Writer(const ARR& array)
: _array(const_cast<ARR&>(array)), _limit_ptr(NULL)
// note: if _limit_ptr is NULL, the ARR& is never reassigned
{ limit_init(); _position = 0; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent needed.


// reports the largest uint32_t value that can be encoded using len bytes
// len must be in the range [1..5]
static constexpr uint32_t max_encoded_in_length(uint32_t len) {
Copy link
Contributor

@coleenp coleenp Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few of the classes make the gtest test a friend so that functions that are internal to unsigned5 can be made private. The test would have to call these functions in a class though, and not directly in the test cases. Maybe this is ok.

Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, for the current CompressedStream and for upcoming changes.

@openjdk
Copy link

openjdk bot commented Sep 7, 2022

@rose00 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8292758: put support for UNSIGNED5 format into its own header file

Reviewed-by: dlong, coleenp

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 78 new commits pushed to the master branch:

  • fc5f97f: 8293474: RISC-V: Unify the way of moving function pointer
  • 2d13f53: 8293512: ProblemList serviceability/tmtools/jstat/GcNewTest.java in -Xcomp mode
  • f84386c: 8293182: Improve testing of CDS archive heap
  • 51de765: 8283010: serviceability/sa/ClhsdbThread.java failed with "'Base of Stack:' missing from stdout/stderr "
  • 8a48965: 8293514: ProblemList gc/metaspace/TestMetaspacePerfCounters.java#Epsilon-64 on all platforms
  • 1e031e6: 8293232: Fix race condition in pkcs11 SessionManager
  • 1080c4e: 8293508: ProblemList gc/metaspace/TestMetaspacePerfCounters.java#Epsilon-64
  • aff9a69: 8283224: Remove THREAD_NOT_ALIVE from possible JDWP error codes
  • 76df73b: 8293456: runtime/os/TestTracePageSizes.java sub-tests fail with "AssertionError: No memory range found for address: NNNN"
  • 32c7b62: 8293146: Strict DateTimeFormatter fails to report an invalid week 53
  • ... and 68 more: https://git.openjdk.org/jdk/compare/0fb9469d93bffd662848b63792406717f7b4ec0d...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 7, 2022
@rose00
Copy link
Contributor Author

rose00 commented Sep 8, 2022

@dean-long and @coleenp Thank you for the reviews.

The public functions in UNSIGNED5 are not internal but are designed to be useful by themselves. That is why the gtest tests them separately. For example, check_length is used inside the Reader but if you are building your own reader-like logic, you might want to use it to avoid buffer overflows. Likewise fits_in_limit is used by the sizing logic of write_uint_grow but if you are rolling your own auto-grow logic, you would want to call that function on its own, or maybe encoded_length. The fancy API points like Reader and Sizer and write_uint_grow are not intended to be primitives, but rather best practices for using the static functions, which are the real primitives.

Testing notes:

This change passes a targeted test that runs test/hotspot/jtreg/compiler/c2/Test6*.java (about 50 tests) with CONF=fastdebug and JTREG="JAVA_OPTIONS=-Xcomp -XX:+TraceDeoptimization". That run is chosen to deoptimize many times (about 39k), so as to exercise the pre-existing stack walk logic, which relies on compressed streams. Fault injection confirms that any JVM run crashes immediately if there is a bug in the encoding.

It also passes tiers 1/2/3 (in a slightly earlier version) and all gtests (in the latest version).

@rose00
Copy link
Contributor Author

rose00 commented Sep 8, 2022

/integrate

@openjdk
Copy link

openjdk bot commented Sep 8, 2022

Going to push as commit 8d3399b.
Since your change was applied there have been 80 commits pushed to the master branch:

  • 6677227: 8293497: Build failure due to MaxVectorSize was not declared when C2 is disabled after JDK-8293254
  • 986b834: 8293489: Accept CAs with BasicConstraints without pathLenConstraint
  • fc5f97f: 8293474: RISC-V: Unify the way of moving function pointer
  • 2d13f53: 8293512: ProblemList serviceability/tmtools/jstat/GcNewTest.java in -Xcomp mode
  • f84386c: 8293182: Improve testing of CDS archive heap
  • 51de765: 8283010: serviceability/sa/ClhsdbThread.java failed with "'Base of Stack:' missing from stdout/stderr "
  • 8a48965: 8293514: ProblemList gc/metaspace/TestMetaspacePerfCounters.java#Epsilon-64 on all platforms
  • 1e031e6: 8293232: Fix race condition in pkcs11 SessionManager
  • 1080c4e: 8293508: ProblemList gc/metaspace/TestMetaspacePerfCounters.java#Epsilon-64
  • aff9a69: 8283224: Remove THREAD_NOT_ALIVE from possible JDWP error codes
  • ... and 70 more: https://git.openjdk.org/jdk/compare/0fb9469d93bffd662848b63792406717f7b4ec0d...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 8, 2022
@openjdk openjdk bot closed this Sep 8, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 8, 2022
@openjdk
Copy link

openjdk bot commented Sep 8, 2022

@rose00 Pushed as commit 8d3399b.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@rose00 rose00 deleted the compressed-stream branch September 8, 2022 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
4 participants