Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8256811: Delayed/missed jdwp class unloading events #635

Closed
wants to merge 4 commits into from

Conversation

adinn
Copy link
Contributor

@adinn adinn commented Aug 17, 2022

Backport of patch to fix class unloading event notification.

This backport was originally developed by @zhengyu123 and involved some minor changes to the original patch.

n.b. the original patch required a follow up fix (cf JDK-8290908) which also needs backporting after this has been included.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8256811: Delayed/missed jdwp class unloading events

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk17u-dev pull/635/head:pull/635
$ git checkout pull/635

Update a local copy of the PR:
$ git checkout pull/635
$ git pull https://git.openjdk.org/jdk17u-dev pull/635/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 635

View PR using the GUI difftool:
$ git pr show -t 635

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk17u-dev/pull/635.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 17, 2022

👋 Welcome back adinn! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot changed the title Backport 54854d9300479c22c416fd9d2fdb5c29fc1884bc 8256811: Delayed/missed jdwp class unloading events Aug 17, 2022
@openjdk
Copy link

openjdk bot commented Aug 17, 2022

This backport pull request has now been updated with issue from the original commit.

@openjdk openjdk bot added backport rfr Pull request is ready for review labels Aug 17, 2022
@mlbridge
Copy link

mlbridge bot commented Aug 17, 2022

Webrevs

@RealCLanger
Copy link
Contributor

RealCLanger commented Aug 17, 2022

@adinn: could you please enable GitHub actions on your repository and trigger a run via the workflow-dispatch event? Same goes for #636. Thanks!

You might as well merge master into your branch and push it after you've enabled GHA. Then we'll get the run automatically.

@zhengyu123
Copy link
Contributor

This fix results a couple of followup CRs: JDK-8290908 and JDK-8291456. JDK-8291456 has yet been resolved, probably this backport should wait till JDK-8291456 is resolved.

@adinn
Copy link
Contributor Author

adinn commented Aug 18, 2022

@RealCLanger The settings for my repo look like they already enable actions. Under Actions/General/Actions Permissions the radio button is selected for Allow all actions and reusable workflows. Do I need to set something else?

What do you mean by 'trigger a run via the workflow-dispatch event'?

@adinn
Copy link
Contributor Author

adinn commented Aug 18, 2022

@zhengyu123 Thanks for the heads-up regarding JDK-8291456. I was not aware there was another dependent issue.

@adinn
Copy link
Contributor Author

adinn commented Aug 18, 2022

@RealCLanger Ah, ok, I think I see now what you mean. I merged master and pushed and I can see that workflows are now running for the updated branch.

Copy link
Member

@gnu-andrew gnu-andrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backport mostly looks clean. The changes in JvmtiExport::post_object_free are appropriate, given the additions in JDK 19+ for virtual threads that are not present in 17.

I'm not 100% on the copyright additions which are not part of the original patch, as they may interfere with other backports. However, I guess they are needed at some point.

Most curious to me is this addition:

+-    JvmtiThreadState *state = thread->as_Java_thread()->jvmti_thread_state();
++    JvmtiThreadState *state = ((JavaThread*)thread)->jvmti_thread_state();

Is there a reason for altering this line? It doesn't seem related to any of the other changes, and, while the direct cast may be more performant than the method call, the existing usage with the method call seems safer.

@adinn
Copy link
Contributor Author

adinn commented Sep 7, 2022

Most curious to me is this addition:

+- JvmtiThreadState *state = thread->as_Java_thread()->jvmti_thread_state();
++ JvmtiThreadState state = ((JavaThread)thread)->jvmti_thread_state();

Is there a reason for altering this line? It doesn't seem related to any of the other changes, and, while the direct cast may be more performant than the method call, the existing usage with the method call seems safer.

No, there is no reason for it other that that it was in the patch Zhengyu originally started working on. Of course, the cast is safe because of the guard in the preceding if test. However, that's not a reason to put it into the backport. I'm not it even helps performance since as_Java_thread() just does a static cast. I'll remove it.

Copy link
Member

@gnu-andrew gnu-andrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the rest looks good.

Note that JDK-8291456 seems to mainly be about the test and has spawned two related test fixes, JDK-8292880 and JDK-8291650, which may be worth a look.

@openjdk
Copy link

openjdk bot commented Sep 12, 2022

@adinn This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8256811: Delayed/missed jdwp class unloading events

Reviewed-by: andrew

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 130 new commits pushed to the master branch:

  • 58b4363: 8251466: test/java/io/File/GetXSpace.java fails on Windows with mapped network drives.
  • 96a5e40: 8281296: Create a regression test for JDK-4515999
  • 4a49294: 8284681: compiler/c2/aarch64/TestFarJump.java fails with "RuntimeException: for CodeHeap < 250MB the far jump is expected to be encoded with a single branch instruction"
  • d4d2e34: 8280872: Reorder code cache segments to improve code density
  • d7a3a9e: 8284752: Zero does not build on Mac OS X due to missing os::current_thread_enable_wx implementation
  • b38b6d5: 8222323: ChildAlwaysOnTopTest.java fails with "RuntimeException: Failed to unset alwaysOnTop"
  • 62eb5ae: 8293826: Closed test fails after JDK-8276108 on aarch64
  • eea78cf: 6829250: Reg test: java/awt/Toolkit/ScreenInsetsTest/ScreenInsetsTest.java fails in Windows
  • 2fbc5cb: 8290711: assert(false) failed: infinite loop in PhaseIterGVN::optimize
  • 51107ab: 8257722: Improve "keytool -printcert -jarfile" output
  • ... and 120 more: https://git.openjdk.org/jdk17u-dev/compare/129484da3bd5ae1053b9275b7c706e908003e0b8...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 12, 2022
@gnu-andrew
Copy link
Member

gnu-andrew commented Sep 12, 2022

Next step is to label the bug with jdk17u-fix-request, add reasoning for the backport in the comments and wait for jdk17u-fix-yes to be added by a maintainer.

@adinn
Copy link
Contributor Author

adinn commented Sep 12, 2022

Thanks for the review @gnu-andrew

Note that JDK-8291456 seems to mainly be about the test and has spawned two related test fixes, JDK-8292880 and JDK-8291650, which may be worth a look.

Yes, I am aware of both of those follow-up issues. Although they both try to fix the test and, indeed, both appear to have lowered the chance of a failure, neither of them was 100% successful. So, I have been waiting for a proper test fix upstream (which is pending but seems not to be imminent). I was also hoping that I could maybe help out with an upstream fix but have not had the time to do so.

We could continue on that course or we could go ahead and push both these patches as is and ignore any failure in the test. Do you have any opinion on how to proceed?

@gnu-andrew
Copy link
Member

My impression from the bugs was that the code change itself is an improvement, and fewer events are missed, but it is difficult to get a reliable test which triggers class unloading and picks up all the events. I'm tending towards going ahead with backporting the four existing patches, but I expect you have a better idea of the risk than I do.

If we do this now, it'll be targeted for the January release so we still have some time to add further follow-up fixes or even revert in 17u. I don't get the impression that the main patch is flawed enough to warrant being reverted in trunk, but if we want to be extra safe, we could wait until some time after OpenJDK 20 is released next March.

@adinn
Copy link
Contributor Author

adinn commented Oct 5, 2022

The upstream work on this issue is still in progress as fixes up to now have improved but neither eradicated the problem nor even diagnosed root cause - in fact the fixes actually stumbled into a secondary issue on Windows. So, I have been hesitating before pushing something that might not remove or workaround the problem.

It looks like Chris Plummer has identified the source of the Windows issue and provided a workaround for it (see https://bugs.openjdk.org/browse/JDK-8292879 and associated PR openjdk/jdk#10519). So, I think it is probably worth proceeding with all the fixes up to and including this one.

@adinn
Copy link
Contributor Author

adinn commented Oct 24, 2022

@zhengyu123 @gnu-andrew

I think we are now ready to queue and push this backport to jdk17u (JDK-8256811) as long as it is followed up by backports for these follow-up JIRAs. I propose the following sequence:

  • JDK-8256811 : Zhengyu's original fix
  • JDK-8290908 : Correct TCK failure from last patch
  • JDK-8291456 : Flush pending ObjectFree events at VM shutdown
  • JDK-8291650 : Add delay to ClassUnloadEventTest to allow ObjectFree events to propagate
  • JDK-8292880 : Enable debuggee logging in ClassUnloadEventTest
  • JDK-8292879 : Add delay to skew race between ClassPrepare event propagation and GC request initiation in debuggee

I already have a dependent PR queued for JDK-8290908 and am preparing further dependent PRs for the next 4 JIRAs.

A few comments:

JDK-8291456 is required to ensure that events are not missed but it is not enough in itself to fix the test

Two of the three following fixes to the test code (JDK-8291650, JDK-JDK-8292879) also do not guarantee to remove the problem since all they do is add delays to skew races in favour of the desired outcome. However, absent any proper fix to these timing problems this is the best option we have to remedy the failure.

Strictly, JDK-8292880 is not needed. However, it should not cause any harm and it will be very useful for diagnosis if we run into problems later on which demand a proper fix.

@adinn
Copy link
Contributor Author

adinn commented Oct 24, 2022

Correction: JDK-8291456 needs to be added after JDK-8291650 as it removes the delay added by JDK-8291650.

@adinn
Copy link
Contributor Author

adinn commented Oct 24, 2022

Aargh! These were all pushed out of order.
The required order is:

  • JDK-8256811 : Zhengyu's original fix
  • JDK-8290908 : Correct TCK failure from last patch
  • JDK-8291650 : Add delay to ClassUnloadEventTest to allow ObjectFree events to propagate
  • JDK-8292880 : Enable debuggee logging in ClassUnloadEventTest
  • JDK-8292879 : Add delay to skew race between ClassPrepare event propagation and GC request initiation in debuggee
  • JDK-8291456 : Flush pending ObjectFree events at VM shutdown

@GoeLin
Copy link
Member

GoeLin commented Oct 27, 2022

@adinn, It seems they all miss the Fix request comment in the JBS issue.
While the other ones are trivial, for this one please state reason and risk. Thanks.

@adinn
Copy link
Contributor Author

adinn commented Oct 27, 2022

Fix Request

The failure to correctly report class unloading events both in the OpenJDK tests and in actual deployments is causing spurious failures for the eclipse team's testing regime. This initial patch and the follow up patches mentioned above were adopted as remedies in upstream releases. Although they are still only an ad hoc fix, do in practice avoid the OpenJDK test failures and should produce the same results for other users.

Some of the upstream patches were superseded by later ones in order to arrive at a satisfactory solution. However, rather than create a new patch to arrive at the same end result it seems best to backport them all in sequence. They all apply cleanly apart from the first one.

One patch merely enables debug output in a relevant test. That has been retained, 1) to maintain patch compatibility with upstream, 2) to ensure follow-on patches apply cleanly, 3) because it will be useful if the ad hoc nature of the current fixes means further problems are encountered and 4) because its effect is minimal, being local only to the test and only resulting in a small amount more info in the test output file.

The overall risk is low because these patches only modify the class unloading code.

The changes have been tested by running the relevant class unloading unit tests. This does not guarantee the problem is fixed (because it is intermittent). However, it does indicate that the patches are not flawed.

@adinn
Copy link
Contributor Author

adinn commented Oct 27, 2022

@GoeLin Apologies for forgetting to include Fix request comments. See above for a comment that covers the full set of patches.

@GoeLin
Copy link
Member

GoeLin commented Oct 27, 2022

Thanks, the reasoning is fine, but it usually goes into the JBS issue.
I copied it there.

@gnu-andrew
Copy link
Member

Thanks both. I've approved 8256811. I see @GoeLin already approved the others, so this should all be good to go.

As they are clean, it should be possible to integrate the others once testing on the PR completes successfully.

@adinn
Copy link
Contributor Author

adinn commented Oct 27, 2022

Thanks, the reasoning is fine, but it usually goes into the JBS issue.
I copied it there.

@GoeLin Apologies for my all too obviously rusty knowledge of the backport process and many thanks for fixing up my mistakes. Clearly, I have been spending too much time working on the GraalVM project.

@gnu-andrew Thanks for checking all the PRs. The test failures on this PR don't seem to be anything to do with the changes in the patch. The Windows one appears to be a (transient?) download failure during the gate run. The Mac failures are all timeouts (in TestUnsignedByteCompare1, TestTrichotomyExpressions and TestLinearScanOrderMain, none of which will can anything to do with this change). Is another run needed? If so is there a simple way to force that to happen?

@gnu-andrew
Copy link
Member

I agree with your analysis and I don't think another run is necessary. I was thinking more of the follow-on PRs, as, being clean, they don't need to wait on a review. I now see you've already filed these, and, given that the tests on those pass (https://github.com/adinn/jdk17u-dev/actions/) with this change included, that seems to be enough.

For reference, you should be able to re-run the job from the button on the right-hand side of https://github.com/adinn/jdk17u-dev/actions/runs/3006655779

@adinn
Copy link
Contributor Author

adinn commented Oct 28, 2022

@gnu-andrew @GoeLin Ok, so does that mean I now proceed to integrate this PR and the dependent ones?

@GoeLin
Copy link
Member

GoeLin commented Oct 28, 2022

Yes, feel free!
I had also checked the tests, to me they look unrelated, too.
You must give the bots some time to work on the dependent PRs.
They will change the branch to compare to to master etc.
Sometimes they ask for a trivial but annoying resolve.

@adinn
Copy link
Contributor Author

adinn commented Oct 28, 2022

/integrate

@openjdk
Copy link

openjdk bot commented Oct 28, 2022

Going to push as commit b2061e0.
Since your change was applied there have been 186 commits pushed to the master branch:

  • 5b6a428: 8292899: CustomTzIDCheckDST.java testcase failed on AIX platform
  • 2e9ef14: 8272123: Problem list 4 jtreg tests which regularly fail on macos-aarch64
  • f65bf06: 8267138: Stray suffix when starting gtests via GTestWrapper.java
  • 996c0e9: 8292778: EncodingSupport_md.c convertUtf8ToPlatformString wrong placing of free
  • 5de8b9c: 8295412: support latest VS2022 MSC_VER in abstract_vm_version.cpp
  • 271d85f: 8294837: unify Windows 2019 version check in os_windows and java_props_md
  • 4df92e1: 8294840: langtools OptionalDependencyTest.java use File.pathSeparator
  • 5ff0ebd: 8277881: Missing SessionID in TLS1.3 resumption in compatibility mode
  • bf866a4: 8269571: NMT should print total malloc bytes and invocation count
  • b6f8f49: 8293701: jdeps InverseDepsAnalyzer runs into NoSuchElementException: No value present
  • ... and 176 more: https://git.openjdk.org/jdk17u-dev/compare/129484da3bd5ae1053b9275b7c706e908003e0b8...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 28, 2022
@openjdk openjdk bot closed this Oct 28, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 28, 2022
@openjdk
Copy link

openjdk bot commented Oct 28, 2022

@adinn Pushed as commit b2061e0.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport integrated Pull request has been integrated
5 participants