-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8256811: Delayed/missed jdwp class unloading events #9168
8256811: Delayed/missed jdwp class unloading events #9168
Conversation
👋 Welcome back zgu! A progress list of the required criteria for merging this PR into |
@zhengyu123 The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue I have with this fix is that is doesn't fix the "delay" issue. It helps it, and ensures the CLASS_UNLOAD events are sent before the debuggee exits, but it doesn't guarantee that they are sent in a time manner.
Your test is somewhat contrived in that the debuggee only runs for as long as it takes to look up all the test classes. Then it exits, generating a VM_DEATH, and guaranteeing that by that point all CLASS_UNLOAD events have been sent (and without delay). However, we also have the issue of a JDK-8257705, which is a test that times out waiting for a CLASS_UNLOAD event, because there isn't another event to trigger sending it. Since the debuggee does not exit until after the test is done (and has timed out), the VM_DEATH comes too late.
JDK-8282459 offers other suggestions for the fix, albeit more complex than your. Please consider the one mentioned in the first comment, which is basically adding a watchdog thread.
Yes, it improves timing in some cases, but not for all.
I could not seem to find the other solution in JDK-8256811 comments. |
Sorry, I meant JDK-8282459. I fixed my comment above. |
|
/label add hotspot-runtime |
Thanks.
The function |
Your comment is pending because you clicked on "Start a review" instead of "Add single comment", and then never finished the review by clicking on "Review changes". |
I never click "Start a review". I tried to inline my comment related to the source fragment. |
@sspitsyn Good catch! I believe it should be synchronized if But it raises a question: why it needs Thanks.
came in from JDK-8212879, could you comment why it is needed? Thanks! |
The following RN is related to this question: |
That wasn't quite what I was getting at. My comment / question was much more general. If you call SetEventNotificationMode() to disable a specific event, once it returns are you guaranteed that no more events will be delivered. This means that if JVMTI was in the middle of calling the event handler at the same time that SetEventNotificationMode() is called, then before returning, SetEventNotificationMode() will block until the event handler returns. |
There is no such general guarantee with calls to the |
This is a reasonable question. I do not remember the exact motivation to add this call. |
Hi, it was added because some tests were failing. With the concurrent posting, we were missing events and flushing them made the tests pass. |
Thanks, @coleenp I removed the block and passed |
I'd suggest to run additionally: |
@zhengyu123 try the tests with stress shendandoah flags if you have them. I think I had a lot of failures with ZGC and ZCG ZFragmentationLimit=0.1 or something like that. |
|
This becomes a problem when the test tries to free a raw monitor on exit. It first disable events, then it frees the raw monitor that the even handler synchronizes on. However, it's possible that an event can be in flight when doing this (an event that was triggered before disabling events, but not fully delivered yet). So now you have a chicken-and-egg problem because you need a 2nd raw monitor to synchronize around freeing the 1st raw monitor. So the end result is you can't always safely cleanup on test exit the way we currently are. I think we are getting away with it because the set of events we expect is fixed or predictable. So once the test decides it is done, it won't get more events, even if they are not disabled. Thus we can free the raw monitor after disabling events without worry of some spurious event causing an untimely entry into the event handler. However, I thought there was one test you fixed where we had this exact problem, and you fixed it by not freeing the raw monitor on exit. ThreadStart and ThreadDeath events come to mind, since these are now notoriously unpredictable. |
Tested on Linux x86_64 (fastdebug)
[1] I pushed update that removed the block, could you please run it through Oracle's internal tests? Thanks! |
I fixed it this way for simplicity. However, it is still possible to correctly sync if necessary. |
I think, you can run all the needed tests by yourself. |
My mach5 debug test results looks good. Sorry for initial posting closed mach5 links. Tested without extra flags: With flags Approved the fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I believe you. Thanks for all the testing.
Thanks, @plummercj @sspitsyn and @coleenp |
/integrate |
Going to push as commit 54854d9.
Your commit was automatically rebased without conflicts. |
@zhengyu123 Pushed as commit 54854d9. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
I'm afraid that the potential problem can be highly intermittent and did not show up in my testing. |
Currently, jdi only check and process class unloading event when it detects a new GC cycle.
After JDK-8212879, posting class events can overlap with GC finish event, that results, sometimes, it only captures partial or even empty unloaded class list. The pending list usually can be flushed out at next GC cycle. But for the classes unloaded during the last GC cycle, the class unloading events may lost forever.
This patch checks and processes class unloading events unconditionally, suggested by @kbarrett, the last pending unloaded class list can be flushed by other events, such as
VM_DEATH
.It also performs
commonRef_compact()
only when there are classes unloaded.New test failed about 20% without patch, none with patch.
Update
There are significant changes from early patch.
The new approach:
No longer removing dead objects and post events on VM thread. I believe it was implemented this way to workaround the following issues:
_in_native
stateJvmtiTagMap
while walking, when transition to_in_native
stateThe new solution breaks up into two steps:
_in_native
state and post object free events in one batchThis way, JDI event handler can process object free events upon arrivals without delay.
Update 2
There is a comment for
JvmtiTagMap::check_hashmap()
that statesObjectFree
events are posted before heap walks.Now, the events are actually posted after heap walks, but I don't think it makes any material material difference.
Even the events are posted earlier in old code, but they are only processed after next GC cycle.
Progress
Issue
Reviewers
Contributors
<cjplummer@openjdk.org>
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9168/head:pull/9168
$ git checkout pull/9168
Update a local copy of the PR:
$ git checkout pull/9168
$ git pull https://git.openjdk.org/jdk pull/9168/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 9168
View PR using the GUI difftool:
$ git pr show -t 9168
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9168.diff