Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8352088: Call of com.sun.jdi.ThreadReference.threadGroups() can lock up target VM #24236

Closed
wants to merge 10 commits into from

Conversation

plummercj
Copy link
Contributor

@plummercj plummercj commented Mar 25, 2025

Calling ThreadGroupReference.groups() from an event handler can cause a deadlock. Details in first comment. Tested with :jdk_lang on all supported platforms and tier1, tier2, tier3, and tier5 svc testing.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8352088: Call of com.sun.jdi.ThreadReference.threadGroups() can lock up target VM (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24236/head:pull/24236
$ git checkout pull/24236

Update a local copy of the PR:
$ git checkout pull/24236
$ git pull https://git.openjdk.org/jdk.git pull/24236/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24236

View PR using the GUI difftool:
$ git pr show -t 24236

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24236.diff

Using Webrev

Link to Webrev Comment

Sorry, something went wrong.

Verified

This commit was signed with the committer’s verified signature.
minborg Per-Ake Minborg
@bridgekeeper
Copy link

bridgekeeper bot commented Mar 25, 2025

👋 Welcome back cjplummer! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 25, 2025

@plummercj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8352088: Call of com.sun.jdi.ThreadReference.threadGroups() can lock up target VM

Reviewed-by: alanb, jpai, sspitsyn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 273 new commits pushed to the master branch:

  • 15d36ee: 8353330: Test runtime/cds/appcds/SignedJar.java fails in CDSHeapVerifier
  • e6fe249: 8323100: com/sun/tools/attach/StartManagementAgent.java failed with "WaitForSingleObject failed"
  • 096e70d: 8352437: Support --add-exports with -XX:+AOTClassLinking
  • 6970cf6: 8352775: JVM crashes with -XX:AOTMode=create -XX:+UseZGC
  • afcad8c: 5043343: FileImageInputStream and FileImageOutputStream do not properly track streamPos for RandomAccessFile
  • 6891490: 8353324: Clean up of comments and import after 8319192
  • 07fd666: 8342984: Bump minimum boot jdk to JDK 24
  • 6a46d55: 8353129: CDS ArchiveRelocation tests fail after JDK-8325132
  • 19eabaf: 8353227: JFR: Prepare tests for strong parser validation
  • 564066d: 8353118: Deprecate the use of java.locale.useOldISOCodes system property
  • ... and 263 more: https://git.openjdk.org/jdk/compare/a347ecdedc098bd23598ba6acf28d77db01be066...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot changed the title 8352773 8352773: JVMTI should disable events during java upcalls Mar 25, 2025
@openjdk
Copy link

openjdk bot commented Mar 25, 2025

@plummercj The following labels will be automatically applied to this pull request:

  • core-libs
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability serviceability-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Mar 25, 2025

Verified

This commit was signed with the committer’s verified signature.
minborg Per-Ake Minborg

Verified

This commit was signed with the committer’s verified signature.
minborg Per-Ake Minborg

Verified

This commit was signed with the committer’s verified signature.
minborg Per-Ake Minborg
@plummercj
Copy link
Contributor Author

The reason is because this JDI API eventually ends up with JVMTI doing a java upcall to ThreadGroup.subgroupsAsArray(), which can trigger class loading the first time it is called. Thjis results in a ClassPrepareEvent that the debugger will get, but not process because its event handler thread is blocked on the ThreadGroupReference.groups(), so we have a deadlock.

The workaround is to make ThreadGroupReference.groups() not trigger any class loading. The fix is subtle. Before the fix, the java stack trace at the time of the ClasPrepareEvent looks like:

java.util.Arrays.copyOf(java.lang.Object[], int, java.lang.Class) bci:18 line:3513
java.util.ArrayList.toArray(java.lang.Object[]) bci:21 line:401
java.lang.ThreadGroup.subgroupsAsArray() bci:10 line:751

This is ThreadGroup.subgroupsAsArray:

private ThreadGroup[] subgroupsAsArray() {
List groups = synchronizedSubgroups();
return groups.toArray(new ThreadGroup[0]);
}

This is ArrayList.toArray():

public T[] toArray(T[] a) {
if (a.length < size)
// Make a new array of a's runtime type, but my contents:
return (T[]) Arrays.copyOf(elementData, size, a.getClass()); <--- Line 401
System.arraycopy(elementData, 0, a, 0, size);
if (a.length > size)
a[size] = null;
return a;
}

And this is Arrays.copyOf():

public static <T,U> T[] copyOf(U[] original, int newLength, Class<? extends T[]> newType) {
@SuppressWarnings("unchecked")
T[] copy = ((Object)newType == (Object)Object[].class)
? (T[]) new Object[newLength]
: (T[]) Array.newInstance(newType.getComponentType(), newLength);
System.arraycopy(original, 0, copy, 0,
Math.min(original.length, newLength));
return copy;
}

Line 3513 is actually the return statement, so this seems incorrect of by a bit. Adding some tracing of ClassPrepareEvents shows that we are currently handling the following:

cbClassPrepare: java.lang.reflect.Array

So it looks like we took the Array.newInstance() path, which triggered class loading of java.lang.reflect.Array. After the fix we never end up in Arrays.copyOf(). Instead ArrayList.toArray() calls System.arraycopy() directly, avoiding any class loading..

@plummercj plummercj marked this pull request as ready for review March 26, 2025 00:00
@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 26, 2025
@mlbridge
Copy link

mlbridge bot commented Mar 26, 2025

@plummercj
Copy link
Contributor Author

Sorry, I used the wrong bugID for this PR

@plummercj plummercj changed the title 8352773: JVMTI should disable events during java upcalls 8352088 Mar 26, 2025
@openjdk openjdk bot changed the title 8352088 8352088: Call of com.sun.jdi.ThreadReference.threadGroups() can lock up target VM Mar 26, 2025
@plummercj
Copy link
Contributor Author

Ok, bugID is correct now.

@@ -661,7 +661,7 @@ private List<ThreadGroup> synchronizedSubgroups() {
*/
private ThreadGroup[] subgroupsAsArray() {
List<ThreadGroup> groups = synchronizedSubgroups();
return groups.toArray(new ThreadGroup[0]);
return groups.toArray(new ThreadGroup[groups.size()]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Chris, would a comment on this line be necessary to prevent future (IDE encouraged [1]) refactoring of this code back to toArray(new ThreadGroup[0])?

[1]
toArray

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think about using the simple code + loop from the JBS comment? That would remove any discussion about toArray and any concerns that it would load classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding adding a comment, I wasn't too sure what the comment should say because once you start down that path, it's hard to not end up with too much detail, and then things just get very wordy. Since we have a test that will fail if this is ever modified to cause classloading again, at least that should help prevent undoing this fix. However, the one downside of the test is that I can only get it to fail with non-debug builds, so it may not be caught early.

Regarding using the loop, that's probably a more bullet proof option, although there is still no guarantee that someday someone won't look at it and say "I can just use toArray() here". There's probably no escaping an AI code cleaning bot coming to that conclusion sometime in the near future. Then we're back to adding a comment to make sure it isn't changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that this subgroupsAsArray() private method is only used by JVMTI and a comment exists about that on that method:

/**
 * Returns a snapshot of the subgroups as an array, used by JVMTI.
 */

So, irrespective of what solution we choose, perhaps we could add another brief sentence to that comment, something like the following?

/**
 * Returns a snapshot of the subgroups as an array, used by JVMTI.
 * Care should be taken not to trigger any classloading in this method.
 */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to the code that does the array copy inline rather than rely on toArray(). I also added a comment. I intentionally kept is short and simple. I was tempted to say something about toArray(), reference the CR, mention JDI and the debug agent, etc, but that's heading down the "too wordy" path that I wanted to avoid. Readers can "git blame" to find out why it was done.

Verified

This commit was signed with the committer’s verified signature.
minborg Per-Ake Minborg
@@ -658,10 +658,16 @@ private List<ThreadGroup> synchronizedSubgroups() {

/**
* Returns a snapshot of the subgroups as an array, used by JVMTI.
* WARNING: Make sure this API does not trigger any class loading.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to say "method" rather than "API" as this is all internal.


public class EarlyThreadGroupChildrenTest extends TestScaffold {
ClassType targetClass;
ThreadReference mainThread;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these used? Maybe copied over from another test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, copied from another test. I'll remove them.

@Override
public void classPrepared(ClassPrepareEvent event) {
try {
++classPreparedCount;
Copy link
Contributor

@AlanBateman AlanBateman Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a long time since I looked at at test based on TestScaffold. Would I be correct to say that it creates an event handler thread to remove events from the event queue and dispatch them to the registered listeners? Only asking as it's not immediately clear if the increment of the event count is correct. I think TestScaffold uses one thread but would need to dig into the test infra to be sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, one EventHandler thread dispatching to all listeners. More details below:

There is an interface called TargetListener which declares all the event callbacks. TargetAdapter implements TargetListener and provides and empty implementation for each callback. TestScaffold extends TargetAdapter, and each test extends TestScaffold. So every test has default empty implementations of all the callbacks, and can override some (there are a couple of overrides in this tests).

A TargetListener instance gets events by adding itself as a listener using TestScaffold.addListener(). You can see that this test is doing that early on so it gets events.

In addition, some of the TestScaffold APIs will create an instance of a TargetAdapter with one or more event callback overrides, and register that listener. For example, see waitForRequestedEvent(). So it is possible to have 2 or more listeners registered, although as I just pointed out in JDK-8352759, that can be problematic at times.

TestScaffold always has an EventHandler thread active (and only one). It sits in a loop calling EventQueue.remove(). For any each EventSet received it will iterate over each Event in the EventSet, and for each registered listener, it will call the listener's event callback for that Event.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, classPreparedCount is somewhat unneeded legacy code. Previously I had limited it to 50 events to avoid the VMDisconnected exception, but I got rid of the count check and instead now catch the exception.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the expanded comment, it just means that classPreparedCount doesn't need to be volatile, it's only accessed by one thread. For this test it doesn't matter of course.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 26, 2025

Verified

This commit was signed with the committer’s verified signature.
minborg Per-Ake Minborg
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Mar 26, 2025

Verified

This commit was signed with the committer’s verified signature.
minborg Per-Ake Minborg
Copy link
Member

@jaikiran jaikiran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Chris for adding the code comment. The source change and the code comment look good to me.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 27, 2025
@@ -658,10 +658,16 @@ private List<ThreadGroup> synchronizedSubgroups() {

/**
* Returns a snapshot of the subgroups as an array, used by JVMTI.
* WARNING: Make sure this method does not trigger any class loading.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say why briefly here? Because it's called during thread startup and may deadlock with ClassLoad events. I see the comment in the test but not sure if someone will find it, except through git blame, which might be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to avoid getting too much into the weeds with the explanation. There's always that one extra bit of information that would be useful, but after adding that then there's another, and you need to decide when to stop. How about the following (and I really don't want to go any further than this):

 * WARNING: Make sure this method does not trigger any class loading,
 * because a ClassPrepareEvent can deadlock the debugger and debug agent.

Copy link
Contributor

@sspitsyn sspitsyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks okay to me. I guess new test fails with a deadlock without your fix.

@plummercj
Copy link
Contributor Author

The fix looks okay to me. I guess new test fails with a deadlock without your fix.

Yes. That is the failure mode. There is no detecting the failure other than a timeout.

Verified

This commit was signed with the committer’s verified signature.
minborg Per-Ake Minborg
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 28, 2025
@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 28, 2025
@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 29, 2025
Copy link
Member

@jaikiran jaikiran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looks good.

@@ -658,10 +658,17 @@ private List<ThreadGroup> synchronizedSubgroups() {

/**
* Returns a snapshot of the subgroups as an array, used by JVMTI.
* WARNING: Make sure this method does not trigger any class loading,
* because a ClassPrepareEvent can deadlock the debugger and debug agent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is okay but if you are doing any further edits then you could change this to say that a callback for the ClassPrepare event can deadlock (the event is "ClassPrepare" in the JVM TI spec, I realize the "ClassPrepareEvent" may have come from JDI).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not use "callback" here. Even the JVMTI spec mostly documents in terms of events, not callbacks. For example, for the ClassLoad callback it says "For most purposes the ClassPrepare event will be more useful". Maybe as a compromise I can change "ClassPrepareEvent" to "ClassPrepare event".

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Apr 1, 2025
@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 1, 2025
@plummercj
Copy link
Contributor Author

Thanks reviews Sergei, Jai, and Alan

/integrate

@openjdk
Copy link

openjdk bot commented Apr 2, 2025

Going to push as commit cc870d4.
Since your change was applied there have been 287 commits pushed to the master branch:

  • d979bd8: 8344671: Few JFR streaming tests fail with application not alive error on MacOS 15
  • 49cb7aa: 8339114: DaCapo xalan performance with -XX:+UseObjectMonitorTable
  • d32ff13: 8353117: Crash: assert(id >= ThreadIdentifier::initial() && id < ThreadIdentifier::current()) failed: must be reasonable)
  • a0677d9: 8353263: Parallel: Remove locking in PSOldGen::resize
  • 8608b16: 8348887: Create IR framework test for JDK-8347997
  • 23eb648: 8353545: Improve debug info for StartOptionTest
  • 4f97c4c: 8349211: Add support for intrusive trees to the utilities red-black tree
  • c9baa8a: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead
  • b80b04d: 8353329: Small memory leak when create GrowableArray with initial size 0
  • 4a50778: 8353458: Don't pass -Wno-format-nonliteral to CFLAGS
  • ... and 277 more: https://git.openjdk.org/jdk/compare/a347ecdedc098bd23598ba6acf28d77db01be066...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Apr 2, 2025
@openjdk openjdk bot closed this Apr 2, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 2, 2025
@openjdk
Copy link

openjdk bot commented Apr 2, 2025

@plummercj Pushed as commit cc870d4.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

None yet

5 participants