Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8287281: adjust guarantee in Handshake::execute for the case of target thread being current #8992

Closed
wants to merge 10 commits into from

Conversation

jdksjolen
Copy link
Contributor

@jdksjolen jdksjolen commented Jun 2, 2022

Please review this PR for fixing JDK-8287281.

If a thread is handshake safe we immediately execute the closure, instead of going through the regular Handshake process.

Finally: Should VirtualThreadGetThreadClosure and its do_thread() body be inlined instead? We can do this in this PR, imho, but I'm hoping to get some input on this.

Passes tier1. Running tier2-5.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8287281: adjust guarantee in Handshake::execute for the case of target thread being current

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/8992/head:pull/8992
$ git checkout pull/8992

Update a local copy of the PR:
$ git checkout pull/8992
$ git pull https://git.openjdk.org/jdk pull/8992/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 8992

View PR using the GUI difftool:
$ git pr show -t 8992

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/8992.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 2, 2022

👋 Welcome back jdksjolen! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 2, 2022
@openjdk
Copy link

openjdk bot commented Jun 2, 2022

@jdksjolen The following labels will be automatically applied to this pull request:

  • hotspot
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability serviceability-dev@openjdk.org hotspot hotspot-dev@openjdk.org labels Jun 2, 2022
Copy link
Contributor

@robehn robehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, thanks for fixing.

@openjdk
Copy link

openjdk bot commented Jun 2, 2022

@jdksjolen This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8287281: adjust guarantee in Handshake::execute for the case of target thread being current

Reviewed-by: rehn, pchilanomate, dholmes, dcubed

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 346 new commits pushed to the master branch:

  • 2728770: 8288589: Files.readString ignores encoding errors for UTF-16
  • ef17ee4: 8288515: (ch) Unnecessary use of Math.addExact() in java.nio.channels.FileLock.overlaps()
  • 72f286a: 8287580: (se) CancelledKeyException during channel registration
  • b8db0c3: 6980847: (fs) Files.copy needs to be "tuned"
  • d579916: 8288740: Change incorrect documentation for sjavac flag
  • 26c03c1: 8288719: [arm32] SafeFetch32 thumb interleaving causes random crashes
  • a802b98: 8287760: --do-not-resolve-by-default gets overwritten if --warn-if-resolved flags is used
  • bf0623b: 8286314: Trampoline not created for far runtime targets outside small CodeCache
  • 5b583e4: Merge
  • 6458ebc: 8288988: ProblemList serviceability/jvmti/vthread/ContStackDepthTest/ContStackDepthTest.java in -Xcomp mode
  • ... and 336 more: https://git.openjdk.org/jdk/compare/6e55a72f25f7273e3a8a19e0b9a97669b84808e9...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@robehn, @pchilano, @dholmes-ora, @dcubed-ojdk) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 2, 2022
@mlbridge
Copy link

mlbridge bot commented Jun 2, 2022

@jdksjolen
Copy link
Contributor Author

The tests failed and my assumption was wrong: There are other instances of handshaking with their own thread as target, We reverse the strategy and call do_thread directly in Handshake::execute.

Copy link
Contributor

@robehn robehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, looks good.

Remember to re-run the tests!

Copy link
Contributor

@pchilano pchilano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Thanks,
Patricio

@jdksjolen
Copy link
Contributor Author

Passed tier1-5.

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Jun 5, 2022
@openjdk
Copy link

openjdk bot commented Jun 5, 2022

@jdksjolen
Your change (at version bf75d4c) is now ready to be sponsored by a Committer.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Johan,

I like the idea of this, but am not clear on all the details for all possible cases - see below.

It also makes me wonder about the async case, where Handshake::execute(AsyncHandshakeClosure*, ...) never processes the handshake directly even if it is for the current thread. The async case seems to be a two phase protocol:

  1. Install async op on yourself
  2. At some later handshake state poll discover the op you previously installed.
    ??

There are a few minor nits/suggestions below as well.

Thanks.

src/hotspot/share/runtime/handshake.cpp Outdated Show resolved Hide resolved
src/hotspot/share/prims/jvmtiEnvThreadState.cpp Outdated Show resolved Hide resolved
Comment on lines +400 to +401
Handshake::execute(&op, thread);
guarantee(op.completed(), "Handshake failed. Target thread is not alive?");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I much prefer that the current-thread case is internalised by Handshake::execute now. The code creating the handshake op shouldn't have to worry about current thread or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having Handshake::execute() handle the current-thread case will certainly
allow us to make the code consistent in all the callers of Handshake::execute().

src/hotspot/share/prims/jvmtiEventController.cpp Outdated Show resolved Hide resolved
Comment on lines 355 to 357
if (target->is_handshake_safe_for(self)) {
hs_cl->do_thread(target);
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of doing this, but I can't quite convince myself that it will always be safe when the target is not the current thread. ??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we're pushing the special case handling for current-thread down
into the three parameter version of Handshake::execute(), we'll also
directly execute the closure's do_thread() function in other calls to the
three parameter version of Handshake::execute() where we didn't change
the calling code site in this patch:

  • src/hotspot/share/classfile/javaClasses.cpp: async_get_stack_trace()
  • src/hotspot/share/prims/jvmtiExtensions.cpp: GetCarrierThread()
  • src/hotspot/share/prims/whitebox.cpp: WB_HandshakeReadMonitors(), WB_HandshakeWalkStack()
  • src/hotspot/share/runtime/handshake.cpp: execute(HandshakeClosure* hs_cl, JavaThread* target)
    Of course, since the two parameter version of Handshake::execute() is
    now a changed code path, that means that all callers to the two parameter
    version of Handshake::execute() are also affected. No, I'm not going to
    list all those call sites.

This is a change in behavior and I'm not saying that this is wrong, but it's
not clear to me that the repercussions are understood and discussed in
this PR.

What I'm mumbling about here might be the same thing that @dholmes-ora is
worried about, but I'm just being more verbose about it. :-)

Copy link
Member

@dcubed-ojdk dcubed-ojdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug/PR is specifically about this block of code:

  if (tlh == nullptr) {
    guarantee(Thread::is_JavaThread_protected_by_TLH(target),
              "missing ThreadsListHandle in calling context.");
    target->handshake_state()->add_operation(&op);

and the bug makes the claim that we need to adjust this
guarantee(). Okay, but this proposed fix is indirectly changing
the guarantee() by inserting this block of code before the
guarantee():

  if (target->is_handshake_safe_for(self)) {
    hs_cl->do_thread(target);
    return;
}

so we still have the original guarantee() that checks a
specific state with respect to ThreadsListHandles and
we replace it with a check, the is_handshake_safe_for()
call, that has nothing to do with ThreadsListHandles!

The original purpose of this logic block:

  if (tlh == nullptr) {
    guarantee(Thread::is_JavaThread_protected_by_TLH(target),
              "missing ThreadsListHandle in calling context.");
    target->handshake_state()->add_operation(&op);

is to require a protecting ThreadsListHandle to be in place
somewhere in the calling context since we have not
passed in a ThreadsListHandle from the calling context.

When I added the above block of code, I intentionally
updated all of the call sites that reached the new strict
check with ThreadsListHandles. This included calls sites
where the caller was the current thread. This was an
intentional change on my part to make sure that all the
JavaThreads being operated (including current) on are
protected by ThreadsListHandles.

When the Loom project was being developed, a number
of these carefully placed ThreadsListHandles were moved
and unprotected code paths were introduced. We believe
that these unprotected code paths are safe because we
believe that they are only used by the current thread and
the current thread does not really need a ThreadsListHandle.
That might be true, but it certainly complicates the reasoning
about the code paths.

The bug talks about adjusting the guarantee() to allow the
current thread to be unprotected by a ThreadsListHandle, but
the logic that we have switched to:

  // A JavaThread can always safely operate on it self and other threads
  // can do it safely if they are the active handshaker.
  bool is_handshake_safe_for(Thread* th) const {
    return _handshake.active_handshaker() == th || this == th;
  }

does more than that. It also allows the target to be unprotected
by a ThreadsListHandle if the calling thread is the active handshaker.
I'm not (yet) convinced that is a good policy.

Comment on lines +400 to +401
Handshake::execute(&op, thread);
guarantee(op.completed(), "Handshake failed. Target thread is not alive?");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having Handshake::execute() handle the current-thread case will certainly
allow us to make the code consistent in all the callers of Handshake::execute().

src/hotspot/share/prims/jvmtiEventController.cpp Outdated Show resolved Hide resolved
Comment on lines 355 to 357
if (target->is_handshake_safe_for(self)) {
hs_cl->do_thread(target);
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we're pushing the special case handling for current-thread down
into the three parameter version of Handshake::execute(), we'll also
directly execute the closure's do_thread() function in other calls to the
three parameter version of Handshake::execute() where we didn't change
the calling code site in this patch:

  • src/hotspot/share/classfile/javaClasses.cpp: async_get_stack_trace()
  • src/hotspot/share/prims/jvmtiExtensions.cpp: GetCarrierThread()
  • src/hotspot/share/prims/whitebox.cpp: WB_HandshakeReadMonitors(), WB_HandshakeWalkStack()
  • src/hotspot/share/runtime/handshake.cpp: execute(HandshakeClosure* hs_cl, JavaThread* target)
    Of course, since the two parameter version of Handshake::execute() is
    now a changed code path, that means that all callers to the two parameter
    version of Handshake::execute() are also affected. No, I'm not going to
    list all those call sites.

This is a change in behavior and I'm not saying that this is wrong, but it's
not clear to me that the repercussions are understood and discussed in
this PR.

What I'm mumbling about here might be the same thing that @dholmes-ora is
worried about, but I'm just being more verbose about it. :-)

@openjdk openjdk bot removed the sponsor Pull request is ready to be sponsored label Jun 7, 2022
@jdksjolen
Copy link
Contributor Author

It seems that we have at least two choices here:

  1. Change the is_handshake_safe_for to current == target and be done with it.
  2. Investigate whether is_handshake_safe_for is OK to be used in this context.

Is there anything I am missing?

I'm fine with going for option 1 but unless we need to get this change in quickly (it's a P3 bug, not sure what that entails) I'd like to wait for @robehn's input.

@dholmes-ora
Copy link
Member

I have to agree with Dan. This is supposed to only be about targeting the current thread, but we are now no longer ensuring the target is protected by a TLH when the current thread is the active_handshaker. So I would vote for:

  1. Change the is_handshake_safe_for to current == target and be done with it.

@robehn
Copy link
Contributor

robehn commented Jun 13, 2022

The only way to become an active handshaker is to handshake another thread (target), when that happens we verify that target is ThreadsList safe.
Thus active handshaker is guaranteed that the target is already verified on a ThreadsList.
As long as we are the active handshake the target is blocked, i.e. target is safepoint safe.

The reason I think handshake safe is good is because we have 3 (4) cases:
1: Current != Target (Not 3 and not 4)
2: Current == Target
3: Current != Target, but already executing a handshake for target
4: Current != Target, but we are in a safepoint (still no internally handled)

@dholmes-ora
Copy link
Member

@robehn can you explain to me how the current thread can both be the active handshaker of the target and at the same time be executing another handshake with the target? This is making my head spin.

This change has deviated quite considerably from the issue that caused a bug to be filed. And Dan still has concerns that the current thread should still be protected by a TLH even if not strictly necessary. Maybe we actually need to backtrack and restore an invariant that there is always a TLH even for the current thread and fix the JVMTI code that did things differently?

@robehn
Copy link
Contributor

robehn commented Jun 13, 2022

@dholmes-ora it can't.

The point was, the code was originally is only truly tested and written for the case:
1: Current != Target

The other cases, 2-4, use to be externally handle.
There was an suggestion that 2 (current == target) should be internally handled. (let it slide pass the guarantee)

In all cases current and target (even if they are the same) must be present on some ThreadsList (e.g. main list when current == target).
I.e. they may not be terminated, e.g. since handshake operation may use handles, thus must be processes by GC.
The same goes for newly created before added to ThreadsList. (@dcubed-ojdk is correct)

So we are letting a "new" case happen when 'just' adjusting the guarantee, maybe this case works fine, I don't know.

@sspitsyn
Copy link
Contributor

Maybe we actually need to backtrack and restore an invariant that there is always a TLH even for the current thread and fix the JVMTI code that did things differently?

This will make JVMTI code unnecessarily ugly in a couple of spots.
But I'm okay with that if keeping this invariant is important.
I can help with fixing JVMTI if needed.

@jdksjolen
Copy link
Contributor Author

The scope of the ticket is precisely for the case when Thread::current() == target and as such the fix only checks for this particular case now.

@dcubed-ojdk, does this change look good to you?

@dcubed-ojdk
Copy link
Member

@jdksjolen - I've reread all the comments in the PR and the latest version of the code
and I'm okay with the latest version. Please clarify what testing has been done on this
latest version of the fix.

Copy link
Member

@dcubed-ojdk dcubed-ojdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thumbs up. Please see my query about the latest testing.

@dcubed-ojdk
Copy link
Member

Just to be clear:

@dholmes-ora wrote this:

Maybe we actually need to backtrack and restore an invariant that there is always a TLH even for the current thread and fix the JVMTI code that did things differently?

@sspitsyn wrote this:

This will make JVMTI code unnecessarily ugly in a couple of spots.
But I'm okay with that if keeping this invariant is important.
I can help with fixing JVMTI if needed.

The current version of the fix does NOT restore the invariant that there is always a TLH
even for the current thread. I'm (mostly) okay with this.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with this in current form.

I'll leave it to Dan to decide whether he thinks restoring the old TLH "invariant" should be done in a separate RFE.

Thanks.

@jdksjolen
Copy link
Contributor Author

Cheers. Feel free to sponsor this.

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Jun 23, 2022
@openjdk
Copy link

openjdk bot commented Jun 23, 2022

@jdksjolen
Your change (at version cc6736c) is now ready to be sponsored by a Committer.

@dholmes-ora
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Jun 24, 2022

Going to push as commit 9dc9a64.
Since your change was applied there have been 351 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 24, 2022
@openjdk openjdk bot closed this Jun 24, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Jun 24, 2022
@openjdk
Copy link

openjdk bot commented Jun 24, 2022

@dholmes-ora @jdksjolen Pushed as commit 9dc9a64.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@dcubed-ojdk
Copy link
Member

This PR has been backed out because it caused failures in Mach5 Tier[1-4].
See JDK-8289129 [BACKOUT] JDK-8287281 adjust guarantee in Handshake::execute for the case of target thread being current

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
6 participants