-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8348907: Stress times out when is executed with ZGC #24209
Conversation
👋 Welcome back mgronlun! A progress list of the required criteria for merging this PR into |
@mgronlun This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 29 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
/label remove hotspot |
/label add hotspot-gc |
/label add hotspot-jfr |
@mgronlun |
@mgronlun |
@mgronlun |
Webrevs
|
|
@mgronlun Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a pragmatic solution.
I am not reviewing the the implications of not setting the epoch, as my understanding here is a bit lacking.
The name JfrNonReentrant
seems a little general for how tightly coupled the property is to running on a virtual thread and loading an oop. At the same time this is currently the only interaction which exhibits problems with reentry, and I am not sure if there is a better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a pragmatic workaround for the problem. We might want to revisit this at some point to make it easier to use JFR in the GC code, but I think this is an appropriate fix for the bug right now. Thanks for fixing this Markus.
/integrate |
Going to push as commit c2a4fed.
Your commit was automatically rebased without conflicts. |
Greetings,
Here is a suggested solution for solving the intricate deadlock issues involving virtual threads, ZGC load barriers, and JFR.
A JFR event can be allocated and committed in specific sensitive contexts, such as inside mutex-protected load barriers. If the thread is a virtual thread, JFR determines its thread name by loading the oop from the thread (jt->vthread()) as part of the event commit.
This operation again triggers the load barrier, which contains a non-reentrant lock, effectively deadlocking the thread with itself.
So, for specific sensitive event sites, JFR mustn't recurse or reenter into the same event site as part of the event commit.
After a few iterations and prototypes, which failed because they eventually ended up touching some oop, I came up with the following.
From a user perspective, an event (site) can now be marked as "non-reentrant" by wrapping it in a helper class.
This instruction now guarantees JFR will not reenter this site again as part of the event.commit().
The tradeoff is that we cannot write the virtual thread name for these sensitive event sites; we will instead report "" as the virtual thread name, which is the default virtual thread name in Java. All other information about the thread, such as the thread ID, virtual thread, etc., will still be reported.
I believe it is a reasonable tradeoff and a general solution for sensitive JFR event sites, which are rare in practice, with minimal impact on event programming.
Testing: jdk_jfr, stress testing
Let me know what you think.
Thanks
Markus
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24209/head:pull/24209
$ git checkout pull/24209
Update a local copy of the PR:
$ git checkout pull/24209
$ git pull https://git.openjdk.org/jdk.git pull/24209/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 24209
View PR using the GUI difftool:
$ git pr show -t 24209
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24209.diff
Using Webrev
Link to Webrev Comment