-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) #23630
Conversation
…nal Error (codeBlob.cpp:429) Initial size of CodeCache is too small
Co-authored-by: Tobias Hartmann <tobias.hartmann@oracle.com>
Co-authored-by: Tobias Hartmann <tobias.hartmann@oracle.com>
Co-authored-by: Tobias Hartmann <tobias.hartmann@oracle.com>
Co-authored-by: Tobias Hartmann <tobias.hartmann@oracle.com>
…ime::generate
👋 Welcome back dfenacci! A progress list of the required criteria for merging this PR into |
@dafedafe This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 388 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
@dafedafe this pull request can not be integrated into git checkout JDK-8347406
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
Webrevs
|
@adinn I noticed I touched quite a few runtime files that you recently refactored. I guess it might make sense if you had a look at them. Thanks! |
I don't understand why JDK-8326615 didn't work. If the minimum codecache size was too small, can't we just increase it? |
} | ||
return _pre_barrier_c1_runtime_code_blob != nullptr && reference_barrier_success; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to return false immediately after each failure, rather than continuing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, totally. Fixed.
@@ -540,4 +540,8 @@ void ZBarrierSetC1::generate_c1_runtime_stubs(BufferBlob* blob) { | |||
generate_c1_store_runtime_stub(blob, true /* self_healing */, "store_barrier_on_oop_field_with_healing"); | |||
_store_barrier_on_oop_field_without_healing = | |||
generate_c1_store_runtime_stub(blob, false /* self_healing */, "store_barrier_on_oop_field_without_healing"); | |||
return _load_barrier_on_oop_field_preloaded_runtime_stub != nullptr && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, why not return false immediately on first failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed too.
if (rs == nullptr) { | ||
C->record_failure("CodeCache is full"); | ||
} else { | ||
C->set_stub_entry_point(rs->entry_point()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the deleted rs->is_runtime_stub() assert still useful here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A slightly modified one surely is. Inserted it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking it could be moved into the else
clause and simplified further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see 👍. Moved. Thanks!
It seems that the small minimum codecache size wasn’t the core of the problem. In JDK-8326615 I’ve actually tried increasing the minimum code cache size (even calculating the minimum C1 and C2 sizes needed) but I kept hitting the same problem seemingly because what triggers the crash is the code cache exceeding its limits exactly during one of the allocations in the 4 call paths listed in the description (any allocation at another point simply turns off C1 or C2 (or both)). Increasing the minimum was possibly making it a bit less probable (i.e. when running something small that would use very little code cache, e.g. Even so, it might be a good idea to additionally increase the minimum code cache anyway. @dean-long do you think it would make sense to file an RFE for that? |
Co-authored-by: Dean Long <17332032+dean-long@users.noreply.github.com>
Refreshing my memory, isn't the real problem with trying to fix this with a minimum codecache size is that some of these stubs are not allocated during initial single-threaded JVM startup, but later when the first compiler threads start, and that allows other code blobs to fill up the codecache? |
Sure, if it's still an issue. |
Yes, exactly. This seems to be even more of an issue with 2 compiler threads (i.e. C1/C2) since the first can fill up the code cache first at the expense of the other. |
@dean-long You are right that the root of the problem is delayed stub init when running in tiered mode. It's actually a race between the C1 and C2 threads to consume code cache space. Neither C1 nor C2 can generate compiled method code before they have initialized their respective stubs. However, initialization of C2 stubs may potentially occur after a C1 compiler thread has initialized its own stubs and generated some number of C1-compiled methods. Clearly that can't happen the other way around because in tiered mode a C2 thread will only end up compiling a method after 1) a prior C1-compiled method exists in the runtime and 2) its own stubs have been generated. An implication of 1) is that the C1 stubs have already been generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me too.
Thank you very much for your reviews @dean-long and @adinn! |
/integrate |
Going to push as commit 48fac66.
Your commit was automatically rebased without conflicts. |
Issue
The test
src/hotspot/share/opto/c2compiler.cpp
fails intermittently due to a crash that happens when trying to allocate code cache space for C1 and C2 inRuntimeStub::new_runtime_stub
andSingletonBlob::operator new
.Causes
There are a few call paths during the initialization of C1 and C2 that can lead to the code cache allocations in
RuntimeStub::new_runtime_stub
(throughRuntimeStub::operator new
) andSingletonBlob::operator new
triggering a fatal error if there is no more space. The paths in question are:Compiler::init_c1_runtime
->Runtime1::initialize
->Runtime1::generate_blob_for
->Runtime1::generate_blob
->RuntimeStub::new_runtime_stub
C2Compiler::initialize
->C2Compiler::init_c2_runtime
->OptoRuntime::generate
->OptoRuntime::generate_stub
->Compile::Compile
->Compile::Code_Gen
->PhaseOutput::install
->PhaseOutput::install_stub
->RuntimeStub::new_runtime_stub
C2Compiler::initialize
->C2Compiler::init_c2_runtime
->OptoRuntime::generate
->OptoRuntime::generate_uncommon_trap_blob
->UncommonTrapBlob::create
->new UncommonTrapBlob
C2Compiler::initialize
->C2Compiler::init_c2_runtime
->OptoRuntime::generate
->OptoRuntime::generate_exception_blob
->ExceptionBlob::create
->new ExceptionBlob
Solution
Instead of fatally crashing the we can use the
alloc_fail_is_fatal
flag ofRuntimeStub::new_runtime_stub
to avoid crashing in cases 1 and 2 and add a similar flag toSingletonBlob::operator new
for cases 3 and 4. In the latter case we need to adjust all calls accordingly.Note: In JDK-8326615 it was argued that increasing the minimum code cache size would solve the issue but that wasn't entirely accurate: doing so possibly decreases the chances of a failed allocation in these 4 places but doesn't totally avoid it.
Testing
The original failing regression test in
test/hotspot/jtreg/compiler/startup/StartupOutput.java
has been modified to run multiple times with randomized values (within the original failing range) to increase the chances of hitting the fatal assertion.Tests: Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode)
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/23630/head:pull/23630
$ git checkout pull/23630
Update a local copy of the PR:
$ git checkout pull/23630
$ git pull https://git.openjdk.org/jdk.git pull/23630/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 23630
View PR using the GUI difftool:
$ git pr show -t 23630
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/23630.diff
Using Webrev
Link to Webrev Comment