-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8293972: runtime/NMT/NMTInitializationTest.java#default_long-off failed with "Suspiciously long bucket chains in lookup table." #14607
Conversation
👋 Welcome back gziemski! A progress list of the required criteria for merging this PR into |
@gerard-ziemski The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
Isn't there only a single process created per test? |
This is failing in GHA:
|
Investigating... |
I used
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking back at this, I wonder whether we should reduce the table size. 64k for this lookup table is pretty big. The MallocSizeTable itself, that arguably holds a lot more allocations, is just have as wide.
Here are the results with Thomas proposed hash function
Summary:
So we don't gain anything by using clever bit shifts and xors (I know I tried many myself) compared to the "naked" pointer - the opposite in fact. I will post the code to the sandbox I used for these experiments after I clean it up a bit... |
Uploaded the src code of the sandbox + data sets, which I used for comparing hash functions, to the bug issue https://bugs.openjdk.org/browse/JDK-8293972 |
Tested with a smaller table size 4093, about half of 8191:
I think we could go smaller as you asked. My proposed hash function still is the best 2 out of 3 for stddev and best for all max chain and ranges. |
I'm going on vacation till July 5th. Thank you David and Thomas for feedback! |
… which lets us simplify the code
Once I started thinking of making the Please take another look and tell me what you think. After this fix goes in, I would like to continue work alongside the idea of less reliance on |
Sorry, I appreciate your work, but I'm not a big fan. I had similar thoughts when building NMTPreinit, but decided to implement realloc fully because it is just a lot safer and flexible. You bank on no reallocs ever happening in early initialization code. The amount of code that could run may surprise you. I know it surprised me. And those can be code paths very rarely taken. Realloc is such a basic function. Automatic C++ objects may realloc, for instance. I think UL has a ton of those, as has the Compiler. Or, you may crash and execute the crash handler - are we sure no realloc ever runs then? Your patch introduces a new dependency: now we will never be able to run code beforehand that uses realloc. And we will not be able to precede NMT argument parsing with another argument. That makes the code brittle. To be sure this works, we'd need at least add a guarantee - not an assert - at os::realloc(). Because the code paths taken can vary wildly, and you can never be sure we test everything. Bottomline, you trade in potential crashes (guarantees) and more inflexibility for a complexity reduction in code that makes no real problem. I'd rather just dumb down the test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can someone remind me what the realloc issue is please. On the surface this looks like a big simplification.
Thanks.
In brief, NMTPreInit is a mechanism that allows us to initialize NMT together with all other VM subsystems after normal argument processing. Before NMTPreInit, NMT was controlled by the launcher. That was complex and brittle and prevented us from using NMT when hotspot was embedded into a different launcher. E.g. IntelliJ, Eclipse, our gtestlauncher... the latter prevented us from having NMT-related gtests. NMTPreInit needs to track early-state mallocs and reallocs to handle them correctly in case these blocks are handed into os::realloc or os::free after NMT initialization. Not doing so corrupts the C-heap because of malloc header mismatch. This PR wants to remove the capability to handle early state reallocs. It tries to achieve that by moving NMT argument processing to the start of CJVM. The assumption is that no C-heap reallocs happen before that. The problem with that assumption is that we cannot be sure. We have a lot of code that runs right from the point of loading the libjvm.so on, long before CJVM is invoked. In the case of static linking (gtestlauncher or whatever Google tries to do), right from the point of initial program load. This PR bets on no reallocs happening before CJVM. Should our bet go wrong, the C-heap will be corrupted if:
We had a similar discussion for https://bugs.openjdk.org/browse/JDK-8301811 and https://bugs.openjdk.org/browse/JDK-8299196 - see my comment there. For some reason, NMTPreInit seems to attract a lot of cleanup efforts. Dealing with malloc and free seems deceptively simple. So NMTPreInit seems unnecessarily complex. But the issue itself is complex. NMTPreInit is the simplest possible answer to that complex problem. Dumbing it down more incurs a much greater risk and/or a larger deal of inflexibility. We could get rid of it if we agree that no code must run, under any circumstances, before NMT initialization. No code because otherwise, the use of C-heap allocation is very difficult to police. And getting it wrong is very costly. That would mean forbidding global C++ objects and platform-dependent DLL initialization. That, in turn, should be first agreed upon by the hotspot community and then has to actually happen. And I'm not sure it would be the right decision since global C++ objects are a powerful tool, especially for testing and prototyping. My gut feeling is that I don't want to depend on this assumption and that even if we get this right now, an inability to run any form of complex code before CJVM ties us down in ways I don't yet foresee. |
One more note of why I hesitate to remove "realloc"-support: realloc is often used for buffers that grow on demand. These growth points are difficult to predict since they depend on dynamic buffer content. Therefore this code is very difficult to fully cover with tests. We have seen these problems with stringStream back when it was still using RA as backing store. All would go well in tests, but when the customer turned on logging in a particular way, and we logged a particular sequence of log strings, the reallocation would hit just right at the wrong spot - in this case, we would reallocate in a leaf function under a temporary ResourceMark - and then later crash in the caller when the RM was removed. For our case, it means if someone creates a global C++ object that uses e.g. a stringStream (now uses C-Heap) or a GrowableHeapArray, and uses that before CJVM, it may trigger reallocs before CJVM at a customer, but not in our tests. |
I really do not like the complexity of So I can add What about the other change, where we move |
Thank you David and Thomas for your feedback. I will do David's suggestions shortly, but Thomas |
I'm leaning towards adding back the realloc to I will tackle replacing |
Yeah, I noticed. IMHO There are tons of better targets for cleanups and reviewer cycles. Just from the top of my head the NMT backend, the virtual memory abstraction layer, UL, ReservedSpace, Hotspot arena handling... all these are areas are quite complex and often very messy and would benefit a lot from cleanup attention. NMTPreinit was a large improvement over the way things were. It has made little problems over the years, barring over-strict tests like this one (my own darn fault for writing them, ironically - untested code does not draw attention).
I like the change to a Mersenne prime, that's a good idea. But the initialization change not so much. It adds a lot of complexity and brittleness to NMT initialization, for not much benefit. Right now, NMT initialization is simple. Because it is time-independent. All VM subsystems are already started, and we are in normal VM land. We can log, we can use Thread::current, we can use ostream, we can crash and have useful hs_err files... That was a big simplification that NMTPreInit brought. With your patch, we are back at having two NMT initialization phases. And the early phase happens before UL or Thread::current() or ostream or reasonable hs_err file printing are available. And we don't get that much for it. I think instead our time would be well spent on tracking down those mallocs and removing them. IIRC A lot of that stuff comes from UL. And UL would benefit so much from simplification. It is so darn complex for a simple logging framework. I don't see why it needs to dynamically allocate at all before argument parsing, since it has no way of knowing yet what's what. I suspect whatever is generated there could just as well live as automatic variable, or inlined, or hard-coded as constants. For this test, I think you could make the failure threshold much higher. When I originally wrote the test, I intended to weed out stupid problems like "oh, hashcode is constant", or "oh, we only use every 16th table bucket because of ptr alignment". But in this case, the hash distribution is not that bad. |
OK, I will keep the hash table changes and revert anything else. I'm not giving up on tackling |
Removed anything even remotely controversial. We only change the heuristic function and tweak the hash table size to be 8191, i.e. Mersenne prime. Thank you David and Thomas for your patience! |
I'm going on vacation for a week this Monday, so if I get your OKs I will integrate after I come back, just in case this will affect anything and I need to be available... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good!
@gerard-ziemski This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 33 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also fine with this very simple fix.
Thanks.
Thank you David and Thomas for the reviews! |
/integrate |
Going to push as commit 78f6799.
Your commit was automatically rebased without conflicts. |
@gerard-ziemski Pushed as commit 78f6799. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
We improve the hash function, by using Mersenne prime (2^x - 1) i.e. 8191 , as the table size, which when combined with "modulo" operation, provides a uniform bit distribution.
To help with testing this change to make sure we improve the hash function, I collected
NMTPreInitAllocation
data on 3 platforms: macOS, Linux and Windows, using 100 runs oftest/hotspot/jtreg/runtime/NMT/NMTInitializationTest.java
and computedstdev
as well as averagemax chain
:Note: the lower the number, the better.
Note: the
stdedev
andmax chain
are the most important numbers here.Tested manually with
jtreg -nr -va -jdk:./build/macosx-x86_64-server-fastdebug/images/jdk test/hotspot/jtreg/runtime/NMT
andgtestLauncher -jdk ./build/macosx-x86_64-server-fastdebug/images/jdk --gtest_output=xml:test_result.xml --gtest_catch_exceptions=0 --gtest_filter="NMT*:os*" -XX:NativeMemoryTracking=summary
andMach5 hs-tier1,2,3,4,5
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14607/head:pull/14607
$ git checkout pull/14607
Update a local copy of the PR:
$ git checkout pull/14607
$ git pull https://git.openjdk.org/jdk.git pull/14607/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 14607
View PR using the GUI difftool:
$ git pr show -t 14607
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14607.diff
Webrev
Link to Webrev Comment