Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8289477: Memory corruption with CPU_ALLOC, CPU_FREE on muslc #9328

Conversation

tstuefe
Copy link
Member

@tstuefe tstuefe commented Jun 29, 2022

In os::Linux::active_processor_count(), we use the CPU_xxx macros to manage sets of CPU information.

muslc defines those macros to call calloc(3) and free(3):

#define CPU_ALLOC(n) ((cpu_set_t *)calloc(1,CPU_ALLOC_SIZE(n)))
#define CPU_FREE(set) free(set)

whereas glibc uses intermediate functions:

#define __CPU_ALLOC(count) __sched_cpualloc (count)
#define __CPU_FREE(cpuset) __sched_cpufree (cpuset)

which in the end also takes from C-heap, but those calls are not inlined.

So, on muslc we call calloc() and free(). Call happens inside the os::Linux namespace, free() resolves to os::free(). We have no wrapper in os for calloc though, so calloc() calls into muslc right away.

That means we have raw ::malloc() -> os::free(), which is unbalanced. Raw ::malloc() does not write the header os::free() expects. If NMT is on, we assert now, because NMT does not find its header in os::free().

This can be very easily reproduced by starting an Alpine VM with NMT on (or, a debug VM) and -XX:+UnlockDiagnosticVMOptions -XX:+UseCpuAllocPath. This causes a NMT fence alert right away:

NMT Block at 0x00007fc5a35db9f0, corruption at: 0x00007fc5a35db9f0: 
0x00007fc5a35db970:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00007fc5a35db980:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00007fc5a35db990:   f8 a1 c8 1b ea 55 00 00 00 00 00 00 00 c0 00 00
0x00007fc5a35db9a0:   00 a4 c8 1b ea 55 00 00 01 00 00 00 00 c0 00 00
0x00007fc5a35db9b0:   d8 a3 c8 1b ea 55 00 00 1d 00 00 00 00 a0 00 00
0x00007fc5a35db9c0:   2d 63 70 00 00 00 00 00 08 00 00 00 00 61 01 00
0x00007fc5a35db9d0:   2d 76 65 72 73 69 6f 6e 00 00 00 00 00 82 02 00
0x00007fc5a35db9e0:   2d 73 65 72 76 65 72 00 00 00 00 00 00 83 03 00
0x00007fc5a35db9f0:   2d 63 6c 69 65 6e 74 00 00 00 00 00 00 84 04 00
0x00007fc5a35dba00:   ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00007fc5a35dba10:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00007fc5a35dba20:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00007fc5a35dba30:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00007fc5a35dba40:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00007fc5a35dba50:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00007fc5a35dba60:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/mallocTracker.cpp:151
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/ubuntu/client_home/workspace/build-user-branch-linux_alpine_x86_64/SapMachine/src/hotspot/share/services/mallocTracker.cpp:151), pid=219496, tid=219512

The position of the musl devs is that "calloc" and "free" are reserved words in C, and should not be used [1]. I think they are right. The way we reuse known C- and Posix symbol names in the os namespace has bitten me in the past in similar cases.


The fix is minimally invasive for easy backporting. I just move the content of os::Linux::active_processor_count() into its own local function, outside the os namespace. That way, CPU_FREE cannot pick up os::free() accidentally, and the error is fixed.

I really would like a more thorough solution though, renaming all the potential conflict candidates, but leave that for a follow-up RFE.

[1] https://www.openwall.com/lists/musl/2022/06/29/3


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8289477: Memory corruption with CPU_ALLOC, CPU_FREE on muslc

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9328/head:pull/9328
$ git checkout pull/9328

Update a local copy of the PR:
$ git checkout pull/9328
$ git pull https://git.openjdk.org/jdk pull/9328/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 9328

View PR using the GUI difftool:
$ git pr show -t 9328

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9328.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 29, 2022

👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 29, 2022

@tstuefe The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Jun 29, 2022
@tstuefe tstuefe force-pushed the JDK-8289477-mem-corruption-on-alpine-cpu_alloc branch from 9d9125e to b022763 Compare June 29, 2022 17:11
@tstuefe tstuefe marked this pull request as ready for review June 29, 2022 17:12
@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 29, 2022
@mlbridge
Copy link

mlbridge bot commented Jun 29, 2022

Webrevs

@tstuefe tstuefe changed the title JDK-8289477: Memory corruption caused by CPU_ALLOC, CPU_FREE on muslc JDK-8289477: Memory corruption with CPU_ALLOC, CPU_FREE on muslc Jun 29, 2022
Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to rely on the compiler not recognising that the single use of the new function could be inlined into the original.

@tstuefe
Copy link
Member Author

tstuefe commented Jun 30, 2022

This seems to rely on the compiler not recognising that the single use of the new function could be inlined into the original.

No. Using free() (without global scope "::") inside os::Linux:: will resolve the the nearest outer scope os::free(). Using free() inside a global function will not do that since free() is already in global scope, so gets resolved to ::free(). This is not dependent on any inlining decisions.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so the "binding" of free() will happen before any potential inlining could occur.

It is a nice simple workaround.

Thanks.

@openjdk
Copy link

openjdk bot commented Jun 30, 2022

@tstuefe This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8289477: Memory corruption with CPU_ALLOC, CPU_FREE on muslc

Reviewed-by: dholmes, clanger

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 8 new commits pushed to the master branch:

  • 31e50f2: 8286104: use aggressive liveness for unstable_if traps
  • dddd4e7: 8289291: HttpServer sets incorrect value for "max" parameter in Keep-Alive header value
  • 048bffa: Merge
  • cf71544: 8289252: Recommend Locale.of() method instead of the constructor
  • 5708974: 8288596: Random:from() adapter does not delegate to supplied generator in all cases
  • dbc6e11: 8289399: Update SourceVersion to use snippets
  • 15efb2b: 8289238: Refactoring changes to PassFailJFrame Test Framework
  • b6bd190: 8288985: P11TlsKeyMaterialGenerator should work with ChaCha20-Poly1305

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 30, 2022
@tstuefe
Copy link
Member Author

tstuefe commented Jun 30, 2022

Okay so the "binding" of free() will happen before any potential inlining could occur.

It is a nice simple workaround.

Thanks.

Thank you!

Copy link
Contributor

@RealCLanger RealCLanger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing.

@tstuefe
Copy link
Member Author

tstuefe commented Jun 30, 2022

Thanks @dholmes-ora and @RealCLanger.

/integrate

@openjdk
Copy link

openjdk bot commented Jun 30, 2022

Going to push as commit da6d1fc.
Since your change was applied there have been 8 commits pushed to the master branch:

  • 31e50f2: 8286104: use aggressive liveness for unstable_if traps
  • dddd4e7: 8289291: HttpServer sets incorrect value for "max" parameter in Keep-Alive header value
  • 048bffa: Merge
  • cf71544: 8289252: Recommend Locale.of() method instead of the constructor
  • 5708974: 8288596: Random:from() adapter does not delegate to supplied generator in all cases
  • dbc6e11: 8289399: Update SourceVersion to use snippets
  • 15efb2b: 8289238: Refactoring changes to PassFailJFrame Test Framework
  • b6bd190: 8288985: P11TlsKeyMaterialGenerator should work with ChaCha20-Poly1305

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 30, 2022
@openjdk openjdk bot closed this Jun 30, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 30, 2022
@openjdk
Copy link

openjdk bot commented Jun 30, 2022

@tstuefe Pushed as commit da6d1fc.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@openjdk
Copy link

openjdk bot commented Jul 6, 2022

@tstuefe Unknown command backport - for a list of valid commands use /help.

@openjdk
Copy link

openjdk bot commented Jul 6, 2022

@tstuefe Unknown command backport - for a list of valid commands use /help.

@openjdk
Copy link

openjdk bot commented Jul 6, 2022

@tstuefe Unknown command backport - for a list of valid commands use /help.

@RealCLanger
Copy link
Contributor

Hey @tstuefe, you need to do this on the commit 😉

@tstuefe
Copy link
Member Author

tstuefe commented Jul 6, 2022

Hey @tstuefe, you need to do this on the commit wink

... I need more coffee :-(

@tstuefe tstuefe deleted the JDK-8289477-mem-corruption-on-alpine-cpu_alloc branch August 24, 2023 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
3 participants