8349713: [leyden] Memory map the cached code file #34

shipilev · 2025-02-10T12:14:11Z

It is visible in profiles for lots of applications that reading the SC cache file at startup costs significantly. On JavacBenchApp example, loading ~25M code requires about 30ms. This is ~1 GB/sec, so it is I/O limited.

We should really mmap the SC cache file to alleviate these costs. Let the actual SC readers (separate threads) to eat the cost of reading from the backing file.

I was not entirely sure COW for file mappings works correctly on Windows, so I excepted that one.

Additional testing:

Linux x86_64 server fastdebug, runtime/cds

Progress

Change must not contain extraneous whitespace
Change must be properly reviewed (1 review required, with at least 1 Committer)

Issue

JDK-8349713: [leyden] Memory map the cached code file (Enhancement - P4)

Reviewers

Vladimir Kozlov (@vnkozlov - Committer)
Ioi Lam (@iklam - Committer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/leyden.git pull/34/head:pull/34
$ git checkout pull/34

Update a local copy of the PR:
$ git checkout pull/34
$ git pull https://git.openjdk.org/leyden.git pull/34/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 34

View PR using the GUI difftool:
$ git pr show -t 34

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/leyden/pull/34.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2025-02-10T12:14:47Z

👋 Welcome back shade! A progress list of the required criteria for merging this PR into premain will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-02-10T12:14:54Z

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8349713: [leyden] Memory map the cached code file

Reviewed-by: kvn, iklam

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 1 new commit pushed to the premain branch:

58e1381: 8349905: [leyden] Make SCCache depend on CDS build feature

Please see this link for an up-to-date comparison between the source branch of this pull request and the premain branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@vnkozlov, @iklam) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

shipilev · 2025-02-10T12:15:01Z

It demonstrably improves performance on Linux, kicking out the 30ms out of critical startup path.

# Without mmap (legacy code)
Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -Xms64m -Xmx1g -XX:CacheDataStore=JavacBenchApp.cds -XX:+UseParallelGC -cp JavacBenchApp.jar -XX:-MmapCachedCode JavacBenchApp 50 1
  Time (mean ± σ):     408.0 ms ±   2.5 ms    [User: 1231.7 ms, System: 196.2 ms]
  Range (min … max):   404.6 ms … 412.8 ms    10 runs

# With mmap
Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -Xms64m -Xmx1g -XX:CacheDataStore=JavacBenchApp.cds -XX:+UseParallelGC -cp JavacBenchApp.jar JavacBenchApp 50 1
  Time (mean ± σ):     382.1 ms ±   2.6 ms    [User: 1229.9 ms, System: 181.6 ms]
  Range (min … max):   378.9 ms … 388.0 ms    10 runs

mlbridge · 2025-02-10T12:19:01Z

Webrevs

franz1981 · 2025-02-10T13:01:27Z

@shipilev

Does the numbers still holds with

sync; echo 3 > /proc/sys/vm/drop_caches

before the benchmark (single shot) run?
With big servers, unless there's some expected sharing (i.e. multiple processes using the same archive) to further boost it, I would expect direct I/O to benefit loading the archive by saving the extra copy (in the OS page cache) required to use the data read (OS page cache + extra copy).
The other concern re mmap is due to munmap cost which, on kernel side, relies (IIRC) to some v page single (!) lock to guard it - which usually slowdown processes termination

shipilev · 2025-02-10T14:43:41Z

Does the numbers still holds with sync; echo 3 > /proc/sys/vm/drop_caches

Yes, they do, and there is a good reason why: without caches, the hit on the critical startup path is even worse, even with a modern SSD. And AFAIU, file-backed mmap does play well with I/O caches too. Observe:

# Drop caches, read
  Time (mean ± σ):     521.0 ms ±   6.1 ms    [User: 1297.3 ms, System: 259.0 ms]
  Range (min … max):   514.6 ms … 532.2 ms    10 runs

# Drop caches, mmap
  Time (mean ± σ):     479.9 ms ±   2.6 ms    [User: 1148.7 ms, System: 223.4 ms] ; <--- ~40ms faster
  Range (min … max):   476.4 ms … 484.3 ms    10 runs

# Cached, read
  Time (mean ± σ):     413.0 ms ±   3.5 ms    [User: 1267.0 ms, System: 207.9 ms]
  Range (min … max):   408.6 ms … 417.7 ms    10 runs

# Cached, mmap
  Time (mean ± σ):     386.0 ms ±   4.7 ms    [User: 1258.5 ms, System: 183.3 ms]  ; <--- ~30ms faster
  Range (min … max):   378.7 ms … 393.7 ms    10 runs

The other concern re mmap is due to munmap cost which, on kernel side, relies (IIRC) to some v page single (!) lock to guard it - which usually slowdown processes termination

hyperfine tests of mine include that cost, as they are end-to-end invocation tests. (I remember this from 1BRC times.)

shipilev · 2025-02-10T15:27:38Z

Current GHA failures should be fixed by #35.

vnkozlov

Good. Until, as we discussed, cached code (and its data) will be part of CDS archive file.

How CDS handles mmap on windows?

shipilev · 2025-02-10T15:35:51Z

Good. Until, as we discussed, cached code (and its data) will be part of CDS archive file.

Yes, that would make this whole thing no-op. CDS mmap-s the archive already.

How CDS handles mmap on windows?

Windows can still memory-map the file through its Windows APIs. But I vaguely recollect some corner cases that @iklam fights every so often in CDS. Something with remapping or so? See MetaspaceShared::use_windows_memory_mapping, for example. I don't think it is wise to spend time dealing with those corner cases in current Leyden prototype, let it be part of whole CDS archive first.

iklam · 2025-02-10T17:09:07Z

Good. Until, as we discussed, cached code (and its data) will be part of CDS archive file.

Yes, that would make this whole thing no-op. CDS mmap-s the archive already.

How CDS handles mmap on windows?

Windows can still memory-map the file through its Windows APIs. But I vaguely recollect some corner cases that @iklam fights every so often in CDS. Something with remapping or so? See MetaspaceShared::use_windows_memory_mapping, for example. I don't think it is wise to spend time dealing with those corner cases in current Leyden prototype, let it be part of whole CDS archive first.

With the Windows APIs used by HotSpot today (VirtualAlloc and MapViewOfFile), we can't map a file into a reserved region. If we just want to mmap the SCCache into non-reserved, random location that's picked by the OS, we can already do that today with HotSpot.

In some cases, CDS wants to mmap into reserved regions. On Windows, we end up not mapping with MapViewOfFile, but simply reading the entire CDS file into reserved memory.

I think with this new API MapViewOfFile3, we can map into a reserved region. This would be useful if, for example, we want the SCCache to be immediately next to CDS, so that the AOT code can use relative addressing for metadata pointers (InstanceKlass*, etc).

shipilev · 2025-02-10T17:25:01Z

With the Windows APIs used by HotSpot today (VirtualAlloc and MapViewOfFile), we can't map a file into a reserved region. If we just want to mmap the SCCache into non-reserved, random location that's picked by the OS, we can already do that today with HotSpot.

In some cases, CDS wants to mmap into reserved regions. On Windows, we end up not mapping with MapViewOfFile, but simply reading the entire CDS file into reserved memory.

OK, great. AFAICS, this technically allows us to do MmapCachedCode = true on Windows too. Non-Windows currently rely on MAP_PRIVATE to get us COW. I believe FILE_MAP_COPY in os::pd_map_memory gives us the same on Windows.

Still, I think it is a bit saner to keep doing mmap-ing only on Linux, if only to test that non-mmap path works and not force us to debug Windows mmap issues in Leyden prototype. But I don't feel strongly about this. Opinions welcome!

shipilev · 2025-02-11T18:42:52Z

Any other opinions about this? I would like to integrate this to reap some startup benefits :)

iklam

I think this looks fine. The change is small, so we can update or remove it when migrating the SCCache into CDS.

@vnkozlov what do you think?

vnkozlov · 2025-02-11T20:32:54Z

I think this looks fine. The change is small, so we can update or remove it when migrating the SCCache into CDS.

@vnkozlov what do you think?

Yes, that is the plan.

shipilev · 2025-02-12T09:07:52Z

I think this PR would be superseded by @ashu-mehra's #39. We can still do it ahead of #39.

ashu-mehra · 2025-02-12T14:11:21Z

@shipilev I am fine with merging this as is.

This reverts commit 2f884d8.

shipilev · 2025-02-12T15:12:11Z

All right, the flag stays for a while. Once Ashu moves this whole thing to CDS, we can remove the flag. Meanwhile, we only do mmap on Linux. I think we are ready to integrate this.

shipilev · 2025-02-12T16:25:43Z

/integrate

openjdk · 2025-02-12T16:26:03Z

@shipilev
Your change (at version f2364f2) is now ready to be sponsored by a Committer.

vnkozlov · 2025-02-12T16:29:30Z

/sponsor

openjdk · 2025-02-12T16:30:42Z

Going to push as commit 79f7e61.
Since your change was applied there has been 1 commit pushed to the premain branch:

58e1381: 8349905: [leyden] Make SCCache depend on CDS build feature

Your commit was automatically rebased without conflicts.

openjdk · 2025-02-12T16:30:52Z

@vnkozlov @shipilev Pushed as commit 79f7e61.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Fix

Loading
Loading status checks…

41d018d

openjdk bot added the rfr label Feb 10, 2025

vnkozlov approved these changes Feb 10, 2025

View reviewed changes

openjdk bot added the ready label Feb 10, 2025

Merge branch 'premain' into JDK-8349713-mmap-sccache

Loading
Loading status checks…

ce07d99

iklam approved these changes Feb 11, 2025

View reviewed changes

shipilev added 2 commits February 12, 2025 11:15

Merge branch 'premain' into JDK-8349713-mmap-sccache

f3f1c92

Remove flag, allow all platforms, remove C-heap load

Loading
Loading status checks…

2f884d8

Revert "Remove flag, allow all platforms, remove C-heap load"

Loading
Loading status checks…

f2364f2

This reverts commit 2f884d8.

openjdk bot added the sponsor label Feb 12, 2025

openjdk bot added the integrated label Feb 12, 2025

openjdk bot closed this Feb 12, 2025

openjdk bot removed ready rfr sponsor labels Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8349713: [leyden] Memory map the cached code file #34

8349713: [leyden] Memory map the cached code file #34

shipilev commented Feb 10, 2025 •

edited by openjdk bot

Loading

bridgekeeper bot commented Feb 10, 2025

openjdk bot commented Feb 10, 2025 •

edited

Loading

shipilev commented Feb 10, 2025 •

edited

Loading

mlbridge bot commented Feb 10, 2025 •

edited

Loading

franz1981 commented Feb 10, 2025 •

edited

Loading

shipilev commented Feb 10, 2025 •

edited

Loading

shipilev commented Feb 10, 2025

vnkozlov left a comment

shipilev commented Feb 10, 2025 •

edited

Loading

iklam commented Feb 10, 2025

shipilev commented Feb 10, 2025

shipilev commented Feb 11, 2025

iklam left a comment

vnkozlov commented Feb 11, 2025

shipilev commented Feb 12, 2025 •

edited

Loading

ashu-mehra commented Feb 12, 2025

shipilev commented Feb 12, 2025

shipilev commented Feb 12, 2025

openjdk bot commented Feb 12, 2025

vnkozlov commented Feb 12, 2025

openjdk bot commented Feb 12, 2025

openjdk bot commented Feb 12, 2025

8349713: [leyden] Memory map the cached code file #34

8349713: [leyden] Memory map the cached code file #34

Conversation

shipilev commented Feb 10, 2025 • edited by openjdk bot Loading

Progress

Issue

Reviewers

Reviewing

bridgekeeper bot commented Feb 10, 2025

openjdk bot commented Feb 10, 2025 • edited Loading

shipilev commented Feb 10, 2025 • edited Loading

mlbridge bot commented Feb 10, 2025 • edited Loading

Webrevs

franz1981 commented Feb 10, 2025 • edited Loading

shipilev commented Feb 10, 2025 • edited Loading

shipilev commented Feb 10, 2025

vnkozlov left a comment

Choose a reason for hiding this comment

shipilev commented Feb 10, 2025 • edited Loading

iklam commented Feb 10, 2025

shipilev commented Feb 10, 2025

shipilev commented Feb 11, 2025

iklam left a comment

Choose a reason for hiding this comment

vnkozlov commented Feb 11, 2025

shipilev commented Feb 12, 2025 • edited Loading

ashu-mehra commented Feb 12, 2025

shipilev commented Feb 12, 2025

shipilev commented Feb 12, 2025

openjdk bot commented Feb 12, 2025

vnkozlov commented Feb 12, 2025

openjdk bot commented Feb 12, 2025

openjdk bot commented Feb 12, 2025

shipilev commented Feb 10, 2025 •

edited by openjdk bot

Loading

openjdk bot commented Feb 10, 2025 •

edited

Loading

shipilev commented Feb 10, 2025 •

edited

Loading

mlbridge bot commented Feb 10, 2025 •

edited

Loading

franz1981 commented Feb 10, 2025 •

edited

Loading

shipilev commented Feb 10, 2025 •

edited

Loading

shipilev commented Feb 10, 2025 •

edited

Loading

shipilev commented Feb 12, 2025 •

edited

Loading