New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8288979: Improve CLDRConverter run time #9243
Conversation
👋 Welcome back djelinski! A progress list of the required criteria for merging this PR into |
@djelinski The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
if (Objects.isNull(newMap)) { | ||
newMap = new HashMap<>(); | ||
Map<String, Object> newMap = new HashMap<>(map); | ||
Map<BundleEntryValue, BundleEntryValue> dedup = new HashMap<>(map.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LinkedHashMap could be used to retain the iteration order.
Or TreeMap if some deterministic order was desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Which raises the question: do we need any arbitrary order? The original code used a hashmap too. It preserved the original order only when no duplicates were detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A stable order is useful when comparing between builds (by a human).
It also supports the goal of reproducible builds.
@naotoj What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, once this fix makes it to the repository, the build will be reproducible. Making it to be sorted is a welcome enhancement (I compare the generated bundles manually from time to time), but it may be costly so it could defy the performance improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, this can utilize the new HashMap.newHashMap()
, although I don't think resizing would be occurring in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once this fix makes it to the repository, the build will be reproducible
Yes, we always produce the same source code. Given the same order of modifications, a hashmap will produce the same iteration order.
LinkedHashMap could be used to retain the iteration order.
Just added. The input maps were always sorted in some order (they were either LinkedHashMaps or TreeMaps), and now we preserve that order.
This means a lot of changes in the generated output files now, but hopefully in the future the changes will be much easier to review.
TIL: put
/ putIfAbsent
on an existing entry does not change the iteration order of a LinkedHashMap
unless accessOrder
is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, this can utilize the new
HashMap.newHashMap()
, although I don't think resizing would be occurring in this case.
It may occur if there are very few duplicates. Still, the performance impact of proper sizing is minimal here.
make/jdk/src/classes/build/tools/cldrconverter/ResourceBundleGenerator.java
Show resolved
Hide resolved
make/jdk/src/classes/build/tools/cldrconverter/ResourceBundleGenerator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the refactoring Daniel. Removing the leftover List
was a bonus.
if (Objects.isNull(newMap)) { | ||
newMap = new HashMap<>(); | ||
Map<String, Object> newMap = new HashMap<>(map); | ||
Map<BundleEntryValue, BundleEntryValue> dedup = new HashMap<>(map.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, this can utilize the new HashMap.newHashMap()
, although I don't think resizing would be occurring in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@djelinski This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 24 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
/integrate |
Going to push as commit c8cc94a.
Your commit was automatically rebased without conflicts. |
@djelinski Pushed as commit c8cc94a. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This PR improves the performance of deduplication done by ResourceBundleGenerator.
The original implementation compared every pair of values, requiring O(n^2) time. The new implementation uses a HashMap to find duplicates, trading off some extra memory consumption for O(n) computational complexity. In practice the time to generate jdk.localedata on my Linux VM files dropped from 14 to 8 seconds.
The resulting files (under build/support/gensrc/java.base and jdk.localedata) have different contents; map iteration order depends on the insertion order, and the insertion order of the new implementation is different from the original.
The files generated before and after this change have the same size.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9243/head:pull/9243
$ git checkout pull/9243
Update a local copy of the PR:
$ git checkout pull/9243
$ git pull https://git.openjdk.org/jdk pull/9243/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 9243
View PR using the GUI difftool:
$ git pr show -t 9243
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9243.diff