New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8289834: Add SBCS and DBCS Only EBCDIC charsets #9399
Conversation
👋 Welcome back itakiguchi! A progress list of the required criteria for merging this PR into |
/label remove nio |
@takiguc |
/label add core-libs |
@takiguc |
@takiguc The |
Discussions are available on : |
Yes, I think this need discussion on whether the JDK really needs to keep including and adding more EBCDIC charsets. I understand they can be useful for someone using JDBC to connect to a database on z/OS but this scenario would work equally well if the EBCDIC charsets were deployed on the class path or module path. Do you know if the icu4j project is still alive? I've often wondered why there wasn't more use of the provider mechanism. |
/label -build |
@magicus |
I second Alan's comment here. I am not sure we could justify those legacy less common EBCDIC charsets making into the |
Mailing list message from Bernd Eckenfels on core-libs-dev: And also there is no reason why db drivers or host connectors should not ship their own charset support (Oracle JDBC for example had nls_charset addons. My employer also ship a custom EBCDIC encoding which includes some compatibility hacks, and that took some effort to adopt it to the missing ext mechanism). Having said that, with JPMS a ?legacy ebcdic? encoding module would be possible while still being optional. Maybe in the future a mechanism for modules which can be added (instead of removed) from standard distribution would make that nicer? Is there a performance restriction for charset if they are not part of a platform module (optimized string access)? Gruss -- On Wed, 6 Jul 2022 16:18:08 GMT, Ichiroh Takiguchi <itakiguchi at openjdk.org> wrote:
Yes, I think this need discussion on whether the JDK really needs to keep including and adding more EBCDIC charsets. I understand they can be useful for someone using JDBC to connect to a database on z/OS but this scenario would work equally well if the EBCDIC charsets were deployed on the class path or module path. Do you know if the icu4j project is still alive? I've often wondered why there wasn't more use of the provider mechanism. ------------- PR: https://git.openjdk.org/jdk/pull/9399 |
@takiguc This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
Hello @AlanBateman . As you know |
If they have ASCII compatible regions then that may be so but I haven't see any performance data published on that. Do you know if any experiments that have deployed a CharsetProvider for the EBCDIC charsets and compared the performance with the charsets that in the JDK? There may be merit in exploring adding base abstracts implementations of CharsetEncoder/CharsetDecoder to java.nio.charsets.spi to support single and double byte charsets to see how such base implementations might look, how they would help performance, and if there are any security downsides. |
Hello @AlanBateman . I created attached small test program, I'm not sure it's good or not
Following test result is just for my test environment
Use following options, like I used jdk-20 b12
Notes:
Actually, I'm confused by this result. Anyway, please evaluate about this result. |
My initial reaction is one of relief that the icu4j provider can be used with current JDK builds. This means there is an option should we decide to stop adding more EBCDIC charsets to the JDK. The test uses IBM-1047 and I can't tell if the icu4j provider is used or not. Charset doesn't define a provider method but I think would be useful to print cs.getClass() or cs.getClass().getModule() so we know which Charset implementation is used. Also I think any discussion on performance would be better served with a JMH benchmark rather than a standalone test. |
@takiguc This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
Hello @AlanBateman . I created Charset SPI JAR Test code:
All test related files are in JDK-8289834. Test results are as follows on RHEL8.6 x86_64 (Intel Core i7 3520M) :
IBM1047 is 2.6 times faster than the SPI version on JDK20. |
I don't think that is a workable option. Maybe you could do more performance analysis on the providers that are using the SPI to see if there are any secure primitives that would make sense to expose in java.nio.charset.spi? I'm hesitate to suggest adding base classes to java.nio.charset.spi but that may be something that could be explored too. |
@takiguc This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
@takiguc This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the |
I'm still working on this one. |
@takiguc This pull request is now open |
@takiguc This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
@takiguc This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the |
OpenJDK supports "Japanese EBCDIC - Katakana" and "Korean EBCDIC" SBCS and DBCS Only charsets.
But OpenJDK does not supports some of "Japanese EBCDIC - English" / "Simplified Chinese EBCDIC" / "Traditional Chinese EBCDIC" SBCS and DBCS Only charsets.
I'd like to request Cp1027/Cp835/Cp836/Cp837 for consistency
*1: Cp037 compatible
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9399/head:pull/9399
$ git checkout pull/9399
Update a local copy of the PR:
$ git checkout pull/9399
$ git pull https://git.openjdk.org/jdk pull/9399/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 9399
View PR using the GUI difftool:
$ git pr show -t 9399
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9399.diff