Skip to content

8343110: Add getChars(int, int, char[], int) to CharSequence and CharBuffer #21730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

mkarg
Copy link
Contributor

@mkarg mkarg commented Oct 26, 2024

This Pull Request proposes an implementation for JDK-8343110: Adding the new method public void getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin) to the CharSequence interface, providing a bulk-read facility including a default implementation iterating over charAt(int).

In addition, this Pull Request proposes to replace the implementation of Reader.of(CharSequence).read(char[] cbuf, int off, int len) to invoke CharSequence.getChars(next, next + n, cbuf, off) instead of utilizing pattern matching for switch. Also, this PR proposes to implement CharBuffer.getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin) as an alias for CharBuffer.get(srcBegin, dst, dstBegin, srcEnd - srcBegin).

To ensure quality...

  • ...the method signature and JavaDocs are adapted from AbstractStringBuilder.getChars(...).
  • ...this PR relies upon the existing tests for Reader.of(CharSequence), as these provide sufficient coverage of all changes introduced by this PR.

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change requires CSR request JDK-8343111 to be approved
  • Change must be properly reviewed (2 reviews required, with at least 2 Reviewers)

Issues

  • JDK-8343110: Add getChars(int, int, char[], int) to CharSequence and CharBuffer (Enhancement - P4)
  • JDK-8343111: Add getChars(int, int, char[], int) to CharSequence and CharBuffer (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21730/head:pull/21730
$ git checkout pull/21730

Update a local copy of the PR:
$ git checkout pull/21730
$ git pull https://git.openjdk.org/jdk.git pull/21730/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21730

View PR using the GUI difftool:
$ git pr show -t 21730

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21730.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 26, 2024

👋 Welcome back mkarg! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 26, 2024

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Oct 26, 2024

@mkarg The following labels will be automatically applied to this pull request:

  • core-libs
  • nio

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added nio nio-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Oct 26, 2024
@mkarg mkarg mentioned this pull request Oct 26, 2024
4 tasks
@mkarg
Copy link
Contributor Author

mkarg commented Oct 26, 2024

/csr

@openjdk openjdk bot added the csr Pull request needs approved CSR before integration label Oct 26, 2024
@openjdk
Copy link

openjdk bot commented Oct 26, 2024

@mkarg has indicated that a compatibility and specification (CSR) request is needed for this pull request.

@mkarg please create a CSR request for issue JDK-8343110 with the correct fix version. This pull request cannot be integrated until the CSR request is approved.

@mkarg
Copy link
Contributor Author

mkarg commented Oct 26, 2024

FYI @liach @bokken @eirbjo @robtimus @parttimenerd

@liach
Copy link
Member

liach commented Oct 26, 2024

Sorry for belated mail response, but I think we should design the API to not take source start/end. I think JIT can escape analysis the new String in practice.

@mkarg
Copy link
Contributor Author

mkarg commented Oct 26, 2024

Sorry for belated mail response, but I think we should design the API to not take source start/end. I think JIT can escape analysis the new String in practice.

Chen, thank you for chiming in! There is nothing to be sorry, I was just posting drafts so far! 🙂

Regarding your actual question, I do understand your idea and while originally I had the same in mind (it really is appealing!), I came up with a draft using the original String.getChars() signature instead, due to the following drawbacks:

  • There might exist (possibly lotsof) CharSequence.getChars(int, int, char[], int) implementations already, as this problem (and the idea how to solve it) is anything but new. At least such implementations are String, StringBuilder and StringBuffer. If we come up with a different signature, then none of these already existing performance boosters will get used by Reader.of(CharSequence) automatically - at least until they come up with alias methods. Effectively this leads to (possibly lots) of alias methods. At least it leads to alias methods in String, StringBuilder, StringBuffer and CharBuffer. In contrast, when keeping the signature copied from String.getChars, chances are good that (possibly lots of) implementations will instantly be supported by Reader.of(CharSequence) without alias methods. At least, String, StringBuilder and StringBuffer will be.
  • Since decades people are now very used to StringBuilder.getChars(int, int, char[], int), so (possibly a lot of) people might simply expect us to come up with that lengthy signature. These people might be rather confused (if not to say frustrated) when we now force them to write an intermediate subSequence(int, int) for something that was "such simple" before.
  • Custom implementations of CharSequence.subSequence could come up with the (performance-wise "bad") idea of creating copies instead of views. At least it seems like AbstractStringBuilder is doing that, so chances are "good" that custom libs will do that, too. For example, because they need it for safety. Or possibly, because they have a technical reason that enforces a copy. That would (possibly massively, depending on the actual class) spoil the idea of performance-boosting this PR is actually all about.

@liach
Copy link
Member

liach commented Oct 26, 2024

Hi Markus, I recommend to continue the discussion on the mailing list instead of on GitHub. Note that GitHub pull requests are only forwarded to the mailing list when it's ready for review, and this proposal is way too early. (And the CSR is too early, too: we should agree on an API surface, such as exception contracts, first)

Sorry that no one has responded to you yet, but many engineers are busy with other areas, such as pushing the JEPs (as you see, we have a huge number of candidate/submitted JEPs right now, and JDK 24 RDP1 is just 6 weeks away). They are monitoring your core-libs thread, stay assured!

@mkarg
Copy link
Contributor Author

mkarg commented Oct 26, 2024

@liach Agreed with all you say, but actually: I did not expect any response - in particular on a weekend!

I do not know why apparently people started to review the draft documents. I just filed it for no other purpose than taking note of its existence. 🤔 I thought it is pretty clear to everybody that draft means "do not review". At least this is how I used draft documents in the past 30 years. 🙂

So it seems there was a misunderstanding here regarding my drafts. My intention ever was and still is to first finish the discussion about alternatives (A)...(E) on the core-lib mailing list first. My PR and CSR drafts are intentionally in draft state, which means, they are not open for review yet (there is no rfr label) and solely serve as my public storage for ideas-in-progress in lieu of out-of-band media. The detail discussion you started regarding details of the PR was a bit too early IMHO (and surprised me, and actually your duplicate question in the PR confused me), as it is not even decided that we actually go with proposal A, so why discussing details of exactly that already (my conclusion was, you like to prepare it for the case it becomes the finalist of the mailing list discussion)? Maybe I misunderstood your intent, and you actually wanted to propose alternative (F), then sorry for me not seeing that in the first place.

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 21, 2024

@mkarg This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@mkarg
Copy link
Contributor Author

mkarg commented Dec 22, 2024

Kindly asking anybody's contribution to the ongoing discussion on the core-devs mailing list. Thanks you.

@openjdk
Copy link

openjdk bot commented Feb 9, 2025

@mkarg this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout getchars
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Feb 9, 2025
@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Feb 9, 2025
@mkarg mkarg marked this pull request as ready for review February 9, 2025 18:37
@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 9, 2025
@mlbridge
Copy link

mlbridge bot commented Feb 9, 2025

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 9, 2025

@mkarg This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@mkarg
Copy link
Contributor Author

mkarg commented Mar 15, 2025

Regarding your actual question, I do understand your idea and while originally I had the same in mind (it really is appealing!), I came up with a draft using the original String.getChars() signature instead, due to the following drawbacks:

  • There might exist (possibly lotsof) CharSequence.getChars(int, int, char[], int) implementations already, as this problem (and the idea how to solve it) is anything but new. At least such implementations are String, StringBuilder and StringBuffer. If we come up with a different signature, then none of these already existing performance boosters will get used by Reader.of(CharSequence) automatically - at least until they come up with alias methods. Effectively this leads to (possibly lots) of alias methods. At least it leads to alias methods in String, StringBuilder, StringBuffer and CharBuffer. In contrast, when keeping the signature copied from String.getChars, chances are good that (possibly lots of) implementations will instantly be supported by Reader.of(CharSequence) without alias methods. At least, String, StringBuilder and StringBuffer will be.

  • Since decades people are now very used to StringBuilder.getChars(int, int, char[], int), so (possibly a lot of) people might simply expect us to come up with that lengthy signature. These people might be rather confused (if not to say frustrated) when we now force them to write an intermediate subSequence(int, int) for something that was "such simple" before.

  • Custom implementations of CharSequence.subSequence could come up with the (performance-wise "bad") idea of creating copies instead of views. At least it seems like AbstractStringBuilder is doing that, so chances are "good" that custom libs will do that, too. For example, because they need it for safety. Or possibly, because they have a technical reason that enforces a copy. That would (possibly massively, depending on the actual class) spoil the idea of performance-boosting this PR is actually all about.

Kindly asking for more comments! Something I am missing? Some good reason to not adopt my proposal?

@liach
Copy link
Member

liach commented Mar 25, 2025

/reviewers 2 reviewer

@openjdk
Copy link

openjdk bot commented Mar 25, 2025

@liach
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 2 Reviewers).

…E behavior - when an IOOBE is thrown, some characters may be already transferred (this is important for concurrent char sequences)'
@mkarg
Copy link
Contributor Author

mkarg commented Apr 4, 2025

All requested changes are included in this PR since last week. Kindly asking to re-review / mark as reviewed, so we can proceed with the needed CSR. Thanks.

@AlanBateman
Copy link
Contributor

I think the API docs in the latest draft looks okay. It mildly bothers me a bit is that getChars is JDK 1.0 API but the trade off with doing a new API is that it would need to be implemented by String and SB so I think the proposal on the table is okay.

@mkarg
Copy link
Contributor Author

mkarg commented Apr 8, 2025

I think the API docs in the latest draft looks okay. It mildly bothers me a bit is that getChars is JDK 1.0 API but the trade off with doing a new API is that it would need to be implemented by String and SB so I think the proposal on the table is okay.

Kindly asking to mark this PR as reviewed, so I will go on with the CSR next.

@liach
Copy link
Member

liach commented Apr 8, 2025

Should we ask hotspot compiler engineers for inputs on how useful this API is for allowing batch/vectorized copy and eliminating overheads of defensive copy array allocation?

@mkarg
Copy link
Contributor Author

mkarg commented Apr 11, 2025

Should we ask hotspot compiler engineers for inputs on how useful this API is for allowing batch/vectorized copy and eliminating overheads of defensive copy array allocation?

I have no strong feelings about that.

As it looks like we reached concensus on the proposed change, I have now copied the spec into the CSR and set it into the proposed state. Kindly asking for your reviews of the CSR, so it can rather soon proceed into the provisional state.

Comment on lines +336 to +338
* Concurrent truncation of this character sequence can throw
* {@code IndexOutOfBoundsException}. In this case, some characters, but not
* all, may be already transferred.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though its been absent for decades, it might be worthwhile to include a class level warning that the implementations of the interface are not known or required to be thread safe and if used concurrently by multiple threads the behavior is unpredictable. I'd be more circumspect than trying to specify the only aberrant behavior is IndexOutOfBounds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any quick proposals how to rephrase JavaDocs and CSR? As the CSR is already reviewed I am a bit reluctant to hold the train for this, as IMHO masses of classes and interfaces would need such a disclaimer, if we kick-off that ball here...

Copy link
Contributor

@AlanBateman AlanBateman Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Roger's suggestion is for a different JBS issue / PR / CSR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we already added this API to allow implementations to be more consistent under race condition. So for changing char sequences, this method's default implementation can throw IOOBE, but it is already specified so it is well-behaved even under race (as you can argue a read observed the shorter state of the CS that causes the failure)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omitting it is fine, it was just a reminder that an interface cannot make any definitive statements about its implementations and sometime should warn the unwary user. (We all expect them to adhere to the contract described in each method, but who knows.)
The statement in the implSpec is fine as it only applies to the default implementation, but can be mis-read and thought to apply to all implementations. YMMV.
And though I see consensus, I don't see any approvals yet, including the CSR.
Its good to see it settle down and make progress.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alan, I am a bit lost here. How to proceed: Change the JavaDocs, or keep them as found in the CSR?

@liach
Copy link
Member

liach commented Apr 16, 2025

The implementation looks good. Do you think we need some unit tests for this new API, especially against the default implementation?

@mkarg
Copy link
Contributor Author

mkarg commented Apr 16, 2025

IMHO the existing Reader.of() tests do cover all implementations already.

@liach
Copy link
Member

liach commented Apr 16, 2025

Relying on another user is not a good practice - it is possible that we switch Reader.of to another implementation. The tests can be as simple as copying the Reader.of tests.

@mkarg
Copy link
Contributor Author

mkarg commented Apr 16, 2025

Nevertheless, the tests are there, so no new ones are needed. We can duplicate them, if a majority thinks it is beneficial.

@jaikiran
Copy link
Member

Nevertheless, the tests are there, so no new ones are needed. We can duplicate them, if a majority thinks it is beneficial.

As noted in the contribution guide (https://openjdk.org/guide/#testing-the-jdk), regression tests are expected for a majority of changes that are done in the JDK. There are situations where regressions tests cannot be added, but this isn't one. I believe a new method with a default specified implementation in a public exported interface deserves a regression test of its own.

@mkarg
Copy link
Contributor Author

mkarg commented Apr 16, 2025

Nevertheless, the tests are there, so no new ones are needed. We can duplicate them, if a majority thinks it is beneficial.

As noted in the contribution guide (https://openjdk.org/guide/#testing-the-jdk), regression tests are expected for a majority of changes that are done in the JDK. There are situations where regressions tests cannot be added, but this isn't one. I believe a new method with a default specified implementation in a public exported interface deserves a regression test of its own.

The docs do not say that the regression test must be separate. Again: We do have testing covered. The question is not whether we need a test -we do have one-, but whether to duplicate it.

Edit: Don't get me wrong, I do not have any strong feelings here. If duplication is wanted, I will do it. I am just waiting for a majority to clearly say that they want duplication. What I do not want is to duplicate the tests and later another reviewer chimes in asking me to throw the duplicate away.

@jaikiran
Copy link
Member

If duplication is wanted, I will do it. I am just waiting for a majority to clearly say that they want duplication.

By duplication, do you mean copy pasting the testReadBII(Reader reader), testReadBIILenZero(Reader reader), testReadDirectCharBuffer(Reader reader) and similar such test methods from the existing test/jdk/java/io/Reader/Of.java test class? Including all those references to the Reader?

Implementing a test that calls Reader.of() to verify the behaviour of an independent CharSequence.getChars() method isn't the kind of test I am proposing. It should be possible to just invoke CharSequence.getChars() on a CharSequence instance to verify that its default method does what it specifies. So these new tests won't be duplicating any existing test code.

@mkarg
Copy link
Contributor Author

mkarg commented Apr 16, 2025

It should be possible to just invoke CharSequence.getChars() on a CharSequence instance to verify that its default method does what it specifies. So these new tests won't be duplicating any existing test code.

IIUC then you will be fine with a test that solely tests the default implementation of CharSequence.getChars()?

@jaikiran
Copy link
Member

IIUC then you will be fine with a test that solely tests the default implementation of CharSequence.getChars()?

Correct. And a similar separate test for CharBuffer.getChars() since that one too is a new default implementation method. I apologize if that wasn't clear in the discussion so far.

…t implementation of CharSequence.getChars() and for CharBuffer.getChars()
@mkarg
Copy link
Contributor Author

mkarg commented Apr 17, 2025

IIUC then you will be fine with a test that solely tests the default implementation of CharSequence.getChars()?

Correct. And a similar separate test for CharBuffer.getChars() since that one too is a new default implementation method. I apologize if that wasn't clear in the discussion so far.

I now have covered CharSequence.getChars() and CharBuffer.getChars() with separate basic unit testing. Does this fit? 🙂

Copy link
Member

@liach liach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit tests look good.

* @summary Check for expected behavior of CharBuffer.getChars().
* @run testng GetChars
*/
public class GetChars {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests for buffers, including CharBuffer, are in test/jdk/java/nio/Buffer. Look at the existing Chars.java tests idea, and specifically the DataProvider "createCharBuffers" as it will show the different kinds of CharBuffers that can be tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this tip, Alan! I have move the unit test from nio/CharBuffer to nio/Buffer, and appended it with the randomized test found in Buffer/Chars.java. Hope it looks fine now?

…getChars() with unit test for CharBuffer.chars()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org csr Pull request needs approved CSR before integration nio nio-dev@openjdk.org rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

6 participants