Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8300207: Add a pre-check for the number of canonical equivalent permutations in j.u.r.Pattern #12027

Closed
wants to merge 2 commits into from

Conversation

rgiulietti
Copy link
Contributor

@rgiulietti rgiulietti commented Jan 17, 2023

  • Strengthen a computation that could overflow.
  • Specify that use of CANON_EQ could exhaust memory in the compilation phase.

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change requires CSR request JDK-8300209 to be approved

Issues

  • JDK-8300207: Add a pre-check for the number of canonical equivalent permutations in j.u.r.Pattern
  • JDK-8300209: Specify that usage of CANON_EQ in j.u.r.Pattern may lead to memory exhaustion (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/12027/head:pull/12027
$ git checkout pull/12027

Update a local copy of the PR:
$ git checkout pull/12027
$ git pull https://git.openjdk.org/jdk pull/12027/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 12027

View PR using the GUI difftool:
$ git pr show -t 12027

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12027.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 17, 2023

👋 Welcome back rgiulietti! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added rfr Pull request is ready for review csr Pull request needs approved CSR before integration labels Jan 17, 2023
@openjdk
Copy link

openjdk bot commented Jan 17, 2023

@rgiulietti The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Jan 17, 2023
@mlbridge
Copy link

mlbridge bot commented Jan 17, 2023

Webrevs

@@ -1095,6 +1096,9 @@ public static Pattern compile(String regex) {
* Compiles the given regular expression into a pattern with the given
* flags.
*
* <p>Setting {@link #CANON_EQ} among the flags may impose a moderate risk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a candidate for @apiNote, same thing for the note added to CANON_EQ.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choice of a <p> paragraph rather than @apiNote is for consistency with similar commentary paragraphs in the specs of CASE_INSENSITIVE, UNICODE_CASE, and UNICODE_CHARACTER_CLASS.

I have no problems in using @apiNote instead, but then it would be better to apply the same for the other mentioned flags as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see your point and to use apiNote consistently would require "converting" some of the existing text to apiNote too.

I'm still mulling over Pattern.compile throwing OOME. An implNote is probably the right category for this, in which case it can start with "In the the JDK Reference Implementation ...". I assume the static Pattern.matches needs same, and also Pattern.matcher for the lazy compilation case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no hard limit for the number of combining marks in the Unicode specification, even though in practice it never reaches the implementation limit. A high number of combining marks is thus more akin to a a resource exhaustion condition than to anything else, IMO.

Even today, compilation of a pattern risks throwing an OOME anyway when trying to generate the permutations. Pre-emptively throwing an OOME just anticipates and extrapolates this behavior beyond the int limit of array lengths.

Alternatively, compilation (greedy or lazy) could throw PatternSyntaxException, although there is not really something wrong with syntax.

I'll add @implNote to the other methods you mention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CSR will be updated once this PR stabilizes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlanBateman

I had previously discussed with @rgiulietti whether OOME or PatternSyntaxException was more appropriate. The issue is that you might try to compile a pattern that contains a character with N combining diacritics, and it might work fine. But if you change that character to have N+1 combining diacritics, it might throw OOME. There's no syntax difference, but rather the issue is hitting an internal implementation limit.

There's a bit of a precedent for throwing OOME in such cases. Various places in the library try to grow arrays. If the required array size is greater than MAX_VALUE, the library pre-emptively throws OOME without even trying to allocate the array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"In the the JDK Reference Implementation ..."

I'm still not sure of the right style for the JDK to refer to itself. This is really the "Java SE Reference Implementation". Or perhaps it should just be "OpenJDK". Regardless, this kind of wording would stick out in a funny way, as it's not used very frequently. I'm content not having such a preamble. Having the text within @implNote is probably sufficient.

@rgiulietti
Copy link
Contributor Author

The CSR is ready for review.

@openjdk openjdk bot removed the csr Pull request needs approved CSR before integration label Jan 21, 2023
@openjdk
Copy link

openjdk bot commented Jan 21, 2023

@rgiulietti This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8300207: Add a pre-check for the number of canonical equivalent permutations in j.u.r.Pattern

Reviewed-by: smarks

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 128 new commits pushed to the master branch:

  • 3ea4eac: 8300817: The build is broken after JDK-8294693
  • cbfc069: 8300731: Avoid unnecessary array fill after creation in PaletteBuilder
  • 67b1c89: 8294693: Add Collections.shuffle overload that accepts RandomGenerator interface
  • c8dd758: 8300260: Remove metaprogramming/isSame.hpp
  • a6c2a2a: 8300692: GCC 12 reports some compiler warnings in bundled freetype
  • bb42e61: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder
  • 06394ee: 8300590: [JVMCI] BytecodeFrame.equals is broken
  • 5331a3e: 8298908: Instrument Metaspace for ASan
  • e1ee672: 8300725: Improve performance of ColorConvertOp for default destinations with alpha
  • 7c2f77a: 8300584: Accelerate AVX-512 CRC32C for small buffers
  • ... and 118 more: https://git.openjdk.org/jdk/compare/289aed465e9b8449938d4cdb515748e7aca1d070...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 21, 2023
@rgiulietti
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Jan 22, 2023

Going to push as commit 030b071.
Since your change was applied there have been 129 commits pushed to the master branch:

  • 7ced08d: 8300638: Tier1 IR Test failure after JDK-8298632 on macosx-x64-debug
  • 3ea4eac: 8300817: The build is broken after JDK-8294693
  • cbfc069: 8300731: Avoid unnecessary array fill after creation in PaletteBuilder
  • 67b1c89: 8294693: Add Collections.shuffle overload that accepts RandomGenerator interface
  • c8dd758: 8300260: Remove metaprogramming/isSame.hpp
  • a6c2a2a: 8300692: GCC 12 reports some compiler warnings in bundled freetype
  • bb42e61: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder
  • 06394ee: 8300590: [JVMCI] BytecodeFrame.equals is broken
  • 5331a3e: 8298908: Instrument Metaspace for ASan
  • e1ee672: 8300725: Improve performance of ColorConvertOp for default destinations with alpha
  • ... and 119 more: https://git.openjdk.org/jdk/compare/289aed465e9b8449938d4cdb515748e7aca1d070...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jan 22, 2023
@openjdk openjdk bot closed this Jan 22, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 22, 2023
@openjdk
Copy link

openjdk bot commented Jan 22, 2023

@rgiulietti Pushed as commit 030b071.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@rgiulietti rgiulietti deleted the JDK-8300207 branch January 23, 2023 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated
3 participants