Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8293232: Fix race condition in pkcs11 SessionManager #10125

Closed
wants to merge 2 commits into from

Conversation

zzambers
Copy link
Contributor

@zzambers zzambers commented Sep 1, 2022

There is a race condition in JDK's SessionManager, which can lead to random exceptions.

Exception:

javax.net.ssl.SSLException: Internal error: close session with active objects
	at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:133)
	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:371)
	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:314)
	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:309)
	at java.base/sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1707)
	at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1080)
	at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:971)
	at SSLSocketServer.serverLoop(SSLSocketServer.java:133)
	at SSLSocketServer$1.run(SSLSocketServer.java:75)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.security.ProviderException: Internal error: close session with active objects
	at jdk.crypto.cryptoki/sun.security.pkcs11.Session.close(Session.java:127)
	at jdk.crypto.cryptoki/sun.security.pkcs11.Session.close(Session.java:114)
	at jdk.crypto.cryptoki/sun.security.pkcs11.SessionManager.closeSession(SessionManager.java:237)
	at jdk.crypto.cryptoki/sun.security.pkcs11.SessionManager$Pool.release(SessionManager.java:270)
	at jdk.crypto.cryptoki/sun.security.pkcs11.SessionManager.demoteObjSession(SessionManager.java:210)
	at jdk.crypto.cryptoki/sun.security.pkcs11.Session.removeObject(Session.java:101)
	at jdk.crypto.cryptoki/sun.security.pkcs11.SessionKeyRef.updateNativeKey(P11Key.java:1396)
	at jdk.crypto.cryptoki/sun.security.pkcs11.SessionKeyRef.removeNativeKey(P11Key.java:1377)
	at jdk.crypto.cryptoki/sun.security.pkcs11.NativeKeyHolder.releaseKeyID(P11Key.java:1329)
	at jdk.crypto.cryptoki/sun.security.pkcs11.P11Key.releaseKeyID(P11Key.java:156)
	at jdk.crypto.cryptoki/sun.security.pkcs11.P11AEADCipher.reset(P11AEADCipher.java:529)
	at jdk.crypto.cryptoki/sun.security.pkcs11.P11AEADCipher.ensureInitialized(P11AEADCipher.java:436)
	at jdk.crypto.cryptoki/sun.security.pkcs11.P11AEADCipher.implDoFinal(P11AEADCipher.java:732)
	at jdk.crypto.cryptoki/sun.security.pkcs11.P11AEADCipher.engineDoFinal(P11AEADCipher.java:624)
	at java.base/javax.crypto.Cipher.doFinal(Cipher.java:2500)
	at java.base/sun.security.ssl.SSLCipher$T12GcmReadCipherGenerator$GcmReadCipher.decrypt(SSLCipher.java:1659)
	at java.base/sun.security.ssl.SSLSocketInputRecord.decodeInputRecord(SSLSocketInputRecord.java:260)
	at java.base/sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:181)
	at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:111)
	at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1508)
	at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1479)
	at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1064)
	... 4 more

Reproducibility:
I started getting this exception quite reliably on JDK17 on my machine with one particular test setup using ssl-tests testsuite. Unfortunately setup itself needed some RH specific patches and also ability to reproduce depends on other factors such number keys in keystore, machine where testing was performed... I tried a bit to create some reproducer, but I couldn't find a way to easily reproduce this issue :(

Problem:
SunPKCS11 provider does session pooling. This is done in SessionManager [1] (one per SunPKCS11 provider). Released sessions are kept by SessionManager for a while, for reuse (in limited number). This however is a bit complicated as some sessions can own objects (e.g. keys). So there are actually 2 pools. One for sessions with objects ("objSessions") and one for sessions without objects ("opSessions"). This is because sessions without objects, which are not being used, can be safely closed (SessionManager only keeps around limited amount of these), while sessions with objects cannot be safely closed (until all objects are removed from them). Session manager has methods for getting Session for given purpose (object creation or just doing other operations), prioritizing appropriate pool. Each session has counter (called "createdObjects") to track how many objects it owns. When session is being returned to pool this counter is checked and session is placed to appropriate pool. Also when counter for some Session in "objSessions" pool reaches zero it is moved ("demoted") to "opSessions" pool.

And here comes complicated part. As far as I understand it, Session.addObject() [2] (which increases "createdObjects" counter) is always being called by thread "holding" session which owns the created object. (That is: thread gets a session, uses it to create an object and calls Session.addObject() on that session to increase the counter, before returning the session to pool. See e.g.: [3]) However this is not true for Session.removeObject() [4]. (That is: thread gets session, which is not necessary the same one owning object being removed, performs object removal, but then calls Session.removeObject() on session which owned that object. See e.g.: [5]) That is Session.removeObject() can be called on Session which is in "objSessions" pool or which is being used be other thread. (object removal can happen as result of releasing key, either explicitly or as result of GC etc..).

And finally, there is a problem in code handling object removal from a session. Session.removeObject() [4] first checks if "createdObjects" counter reached zero. If so, it calls SessionManager.demoteObjSession(this) [6], which attempts to remove Session from objSessions pool, if session is successfully removed from there, meaning no other thread "holds" this session, session is put to opSessions pool, if not (meaning other thread "holds" it), method just returns, since that other thread puts this session to appropriate pool, when it is done with it by calling SessionManager.releaseSession(session).

There is race condition here. Consider following scenario:

// Thread T1 runs:
Session.removeObject() // [4]
createdObjects.decrementAndGet() // returns zero

// Thread T2 steps in (operating on the same session instance):
Session.addObject() // increases "createdObjects" counter [2]
SessionManager.releaseSession(session) // releases session to objSessions pool

// Thread T1 continues:
SessionManager.demoteObjSession(this) // [6]
objSessions.remove(session) // returns true
opSessions.release(session)  // puts session (with objects!) to opSessions pool
// if opSessions is already full, close of session with objects is attempted throwing Exception..

Fix:
SessionManager.demoteObjSession [6] method was changed, so that check for objects is done once again if session was successfully removed from "objSessions" pool (now that it is out of pool and other threads should not be adding objects to it). Based on this check session is either released to "opSessions" pool or returned to "objSessions" pool. This can be achieved by calling releaseSession(session) instead of opSessions.release(session).

Testing:
jdk_security tests passed for me locally with this change.
I have also tested this change on top of custom JDK17 build which allows scenario, where I can reproduce this issue. Problem got fixed.

[1] https://github.com/openjdk/jdk/blob/9444a081cc9873caa7b5c6a78df0d1aecda6e4f1/src/jdk.crypto.cryptoki/share/classes/sun/security/pkcs11/SessionManager.java
[2]


[3]
[4]
[5]
[6]


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8293232: Fix race condition in pkcs11 SessionManager

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/10125/head:pull/10125
$ git checkout pull/10125

Update a local copy of the PR:
$ git checkout pull/10125
$ git pull https://git.openjdk.org/jdk pull/10125/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 10125

View PR using the GUI difftool:
$ git pr show -t 10125

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/10125.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 1, 2022

👋 Welcome back zzambers! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@zzambers zzambers changed the title Fix race condition in pkcs11 SessionManager 8293232: Fix race condition in pkcs11 SessionManager Sep 1, 2022
@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 1, 2022
@openjdk
Copy link

openjdk bot commented Sep 1, 2022

@zzambers The following label will be automatically applied to this pull request:

  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the security security-dev@openjdk.org label Sep 1, 2022
@mlbridge
Copy link

mlbridge bot commented Sep 1, 2022

Webrevs

@valeriepeng
Copy link

Thanks for the suggested fix. I share your opinion about the potential race condition regarding demoting object session. Will take a look.

@@ -207,7 +207,7 @@ void demoteObjSession(Session session) {
// will be added to correct pool on release, nothing to do now
return;
}
opSessions.release(session);
releaseSession(session);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the described race condition, have you tried fixing it by adding a if-condition check before doing line 204-210, i.e. if (!session.hasObjects()) { .... }?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid putting check before line 204 would not solve the issue (just lowered it's likelihood). Problem is, that operation consisting of check for objects on a session and then removing it from objSessions pool is not atomic. Session still could be obtained from objSessions pool by other thread after session.hasObjects() was called, object added to it and released back to objSessions pool before objSessions.remove(session) is called. I think this check for objects can only be trusted after session was successfully removed from objSessions (that is, session was in objSessions pool (no tread "holds" it) and was removed).

Actually whole call of demoteObjSession method is already behind one check for zero objects (but that check cannot be trusted), and needs to be redone after objSessions.remove(session), because of problem described higher . See:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aware of the zero objects check in the reference above.
I am fine with the proposed fix then. Perhaps add a comment to this releaseSession() call to warn about this race condition.

@openjdk
Copy link

openjdk bot commented Sep 6, 2022

@zzambers This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8293232: Fix race condition in pkcs11 SessionManager

Reviewed-by: valeriep

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 94 new commits pushed to the master branch:

  • aff9a69: 8283224: Remove THREAD_NOT_ALIVE from possible JDWP error codes
  • 76df73b: 8293456: runtime/os/TestTracePageSizes.java sub-tests fail with "AssertionError: No memory range found for address: NNNN"
  • 32c7b62: 8293146: Strict DateTimeFormatter fails to report an invalid week 53
  • 02dce24: 8207166: jdk/jshell/JdiHangingLaunchExecutionControlTest.java - launch timeout
  • d36abbe: 8293496: ProblemList runtime/os/TestTracePageSizes.java on linux-x64
  • 1ee59ad: 8289798: Update to use jtreg 7
  • 5934669: 8292383: Create a SymbolHandle type to use for ResourceHashtable
  • 6ff4775: 8285487: AArch64: Do not generate unneeded trampolines for runtime calls
  • d696104: 4850101: Setting mnemonic to VK_F4 underlines the letter S in a button.
  • 14fd1b6: 8292921: Rewrite object field printer
  • ... and 84 more: https://git.openjdk.org/jdk/compare/3c1bda4bc3ad81ebabdd9ae05de53ff16f555027...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@valeriepeng) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 6, 2022
@zzambers
Copy link
Contributor Author

zzambers commented Sep 6, 2022

@valeriepeng, Thank you for your review
/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Sep 6, 2022
@openjdk
Copy link

openjdk bot commented Sep 6, 2022

@zzambers
Your change (at version 713f617) is now ready to be sponsored by a Committer.

@openjdk openjdk bot removed the sponsor Pull request is ready to be sponsored label Sep 7, 2022
@valeriepeng
Copy link

/sponsor

@openjdk
Copy link

openjdk bot commented Sep 7, 2022

@valeriepeng The PR has been updated since the change author (@zzambers) issued the integrate command - the author must perform this command again.

@valeriepeng
Copy link

@zzambers Your change (at version 713f617) is now ready to be sponsored by a Committer.

The bot says that you need to do "/integrate" command again before I can do "/sponsor".

@zzambers
Copy link
Contributor Author

zzambers commented Sep 7, 2022

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Sep 7, 2022
@openjdk
Copy link

openjdk bot commented Sep 7, 2022

@zzambers
Your change (at version cf49318) is now ready to be sponsored by a Committer.

@valeriepeng
Copy link

/sponsor

@openjdk
Copy link

openjdk bot commented Sep 7, 2022

Going to push as commit 1e031e6.
Since your change was applied there have been 95 commits pushed to the master branch:

  • 1080c4e: 8293508: ProblemList gc/metaspace/TestMetaspacePerfCounters.java#Epsilon-64
  • aff9a69: 8283224: Remove THREAD_NOT_ALIVE from possible JDWP error codes
  • 76df73b: 8293456: runtime/os/TestTracePageSizes.java sub-tests fail with "AssertionError: No memory range found for address: NNNN"
  • 32c7b62: 8293146: Strict DateTimeFormatter fails to report an invalid week 53
  • 02dce24: 8207166: jdk/jshell/JdiHangingLaunchExecutionControlTest.java - launch timeout
  • d36abbe: 8293496: ProblemList runtime/os/TestTracePageSizes.java on linux-x64
  • 1ee59ad: 8289798: Update to use jtreg 7
  • 5934669: 8292383: Create a SymbolHandle type to use for ResourceHashtable
  • 6ff4775: 8285487: AArch64: Do not generate unneeded trampolines for runtime calls
  • d696104: 4850101: Setting mnemonic to VK_F4 underlines the letter S in a button.
  • ... and 85 more: https://git.openjdk.org/jdk/compare/3c1bda4bc3ad81ebabdd9ae05de53ff16f555027...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 7, 2022
@openjdk openjdk bot closed this Sep 7, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 7, 2022
@openjdk openjdk bot removed the sponsor Pull request is ready to be sponsored label Sep 7, 2022
@openjdk
Copy link

openjdk bot commented Sep 7, 2022

@valeriepeng @zzambers Pushed as commit 1e031e6.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated security security-dev@openjdk.org
2 participants