Skip to content
This repository was archived by the owner on Sep 19, 2023. It is now read-only.
/ jdk19 Public archive

8290250: Shenandoah: disable Loom for iu mode #140

Closed

Conversation

zhengyu123
Copy link
Contributor

@zhengyu123 zhengyu123 commented Jul 13, 2022

Please review this trivial patch to disable Loom for Shenandoah iu mode (an experimental feature).

Test:

  • hotspot_gc_shenandoah
  • jdk_loom with Shenandoah + iu
  • hotspot_loom with Shenandoah + iu

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk19 pull/140/head:pull/140
$ git checkout pull/140

Update a local copy of the PR:
$ git checkout pull/140
$ git pull https://git.openjdk.org/jdk19 pull/140/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 140

View PR using the GUI difftool:
$ git pr show -t 140

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk19/pull/140.diff

Sorry, something went wrong.

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 13, 2022

👋 Welcome back zgu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 13, 2022
@openjdk
Copy link

openjdk bot commented Jul 13, 2022

@zhengyu123 The following labels will be automatically applied to this pull request:

  • hotspot-gc
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot-gc hotspot-gc-dev@openjdk.org shenandoah shenandoah-dev@openjdk.org labels Jul 13, 2022
@mlbridge
Copy link

mlbridge bot commented Jul 13, 2022

Webrevs

@fisk
Copy link

fisk commented Jul 13, 2022

Speaking of IU mode, how does Shenandoah IU mode deal with barrier elision on newly allocated objects? It was only valid in CMS because the entire young generation was traversed when terminating the concurrent marking, so that unvisited pointers could be found.

@zhengyu123
Copy link
Contributor Author

Speaking of IU mode, how does Shenandoah IU mode deal with barrier elision on newly allocated objects? It was only valid in CMS because the entire young generation was traversed when terminating the concurrent marking, so that unvisited pointers could be found.

Shenandoah SATB/IU both elide barrier on newly allocated objects. SATB barrier should guarantee they can only refer marked objects. Do I miss anything?

Copy link
Contributor

@rkennke rkennke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks good. Thank you.

@openjdk
Copy link

openjdk bot commented Jul 13, 2022

@zhengyu123 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8290250: Shenandoah: disable Loom for iu mode

Reviewed-by: rkennke

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 1 new commit pushed to the master branch:

  • 73b83e0: 8290207: Missing notice in dom.md

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jul 13, 2022
@fisk
Copy link

fisk commented Jul 13, 2022

Speaking of IU mode, how does Shenandoah IU mode deal with barrier elision on newly allocated objects? It was only valid in CMS because the entire young generation was traversed when terminating the concurrent marking, so that unvisited pointers could be found.

Shenandoah SATB/IU both elide barrier on newly allocated objects. SATB barrier should guarantee they can only refer marked objects. Do I miss anything?

Yes, I believe so. For a SATB collector, it is okay to elide the barrier on newly allocated objects, because they were not part of the snapshot-of-the-beginning, and all new objects are implicitly alive and don't need to be visited.

However, with an IU scheme, that property does not translate.

Consider that you have a particular object graph when marking starts, then one mutator loads an object from the graph, and clears the field such that it is only reachable from the roots of said murator thread. Then the mutator writes the object to a newly allocated object without barriers and discards the root. Now, the reference is only reachable from the newly allocated object.

So for this elision to be valid with IU, the GC has to visit newly allocated objects again before terminating, which e.g. CMS indeed did, by doing a young collection in the safepoint that terminated old marking, but I don't know that Shenandoah does anything like that.

@zhengyu123
Copy link
Contributor Author

Speaking of IU mode, how does Shenandoah IU mode deal with barrier elision on newly allocated objects? It was only valid in CMS because the entire young generation was traversed when terminating the concurrent marking, so that unvisited pointers could be found.

Shenandoah SATB/IU both elide barrier on newly allocated objects. SATB barrier should guarantee they can only refer marked objects. Do I miss anything?

Yes, I believe so. For a SATB collector, it is okay to elide the barrier on newly allocated objects, because they were not part of the snapshot-of-the-beginning, and all new objects are implicitly alive and don't need to be visited.

However, with an IU scheme, that property does not translate.

Consider that you have a particular object graph when marking starts, then one mutator loads an object from the graph, and clears the field such that it is only reachable from the roots of said murator thread. Then the mutator writes the object to a newly allocated object without barriers and discards the root. Now, the reference is only reachable from the newly allocated object.

So for this elision to be valid with IU, the GC has to visit newly allocated objects again before terminating, which e.g. CMS indeed did, by doing a young collection in the safepoint that terminated old marking, but I don't know that Shenandoah does anything like that.

Unfortunately, IU problem is beyond the barrier elision. Even I don't elide the barrier for IU, I still see failures.

@zhengyu123
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Jul 14, 2022

Going to push as commit c8e0315.
Since your change was applied there have been 4 commits pushed to the master branch:

  • fb27ddc: 8290252: Add TEST.properties to java/nio/channels/FileChannel and move tests out of largeMemory sub-dir
  • fd89ab8: 8288112: C2: Error: ShouldNotReachHere() in Type::typerr()
  • 2bf6285: 8290209: jcup.md missing additional text
  • 73b83e0: 8290207: Missing notice in dom.md

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jul 14, 2022
@openjdk openjdk bot closed this Jul 14, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jul 14, 2022
@openjdk
Copy link

openjdk bot commented Jul 14, 2022

@zhengyu123 Pushed as commit c8e0315.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@fisk
Copy link

fisk commented Jul 14, 2022

Speaking of IU mode, how does Shenandoah IU mode deal with barrier elision on newly allocated objects? It was only valid in CMS because the entire young generation was traversed when terminating the concurrent marking, so that unvisited pointers could be found.

Shenandoah SATB/IU both elide barrier on newly allocated objects. SATB barrier should guarantee they can only refer marked objects. Do I miss anything?

Yes, I believe so. For a SATB collector, it is okay to elide the barrier on newly allocated objects, because they were not part of the snapshot-of-the-beginning, and all new objects are implicitly alive and don't need to be visited.

However, with an IU scheme, that property does not translate.

Consider that you have a particular object graph when marking starts, then one mutator loads an object from the graph, and clears the field such that it is only reachable from the roots of said murator thread. Then the mutator writes the object to a newly allocated object without barriers and discards the root. Now, the reference is only reachable from the newly allocated object.

So for this elision to be valid with IU, the GC has to visit newly allocated objects again before terminating, which e.g. CMS indeed did, by doing a young collection in the safepoint that terminated old marking, but I don't know that Shenandoah does anything like that.

Unfortunately, IU problem is beyond the barrier elision. Even I don't elide the barrier for IU, I still see failures.

Right. Loom assumes that write barriers are not needed on newly allocated objects, so that it can copy oops from the stack to the heap. Shenandoah with IU mode is the only collector where that is not true, due to the reason I outlined. What I'm saying isn't that turning off barrier elision in IU mode will fix your current loom crash. I'm saying that even without loom, the barrier elision is invalid in general. I thought I'd let you know, in case you want Shenandoah with IU mode to work reliably at all.

@rkennke
Copy link
Contributor

rkennke commented Jul 14, 2022

Speaking of IU mode, how does Shenandoah IU mode deal with barrier elision on newly allocated objects? It was only valid in CMS because the entire young generation was traversed when terminating the concurrent marking, so that unvisited pointers could be found.

Shenandoah SATB/IU both elide barrier on newly allocated objects. SATB barrier should guarantee they can only refer marked objects. Do I miss anything?

Yes, I believe so. For a SATB collector, it is okay to elide the barrier on newly allocated objects, because they were not part of the snapshot-of-the-beginning, and all new objects are implicitly alive and don't need to be visited.

However, with an IU scheme, that property does not translate.

Consider that you have a particular object graph when marking starts, then one mutator loads an object from the graph, and clears the field such that it is only reachable from the roots of said murator thread. Then the mutator writes the object to a newly allocated object without barriers and discards the root. Now, the reference is only reachable from the newly allocated object.

So for this elision to be valid with IU, the GC has to visit newly allocated objects again before terminating, which e.g. CMS indeed did, by doing a young collection in the safepoint that terminated old marking, but I don't know that Shenandoah does anything like that.

IIRC, in Shenandoah's IU mode we treat new objects as implicitely alive, much like we do with SATB mode. Therefore it should be good to elide the barriers there. I know that this is not the classical I-U scheme that I believe CMS followed. There has been a paper (I believe it is that paper [1]) that showed that it's orthogonal issues whether we follow new or old references and whether or not we treat new objects alive or not. Treating new objects as live has the advantage that it solves the termination problem.

[1] https://www.cs.technion.ac.il/~yahave/papers/pldi06-cgc.pdf

@ysramakrishna
Copy link
Member

ysramakrishna commented Jul 14, 2022

CMS collected only old gen, and direct allocations in old gen were grey, and so would updates of black objects in old gen. I assume new objects in Shenandoah would also be grey as would be updates of those objects? (That would be the classical IU scheme in my understanding.)

I should read the Vechev paper though. Is the Shenandoah treatment of new allocations described anywhere for the IU case?

@mlbridge
Copy link

mlbridge bot commented Jul 15, 2022

Mailing list message from Erik Osterlund on shenandoah-dev:

Hi Roman,

The problem isn?t the new object itself. It?s dangling pointers created from it. Sure the new object will be kept alive. The problem is when you store the last reference to *another* object, into a field in the new object, but without barriers that would normally keep that other object alive, as I described in the example. Then you are kind of toast. With SATB anything you stored into the new object is already guaranteed to be kept alive as it is either a new object (implicitly live), or something from the snapshot, which pre-write barriers ensure is fully remembered. The pre-write barrier would only find null which by definition has no business being marked.

I will try to explain my example more clearly step by step.

1. There is an object A, reachable from a root of thread T1. A has a field pointing at another object B.

2. Marking starts. A is matked as a root, but before B is marked concurrently?

3. T1 loads B from the field of A.

4. T1 stores null to the field of A. B is now only reachable from a private root of T1, and is not visible to concurrent marking, and storing null did not keep B alive with IU barriers.

5. T1 allocates a new object C. It is implicitly alive and won?t be traced through.

6. T1 stores a reference from C to B. No barriers are used as C is a new object. (This is the illegal part causing a potential crash)

7. The root from T1 to B is discarded.

8. Concurrent marking traces from A but finds only null from step 4.

9. Marking terminates.

You now have a dangling pointer from C to B that has not been marked through, and the marking has terminated. As I said, the elision in CMS was only valid because the entire young generation was traced again when old marking terminates, which would have found the reference from C to B, and kept it alive. But you don?t do that, so it seems like it will crash instead. That?s why store barrier elision on newly allocated objects in general is seemingly unsound with Shenandoah IU mode.

Hope this helps.

/Erik

@mlbridge
Copy link

mlbridge bot commented Jul 15, 2022

Mailing list message from Roman Kennke on shenandoah-dev:

Hey Erik,

I just had a closer look at the IU barriers again. First of all, we are
*not* using the same barriers for IU and SATB. That's why disabling
barrier elision does nothing. Instead, we emit a c2 node for the IU
barrier, and optimize on that and expand it during barrier expansion
pass (together with the LRBs). The barrier consumes and replaces values
being stored into fields and array elements. If such a value is provably
a newly created object, then we don't need the barrier there - the
object is implicitely live already.

Another difference between IU and SATB is that we don't need to
keep-alive referents when somebody calls Reference.get() during
concurrent marking. If such a referent goes out of scope again, then
it's fine trivially. If the referent is stored anywhere, we will catch
it with the barrier. If it only remains in a local variable by the end
of concurrent marking, we will catch it in final remark. Please punch
holes into this reasoning if you find anything. ;-)

Thanks,
Roman

Hi Roman,

The problem isn?t the new object itself. It?s dangling pointers created from it. Sure the new object will be kept alive. The problem is when you store the last reference to *another* object, into a field in the new object, but without barriers that would normally keep that other object alive, as I described in the example. Then you are kind of toast. With SATB anything you stored into the new object is already guaranteed to be kept alive as it is either a new object (implicitly live), or something from the snapshot, which pre-write barriers ensure is fully remembered. The pre-write barrier would only find null which by definition has no business being marked.

I will try to explain my example more clearly step by step.

1. There is an object A, reachable from a root of thread T1. A has a field pointing at another object B.

2. Marking starts. A is matked as a root, but before B is marked concurrently?

3. T1 loads B from the field of A.

4. T1 stores null to the field of A. B is now only reachable from a private root of T1, and is not visible to concurrent marking, and storing null did not keep B alive with IU barriers.

5. T1 allocates a new object C. It is implicitly alive and won?t be traced through.

6. T1 stores a reference from C to B. No barriers are used as C is a new object. (This is the illegal part causing a potential crash)

7. The root from T1 to B is discarded.

8. Concurrent marking traces from A but finds only null from step 4.

9. Marking terminates.

You now have a dangling pointer from C to B that has not been marked through, and the marking has terminated. As I said, the elision in CMS was only valid because the entire young generation was traced again when old marking terminates, which would have found the reference from C to B, and kept it alive. But you don?t do that, so it seems like it will crash instead. That?s why store barrier elision on newly allocated objects in general is seemingly unsound with Shenandoah IU mode.

Hope this helps.

/Erik

@mlbridge
Copy link

mlbridge bot commented Jul 15, 2022

Mailing list message from Erik Osterlund on shenandoah-dev:

The problem is that when you store the edge from C to B in my example, at step 6, the newly allocated object C is implicitly live, but the new value being written to the field (B) is not implicitly live and needs to be explicitly kept alive in an IU model. But it isn?t if the barrier is elided. This is a difference to SATB where B is also implicitly live due to being part of the snapshot-at-the-beginning, which is why eliding the barrier has no effect with SATB.

/Erik

On 14 Jul 2022, at 21:49, Roman Kennke <rkennke at redhat.com> wrote:

?Hey Erik,

I just had a closer look at the IU barriers again. First of all, we are *not* using the same barriers for IU and SATB. That's why disabling barrier elision does nothing. Instead, we emit a c2 node for the IU barrier, and optimize on that and expand it during barrier expansion pass (together with the LRBs). The barrier consumes and replaces values being stored into fields and array elements. If such a value is provably a newly created object, then we don't need the barrier there - the object is implicitely live already.

Another difference between IU and SATB is that we don't need to keep-alive referents when somebody calls Reference.get() during concurrent marking. If such a referent goes out of scope again, then it's fine trivially. If the referent is stored anywhere, we will catch it with the barrier. If it only remains in a local variable by the end of concurrent marking, we will catch it in final remark. Please punch holes into this reasoning if you find anything. ;-)

Thanks,
Roman

Hi Roman,
The problem isn?t the new object itself. It?s dangling pointers created from it. Sure the new object will be kept alive. The problem is when you store the last reference to *another* object, into a field in the new object, but without barriers that would normally keep that other object alive, as I described in the example. Then you are kind of toast. With SATB anything you stored into the new object is already guaranteed to be kept alive as it is either a new object (implicitly live), or something from the snapshot, which pre-write barriers ensure is fully remembered. The pre-write barrier would only find null which by definition has no business being marked.
I will try to explain my example more clearly step by step.
1. There is an object A, reachable from a root of thread T1. A has a field pointing at another object B.
2. Marking starts. A is matked as a root, but before B is marked concurrently?
3. T1 loads B from the field of A.
4. T1 stores null to the field of A. B is now only reachable from a private root of T1, and is not visible to concurrent marking, and storing null did not keep B alive with IU barriers.
5. T1 allocates a new object C. It is implicitly alive and won?t be traced through.
6. T1 stores a reference from C to B. No barriers are used as C is a new object. (This is the illegal part causing a potential crash)
7. The root from T1 to B is discarded.
8. Concurrent marking traces from A but finds only null from step 4.
9. Marking terminates.
You now have a dangling pointer from C to B that has not been marked through, and the marking has terminated. As I said, the elision in CMS was only valid because the entire young generation was traced again when old marking terminates, which would have found the reference from C to B, and kept it alive. But you don?t do that, so it seems like it will crash instead. That?s why store barrier elision on newly allocated objects in general is seemingly unsound with Shenandoah IU mode.
Hope this helps.
/Erik

@mlbridge
Copy link

mlbridge bot commented Jul 15, 2022

Mailing list message from Roman Kennke on shenandoah-dev:

No wait. The barrier is elided when B is new, but when C is new. This
works differently than SATB. Makes sense?

Roman

The problem is that when you store the edge from C to B in my example, at step 6, the newly allocated object C is implicitly live, but the new value being written to the field (B) is not implicitly live and needs to be explicitly kept alive in an IU model. But it isn?t if the barrier is elided. This is a difference to SATB where B is also implicitly live due to being part of the snapshot-at-the-beginning, which is why eliding the barrier has no effect with SATB.

/Erik

On 14 Jul 2022, at 21:49, Roman Kennke <rkennke at redhat.com> wrote:

?Hey Erik,

I just had a closer look at the IU barriers again. First of all, we are *not* using the same barriers for IU and SATB. That's why disabling barrier elision does nothing. Instead, we emit a c2 node for the IU barrier, and optimize on that and expand it during barrier expansion pass (together with the LRBs). The barrier consumes and replaces values being stored into fields and array elements. If such a value is provably a newly created object, then we don't need the barrier there - the object is implicitely live already.

Another difference between IU and SATB is that we don't need to keep-alive referents when somebody calls Reference.get() during concurrent marking. If such a referent goes out of scope again, then it's fine trivially. If the referent is stored anywhere, we will catch it with the barrier. If it only remains in a local variable by the end of concurrent marking, we will catch it in final remark. Please punch holes into this reasoning if you find anything. ;-)

Thanks,
Roman

Hi Roman,
The problem isn?t the new object itself. It?s dangling pointers created from it. Sure the new object will be kept alive. The problem is when you store the last reference to *another* object, into a field in the new object, but without barriers that would normally keep that other object alive, as I described in the example. Then you are kind of toast. With SATB anything you stored into the new object is already guaranteed to be kept alive as it is either a new object (implicitly live), or something from the snapshot, which pre-write barriers ensure is fully remembered. The pre-write barrier would only find null which by definition has no business being marked.
I will try to explain my example more clearly step by step.
1. There is an object A, reachable from a root of thread T1. A has a field pointing at another object B.
2. Marking starts. A is matked as a root, but before B is marked concurrently?
3. T1 loads B from the field of A.
4. T1 stores null to the field of A. B is now only reachable from a private root of T1, and is not visible to concurrent marking, and storing null did not keep B alive with IU barriers.
5. T1 allocates a new object C. It is implicitly alive and won?t be traced through.
6. T1 stores a reference from C to B. No barriers are used as C is a new object. (This is the illegal part causing a potential crash)
7. The root from T1 to B is discarded.
8. Concurrent marking traces from A but finds only null from step 4.
9. Marking terminates.
You now have a dangling pointer from C to B that has not been marked through, and the marking has terminated. As I said, the elision in CMS was only valid because the entire young generation was traced again when old marking terminates, which would have found the reference from C to B, and kept it alive. But you don?t do that, so it seems like it will crash instead. That?s why store barrier elision on newly allocated objects in general is seemingly unsound with Shenandoah IU mode.
Hope this helps.
/Erik

@mlbridge
Copy link

mlbridge bot commented Jul 15, 2022

Mailing list message from Erik Osterlund on shenandoah-dev:

Hi Roman,

We seem to agree that eliding barriers purely on the basis that the containing object of a field is newly allocated, is invalid. But you seem to say that you do not elide barriers on e.g. initializing stores. If that is the case, all is well. However?
Last time I looked at the store capturing code of C2 for optimizing initializing stores, it emits normal stores with barriers and then in InitializeNode::capture_store it finds stores that are initializing stores for the new object (C), finds and clones the plain store (excluding barriers), and cuts out the old initializing store with IGVN, which should make the IU barrier not ?useful? (only used by dead code), and hence be removed in the useless barrier elimination phase, having the implicit effect of eliding barriers on initializing stores. I think that was indeed the intention of cloning the plain store, and cutting out the old store + barriers. But that isn?t legal with IU. If this works, I don?t see how. Did I miss something?

/Erik

On 14 Jul 2022, at 22:11, Roman Kennke <rkennke at redhat.com> wrote:

?No wait. The barrier is elided when B is new, but when C is new. This works differently than SATB. Makes sense?

Roman

The problem is that when you store the edge from C to B in my example, at step 6, the newly allocated object C is implicitly live, but the new value being written to the field (B) is not implicitly live and needs to be explicitly kept alive in an IU model. But it isn?t if the barrier is elided. This is a difference to SATB where B is also implicitly live due to being part of the snapshot-at-the-beginning, which is why eliding the barrier has no effect with SATB.
/Erik

On 14 Jul 2022, at 21:49, Roman Kennke <rkennke at redhat.com> wrote:

?Hey Erik,

I just had a closer look at the IU barriers again. First of all, we are *not* using the same barriers for IU and SATB. That's why disabling barrier elision does nothing. Instead, we emit a c2 node for the IU barrier, and optimize on that and expand it during barrier expansion pass (together with the LRBs). The barrier consumes and replaces values being stored into fields and array elements. If such a value is provably a newly created object, then we don't need the barrier there - the object is implicitely live already.

Another difference between IU and SATB is that we don't need to keep-alive referents when somebody calls Reference.get() during concurrent marking. If such a referent goes out of scope again, then it's fine trivially. If the referent is stored anywhere, we will catch it with the barrier. If it only remains in a local variable by the end of concurrent marking, we will catch it in final remark. Please punch holes into this reasoning if you find anything. ;-)

Thanks,
Roman

Hi Roman,
The problem isn?t the new object itself. It?s dangling pointers created from it. Sure the new object will be kept alive. The problem is when you store the last reference to *another* object, into a field in the new object, but without barriers that would normally keep that other object alive, as I described in the example. Then you are kind of toast. With SATB anything you stored into the new object is already guaranteed to be kept alive as it is either a new object (implicitly live), or something from the snapshot, which pre-write barriers ensure is fully remembered. The pre-write barrier would only find null which by definition has no business being marked.
I will try to explain my example more clearly step by step.
1. There is an object A, reachable from a root of thread T1. A has a field pointing at another object B.
2. Marking starts. A is matked as a root, but before B is marked concurrently?
3. T1 loads B from the field of A.
4. T1 stores null to the field of A. B is now only reachable from a private root of T1, and is not visible to concurrent marking, and storing null did not keep B alive with IU barriers.
5. T1 allocates a new object C. It is implicitly alive and won?t be traced through.
6. T1 stores a reference from C to B. No barriers are used as C is a new object. (This is the illegal part causing a potential crash)
7. The root from T1 to B is discarded.
8. Concurrent marking traces from A but finds only null from step 4.
9. Marking terminates.
You now have a dangling pointer from C to B that has not been marked through, and the marking has terminated. As I said, the elision in CMS was only valid because the entire young generation was traced again when old marking terminates, which would have found the reference from C to B, and kept it alive. But you don?t do that, so it seems like it will crash instead. That?s why store barrier elision on newly allocated objects in general is seemingly unsound with Shenandoah IU mode.
Hope this helps.
/Erik

@mlbridge
Copy link

mlbridge bot commented Jul 15, 2022

Mailing list message from Erik Osterlund on shenandoah-dev:

Hi Ramki,

If memory serves well, CMS used ?deferred card marking?, to colour allocations into the old generation grey, at a subsequent allocation or in a subsequent safepoint. The important effect of that is that they were coloured grey, before marking terminates, yet importantly after the initializing stores I just described, which would have their barriers elided in C2. So the GC would read the fields after the initializing stores have become visible.

As for Shenandoah IU, I?m not sure if the approach is described somewhere. But my understanding is that instead of first writing a new value to a field, and then handing the address of the field to the GC, so that it can read this new value, Shenandoah hands the new value to the GC directly. So rather than colouring the containing object of the field grey, it colours the new value grey. New allocations are black. And that is fine if you don?t elide barriers on newly allocated objects. If initializing stores elide barriers though, the newly allocated object needs to be asynchronously grey, like CMS (due to deferred card marking for old objects and due to subsequent YC for young objects).

There is AFAICT no such code, and I think the assumption is that elisions are not happening on newly allocated objects, but I think that they are happening, for initializing stores in C2. At least I can?t see what would stop that machinery from removing the barriers.

/Erik

@zhengyu123
Copy link
Contributor Author

The problem is beyond C2, Skynet fails with -Xint:

Referenced from:
  interior location: 0x00000000e4de6df0
  0x00000000e4de6dc8 - klass 0x0000000800036ec0 jdk.internal.vm.StackChunk
        allocated after mark start
    not after update watermark
        marked strong
        marked weak
    not in collection set
  mark: mark(is_neutral no_hash age=0)
  region: | 1335|R  |BTE     e4dc0000,     e4e00000,     e4e00000|TAMS     e4dc0000|UWM     e4e00000|U   256K|T   256K|G     0B|S     0B|L   256K|CP   0

Object:
  0x00000000e4c2e2b0 - klass 0x0000000800036ec0 jdk.internal.vm.StackChunk
    not allocated after mark start
    not after update watermark
    not marked strong
    not marked weak
    not in collection set
  mark: mark(is_neutral no_hash age=0)
  region: | 1328|R  |BTE     e4c00000,     e4c40000,     e4c40000|TAMS     e4c40000|UWM     e4c40000|U   256K|T   256K|G     0B|S     0B|L 93264B|CP   0

Forwardee:
  (the object itself)

So, it seems that IU mode either misses iu barrier on load or we need to treat new object as grey.

@mlbridge
Copy link

mlbridge bot commented Jul 16, 2022

Mailing list message from Roman Kennke on shenandoah-dev:

Hi Erik,

Hi Roman,

We seem to agree that eliding barriers purely on the basis that the containing object of a field is newly allocated, is invalid. But you seem to say that you do not elide barriers on e.g. initializing stores. If that is the case, all is well. However?
Last time I looked at the store capturing code of C2 for optimizing initializing stores, it emits normal stores with barriers and then in InitializeNode::capture_store it finds stores that are initializing stores for the new object (C), finds and clones the plain store (excluding barriers), and cuts out the old initializing store with IGVN, which should make the IU barrier not ?useful? (only used by dead code), and hence be removed in the useless barrier elimination phase, having the implicit effect of eliding barriers on initializing stores. I think that was indeed the intention of cloning the plain store, and cutting out the old store + barriers. But that isn?t legal with IU. If this works, I don?t see how. Did I miss something?

Ok, this seems indeed problematic.

A way to get rid of it easily would be -XX:-ReduceFieldZeroing, right?
But I suspect that does more than is needed. I need to think about this
a little more.

Thanks for pointing this out!

/Roman

/Erik

On 14 Jul 2022, at 22:11, Roman Kennke <rkennke at redhat.com> wrote:

?No wait. The barrier is elided when B is new, but when C is new. This works differently than SATB. Makes sense?

Roman

The problem is that when you store the edge from C to B in my example, at step 6, the newly allocated object C is implicitly live, but the new value being written to the field (B) is not implicitly live and needs to be explicitly kept alive in an IU model. But it isn?t if the barrier is elided. This is a difference to SATB where B is also implicitly live due to being part of the snapshot-at-the-beginning, which is why eliding the barrier has no effect with SATB.
/Erik

On 14 Jul 2022, at 21:49, Roman Kennke <rkennke at redhat.com> wrote:

?Hey Erik,

I just had a closer look at the IU barriers again. First of all, we are *not* using the same barriers for IU and SATB. That's why disabling barrier elision does nothing. Instead, we emit a c2 node for the IU barrier, and optimize on that and expand it during barrier expansion pass (together with the LRBs). The barrier consumes and replaces values being stored into fields and array elements. If such a value is provably a newly created object, then we don't need the barrier there - the object is implicitely live already.

Another difference between IU and SATB is that we don't need to keep-alive referents when somebody calls Reference.get() during concurrent marking. If such a referent goes out of scope again, then it's fine trivially. If the referent is stored anywhere, we will catch it with the barrier. If it only remains in a local variable by the end of concurrent marking, we will catch it in final remark. Please punch holes into this reasoning if you find anything. ;-)

Thanks,
Roman

Hi Roman,
The problem isn?t the new object itself. It?s dangling pointers created from it. Sure the new object will be kept alive. The problem is when you store the last reference to *another* object, into a field in the new object, but without barriers that would normally keep that other object alive, as I described in the example. Then you are kind of toast. With SATB anything you stored into the new object is already guaranteed to be kept alive as it is either a new object (implicitly live), or something from the snapshot, which pre-write barriers ensure is fully remembered. The pre-write barrier would only find null which by definition has no business being marked.
I will try to explain my example more clearly step by step.
1. There is an object A, reachable from a root of thread T1. A has a field pointing at another object B.
2. Marking starts. A is matked as a root, but before B is marked concurrently?
3. T1 loads B from the field of A.
4. T1 stores null to the field of A. B is now only reachable from a private root of T1, and is not visible to concurrent marking, and storing null did not keep B alive with IU barriers.
5. T1 allocates a new object C. It is implicitly alive and won?t be traced through.
6. T1 stores a reference from C to B. No barriers are used as C is a new object. (This is the illegal part causing a potential crash)
7. The root from T1 to B is discarded.
8. Concurrent marking traces from A but finds only null from step 4.
9. Marking terminates.
You now have a dangling pointer from C to B that has not been marked through, and the marking has terminated. As I said, the elision in CMS was only valid because the entire young generation was traced again when old marking terminates, which would have found the reference from C to B, and kept it alive. But you don?t do that, so it seems like it will crash instead. That?s why store barrier elision on newly allocated objects in general is seemingly unsound with Shenandoah IU mode.
Hope this helps.
/Erik

@mlbridge
Copy link

mlbridge bot commented Jul 16, 2022

Mailing list message from Erik Osterlund on shenandoah-dev:

Hi Zhengyu,

I suppose requires_barriers needs to return true when marking is active with IU, as you need barriers even on newly allocated objects then, as I was discussing.

/Erik

@zhengyu123
Copy link
Contributor Author

Mailing list message from Erik Osterlund on shenandoah-dev:

Hi Zhengyu,

I suppose requires_barriers needs to return true when marking is active with IU, as you need barriers even on newly allocated objects then, as I was discussing.

/Erik

Tried that, still not working.
I can not see how it can be Loom specific? but seems it's only exposed by Skynet

Thanks,

-Zhengyu

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated shenandoah shenandoah-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

None yet

4 participants