-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) #12708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back mdoerr! A progress list of the required criteria for merging this PR into |
@TheRealMDoerr The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
In the most recent version of the Panama FFM API, any memory layout (including struct and padding layouts) are always byte aligned. |
I will do a more thorough review soon. Some preliminary comments:
FWIW, we have to do this for Windows vararg floats as well (here) This can be done by
I think supplying the WRT the extension of int -> long. This could potentially also be handled in Java by adding the conversion as a
Zero-length memory segments are supposed to be resized before they are written to or read from (see Zero-length memory segments). We shouldn't disable the check for them, as that would have far-reaching implications for the safety design of the memory access API. Can you explain a bit more about where/why/how the issue occurs? |
FYI: master...JornVernee:jdk:I2L (assuming |
/** | ||
* The {@code T*} native type. | ||
*/ | ||
public static final ValueLayout.OfAddress C_POINTER = ValueLayout.ADDRESS.withBitAlignment(64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is where the issue with the check in MemorySegment::copy
comes from. Note how other platforms add a call to asUnbounded
for the created layout, which makes any pointer boxed using this layout writable/readable (such as the in memory return pointer for upcalls).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the hint! You found it pretty quickly. I had missed that when rebasing my early prototype. Fixed.
@@ -1356,7 +1356,11 @@ static void copy(MemorySegment srcSegment, ValueLayout srcElementLayout, long sr | |||
} | |||
long size = elementCount * srcElementLayout.byteSize(); | |||
srcImpl.checkAccess(srcOffset, size, true); | |||
dstImpl.checkAccess(dstOffset, size, false); | |||
if (dstImpl instanceof NativeMemorySegmentImpl && dstImpl.byteSize() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Jorn said, this change is not acceptable, as it allows bulk copy disregarding the segment real size. In such cases, the issue is always a missing unsafe resize somewhere in the linker code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this workaround. I'm glad to get rid of it.
Thanks for looking into this port! |
Correct, |
Thanks a lot!
Maybe I need to think a bit more about it. I don't really like the extra cases for |
I have removed the size restriction for structs. Passing a struct consisting of 1 char works (on Little Endian). However, passing a struct consisting of 3 chars doesn't (getting IndexOutOfBoundsException: Out of bound access on segment MemorySegment). Neither on PPC64, nor on x86. Is that known or should I file a bug for that? |
I should add tests for the tricky corner cases like the following ones:
Can I add them to the existing libraries? If so, what is the correct naming scheme and what is needed to get them executed (adding the EXPORT alone is not sufficient). Or should I create a separate test for these cases? |
There are two kinds of tests for the linker - some tests (e.g. TestDowncallXYZ and TestUpcallXYZ) execute end to end test with several shapes, and make sure that things work. Then there are ABI specific tests (e.g. see TestXYZCallArranger). These latter tests are typically used to stress tests corners of specific ABIs - and they are easier to write as you can just provide the input (some function descriptor) then test that the resulting set of bindings is the expected one. This allows for much more in-depth testing. |
Would it be possible to generate a biding cast + move and the recognize the pattern in the HS backend and optimize it away as a |
The design philosophy has been to put as much as possible on the Java side, and there are a few reasons for that:
So... WRT efficiency, I think it depends. I've found in the past that adding a few more move instructions to the downcall stub didn't visibly affect performance. This might be because the CPU is good at just aliasing the registers instead of performing an actual move, or because it's just noise next to the membar we do on the return path. Ultimately, I don't think it matters much for performance, though (you could measure). I think the maintainability/future-proofing from implementing in Java is more important. |
I think the same arguments apply here. I'll have a more thorough look at the patch and then get back to you on this. |
I think you might be running into: https://bugs.openjdk.org/browse/JDK-8303017 which was recently found. (If you have a simpler test case please add it to the JBS issue, in a comment) Edit: I've found a simpler test case as well. It's an issue with structs that are not the size of a power of two, since we can only read/write a power of two at a time. And I see now what your comment about |
At the moment, there is a forced indirection between the Java code and downcall stub, so the JIT can not do any optimizations that have to take both sides into account. The downcall stub is opaque to the JIT. (though, perhaps in the long run we can add intrinsification of |
I meant generating |
Oh, sorry, I see what you mean now. But yeah, the VM stub only handles moves, and I think we want to keep it that way (for reasons outlined). The VM stub can be viewed as a low-level primitive that accepts a set of register/stack values, and moves them into the corresponding locations. It's not really supposed to be doing any kind of processing of the values it receives or returns. |
Some more remarks about other issues:
|
Changing the default is fine by me. |
… instructions for moving between FP and GP reg. Improve comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Martin,
finally I've completed a pass over the hotspot part of the port. This seems a good point to share the few comments and questions I've collected so far.
In general the changes do look very good. Hardly any shared code changes.
Nice work!
Cheers, Richard.
} | ||
|
||
allocated_frame_size = align_up(allocated_frame_size, StackAlignmentInBytes); | ||
_frame_size_slots = allocated_frame_size >> LogBytesPerInt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VMRegImpl::stack_slot_size
could be used when converting from size in bytes to size in slots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this would be better readable. But the following code should also be adapted, then:
int framesize() const {
return (_frame_size_slots >> (LogBytesPerWord - LogBytesPerInt));
}
Maybe it makes sense to do some cleanup for all platforms.
|
||
// Load the invoker, as NEP -> .invoker | ||
__ verify_oop(nep_reg); | ||
__ ld(temp_target, jdk_internal_foreign_abi_NativeEntryPoint::downcall_stub_address_offset_in_bytes(), nep_reg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other platforms use access_load_at
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I have no idea why. It does the same but with a more complicated API.
I just noticed that other platforms use NONZERO
. I think I should at least add that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does the same but with a more complicated API.
AFAIK It depends on the GC that's being used. access_load_at
will make sure the right GC barriers are inserted (mostly for concurrent GCs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I see it, the access API is an abstraction to be used instead of raw loads. It hides details. See for instance TemplateTable::getfield_or_static
on x86 where it is also used. PPC lags behind in making use of the access API.
With a fancy new GC the oop in nep_reg could be stale, requiring some processing which would be taken care of by using the access API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GC barriers are used when loading or storing an oop. No GC we currently have (not even the generational ones) use barriers for loading a plain address from an oop. The PPC64 implementation of the BarrierSetAssembler currently has Unimplemented()
for non-oop types and all GCs are implemented.
Maybe it was intended for some future GC or other feature which has not yet reached the official repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just figured it out. It was introduced by https://bugs.openjdk.org/browse/JDK-8203172 (on aarch64) which mentions Shenandoah and future GCs. However, the Shenandoah comment says "non-reference load, no additional barrier is needed" and it doesn't use barriers in such a case. So, for the time being, I'll keep the normal load (because access_load_at
is not ready for non-oop types). But I should add the NONZERO
check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, since Shenandoah changed their load barriers we have been cleaning away the usages of the Access API for loads and stores to primitive values. There's no such support in the C++ Runtime code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, since this is loading a long
(which represents an address that points into the code cache) I think we're fine without using the access API then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. The code had been written for the previous version of Shenandoah (1.0). No current GC uses barriers for non-oop types and the C++ Runtime doesn't support it any more as Stefan pointed out.
It is still possible to use the access API on other platforms, but it does nothing more than a plain load/store for non-oop types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with doing a plain access. I don't like the difference to other ports as it will at least waste time of people with less expertise in the area (e.g. new to the project). No need to continue the discussion in this pr though.
|
||
__ block_comment("{ receiver "); | ||
__ load_const_optimized(R3_ARG1, (intptr_t)receiver, R0); | ||
__ resolve_jobject(R3_ARG1, tmp, R31, MacroAssembler::PRESERVATION_FRAME_LR_GP_FP_REGS); // kills R31 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a simplification the receiver could be resolved in UpcallLinker::on_entry
and returned in JavaThread::_vm_result
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a nice enhancement proposal for all platforms. The register spilling code in resolve_jobject
can get lengthy dependent on the selected GC. Doing it in the C code (which we call anyway above) would make the upcall stubs smaller.
@JornVernee: What do you think about this idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a nice idea. The resolution here pre-dates the time where we called into the VM.
case T_CHAR : | ||
case T_BYTE : | ||
case T_SHORT : | ||
case T_INT : segment_mask = REG32_MASK; break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why the segment_mask depends on bt
on ppc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usage of the segment_mask
can be defined for each platform. I'm using it to encode the information if a value on the Java side uses a 32 or 64 bit slot. In case of 32 bit values, the C side requires all 64 register bits to get defined values (ints get sign extended, floats get converted to double-precision format).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Martin,
I've made a pass over the Java part (except HFA). I found the specs hard to understand but most specs are like this.
I'll finish the rest beginning of next week.
Cheers, Richard.
*/ | ||
package jdk.internal.foreign.abi.ppc64; | ||
|
||
import java.lang.foreign.AddressLayout; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imports are not grouped and ordered alphabetically.
(Very much as the aarch64 version)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
public static final int MAX_REGISTER_ARGUMENTS = 8; | ||
public static final int MAX_FLOAT_REGISTER_ARGUMENTS = 13; | ||
|
||
// This is derived from the 64-Bit ELF V2 ABI spec, restricted to what's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says ABI V2 but the code seems to handle V1 too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's derived from ABI v2, but v1 is compatible. Added that to the comment.
|
||
class StorageCalculator { | ||
private final boolean forArguments; | ||
private boolean forVarArgs = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had kept it in case another PPC64 OS would need it, but I guess it's unlikely. So, I just removed it. Could get added back easily if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. There are other examples of redundant code that might serve a purpose in the future. I honestly don't like that.
In the case of forVarArgs
porters can still find it in the aarch64 version :)
// TODO: Big Endian can't pass partially used slots correctly in some cases with: | ||
// !useABIv2() && layout.byteSize() > 8 && layout.byteSize() % 8 != 0 | ||
|
||
// Allocate individual fields as gp slots (regs and stack). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You explained to me, it's not individual (struct) fields that are handled here. Looks like registers and 8 byte stack slots are allocated to completely cover the struct. Would be good if you could change the comment and names in the code to better reflect this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adapted to aarch64 implementation (using MAX_COPY_SIZE
).
private static LinuxPPC64leLinker instance; | ||
|
||
public static LinuxPPC64leLinker getInstance() { | ||
if (instance == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other platforms optimized this to return a constant (probably after you forked off the port).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Adapted.
* public constants CallArranger.ABIv1/2. | ||
*/ | ||
public abstract class CallArranger { | ||
protected abstract boolean useABIv2(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be refactored into a static method with the same trick that is used in LinuxPPC64leLinker::getInstance
. Callers could be static then and you could delete CallArranger::ABIv2
and ABIv2CallArranger
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like?
protected static final boolean useABIv2 = CABI.current() == CABI.LINUX_PPC_64_LE;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be better to read, but would make the PPC64 CallArranger dependent on the current CABI.
Note that there are tests which use
import jdk.internal.foreign.abi.aarch64.CallArranger;
...
CallArranger.LINUX.getBindings(mt, fd, false);
for example. The tests are designed to run on all platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be better to read, but would make the PPC64 CallArranger dependent on the current CABI. Note that there are tests which use
import jdk.internal.foreign.abi.aarch64.CallArranger; ... CallArranger.LINUX.getBindings(mt, fd, false);
for example. The tests are designed to run on all platforms.
I see, thanks. Would be nice to have some for PPC64 too :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, yes. I didn't find time for figuring out what would be useful tests. We could still add some in the future or with the big endian port.
Another idea: Would the following be better?
final boolean useABIv2 = (this.getClass() == ABIv2CallArranger.class);
That would also allow getting rid of the method useABIv2()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or better final boolean useABIv2 = (this instanceof ABIv2CallArranger);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at 70736be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Martin,
there seems to be a mismatch between this pr and the 64-bit ELF ABI V2 for PPC. In fact all dynamically generated calls that have to conform to ABI V2 are affected. I'm giving a short summary of our discussion this afternoon.
Very briefly: ABI V2 states that a Parameter Save Area (PSA) shall be allocated unless all parameters can be passed in registers as indicated by the caller's prototype, whereas the port always allocates a PSA of 8 double words.
(Details under "Parameter Save Area" in "2.2.3.3. Optional Save Areas" of ELF ABI V2)
It is not wrong what we're doing. It is like we didn't know the prototype of the call targets. But for most calls [1] we are wasting stack space (and confusing everybody that tries to match the implementation with the spec).
Interestingly ABI V1 states that a PSA of at least 8 double words is always needed. Looks like we've missed that change.
I have conducted a little experiment and compiled the following test program using Compiler Explorer [2]
#include <stdint.h>
int64_t test_callee(int64_t p1, int64_t p2, int64_t p3, int64_t p4);
int64_t test_call(int64_t p1, int64_t p2, int64_t p3, int64_t p4) {
return test_callee(p1, p2, p3, p4);
}
This is the -O2 output for ELF ABI V2 (little endian)
Note: the stdu allocates just the minimal frame of 4 double words without PSA.
test_call: # @test_call
.Lfunc_gep0:
addis 2, 12, .TOC.-.Lfunc_gep0@ha
addi 2, 2, .TOC.-.Lfunc_gep0@l
mflr 0
stdu 1, -32(1)
std 0, 48(1)
bl test_callee
nop
addi 1, 1, 32
ld 0, 16(1)
mtlr 0
blr
This is the -O2 output for ELF ABI V1 (big endian)
test_call: # @test_call
.quad .Lfunc_begin0
.quad .TOC.@tocbase
.quad 0
.Lfunc_begin0:
mflr 0
stdu 1, -112(1)
std 0, 128(1)
bl test_callee
nop
addi 1, 1, 112
ld 0, 16(1)
mtlr 0
blr
Note: the stdu allocates a much larger frame because it accomodates a PSA of 8 double words.
I'd suggest to keep the current well tested version but add comments to the code that a PSA is always allocated even though ABI V2 does not require it. This should also be explained in the JBS item.
Furthermore RFEs should be filed to adopt ABI V2 in the FFM API port and in the hotspot port to PPC64le. There's quite a bit of room for improvement there. Ironically I've very recently fixed JDK-8306111 citing the ABI V2 spec without realizing that the fix is not needed for V2, just for V1.
[1] Exceptions that do require a PSA are calls with a really long or variable parameter list or if a prototype is not available
// (native_abi_reg_args is native_abi_minframe plus space for 8 argument register spill slots) | ||
assert(_abi._shadow_space_bytes == frame::native_abi_minframe_size, "expected space according to ABI"); | ||
// Note: For ABIv2, we only need (_input_registers.length() > 8) ? _input_registers.length() : 0 | ||
int register_save_area_slots = MAX2(_input_registers.length(), 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both specs, ABI V1 and V2, call this "Parameter Save Area" we should use the same name.
int register_save_area_slots = MAX2(_input_registers.length(), 8); | |
int parameter_save_area_slots = MAX2(_input_registers.length(), 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Changed.
// This may be a bit more than needed when HFA is used (see CallArranger.java). | ||
// (native_abi_reg_args is native_abi_minframe plus space for 8 argument register spill slots) | ||
assert(_abi._shadow_space_bytes == frame::native_abi_minframe_size, "expected space according to ABI"); | ||
// Note: For ABIv2, we only need (_input_registers.length() > 8) ? _input_registers.length() : 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hard to understand. It should be explained that we allocate a PSA even though ABI V2 only requires it if not all parameters can be passed in registers.
stackAlloc(4, STACK_SLOT_SIZE); // Skip first half of stack slot. | ||
stack = stackAlloc(4, 4); | ||
} else { | ||
stack = stackAlloc(is32Bit ? 4 : 8, STACK_SLOT_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a stack slot is always allocated. Please explain that for ABI V2 this is actually only required if it is know from a prototype that not all parameters can be passed in registers and that we plan to change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This basically computes the stack layout. We need to count all slots to get the right offset for the registers which actually get written on stack. The first such register will hit native_abi_minframe_size + 8 slots. If fewer registers are used, the counted stack slots will not be used.
The decision whether we allocate the Parameter Save Area or not is done in the downcall stub and doesn't depend on the stackAllocs.
Thanks for publishing our discussion, here. The unnecessary PSA affects other areas of hotspot much more than Panama. Yes, we should file an RFE. I think one for hotspot is sufficient as the downcall stub is part of it. I don't think it needs extra treatment. |
assert(_abi._shadow_space_bytes == frame::native_abi_minframe_size, "expected space according to ABI"); | ||
// The Parameter Save Area needs to be at least 8 slots for ABIv1. | ||
// ABIv2 allows omitting it when all parameters can get passed in registers. We currently don't optimize this. | ||
// For ABIv2, we only need (_input_registers.length() > 8) ? _input_registers.length() : 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PSA is also needed if the parameter list is variable in length. Is the expression (_input_registers.length() > 8) ? _input_registers.length() : 0
correct in that case too?
Otherwise: ABIv2 allows omitting it if the callee's prototype indicates that stack parameters are not expected. We currently don't optimize this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see now. This is not obvious though. There are a few layers of abstraction at play which hide this. A comment is needed. Maybe like this:
// With ABIv1 a Parameter Save Area of at least 8 double words is always needed.
// ABIv2 allows omitting it if the callee's prototype indicates that stack parameters are not expected.
// We currently don't optimize this (see DowncallStubGenerator in the backend).
if (reg == null) return stack;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe omitting the PSA is wrong for varargs, but we don't have this information in the backend. So, I think we should simply not optimize it. Reserving 64 Byte stack space should be affordable for a downcall even if it's not always needed. The Java side could compute it, but there's no way to pass this information to the backend. I've improved the comments. Please take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe omitting the PSA is wrong for varargs, but we don't have this information in the backend.
It is clearly wrong.
So, I think we should simply not optimize it.
Already agreed for this version.
Reserving 64 Byte stack space should be affordable for a downcall even if it's not always needed.
It is hardly ever needed to begin with. It's just not pretty to allocate the PSA for little test functions. It'll confuse people that compare the downcall stub to what the C compiler produces. They might even think we havn't understood the ABI ;)
The Java side could compute it, but there's no way to pass this information to the backend.
I'm not sure about that. First idea would be to keep the allocated stack
in nextStorage
if needed and pass it to the backend.
I've improved the comments. Please take a look.
Great! Thanks :)
That's fine. It should have a little list of areas to be revisited. Adoc:
Subtasks can be generated from that list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really good to go now.
(maybe after a final round of tests)
Thanks a lot!
Richard.
Thanks a lot for all discussions, feedback and the reviews! This was really helpful! I'm planning to integrate tomorrow. |
/integrate |
Going to push as commit 20f1535.
Your commit was automatically rebased without conflicts. |
@TheRealMDoerr Pushed as commit 20f1535. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification".
This PR does not include code for VaList support because it's supposed to get removed by JDK-8299736. I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to
<stdarg.h>
(2.2.4 Variable Argument Lists).Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see
useABIv2
in CallArranger.java). Big Endian is excluded by selectingARCH.equals("ppc64le")
(CABI.java) only.There's another limitation: This PR only accepts structures with size divisible by 4. (An
IllegalArgumentException
gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, becauseSharedUtils.primitiveCarrierForSize
only accepts powers of 2. Update: Resolved after merging of JDK-8303017The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests.
I had to make changes to shared code and code for other platforms:
VMStorage
objects fromVMReg
. This is needed for the following reasons:NativeMemorySegmentImpl
is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit.Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708
$ git checkout pull/12708
Update a local copy of the PR:
$ git checkout pull/12708
$ git pull https://git.openjdk.org/jdk.git pull/12708/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 12708
View PR using the GUI difftool:
$ git pr show -t 12708
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12708.diff
Webrev
Link to Webrev Comment