Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8215788: Clarify JarInputStream Manifest access #10045

Closed
wants to merge 17 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 15 additions & 16 deletions src/java.base/share/classes/java/util/jar/JarInputStream.java
Original file line number Diff line number Diff line change
@@ -31,51 +31,50 @@
import jdk.internal.util.jar.JarIndex;

/**
* The {@code JarInputStream} class, which extends {@linkplain ZipInputStream},
* The {@code JarInputStream} class, which extends {@link ZipInputStream},
* is used to read the contents of a JAR file from an input stream.
* It provides support for reading an optional {@linkplain JarFile#MANIFEST_NAME Manifest}
* It provides support for reading an optional {@link JarFile#MANIFEST_NAME Manifest}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think about linking this to {@docroot}/../specs/jar/jar.html#jar-manifest rather tan JarFile#MANIFEST_NAME?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure if that is your preference.

* entry. The {@code Manifest} can be used to store
* meta-information about the JAR file and its entries.
*
* <h2>Accessing the Manifest</h2>
* <p>
* The {@linkplain #getManifest} method will return the {@code Manifest} when it is
* The {@link #getManifest} method will return the {@code Manifest} when it is
* the first entry in the stream or {@code META-INF/} is the first entry and
* the {@code Manifest} is the second entry within the stream. When the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can insert a comma after "when it is the first entry in the stream"? I think that would make it a bit clearer that there are two cases.

Also I'm wondering if the paragraph should be split into two, meaning "When the Manifest ..." can be the start of a new paragraph. The reason is that the text is trying to explain two things, the first is that the manifest must be at the start of the JAR file, the second is that the ordering that methods are invoked will influence how other methods behave.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can insert a comma after "when it is the first entry in the stream"? I think that would make it a bit clearer that there are two cases.

Done

Also I'm wondering if the paragraph should be split into two, meaning "When the Manifest ..." can be the start of a new paragraph. The reason is that the text is trying to explain two things, the first is that the manifest must be at the start of the JAR file, the second is that the ordering that methods are invoked will influence how other methods behave.

Moved to a new paragraph per your suggestion

* {@code Manifest} is returned by {@code getManifest()}, the {@linkplain #getNextEntry()}
* and {@linkplain #getNextJarEntry()} methods will not return the {@code Manifest}.
* If {@code META-INF/} is the first entry in the input stream it will be
* also not be returned by {@linkplain #getNextEntry()} and
* {@linkplain #getNextJarEntry()}.
* {@code Manifest} is returned by {@code getManifest()}, the {@link #getNextEntry()}
* and {@link #getNextJarEntry()} methods will not return the {@code Manifest}.
* If {@code META-INF/} is the first entry in the input stream it will be
* also not be returned by {@link #getNextEntry()} and
* {@link #getNextJarEntry()}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JAR files will almost always have the manifest as the first or second entry. It's very hazardous to have getNextEntry/getNextJarEntry work differently when the manifest is present but not at the start. This makes it really hard to specify too. I wonder what the impact would be of changing the implementation so that getNextEntry/getNextJarEntry consistently skip the manifest rather than only when it's at the start. I can't think of anything right now that could dependent on the current behavior where it might or might be returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree but am reluctant to change 20+ year behavior via this PR as we don't know what we don't know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, in which case what would you think about just saying that the getNextEntry/getNextJarEntry method do not return the Manifest when it's at the start of the stream, and it's unspecified whether they return the Manifest when it located later in the stream. I think this would give us wriggle room to change it in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, in which case what would you think about just saying that the getNextEntry/getNextJarEntry method do not return the Manifest when it's at the start of the stream, and it's unspecified whether they return the Manifest when it located later in the stream. I think this would give us wriggle room to change it in the future.

Is this any better:

 * <p>
 * When the {@code Manifest} is returned by {@code getManifest()}, the {@link #getNextEntry()}
 * and {@link #getNextJarEntry()} methods will not return the {@code Manifest}.
 * If {@code META-INF/} is the first entry in the input stream it will be
 * also not be returned by {@link #getNextEntry()} and {@link #getNextJarEntry()}. 
 * </p>
 * @apiNote 
 * It is unspecified whether {@link #getNextEntry()} and
 * {@link #getNextJarEntry()} will return the {@code Manifest}
 * when the {@code Manifest} occurs later in the input stream.
 * <p>
 * {@link JarEntry#getAttributes()} will return the {@code Manifest}'s
 *  attributes for the current JAR file entry, if any, providing
 *  {@code getManifest()} returns a {@code Manifest} for the JAR file.
 * </p>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit better but I think we can make it clearer and also link the JAR Manifest section of the JAR file spec. Can you try this:

 * <p> The {@link #getManifest() getManifest} method is used to read the
 * <a href="{@docRoot}/../specs/jar/jar.html#jar-manifest">JAR Manifest</a>
 * from the entry {@code META-INF/MANIFEST.MF} when it is the first entry
 * in the stream (or the second entry in the case that the fist entry is
 * {@code META-INF/} and the second entry {@code META-INF/MANIFEST.MF}).
 *
 * <p> The {@link #getNextJarEntry()} and {@link #getNextEntry()} methods are
 * used to read JAR file entries from the stream. These methods skip over the
 * manifest ({@code META-INF/MANIFEST.MF}) when it is at the beginning of the
 * stream. In other words, these methods do not return an entry for the manifest
 * when the manifest is the first entry in the stream. If the first entry is
 * {@code META-INF/} and the second entry is the manifest then both are skipped
 * over by these methods. Whether these methods skip over the manifest when it
 * appears later in the stream is not specified.

I think we also have to update getManifest method to align with the above as doesn't say anything about the manifest needing to be at the beginning of the stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit better but I think we can make it clearer and also link the JAR Manifest section of the JAR file spec. Can you try this:

 * <p> The {@link #getManifest() getManifest} method is used to read the
 * <a href="{@docRoot}/../specs/jar/jar.html#jar-manifest">JAR Manifest</a>
 * from the entry {@code META-INF/MANIFEST.MF} when it is the first entry
 * in the stream (or the second entry in the case that the fist entry is
 * {@code META-INF/} and the second entry {@code META-INF/MANIFEST.MF}).
 *
 * <p> The {@link #getNextJarEntry()} and {@link #getNextEntry()} methods are
 * used to read JAR file entries from the stream. These methods skip over the
 * manifest ({@code META-INF/MANIFEST.MF}) when it is at the beginning of the
 * stream. In other words, these methods do not return an entry for the manifest
 * when the manifest is the first entry in the stream. If the first entry is
 * {@code META-INF/} and the second entry is the manifest then both are skipped
 * over by these methods. Whether these methods skip over the manifest when it
 * appears later in the stream is not specified.

Revised using the above

I think we also have to update getManifest method to align with the above as doesn't say anything about the manifest needing to be at the beginning of the stream.

Ok, I added some verbiage similar to what I had originally before we decided to update the class description. Please let me know if this is what you had in mind

Thank you again for your feedback

* </p>
* <p>
* {@linkplain JarEntry#getAttributes()} will return the {@code Manifest}'s
* {@link JarEntry#getAttributes()} will return the {@code Manifest}'s
* attributes for the current JAR file entry, if any, providing
* {@code getManifest()} returns a {@code Manifest} for the JAR file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per-entry attributes is an advanced feature to attempt to bring into the class description. I think it would be simpler to just drop this paragraph. If you really want something on this topic then it would require first describing main vs. per entry attributes and then explaining that the per-entry attributes are obtained with JarEntry::getAttributes when the manifest is at the beginning of the stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove, but I am not sure I agree we need to describe main vs attribute here given we are pointing to the Jar spec and if there is any discussion of Pre-entry attributes, it should be in JarEntry IMHO. I guess the clarification I was trying to make, apparently unsuccessfully is that JarEntry will not have access to the attributes if getManifest does not return the Manifest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove, but I am not sure I agree we need to describe main vs attribute here given we are pointing to the Jar spec and if there is any discussion of Pre-entry attributes, it should be in JarEntry IMHO. I guess the clarification I was trying to make, apparently unsuccessfully is that JarEntry will not have access to the attributes if getManifest does not return the Manifest.

Wording it is hard. The draft wording made it look that must call getManifest, ignore the return value, and then subsequent calls to JarEntry::getAttributes will return attributes for the JAR file entry. I think to properly describe would require more setup to explain that a manifest can optionally include per entry attributes and these are read with JarEntry::getAttributes when the manifest is found at the beginning of the stream..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove, but I am not sure I agree we need to describe main vs attribute here given we are pointing to the Jar spec and if there is any discussion of Pre-entry attributes, it should be in JarEntry IMHO. I guess the clarification I was trying to make, apparently unsuccessfully is that JarEntry will not have access to the attributes if getManifest does not return the Manifest.

Wording it is hard. The draft wording made it look that must call getManifest, ignore the return value, and then subsequent calls to JarEntry::getAttributes will return attributes for the JAR file entry. I think to properly describe would require more setup to explain that a manifest can optionally include per entry attributes and these are read with JarEntry::getAttributes when the manifest is found at the beginning of the stream..

Thinking about this some more, would the following be any better:

 * <p>
 * The {@code Manifest} for a JAR file may include
 *  <a href="{@docRoot}/../specs/jar/jar.html#main-attributes">main</a> and
 *  <a href="{@docRoot}/../specs/jar/jar.html#per-entry-attributes">per entry</a>
 *  attributes. {@link JarEntry#getAttributes()} will return the per entry
 *  attributes for the current JAR file entry, if any, providing
 *  {@code getManifest()} returns the {@code Manifest} for the JAR file.
 *  </p>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that the "Verifying a JarInputStream" should also avoid mentioning "getManifest method returns the manifest"? I understand precisely it should be "getManifest method is able to return the manifest if you call it".

It almost sounds like we should first define the concepts of "well-formed JAR file" and "well-formed signed JAR" and then specify what these methods behave.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that the "Verifying a JarInputStream" should also avoid mentioning "getManifest method returns the manifest"? I understand precisely it should be "getManifest method is able to return the manifest if you call it".

See Alan's comment below. I will be copying the wording regarding the Manifest being the 1st/2nd entry

It almost sounds like we should first define the concepts of "well-formed JAR file" and "well-formed signed JAR" and then specify what these methods behave.

The Manifest location requirement is unique to JarInputStream so really want to try to keep these updates to a minimum if at all possible so that we are not copying parts of the Jar spec into the javadoc.

* </p>
*
* <h2>Verifying a JarInputStream</h2>
* {@linkplain #JarInputStream(InputStream, boolean)} may be used to
* {@link #JarInputStream(InputStream, boolean)} may be used to
Copy link
Contributor

@AlanBateman AlanBateman Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise you've had a few iterations with Max on this section but I'm concerned that the text is telling the reader that they should use the 2-arg constructor to verify the signatures when a JAR is signed. The default is to verify and the main reason to use the 2-arg constructor is when you want to opt out, not opt-in.

I think the intro to this section will need to start with a sentence to say that JAR files can be signed (link to specs/jar/jar.html#signed-jar-file) and that JarInputStream can read a signed JAR from the input stream. As per the description further up, the manifest must be at the start of the stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise you've had a few iterations with Max on this section but I'm concerned that the text is telling the reader that they should use the 2-arg constructor to verify the signatures when a JAR is signed. The default is to verify and the main reason to use the 2-arg constructor is when you want to opt out, not opt-in.

I think the intro to this section will need to start with a sentence to say that JAR files can be signed (link to specs/jar/jar.html#signed-jar-file) and that JarInputStream can read a signed JAR from the input stream. As per the description further up, the manifest must be at the start of the stream.

OK, will make another pass at this today

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will make another pass at this today

I looked at the latest draft (2bafc00). I think it would help if the section "Verifying a JarInputStream" were renamed to "Signed JAR files". The link to getManifest makes the reader wonder if they have to call this method whereas I think what you want to say that the manifest must be at the start of the stream (as per the first section) and then followed by signature entries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will make another pass at this today

I looked at the latest draft (2bafc00). I think it would help if the section "Verifying a JarInputStream" were renamed to "Signed JAR files".

OK, I will change as you suggest

The link to getManifest makes the reader wonder if they have to call this method whereas I think what you want to say that the manifest must be at the start of the stream (as per the first section) and then followed by signature entries.

The reason I used the getManifest wording is I felt it was easier and less redundant than copying the wording about the Manifest needing to be either the first or second entry (assuming META-INF/ is the first in the stream). However if you prefer that, I will make that change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates in 6913871. Just one comment on the updated text, and specifically this sentence:

    • A {@code JarInputStream} may be used to verify the signatures of a
    • assuming the following requirements are met:

The signatures are verified by default so I think you can reduce this down to:

A {@code JarInputStream} verifies the signatures of entries in a Signed JAR file when:

We could say that a program could opt-out of verification by using the 2-arg constructor but that is probably too much to try to fit in here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates in 6913871. Just one comment on the updated text, and specifically this sentence:

    • A {@code JarInputStream} may be used to verify the signatures of a
    • assuming the following requirements are met:

The signatures are verified by default so I think you can reduce this down to:

A {@code JarInputStream} verifies the signatures of entries in a Signed JAR file when:

We could say that a program could opt-out of verification by using the 2-arg constructor but that is probably too much to try to fit in here.

Updated per your suggestion above.

* verify the signatures of a signed {@code JarInputStream} assuming the
* following requirements are met:
* <ul>
* <li>
* The {@linkplain #getManifest()} returns a {@code Manifest} for the JAR
* The {@link #getManifest()} returns a {@code Manifest} for the JAR
* file
* </li>
* <li>
* All signature-related entries must immediately follow the {@code Manifest}
* All signature-related entries immediately follow the {@code Manifest}
* </li>
* </ul>
* Once the {@code JarEntry} has been completely verified, which is done by
* reading until the end of the entry's input stream,
* {@linkplain JarEntry#getCertificates()} may be called to obtain the certificates
* for this entry and {@linkplain JarEntry#getCodeSigners()} may be called to obtain
* {@link JarEntry#getCertificates()} may be called to obtain the certificates
* for this entry and {@link JarEntry#getCodeSigners()} may be called to obtain
* the signers.
* <p>
* <b>Note:</b>If a {@code JarEntry} is modified after the Jar file is signed,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a space before If. Capitalize Jar.

* a {@linkplain SecurityException} will be thrown when an attempt is made to
* read the entry.
* a {@link SecurityException} will be thrown when the entry is read.
* </p>
*
* @author David Connelly