Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Namespacemap reconstitution #490

Merged
merged 20 commits into from
Nov 30, 2023
Merged

Namespacemap reconstitution #490

merged 20 commits into from
Nov 30, 2023

Conversation

sbarnum
Copy link
Collaborator

@sbarnum sbarnum commented Aug 31, 2023

This is a PR for the proposed namespace map solution addressing all explicitly identified issues (layering of namespace maps, confliction of peer namespace maps, etc) with the previous solution.
It also provides more contextual detail around the intended semantics and use for the namespace map construct in the NamespaceMap.md class specification.

@davaya
Copy link
Contributor

davaya commented Sep 4, 2023

X-Collection.md says only:

An X collection Element is a collection of Elements that does not contain any other X Collection Elements.
In this way it can be thought of as an outer shell collection of SPDX content without self-recursion that can be used as a content aggregation target for serialization.

There are no use cases or requirements for why, where, or when it is used or what it accomplishes, given the requirement in NamespaceMap.md that:

the namespace map set of prefixes and namespaces MUST be implemented in a given serialization form.

Without a justification, there is no reason to define an X-Collection element. Or it could be defined in the logical model with the requirement that it never be serialized. The amorphous "X" concept represents what we have been discussing, but not what we have yet to discuss: externalMap, which is a critical consideration.
The three options for the X-collection (or X-file) element semantics are:

  1. applies to a specific collection/file/payload of elements in a specific data format (has a verifiedBy property)
  2. applies to a specific logical collection of elements regardless of data format (does not have a verifiedBy property)
  3. can be applied to future collections of elements (X is a subclass of Element, not ElementCollection, and has no element property)

One of those three must be chosen before an X element can be defined.

Currently externalMap is a property of ElementCollection, which means that an Sbom element applies only to a specific payload in a specific format. The Consumer/Producer who reads a payload includes the hash of that payload in the list of elements in the newly-produced BOM. If the Consumer/Producer had read a different payload containing the identical elements he would need to create a different Sbom element.

So if ElementCollection is payload-specific, then its subclass X-Collection is also payload-specific (option 1). At a minimum we want to make ElementCollection serialization-agnostic by removing externalMap from it. But if we include externalMap in X-collection then X-collection is still serialization-specific (option 1).

An instance of the X-collection element cannot be created until all of its externalMap property values are known.

Those values are not known until the Consumer/Producer reads a specific payload, which is why X-collection instances cannot exist in the logical model until the serialized source of an element has been chosen. As Max says, those instances can be created and inserted into the graph by a Consumer/Producer after reading a payload. And as I say, those instances could also be created and inserted into the graph by the producer, but not until the producer has serialized the payload and knows its signature/hash.


This boils down to my original objection from 9 months ago: the logical model should be serialization-agnostic. A logical SBOM can contain two File elements, for a total of three. Serialization independence means its serialized payload would contain 3 elements (SBOM, File1, File2), not be forced to serialize 4 elements (X-collection plus the other three.) Consumers of that payload have the option of creating the X-collection element, but don't have to unless they produce other payloads that depend on (reference) the first payload. In particular, the Consumer could copy those three element values into his new payload instead of referencing them, in which case no X-collection element is ever needed.

Copy link
Member

@goneall goneall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sbarnum for writing this up - A few comments to consider for the tech call.

A serialization MAY choose to use prefixes and namespaces other than the namespace map content.
A serialization MAY choose to use no prefixes at all and rather use the more verbose full ElementID IRIs.

If utilized the namespace map set of prefixes and namespaces MUST be implemented in a given serialization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a few passes to parse this statement. At first, it seemed to contradict the "MAY" choose to use, but now I understand that this is describing that we must not replace the native serialization.

Suggest restructuring the sentence to start with the serialization form - something like "The prefix / namespace mapping of the serialization format must always be used to convey the namespace mapping for that serialization ..."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if your characterization is aligned to intent here.
Lines 22-26 all go together.
They are saying that if you are going to actual use the namespace map content as prefixes in a given serialization then how those prefixes are represented in a given serialization form (e.g., json-ld, xml, etc) varies form to form and the implementation of how they are represented in any given form is specified in the binding rules (that bind the model to a given serialization form) for that serialization form. In other words, if you are going to implement them in json-ld you must use the specified form for prefixes in the json-ld binding rules specification. For serialization forms that natively support prefixes it is a little more obvious. For serialization forms that don't and we have to define custom representations this is much more of an important clause.

I believe the whole first sentence (lines 22-24) clearly states this point as is.
Does the above explanation clarify or do you still find the sentence unclear?

The namespace map itself is also conveyed as native SPDX content to support clarity, transparency and
consistency independent of any particular serialization form.

A given serialization payload (whether file or streaming) MUST NOT contain multiple namespace maps with conflicting mappings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it the above sentence refers to the "native" namespace mapping. If so, this may not be a required statement since the serialization format standards already have this requirement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serialization formats that natively support prefixes usually have this requirement though some do not enforce it and rather simply utilize the last defined prefix and ignore any earlier conflicting prefixes.
Serialization formats that do not natively support prefixes and where we will have to define custom prefix representations will have no such rules.

In either case, I would propose making this explicit statement is highly useful in the SPDX spec.

model/Core/Classes/NamespaceMap.md Outdated Show resolved Hide resolved
model/Core/Classes/NamespaceMap.md Outdated Show resolved Hide resolved
This would involve the conflicted mappings issue briefly characterized above this list of use cases.
6) An SPDX content consumer wishing to maintain consistent prefix use while receiving serialized content that does not include a namespace map but does utilize prefixes, and at some future point reserializing that content.
The consumer can simply "wrap" the received content in a collection with a namespace map and specify the prefix to namespace mappings that were actually implemented in the received content.
7) It should be possible to derive and maintain namespace mapping provenance for content.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this be important?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As content is received and reserialized and potentially "wrapped" by consumer/producers with new namespace maps it becomes more complicated who asserted which namespace maps (prefixes) and it what context.
Understanding who asserted which namespace maps (prefixes) and it what context can help a consumer determine trust that asserted prefixes are from the original producer and which ones they should use.

Copy link
Member

@goneall goneall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more comments

@@ -19,10 +19,10 @@ An SpdxCollection is a collection of Elements, not necessarily with unifying con
## Properties

- element
- type: Element
- type: Element and NOT (X-Collection)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the spec parser does not currently support this expression

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does not surprise me.
It will need to though.
This sort of issue I why I strongly prefer having the formal ontology specification (RDFS/OWL/SHACL) be THE ground truth specification rather than a prose form that must be converted.
Specifying this sort of range is simple, native and inherent in the RDFS/OWL/SHACL.

- minCount: 1
- rootElement
- type: Element
- type: Element and NOT (X-Collection)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above for me too. :-)

@sbarnum
Copy link
Collaborator Author

sbarnum commented Sep 6, 2023

X-Collection.md says only:

An X collection Element is a collection of Elements that does not contain any other X Collection Elements.
In this way it can be thought of as an outer shell collection of SPDX content without self-recursion that can be used as a content aggregation target for serialization.

There are no use cases or requirements for why, where, or when it is used or what it accomplishes, given the requirement in NamespaceMap.md that:

We do not typically put " use cases or requirements for why, where, or when it is used or what it accomplishes" in the class specifications.
The class specification for NamespaceMap (NamespaceMap.md) atypically provides this greater level of detail for the namespace map rather than attempting to put is here at least one level abstracted from its direct application.
That being said I can see that the X-Collection.md could use a brief statement that it is intended to convey a namespace map for a given set of SPDX content and that it MAY (though not MUST) be used as the single outermost enclosing SPDX element in specific instance of serialization for simplicity and consistency.

the namespace map set of prefixes and namespaces MUST be implemented in a given serialization form.

The full statement from NamespaceMap.md that this is snippeted from is:
"If utilized the namespace map set of prefixes and namespaces MUST be implemented in a given serialization form (e.g., json-ld or xml) as specified in the binding rules specification for that serialization and utilizing the appropriate inherent or custom specified mechanism for that serialization. The namespace map itself is also conveyed as native SPDX content to support clarity, transparency and consistency independent of any particular serialization form."

This is saying that if you are going to actually use the namespace map content as prefixes in a given serialization then how those prefixes are represented in a given serialization form (e.g., json-ld, xml, etc) varies form to form and the implementation of how they are represented in any given form is specified in the binding rules (that bind the model to a given serialization form) for that serialization form. In other words, if you are going to implement them in json-ld you must use the specified form for prefixes in the json-ld binding rules specification. For serialization forms that natively support prefixes it is a little more obvious. For serialization forms that don't and we have to define custom representations this is much more of an important clause.

Without a justification, there is no reason to define an X-Collection element.

As stated above, I can see value in adding a brief statement to X-Collection.md to clarify its intent though I do not believe class specification must or should have to "justify" their existence.

Or it could be defined in the logical model with the requirement that it never be serialized.

No. The entire purpose of namespace maps are that they are serialized as part of the content.

The amorphous "X" concept represents what we have been discussing, but not what we have yet to discuss: externalMap, which is a critical consideration. The three options for the X-collection (or X-file) element semantics are:

I would assert that externalMap is a completely separate and mostly unrelated topic than what we are discussing here and am unaware of any currently identified issues with its current implementation.

I would caution not using any names like X-File in relation to X-Collection. X-Collection is NOT intended to deal with a specific instance of serialization.

  1. applies to a specific collection/file/payload of elements in a specific data format (has a verifiedBy property)
  2. applies to a specific logical collection of elements regardless of data format (does not have a verifiedBy property)
  3. can be applied to future collections of elements (X is a subclass of Element, not ElementCollection, and has no element property)

One of those three must be chosen before an X element can be defined.

The very explicit intent of namespace map is 2.
It is very explicitly not tied to any specific instance of serialization.

Currently externalMap is a property of ElementCollection, which means that an Sbom element applies only to a specific payload in a specific format. The Consumer/Producer who reads a payload includes the hash of that payload in the list of elements in the newly-produced BOM. If the Consumer/Producer had read a different payload containing the identical elements he would need to create a different Sbom element.

I am confused here. How did externalMap get into this conversation? It is unrelated.

Currently externalMap is a property of ElementCollection, which means that an Sbom element applies only to a specific payload in a specific format.

ExternalMap on ElementCollection makes no such implication on Sbom elements. Sbom elements, or any part of the SPDX model other than the File object, have nothing to do with any specific instance (payload) of serialization.

So if ElementCollection is payload-specific, then its subclass X-Collection is also payload-specific (option 1). At a minimum we want to make ElementCollection serialization-agnostic by removing externalMap from it. But if we include externalMap in X-collection then X-collection is still serialization-specific (option 1).

ElementCollection is definitely not payload specific and neither is X-Collection.

An instance of the X-collection element cannot be created until all of its externalMap property values are known.

Again, I am confused at how externalMap got introduced here or how its meaning and intent got so confused.

Those values are not known until the Consumer/Producer reads a specific payload, which is why X-collection instances cannot exist in the logical model until the serialized source of an element has been chosen. As Max says, those instances can be created and inserted into the graph by a Consumer/Producer after reading a payload. And as I say, those instances could also be created and inserted into the graph by the producer, but not until the producer has serialized the payload and knows its signature/hash.

This boils down to my original objection from 9 months ago: the logical model should be serialization-agnostic. A logical SBOM can contain two File elements, for a total of three. Serialization independence means its serialized payload would contain 3 elements (SBOM, File1, File2), not be forced to serialize 4 elements (X-collection plus the other three.) Consumers of that payload have the option of creating the X-collection element, but don't have to unless they produce other payloads that depend on (reference) the first payload. In particular, the Consumer could copy those three element values into his new payload instead of referencing them, in which case no X-collection element is ever needed.

I agree that the logical model should be serialization-agnostic. It should also be agnostic of any serialization instance.

@goneall
Copy link
Member

goneall commented Sep 20, 2023

In the namespace meeting on 18 Sept 2023, we decided to move forward with the proposal documented in pull request #491

PR #491 is now merged into this pull request so we can review a single PR before merging into the base.

Note that the branch used for PR #491 was not deleted, so we can refer back to the changes if needed. I also did a merge commit so it would be easy to reconstruct the state of this PR prior to the merge.

Please review and comment on the wording for this proposal.

@goneall goneall added the serialization Something about the representation of data in bytes label Oct 12, 2023
@davaya
Copy link
Contributor

davaya commented Oct 17, 2023

PR #500 includes NamespaceMap in payload data, allowing it to be used independently of SerializedCollection. This does not prevent SerializedCollection from defining NamespaceMap independently of serialization if there are use cases for doing so.

@goneall
Copy link
Member

goneall commented Oct 27, 2023

@sbarnum - Do you want to update this PR with the decisions from the serialization team?

  • Move imports from ElementCollection to X-Collection

Copy link
Member

@goneall goneall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor typo

model/Core/Classes/X-Collection.md Outdated Show resolved Hide resolved
@goneall
Copy link
Member

goneall commented Nov 2, 2023

@sbarnum - can you rename the X-Collection to SpdxDocument?

Copy link
Member

@goneall goneall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks @sbarnum

@goneall
Copy link
Member

goneall commented Nov 3, 2023

Note that this is a previous discussion in the PR merged into this: #491

@goneall
Copy link
Member

goneall commented Nov 3, 2023

Fixes #467

@goneall
Copy link
Member

goneall commented Nov 3, 2023

Fixes #415

@maxhbr
Copy link
Member

maxhbr commented Nov 21, 2023

related: #557

@goneall
Copy link
Member

goneall commented Nov 28, 2023

@nishakm @maxhbr @zvr - I resolved the merge conflicts - ready for review.

@goneall goneall merged commit 3357c71 into main Nov 30, 2023
1 check passed
@goneall goneall deleted the namespacemap_reconstitution branch November 30, 2023 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
serialization Something about the representation of data in bytes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: Move data license from CreationInfo to SpdxDocument Clarify Bom & SpdxDocument in 3.0 Model
5 participants