-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Namespacemap reconstitution #490
Conversation
X-Collection.md says only:
There are no use cases or requirements for why, where, or when it is used or what it accomplishes, given the requirement in NamespaceMap.md that:
Without a justification, there is no reason to define an X-Collection element. Or it could be defined in the logical model with the requirement that it never be serialized. The amorphous "X" concept represents what we have been discussing, but not what we have yet to discuss: externalMap, which is a critical consideration.
One of those three must be chosen before an X element can be defined. Currently externalMap is a property of ElementCollection, which means that an Sbom element applies only to a specific payload in a specific format. The Consumer/Producer who reads a payload includes the hash of that payload in the list of elements in the newly-produced BOM. If the Consumer/Producer had read a different payload containing the identical elements he would need to create a different Sbom element. So if ElementCollection is payload-specific, then its subclass X-Collection is also payload-specific (option 1). At a minimum we want to make ElementCollection serialization-agnostic by removing externalMap from it. But if we include externalMap in X-collection then X-collection is still serialization-specific (option 1). An instance of the X-collection element cannot be created until all of its externalMap property values are known. Those values are not known until the Consumer/Producer reads a specific payload, which is why X-collection instances cannot exist in the logical model until the serialized source of an element has been chosen. As Max says, those instances can be created and inserted into the graph by a Consumer/Producer after reading a payload. And as I say, those instances could also be created and inserted into the graph by the producer, but not until the producer has serialized the payload and knows its signature/hash. This boils down to my original objection from 9 months ago: the logical model should be serialization-agnostic. A logical SBOM can contain two File elements, for a total of three. Serialization independence means its serialized payload would contain 3 elements (SBOM, File1, File2), not be forced to serialize 4 elements (X-collection plus the other three.) Consumers of that payload have the option of creating the X-collection element, but don't have to unless they produce other payloads that depend on (reference) the first payload. In particular, the Consumer could copy those three element values into his new payload instead of referencing them, in which case no X-collection element is ever needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sbarnum for writing this up - A few comments to consider for the tech call.
model/Core/Classes/NamespaceMap.md
Outdated
A serialization MAY choose to use prefixes and namespaces other than the namespace map content. | ||
A serialization MAY choose to use no prefixes at all and rather use the more verbose full ElementID IRIs. | ||
|
||
If utilized the namespace map set of prefixes and namespaces MUST be implemented in a given serialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a few passes to parse this statement. At first, it seemed to contradict the "MAY" choose to use, but now I understand that this is describing that we must not replace the native serialization.
Suggest restructuring the sentence to start with the serialization form - something like "The prefix / namespace mapping of the serialization format must always be used to convey the namespace mapping for that serialization ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if your characterization is aligned to intent here.
Lines 22-26 all go together.
They are saying that if you are going to actual use the namespace map content as prefixes in a given serialization then how those prefixes are represented in a given serialization form (e.g., json-ld, xml, etc) varies form to form and the implementation of how they are represented in any given form is specified in the binding rules (that bind the model to a given serialization form) for that serialization form. In other words, if you are going to implement them in json-ld you must use the specified form for prefixes in the json-ld binding rules specification. For serialization forms that natively support prefixes it is a little more obvious. For serialization forms that don't and we have to define custom representations this is much more of an important clause.
I believe the whole first sentence (lines 22-24) clearly states this point as is.
Does the above explanation clarify or do you still find the sentence unclear?
model/Core/Classes/NamespaceMap.md
Outdated
The namespace map itself is also conveyed as native SPDX content to support clarity, transparency and | ||
consistency independent of any particular serialization form. | ||
|
||
A given serialization payload (whether file or streaming) MUST NOT contain multiple namespace maps with conflicting mappings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it the above sentence refers to the "native" namespace mapping. If so, this may not be a required statement since the serialization format standards already have this requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Serialization formats that natively support prefixes usually have this requirement though some do not enforce it and rather simply utilize the last defined prefix and ignore any earlier conflicting prefixes.
Serialization formats that do not natively support prefixes and where we will have to define custom prefix representations will have no such rules.
In either case, I would propose making this explicit statement is highly useful in the SPDX spec.
model/Core/Classes/NamespaceMap.md
Outdated
This would involve the conflicted mappings issue briefly characterized above this list of use cases. | ||
6) An SPDX content consumer wishing to maintain consistent prefix use while receiving serialized content that does not include a namespace map but does utilize prefixes, and at some future point reserializing that content. | ||
The consumer can simply "wrap" the received content in a collection with a namespace map and specify the prefix to namespace mappings that were actually implemented in the received content. | ||
7) It should be possible to derive and maintain namespace mapping provenance for content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this be important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As content is received and reserialized and potentially "wrapped" by consumer/producers with new namespace maps it becomes more complicated who asserted which namespace maps (prefixes) and it what context.
Understanding who asserted which namespace maps (prefixes) and it what context can help a consumer determine trust that asserted prefixes are from the original producer and which ones they should use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple more comments
@@ -19,10 +19,10 @@ An SpdxCollection is a collection of Elements, not necessarily with unifying con | |||
## Properties | |||
|
|||
- element | |||
- type: Element | |||
- type: Element and NOT (X-Collection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: the spec parser does not currently support this expression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does not surprise me.
It will need to though.
This sort of issue I why I strongly prefer having the formal ontology specification (RDFS/OWL/SHACL) be THE ground truth specification rather than a prose form that must be converted.
Specifying this sort of range is simple, native and inherent in the RDFS/OWL/SHACL.
- minCount: 1 | ||
- rootElement | ||
- type: Element | ||
- type: Element and NOT (X-Collection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above for me too. :-)
We do not typically put " use cases or requirements for why, where, or when it is used or what it accomplishes" in the class specifications.
The full statement from NamespaceMap.md that this is snippeted from is: This is saying that if you are going to actually use the namespace map content as prefixes in a given serialization then how those prefixes are represented in a given serialization form (e.g., json-ld, xml, etc) varies form to form and the implementation of how they are represented in any given form is specified in the binding rules (that bind the model to a given serialization form) for that serialization form. In other words, if you are going to implement them in json-ld you must use the specified form for prefixes in the json-ld binding rules specification. For serialization forms that natively support prefixes it is a little more obvious. For serialization forms that don't and we have to define custom representations this is much more of an important clause.
As stated above, I can see value in adding a brief statement to X-Collection.md to clarify its intent though I do not believe class specification must or should have to "justify" their existence.
No. The entire purpose of namespace maps are that they are serialized as part of the content.
I would assert that externalMap is a completely separate and mostly unrelated topic than what we are discussing here and am unaware of any currently identified issues with its current implementation. I would caution not using any names like X-File in relation to X-Collection. X-Collection is NOT intended to deal with a specific instance of serialization.
The very explicit intent of namespace map is 2.
I am confused here. How did externalMap get into this conversation? It is unrelated.
ExternalMap on ElementCollection makes no such implication on Sbom elements. Sbom elements, or any part of the SPDX model other than the File object, have nothing to do with any specific instance (payload) of serialization.
ElementCollection is definitely not payload specific and neither is X-Collection.
Again, I am confused at how externalMap got introduced here or how its meaning and intent got so confused.
I agree that the logical model should be serialization-agnostic. It should also be agnostic of any serialization instance. |
In the namespace meeting on 18 Sept 2023, we decided to move forward with the proposal documented in pull request #491 PR #491 is now merged into this pull request so we can review a single PR before merging into the base. Note that the branch used for PR #491 was not deleted, so we can refer back to the changes if needed. I also did a merge commit so it would be easy to reconstruct the state of this PR prior to the merge. Please review and comment on the wording for this proposal. |
PR #500 includes NamespaceMap in payload data, allowing it to be used independently of SerializedCollection. This does not prevent SerializedCollection from defining NamespaceMap independently of serialization if there are use cases for doing so. |
@sbarnum - Do you want to update this PR with the decisions from the serialization team?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor typo
@sbarnum - can you rename the X-Collection to SpdxDocument? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - Thanks @sbarnum
Note that this is a previous discussion in the PR merged into this: #491 |
Fixes #467 |
Fixes #415 |
related: #557 |
Reset the ranges of element and rootElement to scope out X-Collection
Signed-off-by: Gary O'Neall <[email protected]>
Signed-off-by: Gary O'Neall <[email protected]>
Per review comments
The change from "serialization formats" to "serialization" should cover both multiple instances of serialization in a single format or multiple instances in different formats. Co-authored-by: Gary O'Neall <[email protected]>
Signed-off-by: Gary O'Neall <[email protected]>
7d7de60
to
3c5eaca
Compare
This is a PR for the proposed namespace map solution addressing all explicitly identified issues (layering of namespace maps, confliction of peer namespace maps, etc) with the previous solution.
It also provides more contextual detail around the intended semantics and use for the namespace map construct in the NamespaceMap.md class specification.