You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the <alt> tag is used in two fundamentally different capacities.
As a field to mark customizable text
Many of the license texts include customizable details related to the project being licensed, like project name, copyright statements, maintainer or rightsholder addresses, etc.
These are frequently wrapped in an alt tag that will match anything (match=".*"), although a few have more specific matching patterns. A variable name is always specified (name="something") to capture the matched string.
For example, in BSD-2-Clause.xml, the text specifying "THE COPYRIGHT HOLDER(S) (AND|OR) CONTRIBUTORS" are made customizable in two places, captured as copyrightHolderAsIs and copyrightHolderLiability:
<altmatch=".+"name="copyrightHolderLiability">THE COPYRIGHT HOLDER OR CONTRIBUTORS</alt> BE LIABLE FOR
In Python-2.0.1.xml, the specific Python version for which the license applies is similarly captured in a number of places, using a more specific regular expression:
Python <altmatch="(([0-9]+)\.([0-9]+)\.([0-9]+))?"name="version">2.0.1</alt>, Licensee agrees to be bound by the
To support minor variations in license texts
Other uses of alt tags aren't free-form/customizable at all, but merely prevent slight variations in license text from causing a failure to match the license. Going back to BSD-2-Clause.xml, the word "EXPRESS" is surrounded by an alt tag not because it's customizable, but simply because some versions of the text contain "EXPRESSED" instead of "EXPRESS":
<altmatch="EXPRESS(ED)?"name="express">EXPRESS</alt> OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
The same is true in xpp.xml, where "University" may be misspelled as "Univeristy", and where a certain conjunction may be either "and" or "or" (but nothing else):
The name "Indiana <altmatch="Univeristy|University"name="uni1">Univeristy</alt>" <altmatch="and|or"name="and">and</alt> "Indiana <altmatch="Univeristy|University"name="uni2">Univeristy</alt> Extreme! Lab"
Other tags handled as replaceable
This also applies to e.g. <copyrightText> and <bullet>, which are presented identically as red replaceable text, despite having different purposes.
It makes sense for a project to customize the copyright text of its license as needed, so <copyrightText> can fairly be treated like the first category of <alt> tags above.
But the bullets used in the license are more akin to the second type of <alt> tag above, in that there are a fairly limited set of possibilities for what might be found in their place. The 1. before the first clause in BSD-2-Clause.xml might be replaced with 1), or 1 —, or even nothing if the list is numbered automatically, but it probably shouldn't be replaced with 45) or apple).
Presentation of <alt>
Software doesn't care about the purpose of a given <alt> tag, and for the purposes of matching the information that it's replaceable is sufficient. But to humans, the implications of the two types of "replaceable" text are unlikely to be the same. And because these two very different situations are handled the same way in the code/data, they're also presented the same way on the website. All replaceable text is presented in red, so the handling of variations appears to indicate that unexpected bits of text are customizable or free-form, when in fact they're not.
In the display of BSD-2-Clause.xml, for example, it seems potentially confusing for "EXPRESS" to be shown in the same red text as "THE COPYRIGHT HOLDERS AND CONTRIBUTORS" — at least, without also providing some explanation for why and how someone would "customize" the word EXPRESS:
Since optional and replaceable texts are indicated to humans by coloring the text blue or red, respectively, it's presumably of some value to highlight those locations. But if "there could be anything here" freely-customizable areas of the text, and other areas where only a very limited set of options will pass, are all presented the same, it seems as though that value could be somewhat reduced?
The text was updated successfully, but these errors were encountered:
Hi @ferdnyc, thanks for your detailed thoughts here!
I agree, and this is something that has been sitting in the back of my mind for some time now. The specific variations encoded in the regular expressions for <alt> tags are important, and I understand that some downstream projects (such as Fedora) are handling these.
But I suspect that most people aren't seeing the regexes from this repo, or from license-list-data, and are instead just viewing the website versions at https://spdx.org/licenses. And as you noted, nothing in that HTML view clearly indicates whether the red text for a given <alt> tag (or <bullet>, etc.) is "replace with anything" or "replace with these specific characters."
I haven't had a chance to dig into this, but I'm certainly open to us coming up with a cleaner solution. Here are a couple, feel free to share others:
When a red text field is hovered over, it could pop up a tooltip showing the different regex values
When a red text field is clicked on, it could open a view (maybe something other than a tooltip?) showing the regex
And/or, each HTML page could include a link directly to the corresponding XML file in license-list-data
(Option 3 is probably a good idea, regardless of whether we also do 1 or 2)
Uses of
<alt>
Currently, the
<alt>
tag is used in two fundamentally different capacities.As a field to mark customizable text
Many of the license texts include customizable details related to the project being licensed, like project name, copyright statements, maintainer or rightsholder addresses, etc.
These are frequently wrapped in an alt tag that will match anything (
match=".*"
), although a few have more specific matching patterns. A variable name is always specified (name="something"
) to capture the matched string.For example, in
BSD-2-Clause.xml
, the text specifying "THE COPYRIGHT HOLDER(S) (AND|OR) CONTRIBUTORS" are made customizable in two places, captured ascopyrightHolderAsIs
andcopyrightHolderLiability
:license-list-XML/src/BSD-2-Clause.xml
Line 30 in 9269d72
license-list-XML/src/BSD-2-Clause.xml
Line 33 in 9269d72
In
Python-2.0.1.xml
, the specific Python version for which the license applies is similarly captured in a number of places, using a more specific regular expression:license-list-XML/src/Python-2.0.1.xml
Line 35 in 9269d72
license-list-XML/src/Python-2.0.1.xml
Line 84 in 9269d72
To support minor variations in license texts
Other uses of alt tags aren't free-form/customizable at all, but merely prevent slight variations in license text from causing a failure to match the license. Going back to
BSD-2-Clause.xml
, the word "EXPRESS" is surrounded by an alt tag not because it's customizable, but simply because some versions of the text contain "EXPRESSED" instead of "EXPRESS":license-list-XML/src/BSD-2-Clause.xml
Line 31 in 9269d72
The same is true in
xpp.xml
, where "University" may be misspelled as "Univeristy", and where a certain conjunction may be either "and" or "or" (but nothing else):license-list-XML/src/xpp.xml
Line 40 in 9269d72
Other tags handled as replaceable
This also applies to e.g.
<copyrightText>
and<bullet>
, which are presented identically as red replaceable text, despite having different purposes.It makes sense for a project to customize the copyright text of its license as needed, so
<copyrightText>
can fairly be treated like the first category of<alt>
tags above.But the bullets used in the license are more akin to the second type of
<alt>
tag above, in that there are a fairly limited set of possibilities for what might be found in their place. The1.
before the first clause inBSD-2-Clause.xml
might be replaced with1)
, or1 —
, or even nothing if the list is numbered automatically, but it probably shouldn't be replaced with45)
orapple)
.Presentation of
<alt>
Software doesn't care about the purpose of a given
<alt>
tag, and for the purposes of matching the information that it's replaceable is sufficient. But to humans, the implications of the two types of "replaceable" text are unlikely to be the same. And because these two very different situations are handled the same way in the code/data, they're also presented the same way on the website. All replaceable text is presented in red, so the handling of variations appears to indicate that unexpected bits of text are customizable or free-form, when in fact they're not.In the display of
BSD-2-Clause.xml
, for example, it seems potentially confusing for "EXPRESS" to be shown in the same red text as "THE COPYRIGHT HOLDERS AND CONTRIBUTORS" — at least, without also providing some explanation for why and how someone would "customize" the word EXPRESS:Since optional and replaceable texts are indicated to humans by coloring the text blue or red, respectively, it's presumably of some value to highlight those locations. But if "there could be anything here" freely-customizable areas of the text, and other areas where only a very limited set of options will pass, are all presented the same, it seems as though that value could be somewhat reduced?
The text was updated successfully, but these errors were encountered: