Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify D in req/core/identifier #182

Open
amilan17 opened this issue Feb 12, 2024 · 8 comments
Open

Clarify D in req/core/identifier #182

amilan17 opened this issue Feb 12, 2024 · 8 comments

Comments

@amilan17
Copy link
Member

|D |The +id+ property shall include a local identifier as defined by the data publisher. The local identifier shall not have spaces or special or accented characters.

The question is what are "special" characters?

@amilan17
Copy link
Member Author

@tomkralidis

@tomkralidis
Copy link
Contributor

Perhaps we can further qualify with:

  • no spaces
  • no accents
  • no colons (given they are URN separators)
  • none of the following: `~!@#$%^&*()=][{}|'";,.?/+

cc @josusky

@josusky
Copy link
Contributor

josusky commented Feb 13, 2024

This is too restrictive. The regular expression that you have provided is correct only for the "namespace identifier" NID part. But the NID is fixed in our case to wmo. The rest of the URN is "Namespace Specific String" (NSS) and its validation is more benevolent. Original description is in https://www.rfc-editor.org/rfc/rfc2141.html (section 2.2) and is slightly modified (extended) by newer RFC (https://www.rfc-editor.org/rfc/rfc8141). Example of a valid URN is:
urn:example:a123,z456?+abc

@josusky
Copy link
Contributor

josusky commented Feb 13, 2024

I am not deadly against a rule that is more strict than actual URN specification. I looked up the specification because I spotted the innocent dot (.) in Tom's list - that "lifted me off the chair" :-)
I can hardly imagine anyone putting ~ or ] into metadata ID but a dot (.) or slash (/) seem quite OK to me.

@tomkralidis
Copy link
Contributor

Having a slash (/) in the ID introduces URLs like the following in the GDC:

https://example.org/collections/foo/items/foo%2Fbar

While we can relax the regex set mentioned previously, the above would be error prone.

@tomkralidis tomkralidis mentioned this issue Feb 14, 2024
@amilan17
Copy link
Member Author

amilan17 commented Oct 22, 2024

The definition as approved during PR #183. "The id property SHALL include a local identifier as defined by the data publisher. The local identifier SHALL NOT have spaces or accented characters."

@tomkralidis
Copy link
Contributor

TT-WISMD 2024-10-22:

  • can we use the ISO charset
  • WTH uses IRA T.50, can we reuse
  • LSP reworded the requirement into "no space" or "no accented characters"

@josusky
Copy link
Contributor

josusky commented Oct 28, 2024

Specifying a character set that does not have accented characters and other things that can complicate the usage of this identifier is a good idea. IRA T.50 is an appropriate choice. Apart from that (and the space), did you discuss some more restrictions during TT-WISMD 2024-10-22?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants