-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
language strings missing a language tag #37
Comments
It can be considered an ill-formed literal according to the datatype, much like
The (extended) lexical to value mapping does not exist for the case The text below the defn: "A literal is a language-tagged string if the third element is present. " - says it isn't a language-tagged string i.e. it has no value (the "literal value" text). Special case datatypes are problematic because they propagate: e.g. (Also -- bullet two only refers to lexical form.) |
My implementation does, in fact, parse it, even though it's invalid, as it parses My interpretation is that it's legal Turtle, as @afs says, it just results in an invalid triple. That would be the case for language tags which are also not valid according to BCP47. |
This is very different, IMO. On a practical level, it means that many implementation will choke or behave strangely with it. As @gkellogg points out, his implementation will crash when trying to serialize it back to N-Triples. It will not crash when serializing it back to Turtle, but will produce invalid Turtle, namely: |
Your testing showed this isn't the case. Nothing has changed from RDF 1.1.
At one level, if it's wrong, then it's outside the spec and we don't define the behaviour. We might suggest a behaviour but that's not the same as requiring a behaviour. There are good reasons for systems to choose to accept it - if steaming large data, one occurrence far down the input stream, throwing an error and aborting a large load after a few hours is extremely inconvenient. |
My testing shows that most implementation don't choke on it., that's right. I do expect that they behave strangely down the line, though. E.g., @gkellogg 's implementation accepts it during parsing, but then fails to serialize it back...
Indeed. I'm not claiming that this issue is specific to RDF 1.2.
I agree that we do not define the behaviour of parsers when they encounter invalid input. An alternative would be to relax the constraints on
Agreed, but this is a very general consideration, not specific to this issue. |
Of the two options, I prefer the option to relax the abstract syntax and have it be ill-formed. i.e. widen and not restrict existing usage. JSON-LD already says "and, if the datatype is rdf:langString, an optional language tag" and systems do accept it. I'd prefer leaving things as they are. JSON-LD and RDF/XML don't rely on the document parsing for language tags. (The title of the issue maybe should be "missing language string" for the long term record of the WG's work.) |
Here's another corner case for "missing language tag". PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
[] rdfs:label "abc"@--rtl . Sort of "unknown language but known to be rtl" or "always rtl". |
Seen in the wild: apache/jena#2555 That's in a result set but the principle is the same. |
It just occurred to me that the following Turtle should be rejected as invalid:
Indeed, if accepted, it generates an invalid literal: a literal whose datatype is
rdf:langString
but which does not have a language tag. This contradicts the "if and only if" of RDF Concepts' definition.Almost all implementations that I have tested accept it (notable exception: @gkellogg's Ruby implementation).
We probably need a negative test for this, and of course similar tests for other concrete syntaxes.
The text was updated successfully, but these errors were encountered: