-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validation schema and online editor #67
Comments
@simontaurus I like the idea: immediately show the content of the .eln file in a github.io and allow the user to inspect it. |
Hey, @FlorianRhiem, @nicobrandt, @NicolasCARPi : can you help; have you already created one? There is no gui / preview aspect to this. |
TBH I'm surprised RO-Crate doesn't provide a schema. Seems it would be most helpful to everyone. This seems to discuss it: ResearchObject/ro-crate#33 Maybe the JSON-LD nature of it makes it hard to create... Given that all the nodes can accept a wide range of properties, which can themselves have values of different types and subtypes, I wonder if it's even possible to create a schema that will validate all eln. It will necessarily be restrictive/incomplete, and we will need to adjust it often, based on our use. Here is a generated one from an elabftw metadata, as a starting point: {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"@context": {
"type": "string",
"format": "uri"
},
"@graph": {
"type": "array",
"items": {
"type": "object",
"properties": {
"@id": {
"type": "string"
},
"@type": {
"type": "string"
},
"about": {
"type": "object",
"properties": {
"@id": {
"type": "string"
}
}
},
"conformsTo": {
"type": "object",
"properties": {
"@id": {
"type": "string",
"format": "uri"
}
}
},
"dateCreated": {
"type": "string",
"format": "date-time"
},
"sdPublisher": {
"type": "object",
"properties": {
"@id": {
"type": "string"
}
}
},
"version": {
"type": "string"
},
"author": {
"type": "object",
"properties": {
"@id": {
"type": "string"
}
}
},
"dateModified": {
"type": "string",
"format": "date-time"
},
"name": {
"type": "string"
},
"encodingFormat": {
"type": "string"
},
"url": {
"type": "string",
"format": "uri"
},
"genre": {
"type": "string"
},
"creativeWorkStatus": {
"type": "string"
},
"identifier": {
"type": "string"
},
"keywords": {
"type": "string"
},
"step": {
"type": "array",
"items": {
"type": "object",
"properties": {
"@type": {
"type": "string"
},
"position": {
"type": "integer"
},
"creativeWorkStatus": {
"type": "string"
},
"itemListElement": {
"type": "array",
"items": {
"type": "object",
"properties": {
"@type": {
"type": "string"
},
"text": {
"type": "string"
}
}
}
}
}
}
},
"hasPart": {
"type": "array",
"items": {
"type": "object",
"properties": {
"@id": {
"type": "string"
}
}
}
},
"comment": {
"type": "array",
"items": {
"type": "object",
"properties": {
"@id": {
"type": "string"
},
"@type": {
"type": "string"
},
"dateCreated": {
"type": "string",
"format": "date-time"
},
"text": {
"type": "string"
},
"author": {
"type": "object",
"properties": {
"@id": {
"type": "string"
}
}
}
}
}
}
},
"required": ["@id", "@type"]
}
}
},
"required": ["@context", "@graph"]
} |
I haven't created one, rather I'm using a custom implemented parser (after using the built-in |
The main issue is that RO-CRATE specifies in principle an RDF graph, serialized as flattened and compacted JSON-LD (@graph with a list of nodes). This makes any syntactical validation (like JSON-SCHEMA) limited, in comparison to semantic RDF-"SCHEMAS" like SHACL (which, vise verse, have their limits in adaption and tool availability) However, looking more closely, the RO-CRATE spec is not pure semantically but also syntactically:
Pure semantically would mean, e.g. "the triple {
"@context": "https://w3id.org/ro/crate/1.1/context",
"@type": "Dataset"
} {
"@context": [
"https://w3id.org/ro/crate/1.1/context",
{
"type": "@type"
}
],
"type": "Dataset"
} In addition, RO-Crate inherits the very lax handling of data type, cardinality, range and required-ness of properties further comlicating validation and consumer implementation. Early ELN-Fileformat Schema Draft{
"@context": [
"https://w3id.org/ro/crate/1.1/context",
{
"id": "@id",
"type": "@type"
}
],
"title": "RO-Crate",
"type": "object",
"version": "0.1.1",
"required": [
"id",
"type",
"conformsTo",
"sdPublisher"
],
"definitions": {
"Thing": {
"type": "object",
"required": [
"type"
],
"properties": {
"type": {
"type": "string",
"default": "Thing"
},
"id": {
"type": "string"
},
"name": {
"type": "string"
},
"description": {
"type": "string"
}
}
},
"Organization": {
"allOf": [
{
"$ref": "#/definitions/Thing"
}
],
"type": "object",
"required": [
"type"
],
"properties": {
"type": {
"type": "string",
"default": "Organization"
},
"url": {
"type": "string",
"format": "url"
},
"areaServed": {
"type": "string"
},
"slogan": {
"type": "string"
},
"logo": {
"type": "string",
"format": "url",
"links": [
{
"href": "{{self}}",
"type": "img/png"
}
]
},
"parentOrganization": {
"$ref": "#/definitions/Organization"
}
},
"options": {
"display_required_only": true
}
},
"Person": {
"allOf": [
{
"$ref": "#/definitions/Thing"
}
],
"title": "Person",
"type": "object",
"properties": {
"email": {
"type": "string",
"format": "email"
},
"familyName": {
"type": "string"
},
"givenName": {
"type": "string"
}
}
},
"CreativeWork": {
"allOf": [
{
"$ref": "#/definitions/Thing"
}
],
"title": "CreativeWork",
"type": "object",
"properties": {
"dateCreated": {
"type": "string",
"format": "datetime-local",
"options": {
"flatpickr": {}
}
},
"dateModified": {
"type": "string",
"format": "datetime-local",
"options": {
"flatpickr": {}
}
},
"keywords": {
"type": "array",
"items": {
"type": "string"
}
}
}
},
"File": {
"allOf": [
{
"$ref": "#/definitions/CreativeWork"
}
],
"title": "File",
"type": "object",
"id": "file",
"required": [
"type",
"id"
],
"properties": {
"type": {
"type": "string",
"enum": [
"File"
]
},
"_id": {
"type": "string"
},
"encodingFormat": {
"type": "string"
},
"id": {
"type": "string",
"format": "url",
"options": {
"upload": {
"upload_handler": "testUploadHandler"
}
},
"links": [
{
"href": "dummy",
"rel": "view / download"
}
]
}
},
"not": { "required": [ "hasPart" ] },
"options": {
"display_required_only": true
}
},
"Dataset": {
"allOf": [
{
"$ref": "#/definitions/CreativeWork"
}
],
"title": "Dataset",
"type": "object",
"required": [
"type",
"id"
],
"properties": {
"type": {
"type": "string",
"enum": [
"Dataset"
]
},
"about": {
"$ref": "#/definitions/Thing"
},
"hasPart": {
"type": "array",
"format": "tabs",
"items": {
"discriminator": {
"propertyName": "type",
"mapping": {
"File": "#/definitions/File",
"Dataset": "#/definitions/Dataset"
}
},
"oneOf": [
{
"$ref": "#/definitions/File"
},
{
"$ref": "#/definitions/Dataset"
}
]
}
}
},
"options": {
"display_required_only": true
}
}
},
"properties": {
"id": {
"type": "string",
"enum": [
"ro-crate-metadata.json"
]
},
"type": {
"type": "string",
"enum": [
"CreativeWork"
]
},
"conformsTo": {
"type": "string",
"enum": [
"https://w3id.org/ro/crate/1.1"
]
},
"version": {
"type": "string"
},
"sdPublisher": {
"$ref": "#/definitions/Organization"
},
"about": {
"$ref": "#/definitions/Dataset",
"properties": {
"id": {
"enum": [
"./"
]
}
},
"default": {
"hasPart": []
}
}
},
"options": {
"_display_required_only": true
}
} This would provide use three outcomes:
As a demo for 1. and 3., For 2., throwing the same schema at OO-LD Python playground gives us pydantic dataclasses both for validation and implementation. Generated Dataclassesfrom __future__ import annotations
from enum import Enum
from typing import List, Literal, Optional, Union
from pydantic import BaseModel, EmailStr, Field
class Id(Enum):
ro_crate_metadata_json = "ro-crate-metadata.json"
class Type(Enum):
CreativeWork = "CreativeWork"
class ConformsTo(Enum):
https___w3id_org_ro_crate_1_1 = "https://w3id.org/ro/crate/1.1"
class Thing(BaseModel):
type: str
id: Optional[str] = None
name: Optional[str] = None
description: Optional[str] = None
class Organization(Thing):
type: str
url: Optional[str] = None
areaServed: Optional[str] = None
slogan: Optional[str] = None
logo: Optional[str] = Field(None, links=[{"href": "{{self}}", "type": "img/png"}])
parentOrganization: Optional[Organization] = None
class Person(Thing):
email: Optional[EmailStr] = None
familyName: Optional[str] = None
givenName: Optional[str] = None
class CreativeWork(Thing):
dateCreated: Optional[str] = Field(None, options={"flatpickr": {}})
dateModified: Optional[str] = Field(None, options={"flatpickr": {}})
keywords: Optional[List[str]] = None
class Type1(Enum):
File = "File"
class File(CreativeWork):
type: Literal["File"]
field_id: Optional[str] = Field(None, alias="_id")
encodingFormat: Optional[str] = None
id: str = Field(
...,
links=[{"href": "dummy", "rel": "view / download"}],
options={"upload": {"upload_handler": "testUploadHandler"}},
)
class Type2(Enum):
Dataset = "Dataset"
class ROCrate(BaseModel):
id: Id
type: Type
conformsTo: ConformsTo
version: Optional[str] = None
sdPublisher: Organization
about: Optional[Dataset] = Field(
default_factory=lambda: Dataset.parse_obj({"hasPart": []})
)
class Dataset(CreativeWork):
type: Literal["Dataset"]
about: Optional[Thing] = None
hasPart: Optional[List[Union[File, Dataset]]] = Field(None, discriminator="type")
id: str While this approach never forbidds additional properties we can easily define in a machine readable way which properties we expect to be used and how we expect them to be used. Also we can define subclasses of What do you think? |
We are currently on the way implementing RO-Crate / ELN Fileformat for OpenSemanticLab.
On the way we will create a validation schema using OO-LD.
As discussed with @SteffenBrinckmann this issue to share the work early and provide a first preview (code):
playground example
The text was updated successfully, but these errors were encountered: