Remove the JSON from Qleverfile #117

tpluscode · 2025-01-17T10:35:53Z

I have been looking into Qleverfiles recently and I find the JSON keys a little awkward.

Take the [index] section from wikidata example

[index]
INPUT_FILES      = latest-all.ttl.bz2 latest-lexemes.ttl.bz2 dcatap.nt
MULTI_INPUT_JSON = [{ "cmd": "lbzcat -n 4 latest-all.ttl.bz2", "format": "ttl", "parallel": "true" },
                    { "cmd": "lbzcat -n 1 latest-lexemes.ttl.bz2", "format": "ttl", "parallel": "false" },
                    { "cmd": "cat dcatap.nt", "format": "nt", "parallel": "false" }]
SETTINGS_JSON    = { "languages-internal": [], "prefixes-external": [""], "locale": { "language": "en", "country": "US", "ignore-punctuation": true }, "ascii-prefixes-only": true, "num-triples-per-batch": 5000000 }
STXXL_MEMORY     = 10G

I'd propose to allow "dynamic" sections to break up the complex objects from MULTI_INPUT_JSON and SETTINGS_JSON

[index]
INPUT_FILES  = latest-all.ttl.bz2 latest-lexemes.ttl.bz2 dcatap.nt
STXXL_MEMORY = 10G

[index.latest-all]
CMD      = lbzcat -n 4 latest-all.ttl.bz2
FORMAT   = ttl
PARALLEL = true"

[index.latest-lexemes]
CMD      = lbzcat -n 1 latest-lexemes.ttl.bz2
FORMAT   = ttl
PARALLEL = false

[index.latest-lexemes]
CMD      = cat dcatap.nt
FORMAT   = nt
PARALLEL = false

[index.settings]
LANGUAGES-INTERNAL    = []
PREFIXES-EXTERNAL     = [""]
ASCII-PREFIXES-ONLY   = true
NUM-TRIPLES-PER-BATCH = 5000000

[index.settings.locale]
LANGUAGE           = en
COUNTRY            = US
IGNORE-PUNCTUATION = true

Alternatively, you could support YAML instead. That would for example allow you to publish a JSON Schema for validation and editor suggestions

PS
The standard way to write locale in this case would be like en-US. Curious why it's broken up as two keys

The text was updated successfully, but these errors were encountered:

hannahbast · 2025-01-17T12:29:13Z

@tpluscode Thank you for the comment. Can you be more specific on why you find the JSON awkward? By the way, the [...] are not needed for the MULTI_INPUT_JSON, a JSONL is fine as well.

tpluscode · 2025-01-17T12:48:45Z

First thing is readability. JSON also feels alien to the textual format used. Like... putting binary columns in a relational database.

And the current formatting requires to jump some hoops to stringify it just right. From JavaScript, this is not exactly JSON.stringify so some additional code is necessary to get the right output.

hannahbast · 2025-02-04T01:44:34Z

@tpluscode Coming back to this after three stressful weeks. I am not a fan of the "dynamic" sections. But I agree that YAML would be a natural format.

When we started, we wanted something that is as simple as possible and no more complex than necessary. YAML is the way to go when you have multiline stuff, where indentation is important for readability. For most Qleverfiles, this is what we want now (either sequences of commands or SPARQL queries or both).

The switch will be a bit annoying because we will have to support the old format for a while. But that's life.

ktk · 2025-02-13T12:38:09Z

@hannahbast let's talk about how we can do that together. We can contribute validation & things around it so it becomes easier to work with it. @tpluscode and @ludovicm67 have ideas/experience in it.

tpluscode mentioned this issue Jan 17, 2025

Multiple commands to get data #118

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the JSON from Qleverfile #117

Remove the JSON from Qleverfile #117

tpluscode commented Jan 17, 2025 •

edited

Loading

hannahbast commented Jan 17, 2025

tpluscode commented Jan 17, 2025

hannahbast commented Feb 4, 2025

ktk commented Feb 13, 2025

Remove the JSON from Qleverfile #117

Remove the JSON from Qleverfile #117

Comments

tpluscode commented Jan 17, 2025 • edited Loading

hannahbast commented Jan 17, 2025

tpluscode commented Jan 17, 2025

hannahbast commented Feb 4, 2025

ktk commented Feb 13, 2025

tpluscode commented Jan 17, 2025 •

edited

Loading