This repository is intended to be the source of truth for EGA's metadata schemas. For data to be submitted to the EGA-archive, apart from the data files themselves (e.g. .bam
files), you will need an appropriate metadata architecture both describing your data (e.g. what sequencing technology was used) and the underlying relationships
between its objects (e.g. what experiments encompass which samples).
The new metadata model (see diagram) and JSON schemas are in development, and thus are expected to change often until they are officially released. Meanwhile, submissions shall be done through the usual procedures (see Submission
tab at ega-archive.org)
Depending on the nature of your data (raw sequences, variant calling, arrays...) the metadata and its submission procedure will differ:
- Array based metadata: must be submitted using EGA submitter portal and completing the Array-based format (AF) spreadsheet (direct download).
- Sequence based metadata: can be submitted either using the EGA submitter portal or through the programmatic submission procedure. For the latter you will need to create correctly formatted XMLs containing your
metadata:
- You will find examples of such XMLs (one file for each metadata object) within this repository: (1) descriptive XMLs display what type of information corresponds to which part of the XML's structure; (2) true example XMLs contain fabricated information for you to see what a finished (and ready to be submitted) XML would look like.
- To ease this process, you could make use of the tool star2xml. Follow its README to create these XMLs from the given
joint template
.