Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing Big GML Files #22

Open
bermud opened this issue Mar 23, 2016 · 2 comments
Open

Testing Big GML Files #22

bermud opened this issue Mar 23, 2016 · 2 comments

Comments

@bermud
Copy link
Contributor

bermud commented Mar 23, 2016

Seems that testing Big GML files is a problem 100 MBs or larger.

Discussed in OGC CITE SC Meeting - Washington TC 2016

Plan:

  • set up a GitHub project to only tackle this problem
  • it should:
    • validate the schema
    • validate the file using a schematron file
    • do some kind of geometry validation

We will test a 300 MB file, which is a file that normal validators (e.g. Oxygen, XML Spy, etc.) cannot handle. The files are here:

  • CDDA_SITES_10000.zip: A 300 MB xml file, with 10,000 features of a pan-European dataset, relevant to INSPIRE Protected Sites data theme
  • ProtectedSites_v4.0.sch.zip: a schematron file relevant to the INSPIRE Protected Sites theme
@bermud bermud assigned keshavnangare and unassigned rjmartell May 19, 2016
@bermud bermud changed the title Testing Big GML files Testing Big GML Files May 19, 2016
@bermud
Copy link
Contributor Author

bermud commented Jun 7, 2016

From Richard's ideas, next steps might be:

  • implement a utility class to split up very large GML documents (GmlDocumentSplitter).
  • experiment with various chunk sizes to determine what works best (e.g. if doc size > 100 MB, split it into 50 MB chunks)
  • if necessary, do Schematron validation of the resulting chunks in parallel

For example:

It should be possible to split any GML document into a desired chunk size at feature member boundaries.
For example, consider a utility class GmlDocumentSplitter. It might have a utility method that accepts the desired chunk size as an input parameter:

public File splitDocument(File gmlFile, int chunkSizeMB) {
// return first file in set: dataFile-01 (dataFile-02, dataFile-03, …)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To do
Development

No branches or pull requests

5 participants