Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data missing in the result (and other problems) #260

Open
jangxx opened this issue Nov 18, 2015 · 7 comments
Open

Data missing in the result (and other problems) #260

jangxx opened this issue Nov 18, 2015 · 7 comments

Comments

@jangxx
Copy link

jangxx commented Nov 18, 2015

So I'm getting this data from an api, but when I parse it with xml2js fields are added, renamed and missing.

Input XML:

<?xml version="1.0" encoding="UTF-8"?>
<searchRetrieveResponse xmlns="http://www.loc.gov/zing/srw/"><version>1.1</version><numberOfRecords>1</numberOfRecords><records><record><recordSchema>oai_dc</recordSchema><recordPacking>xml</recordPacking><recordData><dc xmlns:dnb="http://d-nb.de/standards/dnbterms" xmlns="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <dc:title>Elektronik ohne Ballast : Einf. in d. Schaltungstechnik d. industriellen Elektronik; mit 3 Tab. / von Otto Limann</dc:title>
  <dc:creator>Limann, Otto</dc:creator>
  <dc:publisher>München : Franzis-Verlag</dc:publisher>
  <dc:date>1973</dc:date>
  <dc:identifier xmlns:tel="http://krait.kb.nl/coop/tel/handbook/telterms.html" xsi:type="tel:ISBN">3-7723-5613-3 kart. : DM 30.00</dc:identifier>
  <dc:identifier xsi:type="dnb:IDN">740202677</dc:identifier>
  <dc:subject>20a Technik, Industrie, Gewerbe</dc:subject>
  <dc:format>396 S.</dc:format>
</dc></recordData><recordPosition>1</recordPosition></record></records><nextRecordPosition>2</nextRecordPosition><echoedSearchRetrieveRequest><version>1.1</version><query>3772356133</query><xQuery xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/></echoedSearchRetrieveRequest><extraResponseData><accountOf xmlns="">S*** SRU</accountOf></extraResponseData></searchRetrieveResponse>

The outputted object is unexpectedly huge:

{ searchRetrieveResponse: 
   { '$': { xmlns: 'http://www.loc.gov/zing/srw/' },
     version: [ '1.1' ],
     numberOfRecords: [ '1' ],
     records: 
      [ { record: 
           [ { recordSchema: [ 'RDFxml' ],
               recordPacking: [ 'xml' ],
               recordData: 
                [ { 'rdf:RDF': 
                     [ { '$': 
                          { 'xmlns:gndo': 'http://d-nb.info/standards/elementset/gnd#',
                            'xmlns:marcRole': 'http://id.loc.gov/vocabulary/relators/',
                            'xmlns:lib': 'http://purl.org/library/',
                            'xmlns:owl': 'http://www.w3.org/2002/07/owl#',
                            'xmlns:skos': 'http://www.w3.org/2004/02/skos/core#',
                            'xmlns:rdfs': 'http://www.w3.org/2000/01/rdf-schema#',
                            'xmlns:geo': 'http://www.opengis.net/ont/geosparql#',
                            'xmlns:umbel': 'http://umbel.org/umbel#',
                            'xmlns:rdau': 'http://rdaregistry.info/Elements/u/',
                            'xmlns:sf': 'http://www.opengis.net/ont/sf#',
                            'xmlns:rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
                            'xmlns:dcterms': 'http://purl.org/dc/terms/',
                            'xmlns:bibo': 'http://purl.org/ontology/bibo/',
                            'xmlns:isbd': 'http://iflastandards.info/ns/isbd/elements/',
                            'xmlns:foaf': 'http://xmlns.com/foaf/0.1/',
                            'xmlns:dc': 'http://purl.org/dc/elements/1.1/' },
                         'rdf:Description': 
                          [ { '$': { 'rdf:about': 'http://d-nb.info/740202677' },
                              'rdf:type': [ { '$': { 'rdf:resource': 'http://purl.org/ontology/bibo/Document' } } ],
                              'dcterms:medium': [ { '$': { 'rdf:resource': 'http://rdvocab.info/termList/RDACarrierType/1044' } } ],
                              'owl:sameAs': [ { '$': { 'rdf:resource': 'http://hub.culturegraph.org/resource/DNB-740202677' } } ],
                              'bibo:isbn10': 
                               [ { _: '3772356133',
                                   '$': { 'rdf:datatype': 'http://www.w3.org/2001/XMLSchema#string' } } ],
                              'rdau:P60521': 
                               [ { _: 'kart. : DM 30.00',
                                   '$': { 'rdf:datatype': 'http://www.w3.org/2001/XMLSchema#string' } } ],
                              'dc:identifier': 
                               [ { _: '(OColc)74126361',
                                   '$': { 'rdf:datatype': 'http://www.w3.org/2001/XMLSchema#string' } } ],
                              'dc:title': 
                               [ { _: 'Elektronik ohne Ballast',
                                   '$': { 'rdf:datatype': 'http://www.w3.org/2001/XMLSchema#string' } } ],
                              'dcterms:creator': [ { '$': { 'rdf:resource': 'http://d-nb.info/gnd/105782297' } } ],
                              'rdau:P60163': 
                               [ { _: 'München',
                                   '$': { 'rdf:datatype': 'http://www.w3.org/2001/XMLSchema#string' } } ],
                              'dc:publisher': 
                               [ { _: 'Franzis-Verlag',
                                   '$': { 'rdf:datatype': 'http://www.w3.org/2001/XMLSchema#string' } } ],
                              'rdau:P60333': 
                               [ { _: 'München : Franzis-Verlag, 1973',
                                   '$': { 'rdf:datatype': 'http://www.w3.org/2001/XMLSchema#string' } } ],
                              'isbd:P1053': 
                               [ { _: '396 S.',
                                   '$': { 'rdf:datatype': 'http://www.w3.org/2001/XMLSchema#string' } } ],
                              'dcterms:issued': 
                               [ { _: '1973',
                                   '$': { 'rdf:datatype': 'http://www.w3.org/2001/XMLSchema#string' } } ],
                              'rdau:P60493': 
                               [ { _: 'Einf. in d. Schaltungstechnik d. industriellen Elektronik; mit 3 Tab.',
                                   '$': { 'rdf:datatype': 'http://www.w3.org/2001/XMLSchema#string' } } ],
                              'bibo:authorList': 
                               [ { 'rdf:Description': 
                                    [ { '$': { 'rdf:nodeID': 'node1a3jmb6mbx1324003' },
                                        'rdf:type': [ { '$': { 'rdf:resource': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq' } } ],
                                        'rdf:_1': [ { '$': { 'rdf:resource': 'http://d-nb.info/gnd/105782297' } } ] } ] } ] } ] } ] } ],
               recordPosition: [ '1' ] } ] } ],
     nextRecordPosition: [ '2' ],
     echoedSearchRetrieveRequest: 
      [ { version: [ '1.1' ],
          query: [ '3772356133' ],
          xQuery: 
           [ { '$': 
                { 'xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
                  'xsi:nil': 'true' } } ],
          recordSchema: [ 'RDFxml' ] } ],
     extraResponseData: [ { accountOf: [ { _: 'S*** SRU', '$': { xmlns: '' } } ] } ] } }

Not only are there new elements which aren't present in the original XML like rdau:P60493, but some like dc:creator are missing and replaced with dcterms:creator - a link to a resource. Is this correct behavior? Is there an option to parse the XML 'as-is', ignoring the namespaces (or whatever is causing this)?

@jangxx jangxx changed the title Result is missing data Data missing in the result (and other problems) Nov 18, 2015
@kaltri-n
Copy link

Hi, have you already solved this issue? Since I use the same API and have same problems.

@jangxx
Copy link
Author

jangxx commented Mar 22, 2018

No, I switched to marc4js, which was able to parse the data correctly.

Edit: Important to note: I was parsing bibliographic data, which was available in many different formats. The one I tried and failed to parse above was Dublin Core. Not only did I switch to marc4js, I also changed my requests to request data in MARCXML format.

@Leonidas-from-XIV
Copy link
Owner

I am very confused, since there isn't really a way that xml2js would invent new elements.

@jangxx
Copy link
Author

jangxx commented Mar 28, 2018

Well, I share your confusion, which is why I raised this issue in the first place.

@Leonidas-from-XIV
Copy link
Owner

Can you post some minimal code to reproduce?

@jangxx
Copy link
Author

jangxx commented Mar 31, 2018

I'm gonna pass that question on to @kaltri-n since they seem to be currently using (or at least trying to use) this library. The code with which I initially stumbled across this issue is long gone.

@kaltri-n
Copy link

kaltri-n commented Apr 3, 2018

Hi guys,
I failed to parse the data in Dublin Core and so far I haven't looked up at other formats. So, unfortunately, I cannot be helpful in here :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants