Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to parse a xml buffer with encoded url to xml using parseString #498

Open
chetan-plrch opened this issue Jan 23, 2019 · 7 comments
Open

Comments

@chetan-plrch
Copy link

No description provided.

@chetan-plrch
Copy link
Author

Got this error,

Failed Error parsing, in parseString Error: Invalid character in entity name
Line: 64
Column: 285
Char: %
at error (/home/chetan/company/lambdas/cmsSitemapCreator/lambda/node_modules/sax/lib/sax.js:667:10)
at strictFail (/home/chetan/company/lambdas/cmsSitemapCreator/lambda/node_modules/sax/lib/sax.js:693:7)
at SAXParser.write (/home/chetan/company/lambdas/cmsSitemapCreator/lambda/node_modules/sax/lib/sax.js:1504:13)
at Parser.exports.Parser.Parser.parseString (/home/chetan/company/lambdas/cmsSitemapCreator/lambda/node_modules/xml2js/lib/parser.js:322:31)
at Parser.parseString (/home/chetan/company/lambdas/cmsSitemapCreator/lambda/node_modules/xml2js/lib/parser.js:5:59)
at exports.parseString (/home/chetan/company/lambdas/cmsSitemapCreator/lambda/node_modules/xml2js/lib/parser.js:354:19)
at Promise (/home/chetan/company/lambdas/cmsSitemapCreator/lambda/sitemapHandler.js:233:9)
at new Promise ()
at convertXmlToJson (/home/chetan/company/lambdas/cmsSitemapCreator/lambda/sitemapHandler.js:232:12)
at editEntriesFromSitemap (/home/chetan/company/lambdas/cmsSitemapCreator/lambda/sitemapHandler.js:20:32)
at
at process._tickDomainCallback (internal/process/next_tick.js:229:7)

@Leonidas-from-XIV
Copy link
Owner

Looks like you have an entity with invalid characters in your XML. How does the XML look like at the location the error message states?

@chetan-plrch
Copy link
Author

i am having xml buffer to parse

@chetan-plrch
Copy link
Author

chetan-plrch commented Jan 24, 2019

<Buffer 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31 2e 30 22 20 65 6e 63 6f 64 69 6e 67 3d 22 55 54 46 2d 38 22 3f 3e 0a 3c 3f 78 6d 6c 2d 73 74 79 6c 65 ... >

@chetan-plrch
Copy link
Author

chetan-plrch commented Jan 24, 2019

<urlset
    xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
    xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"
    xmlns:xhtml="http://www.w3.org/1999/xhtml"
    xmlns:mobile="http://www.google.com/schemas/sitemap-mobile/1.0"
    xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
    xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
    <url>
        <loc>https://domainname.com/ideal-y5vrwp5ia7</loc>
        <lastmod>2018-12-26T09:23:01.743Z</lastmod>
        <image:image>
            <image:loc>https://images.domainname.com/cs/1/b3/Are%20the%20ideal%20features%20&%20functionalities%20of%20Lab%20Information%20Management%20Webiste%20and%20its%20Mobile%20app1545628606621.jpg</image:loc>
        </image:image>
    </url>
</urlset>

@chetan-plrch
Copy link
Author

Had to use strict: false and normalizeTags: true, and everything worked fine. besides the xmlns arrtributes are appearing in Uppercase didn't expect that.

@chetan-plrch
Copy link
Author

How to make xmlns attributes to lowerCase once { strict: false, normalizeTags: true } is passed as options for parseString

Output

   { '$': 
      { XMLNS: 'http://www.sitemaps.org/schemas/sitemap/0.9',
        'XMLNS:NEWS': 'http://www.google.com/schemas/sitemap-news/0.9',
        'XMLNS:XHTML': 'http://www.w3.org/1999/xhtml',
        'XMLNS:MOBILE': 'http://www.google.com/schemas/sitemap-mobile/1.0',
        'XMLNS:IMAGE': 'http://www.google.com/schemas/sitemap-image/1.1',
        'XMLNS:VIDEO': 'http://www.google.com/schemas/sitemap-video/1.1' },
     url: 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants