Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.1 to 0.4 upgrade big slowdown #157

Open
Redsandro opened this issue Sep 11, 2014 · 27 comments
Open

0.1 to 0.4 upgrade big slowdown #157

Redsandro opened this issue Sep 11, 2014 · 27 comments

Comments

@Redsandro
Copy link

Hi people,

I have been using node-xml2js v0.1.14's xml2json.Parser().parseString without further options on XML files which kept getting bigger. Now these files up to ~2MB take nearly a second to translate to js.

So I thought, let's update and see if there are speed improvements. However, after I updated node-xml2js to v0.4.4, I noticed that in stead of a speed increase, I get a big slowdown were it now takes 8 to 10 seconds.

Are there some special options necessary in the 0.4 version? Or is this module just not meant for bigger XML? Something else I'm missing?

For now I'm downgrading to the much faster version 0.1, and I'll continue my search for fast converters.

@Leonidas-from-XIV
Copy link
Owner

Are you using the 0.1 settings or 0.2 settings in xml2js?

@Redsandro
Copy link
Author

I don't set any options/settings, so I am guessing it's the 0.1 settings since I am using 0.1.14. (?)

After using multiple measurements, I have to correct myself and say that 0.4.4 is 'only' about twice as slow as 0.1.14 in the above issue.

@Leonidas-from-XIV
Copy link
Owner

Then please try with the 0.1 settings in 0.4, see the README and let me know how the numbers look like.

@Redsandro
Copy link
Author

Ok. I'll get back to you.

@Redsandro
Copy link
Author

Trying 0.1.14...
Done.
1998 ms
Trying 0.4.4 using 0.1 defaults...
Done.
3298 ms
Trying 0.4.4 using 0.2 defaults...
Done.
3836 ms

var path        = require('path');
var fs          = require('fs');
var q           = require('q');
    q.longStackSupport = true;

var xml2js01    = require('xml2js0114');
var xml2js04    = require('xml2js');
var p01         = new xml2js01.Parser();
var p04a        = new xml2js04.Parser(xml2js04.defaults["0.1"]);
var p04b        = new xml2js04.Parser(xml2js04.defaults["0.2"]);

var fileName    = 'soccer.xml';
var timeMs;

var pwd         = path.dirname(require.main.filename);
var file        = path.join(pwd, fileName);

var xml         = fs.readFileSync(file, 'utf8');




q(true)
.then(function() {
    console.log('Trying 0.1.14...');
    stopwatch(false);

    return q.nfcall(p01.parseString, xml);
})
.then(function(json) {
    console.log('Done.');
    stopwatch();

    return;
})
.then(function() {
    console.log('Trying 0.4.4 using 0.1 defaults...');
    stopwatch(false);

    return q.nfcall(p04a.parseString, xml);
})
.then(function(json) {
    console.log('Done.');
    stopwatch();

    return;
})
.then(function() {
    console.log('Trying 0.4.4 using 0.2 defaults...');
    stopwatch(false);

    return q.nfcall(p04b.parseString, xml);
})
.then(function(json) {
    console.log('Done.');
    stopwatch();

    return;
})
.fail(function(e){
    console.log(e.stack);
})
.done();



function stopwatch(log) {
    if (timeMs && log !== false)
        console.log((Date.now() - timeMs) + ' ms');

    timeMs = Date.now();
}

These timings are fluctuating +/- 200ms.

@Leonidas-from-XIV
Copy link
Owner

Hmm, yes, for now I'd recommend a faster XML parser. Maybe once I get to finish the htmlparser2 port it will get faster, but that one is not a big priority right now, sorry.

@Redsandro
Copy link
Author

Do you know any other parsers? For now I put node-xml2json and node-xml2object in the same testing setup and they perform equal at best. There was one simple-xml-to-json (iirc) that performed so badly that I removed it from the tests. I also tried a binary non-node parser called xml-json but it also took twice as long.

One would think that a node module using a binary component could outperform pure javascript modules but the availability is meager at best. So I am hoping I'm missing something. There's a lot of XML out there.

@Leonidas-from-XIV
Copy link
Owner

You could try node-xml2js-expat which was forked before I took over so it is at the state of xml2js 0.1.x but replaces saxjs with Expat which is written in C.

For me one of the priorities had been to go without native compilation, but if you need the speed, xml2js is admittedly not the best choice.

@Redsandro
Copy link
Author

Thank you. I will add this to my tests.

And as general words of praise, especially for smaller XML files, xml2js has always been friendly to me.

Just curious, since expat has SAX bindings, could node-expat be a relatively hassle-free drop-in replacement to sax-js in xml2js? An option or switch in source to use a compiled parser might make some people happy; those working with XML files that started young and light but have grown old and ugly. ;)

@Leonidas-from-XIV
Copy link
Owner

It might be possible, I haven't given this any thought. You sure the node-expat binding exports the SAX API?

@Redsandro
Copy link
Author

Actually I am not sure. There is no wiki and I cannot find any documentation.

But the title of the repo is:

node-xmpp/node-expat

libexpat XML SAX parser binding for node.js

And since SAX implies API (Simple Api for Xml) I blatantly assumed it did. :P

@csimi
Copy link

csimi commented Mar 8, 2016

If anyone is still interested, EasySax seems to be a damn fast parser, written in JS.
I had to fix some bugs in easysax myself and integrate it into xml2js but it sped up things pretty well.

sax x 57,812 ops/sec ±7.41% (78 runs sampled)
node-xml x 76,807 ops/sec ±1.75% (87 runs sampled)
libxmljs x 163,375 ops/sec ±2.58% (88 runs sampled)
node-expat x 201,663 ops/sec ±0.76% (84 runs sampled)
easysax x 828,169 ops/sec ±2.59% (86 runs sampled)

I feel there has to be a catch somewhere but I don't see it yet, other than the weird characters in the documentation.

@Redsandro
Copy link
Author

Seems to be streaming. Doesn't convert to object itself.
(Read: Not inline-replacable with node-xml2js)
Am I wrong?

@csimi
Copy link

csimi commented Mar 8, 2016

Yeah, this is regarding to the discussion about the sax-js "backend" of xml2js and replacing it with node-expat.
I've seen a drop in CPU usage of around 50% after replacing sax-js with easysax. Much of the time is still spent on building an object of the whole XML so I'm thinking about just straight-up using the SAX parser.
We're saturating our 100Mbps proxy vms with gzipped XML files so for parsing that much of data every little speedup counts.

@tflanagan
Copy link
Contributor

How does easysax (or any of these other libs) work with browserify?

@Leonidas-from-XIV
Copy link
Owner

@csimi Does your version with the easysax backend pass the unit tests? I am in no means married to sax-js, I just want to avoid a dependency that has to be compiled.

On the other hand, if you saturate your network connection, serializing XML into an object is probably going to be inherently expensive, if you have high performance in mind a streaming solution like raw SAX or similar might indeed be preferable.

@tflanagan
Copy link
Contributor

@Leonidas-from-XIV, if you replace the parser with one that does not work 100% with browserify, then I will be forced to fork it.

Can we introduce an external hook rather than outright replacing it? Side effects could be huge - If one of these replacement libs is actually async, unlike sax-js (because of eventemitter), then a lot of people will come screaming.

@Leonidas-from-XIV
Copy link
Owner

@tflanagan I understand your issue and will try to take it into account.

Your complaint is actually why I would be agains supporting multiple backends, because then the semantics of the library might change in unforeseen ways and some people will be surprised trying to have the same behaviour everywhere might end up an uphill battle for little benefit. I'd rather have one solid backend that works for everybody.

@kyrylkov
Copy link

@Redsandro Russian intro on easysax page actually says it's not streaming

@kyrylkov
Copy link

@csimi can you publish your xml2js + easysax repo?

@kyrylkov
Copy link

My run on a 46kB XML file:

sax x 66.98 ops/sec ±2.62% (50 runs sampled)
node-xml x 70.33 ops/sec ±2.56% (54 runs sampled)
libxmljs x 162 ops/sec ±2.46% (62 runs sampled)
node-expat x 116 ops/sec ±3.50% (59 runs sampled)
ltx x 236 ops/sec ±3.67% (60 runs sampled)
EasySax x 1,167 ops/sec ±4.62% (59 runs sampled)
Fastest is EasySax

@Redsandro
Copy link
Author

@kyrylkov just for comparison, can you add RapidX2J to this comparison? It will illustrate using RapidXML backend.

@kyrylkov
Copy link

@Redsandro It doesn't seem to compile with Node.js 5.8.0? Does it support Node.js 4.x and 5.x?

@Redsandro
Copy link
Author

@kyrylkov oops no idea actually. I'm running legacy (pre io.js-post-fork-merge) node.js for production reasons.

@Redsandro
Copy link
Author

@kyrylkov Compiles on 4.x according to dev:

damirn/rapidx2j#17

it should work with node 4.4.0:

git clone https://github.com/damirn/rapidx2j.git
npm install rapidx2j/

I just tried it on mac os x w/o issues

@csimi
Copy link

csimi commented Mar 17, 2016

I'll try to put something usable together during the weekend.
More benchmarks:

xml2js x 2.33 ops/sec ±27.49% (10 runs sampled)
xml2js easysax x 11.31 ops/sec ±19.67% (18 runs sampled)
rapidx2j x 28.87 ops/sec ±16.22% (30 runs sampled)
easysax x 129 ops/sec ±15.94% (30 runs sampled)

The easysax bench doesn't actually build a js object, just runs a SAX pass over my 215KiB XML test file (fastest mode, without parsing attributes, etc).

Rapidx2j is pretty fast but just how much time it takes to actually build a JS object is clearly visible.

I feel like EasySax is too fast (compared to expat for example) to be standards compliant. There has to be a catch somewhere.

@Redsandro
Copy link
Author

how much time it takes to actually build a JS object is clearly visible.

How do you mean? I need to create that JS object anyhow, so I prefer the module takes care of that. In the case of rapidx2j, the js object is built in the compiled code. I think that's why it's so much faster. In the case of node-xml2js, it happens in javascript, which is slower. I believe this overhead will be added to EasySax.

Is there a (simple) way to use EasySax for getting the JS object out of xml? I'd like to see the time it takes including building the JS object. But in the docs it seems to just throw events on every element encountered (that's what I assumed was streaming, as SAX works like that too).

So far, rapidx2j seems the fastest but I cannot use it without forking, as the current module implements a custom lossy standard, changing caps and such.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants