Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling HTML Entities #450

Closed
boxfoot opened this issue Apr 27, 2018 · 7 comments
Closed

Handling HTML Entities #450

boxfoot opened this issue Apr 27, 2018 · 7 comments

Comments

@boxfoot
Copy link

boxfoot commented Apr 27, 2018

I'm using xml2js to transform an XML file that I don't have much control over. I don't want to make any changes to the XML besides the specific changes I'm trying to make.

I'm running into an issue with HTML entities where build(parse(xml)) is not a reversible process. Easiest to show with an example:

var xml2js = require('xml2js');
var builder = new xml2js.Builder({headless:true});
var xml = "<root>Hello &quot;xml2js&quot;!\r\n</root>"

xml2js.parseString(xml, function (err, result) {
    console.log(builder.buildObject(result));
});
// logs: "<root>Hello \"xml2js\"!&#xD;\n</root>"

As you can see, there are two issues here:

  1. &quot; in the input is transformed to \"
  2. \r in the input is transformed to #xD;

I've been looking through the api options and can't figure out a way to hook in and change this behavior. Any ideas, or is this a bug?

@boxfoot
Copy link
Author

boxfoot commented Apr 27, 2018

Dug around some more and realized I could fix this if able to pass stringify options through to xmlbuilder. I made PR #451 to do that, which enables my desired behavior like this:

var xml2js = require('xml2js');

var builder = new xml2js.Builder({
	headless: true,
	stringify: {
		elEscape: function (str) {
			var ampregex;
			ampregex = this.noDoubleEncoding ? /(?!&\S+;)&/g : /&/g;
			return str
				.replace(ampregex, '&amp;')
				.replace(/</g, '&lt;')
				.replace(/>/g, '&gt;')
				.replace(/"/g, '&quot;')
		}
	}
});

var xml = "<root>Hello &quot;xml2js&quot;!\r\n</root>";

xml2js.parseString(xml, function (err, result) {
    console.log(builder.buildObject(result));
});
// logs: "<root>Hello &quot;xml2js&quot;!\r\n</root>"

@jcsahnwaldt
Copy link
Contributor

I don't think it makes sense for xml2js to handle these differences, because in a way, there are no differences. Any reasonable XML parser will treat the following inputs exactly the same:

<root>"xml2js"!</root>
<root>&quot;xml2js&quot;!</root>
<root>&#x22;xml2js&#x22;!</root>
<root>&#x22;xml2js&#x22;&#x21;</root>
<root>&#x22;&#x78;&#x6D;&#x6C;&#x32;&#x6A;&#x73;&#x22;&#x21;</root>

Of course, these look like very different strings, but interpreted according to the XML specification, they are equivalent - they simply use different encodings for certain characters. It's the same for whitespace: CRLF can be represented as \r\n in JavaScript, but as &#xD;&#xA; or &#13;&#10; (and several other ways) in XML.

If you run into problems because of such syntactic differences, you should change your approach to handling XML. Saying that &quot; and " are different in XML text content is like saying these two JavaScript objects are different:

{foo: "bar"}
{foo: 'bar'}

@jcsahnwaldt
Copy link
Contributor

On the other hand, if xmlbuilder supports it and it's simple to pass through the configuration to xmlbuilder, why not?

@boxfoot
Copy link
Author

boxfoot commented Jun 10, 2018

@jcsahnwaldt While technically these are equivalent but for encodings, the fact remains that other systems may treat them differently. The main use case for XML is data exchange, and unfortunately (despite your assessment of "any reasonable XML parser...") -- we don't always have control over how other systems generate and consume their XML. I expect that's why xmlbuilder gives options on how to encode and serialize. Since it's easy to pass through those options, I'd recommend making it easy to do.

@jcsahnwaldt
Copy link
Contributor

Basically the same thing: #165

@boxfoot
Copy link
Author

boxfoot commented Jun 15, 2018

@jcsahnwaldt Since I've prepped PR #451 to address this, would you be open to reviewing that PR? Sounds like it would close this and the other issue I duplicated...

@Leonidas-from-XIV
Copy link
Owner

I agree with @jcsahnwaldt here. The fact which dependencies xml2js uses should not be what defines xml2js's public interface so passing in options to xmlbuilder-js breaks the layering (in a Law of Demeter kind of way).

Especially since as it was said, the XML generated is valid and in correct parsers equivalent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants