html-dom-parser

HTML to DOM parser that works on both the server (Node.js) and the client (browser):

HTMLDOMParser(string[, options])

The parser converts an HTML string to a JavaScript object that describes the DOM tree.

Example

const parse = require('html-dom-parser');
parse('<p>Hello, World!</p>');

Output:

[
  Element {
    type: 'tag',
    parent: null,
    prev: null,
    next: null,
    startIndex: null,
    endIndex: null,
    children: [
      Text {
        type: 'text',
        parent: [Circular],
        prev: null,
        next: null,
        startIndex: null,
        endIndex: null,
        data: 'Hello, World!'
      }
    ],
    name: 'p',
    attribs: {}
  }
]

Replit | JSFiddle | Examples

Install

NPM:

npm install html-dom-parser --save

Yarn:

yarn add html-dom-parser

CDN:

<script src="https://unpkg.com/html-dom-parser@latest/dist/html-dom-parser.min.js"></script>
<script>
  window.HTMLDOMParser(/* string */);
</script>

Usage

Import or require the module:

// ES Modules
import parse from 'html-dom-parser';

// CommonJS
const parse = require('html-dom-parser');

Parse empty string:

parse('');

Output:

[]

Parse string:

parse('Hello, World!');

[
  Text {
    type: 'text',
    parent: null,
    prev: null,
    next: null,
    startIndex: null,
    endIndex: null,
    data: 'Hello, World!'
  }
]

Parse element with attributes:

parse('<p class="foo" style="color: #bada55">Hello, <em>world</em>!</p>');

Output:

[
  Element {
    type: 'tag',
    parent: null,
    prev: null,
    next: null,
    startIndex: null,
    endIndex: null,
    children: [ [Text], [Element], [Text] ],
    name: 'p',
    attribs: { class: 'foo', style: 'color: #bada55' }
  }
]

The server parser is a wrapper of htmlparser2 parseDOM but with the root parent node excluded. The next section shows the available options you can use with the server parse.

The client parser mimics the server parser by using the DOM API to parse the HTML string.

Options (server only)

Because the server parser is a wrapper of htmlparser2, which implements domhandler, you can alter how the server parser parses your code with the following options:

/**
 * These are the default options being used if you omit the optional options object.
 * htmlparser2 will use the same options object for its domhandler so the options
 * should be combined into a single object like so:
 */
const options = {
    /**
     * Options for the domhandler class.
     * https://github.com/fb55/domhandler/blob/master/src/index.ts#L16
     */
    withStartIndices: false,
    withEndIndices: false,
    xmlMode: false,
    /**
     * Options for the htmlparser2 class.
     * https://github.com/fb55/htmlparser2/blob/master/src/Parser.ts#L104
     */ 
    xmlMode: false, // Will overwrite what is used for the domhandler, otherwise inherited.
    decodeEntities: true,
    lowerCaseTags: true, // !xmlMode by default
    lowerCaseAttributeNames: true, // !xmlMode by default
    recognizeCDATA: false, // xmlMode by default
    recognizeSelfClosing: false, // xmlMode by default
    Tokenizer: Tokenizer
};

If you are parsing HTML with SVG code you can set lowerCaseTags to true without having to enable xmlMode. Keep in mind this will return all tag names in camel-case and not the HTML standard of lowercase.

Note: If you are parsing code client-side (in-browser), you can not control the parsing options. Client-side parsing automatically handles returning some HTML tags in camel-case, such as specific SVG elements, but returns all other tags lowercased according to the HTML standard.

Testing

Run server and client tests:

npm test

Generate HTML coverage report for server tests:

npx nyc report --reporter=html

Lint files:

npm run lint
npm run lint:fix

Test TypeScript declaration file for style and correctness:

npm run lint:dts

Migration

v3.0.0

domhandler has been upgraded to v5 so some parser options like normalizeWhitespace have been removed.

Security contact information

To report a security vulnerability, please use the Tidelift security contact. Tidelift will coordinate the fix and disclosure.

Name		Name	Last commit message	Last commit date
Latest commit History 1,053 Commits
.github		.github
.husky		.husky
examples		examples
lib		lib
test		test
.commitlintrc.json		.commitlintrc.json
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.lintstagedrc.json		.lintstagedrc.json
.npmrc		.npmrc
.prettierrc.json		.prettierrc.json
.size-limit.json		.size-limit.json
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
index.d.ts		index.d.ts
index.js		index.js
index.mjs		index.mjs
karma.conf.js		karma.conf.js
package.json		package.json
rollup.config.js		rollup.config.js
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

html-dom-parser

Example

Install

Usage

Options (server only)

Testing

Migration

v3.0.0

Security contact information

Release

Special Thanks

License

About

Releases

Packages

Languages

License

mashmatrix/html-dom-parser

Folders and files

Latest commit

History

Repository files navigation

html-dom-parser

Example

Install

Usage

Options (server only)

Testing

Migration

v3.0.0

Security contact information

Release

Special Thanks

License

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages