Skip to content

Fastest JavaScript implementation of the porter2 stemming algorithm

License

Notifications You must be signed in to change notification settings

eilvelia/porter2.js

Repository files navigation

porter2   npm CI

Fast JavaScript implementation of the porter2 English stemming algorithm.

$ npm install porter2

Usage

The package is simple: it has no dependencies and exports a single function named stem.

Import using CommonJS:

const { stem } = require('porter2')

Or, import using EcmaScript Modules (through interopability with CommonJS):

import { stem } from 'porter2'

Use the stemmer:

const word = stem('animadversion')
console.log(word) //=> animadvers

This stemmer expects lowercase text.

The code is compatible with ES5. TypeScript type declarations are included.

Benchmarks

On my machine, the 29.4k test suite executes in ~9.5ms (~3M/s throughput) in a hot loop (~70ms for the first run).

Here is a comparison with some other libraries (you probably should take it with a little grain of salt):

library throughput (node) throughput (bun)
porter2.js 3118 kops/s 3283 kops/s
stemr 342 kops/s 367 kops/s
wink-porter2-stemmer 1 162 kops/s 174 kops/s

Here are libraries that implement older porter 1 (note the behavior is not identical):

library throughput (node) throughput (bun)
porter-stemmer-js 2 1422 kops/s 1484 kops/s
stemmer 3 1064 kops/s 623 kops/s
@stdlib/nlp-porter-stemmer 842 kops/s 685 kops/s
porter-stemmer 497 kops/s 520 kops/s

The benchmark code is in bench/run.mjs.

This is tested with Node.js v20.12.2 and bun v1.1.4. The library versions are latest as of 2024-04-29.

Footnotes

  1. wink-porter2-stemmer is 99.97% porter2 compliant (fails on ' cases only)

  2. That one has similar goals and, surprisingly, was published just 3 days before this package was released! (And after I started working on porter2.js.)

  3. ESM only

About

Fastest JavaScript implementation of the porter2 stemming algorithm

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published