Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"preprocessed" headers #3998

Open
ronag opened this issue Jan 11, 2025 · 8 comments
Open

"preprocessed" headers #3998

ronag opened this issue Jan 11, 2025 · 8 comments
Labels
enhancement New feature or request Status: help-wanted This issue/pr is open for contributions

Comments

@ronag
Copy link
Member

ronag commented Jan 11, 2025

When using the same headers for multiple requests it would be nice to bypass much of the header validation and transformation logic we do:

i.e. an API like this should be possible:

const headers = new undici.Headers({ Foo: 'bar' })

for (let n = 0; n < 100; n++) {
  const res = await request(url, { headers })
  await res.dump()
}

Where undici.Headers guarantee that the headers are already in a valid state and we don't need to do anything further. Could even be pre computed into a headers string.

To start with I would makeundici.Headers immutable. Later we could add immutable composition, e.g:

const headers = new undici.Headers({ Foo: 'bar' })

for (let n = 0; n < 100; n++) {
  const res = await request(url, { headers: headers.assign({ Bar: 'foo' }) })
  await res.dump()
}

A quick start (just move logic from core/request constructor and make request access the state through symbols):

class Headers {
  constructor(headers) {
    if (Array.isArray(headers)) {
      if (headers.length % 2 !== 0) {
        throw new InvalidArgumentError('headers array must be even')
      }
      for (let i = 0; i < headers.length; i += 2) {
        processHeader(this, headers[i], headers[i + 1])
      }
    } else if (headers && typeof headers === 'object') {
      if (headers[Symbol.iterator]) {
        for (const header of headers) {
          if (!Array.isArray(header) || header.length !== 2) {
            throw new InvalidArgumentError('headers must be in key-value pair format')
          }
          processHeader(this, header[0], header[1])
        }
      } else {
        const keys = Object.keys(headers)
        for (let i = 0; i < keys.length; ++i) {
          processHeader(this, keys[i], headers[keys[i]])
        }
      }
    } else if (headers != null) {
      throw new InvalidArgumentError('headers must be an object or an array')
    }
  }
}


function processHeader (headers, key, val) {
  if (val && (typeof val === 'object' && !Array.isArray(val))) {
    throw new InvalidArgumentError(`invalid ${key} header`)
  } else if (val === undefined) {
    return
  }

  let headerName = headerNameLowerCasedRecord[key]

  if (headerName === undefined) {
    headerName = key.toLowerCase()
    if (headerNameLowerCasedRecord[headerName] === undefined && !isValidHTTPToken(headerName)) {
      throw new InvalidArgumentError('invalid header key')
    }
  }

  if (Array.isArray(val)) {
    const arr = []
    for (let i = 0; i < val.length; i++) {
      if (typeof val[i] === 'string') {
        if (!isValidHeaderValue(val[i])) {
          throw new InvalidArgumentError(`invalid ${key} header`)
        }
        arr.push(val[i])
      } else if (val[i] === null) {
        arr.push('')
      } else if (typeof val[i] === 'object') {
        throw new InvalidArgumentError(`invalid ${key} header`)
      } else {
        arr.push(`${val[i]}`)
      }
    }
    val = arr
  } else if (typeof val === 'string') {
    if (!isValidHeaderValue(val)) {
      throw new InvalidArgumentError(`invalid ${key} header`)
    }
  } else if (val === null) {
    val = ''
  } else {
    val = `${val}`
  }

  if (request.host === null && headerName === 'host') {
    if (typeof val !== 'string') {
      throw new InvalidArgumentError('invalid host header')
    }
    // Consumed by Client
    headers[kHost] = val
  } else if (request.contentLength === null && headerName === 'content-length') {
    headers[kContentLength] = parseInt(val, 10)
    if (!Number.isFinite(request.contentLength)) {
      throw new InvalidArgumentError('invalid content-length header')
    }
  } else if (request.contentType === null && headerName === 'content-type') {
    headers[kContentType] = val
    headers[kHeaders].push(key, val)
  } else if (headerName === 'transfer-encoding' || headerName === 'keep-alive' || headerName === 'upgrade') {
    throw new InvalidArgumentError(`invalid ${headerName} header`)
  } else if (headerName === 'connection') {
    const value = typeof val === 'string' ? val.toLowerCase() : null
    if (value !== 'close' && value !== 'keep-alive') {
      throw new InvalidArgumentError('invalid connection header')
    }

    if (value === 'close') {
      headers[kReset] = true
    }
  } else if (headerName === 'expect') {
    throw new NotSupportedError('expect header not supported')
  } else {
    request[kHeaders].push(key, val)
  }
}

Refs: #3994

@ronag ronag added Status: help-wanted This issue/pr is open for contributions enhancement New feature or request labels Jan 11, 2025
@mcollina
Copy link
Member

I would call it ImmutableRequestHeaders to not confuse it with the Headers object from the fetch spec.

@PandaWorker
Copy link

What do we need to solve this problem?

  1. Make a header structure and use it only inside the core
  2. When creating a core Request instance, immediately assemble the request and rewrite all the headers in advance (?connection, host, content-length, transfer-encoding) in order to only iterate over the headers and write them to the socket or cached h.toString() and use for performance
  3. remove more checks (
    socket.write(`${header}content-length: 0\r\n\r\n`, 'latin1')
    )
  4. Remove all unnecessary header checks, as we will have a structure that always returns an iterator of headers and a ready-made head row.
    if (Array.isArray(headers)) {

It turns out that all the request data will be prepared in advance and all that remains is to write them to the socket without unnecessary checks.

I want to hear your opinion on this

import { isValidHeaderName, isValidHeaderValue } from './utils.ts';

interface IHTTPHeader {
  name: string;
  values: string[];
};

interface IHTTPHeaders {
  // TODO:
}

const kHeadersMap = Symbol('headers map');
const kCachable = Symbol('cachable');

const cacheWMap: WeakMap<HTTPHeaders, string> = new WeakMap();

export class HTTPHeader implements IHTTPHeader {
  name: string;
  values: string[];

  constructor(name: string, value: string | string[]) {
    this.name = name;
    this.values = typeof value === 'string' ? [value] : value.slice();
  }

  append(value: string | string[]) {
    const { values } = this;
    if (typeof value === 'string') {
      values.push(value);
    } else {
      values.length = (values.length + value.length);
      for (let i = 0; i < value.length; i++) {
        values.push(value[i]);
      }
    }
  }

  clone() {
    return new HTTPHeader(this.name, this.values.slice());
  }

  * entries() {
    const { name: rawName, values } = this;
    const name = rawName.toLowerCase();

    if (values.length === 1) {
      yield [rawName, values[0]];
    }

    else if (name === 'set-cookie') {
      for (let index = 0; index < values.length; index++) {
        yield [rawName, values[index]];
      }
    }
    else {
      const separator = name === 'cookie' ? '; ' : ', ';
      yield [rawName, values.join(separator)];
    }
  }

  *[Symbol.iterator]() {
    return this.entries();
  }

  toString() {
    let raw = ``;
    for (const [name, value] of this.entries()) {
      raw += `${name}: ${value}\r\n`;
    }
    return raw;
  }
}

type HTTPHeadersInit = [string, string][] | Record<string, string> | Iterable<[string, string]> | HTTPHeaders | Headers;

export class HTTPHeaders implements IHTTPHeaders {
  [kCachable] = false;
  [kHeadersMap]: Map<string, IHTTPHeader> = new Map();

  static extend(h: HTTPHeaders, init: HTTPHeadersInit) {
    for (const [name, value] of new HTTPHeaders(init)) {
      h.append(name, value);
    }

    return h;
  }

  constructor(init?: HTTPHeadersInit) {
    if (!init || typeof init !== 'object') return;

    const map = this[kHeadersMap];

    if (init instanceof HTTPHeaders) {
      for (const [name, header] of init[kHeadersMap].entries()) {
        map.set(name, copyHeader(header));
      }
    }

    else if (Symbol.iterator in init) {
      for (const [name, value] of init) {
        this.append(name, value);
      }
    }

    else {
      for (const [name, value] of Object.entries(init)) {
        this.append(name, value);
      }
    }
  }

  clone() {
    return new HTTPHeaders(this);
  }

  extend(init: HTTPHeadersInit = {}) {
    const h = this.clone();
    for (const [name, value] of new HTTPHeaders(init)) {
      h.append(name, value);
    }
    return h;
  }

  assign(...init: HTTPHeadersInit[]) {
    const h = this.clone();

    for (const entry of init) {
      for (const [name, value] of new HTTPHeaders(entry)) {
        h.append(name, value);
      }
    }

    return h;
  }

  set(name: string, value: string, raw: boolean = true) {
    name = name.trim();
    if (!isValidHeaderName(name)) {
      throw new Error(`"${name}" is an invalid header name.`);
    }

    value = value.trim();
    if (!isValidHeaderValue(value)) {
      throw new Error(`"${value}" is an invalid header value.`);
    }

    const lower = name.toLowerCase();
    const map = this[kHeadersMap];

    map.set(lower, new HTTPHeader(raw ? name : lower, value));

    return this[kCachable] ? this.cache() : this;
  }

  append(name: string, value: string, raw: boolean = true) {
    const lower = name.toLowerCase();
    const map = this[kHeadersMap];

    // Проверяем наличие
    if (map.has(lower)) {
      value = value.trim();
      if (!isValidHeaderValue(value)) {
        throw new Error(`"${value}" is an invalid header value.`);
      }

      appendHeaderValues(map.get(lower)!, value);
    } else {
      name = name.trim();
      if (!isValidHeaderName(name)) {
        throw new Error(`"${name}" is an invalid header name.`);
      }

      value = value.trim();
      if (!isValidHeaderValue(value)) {
        throw new Error(`"${value}" is an invalid header value.`);
      }
      map.set(lower, new HTTPHeader(raw ? name : lower, value));
    }

    return this[kCachable] ? this.cache() : this;
  }

  has(name: string) {
    return this[kHeadersMap].has(name.toLowerCase());
  }
  get(name: string) {
    return this[kHeadersMap].get(name.toLowerCase());
  }
  delete(name: string) {
    const deleted = this[kHeadersMap].delete(name.toLowerCase());
    if (this[kCachable]) this.cache();
    return deleted;
  }

  *entries(raw: boolean = true) {
    for (const [name, header] of this[kHeadersMap]) {
      const { name: rawName, values } = header;
      const headerName = raw ? rawName : name;

      if (values.length === 1) {
        yield [headerName, values[0]];
      }

      else if (name === 'set-cookie') {
        for (let index = 0; index < values.length; index++) {
          yield [headerName, values[index]];
        }
      }
      else {
        const separator = name === 'cookie' ? '; ' : ', ';
        yield [headerName, values.join(separator)];
      }
    }
  }

  keys() {
    return this[kHeadersMap].keys();
  }

  values(name?: string) {
    if (!name) {
      return this[kHeadersMap].values().map(h => h.values)!;
    } else {
      return this[kHeadersMap].get(name)?.values ?? [];
    }
  }

  getSetCookie() {
    return (this.values('set-cookie') ?? []) as string[];
  }

  [Symbol.iterator]() {
    return this.entries();
  }

  cache() {
    this[kCachable] = true;
    cacheWMap.set(this, this.build());
    return this;
  }

  build() {
    let raw = ``;
    for (const [name, value] of this.entries()) {
      raw += `${name}: ${value}\r\n`;
    }
    return raw;
  }

  toString() {
    return cacheWMap.has(this) ? cacheWMap.get(this)! : this.build();
  }
}


export function createHeader(name: string, value: string | string[]): IHTTPHeader {
  return new HTTPHeader(name, value);
}

export function appendHeaderValues(h: IHTTPHeader, value: string | string[]) {
  const { values } = h;
  if (typeof value === 'string') {
    values.push(value);
  } else {
    values.length = (values.length + value.length);
    for (let i = 0; i < value.length; i++) {
      values.push(value[i]);
    }
  }
}

export function copyHeader(h: IHTTPHeader) {
  const { name, values } = h;
  return new HTTPHeader(name, values.slice());
}

@PandaWorker
Copy link

Caching the finished header string greatly reduces the load

import { bench, run, summary } from 'mitata';
import { HTTPHeaders } from './HTTPHeaders.ts';

const arr = Array.from({ length: 20 }, (_, i) => [`x-user-${i}`, `${i}`]);

const h = new HTTPHeaders(arr);
const h2 = h.clone().cache();

summary(() => {
  bench('without cache', () => {
    h.toString();
  });

  bench('with cache', () => {
    h2.toString();
  });

  bench('forof', () => {
    let a = '';
    for (const [name, value] of h) {
      a += `${name}: ${value}\r\n`;
    }
  });

  bench('forof noop', () => {
    for (const [name, value] of h) {}
  });
});

run();
clk: ~2.99 GHz
cpu: Apple M1 Max
runtime: node 22.12.0 (arm64-darwin)

benchmark                   avg (min … max) p75   p99    (min … top 1%)
------------------------------------------- -------------------------------
without cache                  1.05 µs/iter   1.05 µs       ▂▄█▅▃ ▂        
                        (1.02 µs … 1.09 µs)   1.08 µs ▅▂▇▆▄▂█████▇█▅▄▁▂▂▁▁▂
                  5.50 ipc (  2.53% stalls)    NaN% L1 data cache
          3.50k cycles  19.22k instructions   0.00% retired LD/ST (   0.00)

with cache                     8.41 ns/iter   8.35 ns  █                   
                       (8.22 ns … 40.03 ns)   9.94 ns ▁██▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
                  6.11 ipc (  0.00% stalls)    NaN% L1 data cache
          27.91 cycles  170.57 instructions   0.00% retired LD/ST (   0.00)

forof                          1.05 µs/iter   1.05 µs      ▆█              
                        (1.02 µs … 1.14 µs)   1.10 µs ▃▅▂▃▃███▇▇▄▁▁▁▁▁▁▁▁▁▂
                  5.47 ipc (  2.45% stalls)    NaN% L1 data cache
          3.49k cycles  19.09k instructions   0.00% retired LD/ST (   0.00)

forof noop                   719.00 ns/iter 722.48 ns  █▂ ▄▂               
                    (707.91 ns … 767.65 ns) 744.45 ns ▆██▇██▇▆▃▃▄▅▃▃▄▂▃▂▃▂▂
                  4.86 ipc (  3.55% stalls)    NaN% L1 data cache
          2.37k cycles  11.54k instructions   0.00% retired LD/ST (   0.00)

summary
  with cache
   85.5x faster than forof noop
   124.41x faster than without cache
   124.72x faster than forof

@mcollina
Copy link
Member

That's the approach I use in https://github.com/mcollina/autocannon.

@PandaWorker
Copy link

we need a similar approach, only write the request body separately, since there may be large buffer, streams, generators, and so on that do not need to be dumped into memory by default.

@metcoder95
Copy link
Member

Like the idea/approach for the immutable headers concept; one thing that grasped my attention was the each header being represented with a class. Do we need such fine grain representation?

Overall, we can target first undici as is and later on attempt to hop fetch somehow (we can provide a layer of portability between ImmutableHeaders and the Headers spec without needing to represent each entry as individual structure), especially as shown that constructed as string provides performance benefits.

@PandaWorker
Copy link

I totally agree with you, we can just add caching to the current headers implementation. And we won't have to change much.

When assembling a request, it would be good for us to bring the headers to the same type (HTTP Headers) and then use the methods of this structure, adding system headers (host, connection, content-length, ...) and so on.

@mcollina
Copy link
Member

Overall, we can target first undici as is and later on attempt to hop fetch somehow (we can provide a layer of portability between ImmutableHeaders and the Headers spec without needing to represent each entry as individual structure), especially as shown that constructed as string provides performance benefits.

The WHATWG-compliant Headers implementation is slow, we can't really use for any internals processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Status: help-wanted This issue/pr is open for contributions
Projects
None yet
Development

No branches or pull requests

4 participants