Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add rehype-semantic-blockquotes plugin #178

Merged
merged 3 commits into from
Oct 2, 2024

Conversation

nikitarevenco
Copy link
Contributor

@nikitarevenco nikitarevenco commented Sep 25, 2024

Initial checklist

  • I read the support docs
  • I read the contributing guide
  • I agree to follow the code of conduct
  • I searched issues and couldn’t find anything (or linked relevant results below)
  • If applicable, I’ve added docs and tests

Description of changes

Initially I submitted this to remark, but I was redirected to submit it here, so I converted it to rehype

With this config it will transform the markdown

import rehypeFormat from "rehype-format";
import rehypeSemanticBlockquotes from "rehype-semantic-blockquotes";
import rehypeStringify from "rehype-stringify";
import remarkParse from "remark-parse";
import remarkRehype from "remark-rehype";
import { unified } from "unified";

const doc = `
> Better to admit you walked through the wrong door than spend your life in the wrong room.
>
> @ [Josh Davis](https://somewhere.com)
`;

const file = String(
  await unified()
    .use(remarkParse)
    .use(remarkRehype)
    .use(rehypeSemanticBlockquotes)
    .use(rehypeStringify)
    .use(rehypeFormat) // for demonstration purposes only
    .process(doc),
);

console.log(file);

Into this

<figure data-blockquote-contaienr="">
  <blockquote data-blockquote-content="">
    <p>
      Better to admit you walked through the wrong door than spend your life in
      the wrong room.
    </p>
  </blockquote>
  <figcaption data-blockquote-credit="">
    <p><a href="https://somewhere.com">Josh Davis</a></p>
  </figcaption>
</figure>

Repo: https://github.com/nikitarevenco/rehype-semantic-blockquotes

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Sep 25, 2024
@ChristianMurphy
Copy link
Member

ChristianMurphy commented Sep 26, 2024

Thanks for sharing @nikitarevenco! 🙇

General feedback on what gets included in the plugins list:

Some criteria to include packages in this list.

  • Some documentation with at least some instructions on how to use the package.
  • A CI job to run tests. (The tests are already there.)
  • A prepack script and CI job to build the types. (The JSDoc types are already there.)
  • The package should be published to npm.

source: syntax-tree/hast#24 (comment)

For your plugin I'm seeing:

  • Documentation
  • CI to run lint/tests
  • Job to generate types? (It looks like the types are hard coded, rather than generated from TS or JSDoc?)
  • Published to NPM

More specific comments.

  1. I'd highly recommend asserting something in the tests, and checking for different edge cases https://github.com/nikitarevenco/rehype-semantic-blockquotes/blob/c58fd80635318d6c4d19a82620b47d9dbadde21b/test.js#L1-L24, node:test offers a good framework for writing tests https://nodejs.org/api/test.html
  2. The security statement (https://github.com/nikitarevenco/rehype-semantic-blockquotes/blob/c58fd80635318d6c4d19a82620b47d9dbadde21b/README.md?plain=1#L183-L187) is a bit misleading, the plugin does involve rehype, and when documented without a sanitize step, a modified version of the readme allows for scripting https://stackblitz.com/edit/github-ztazsu?file=src%2Fmain.ts

@nikitarevenco
Copy link
Contributor Author

nikitarevenco commented Sep 27, 2024

Thanks for sharing @nikitarevenco! 🙇

General feedback on what gets included in the plugins list:

Some criteria to include packages in this list.

  • Some documentation with at least some instructions on how to use the package.
  • A CI job to run tests. (The tests are already there.)
  • A prepack script and CI job to build the types. (The JSDoc types are already there.)
  • The package should be published to npm.

source: syntax-tree/hast#24 (comment)

For your plugin I'm seeing:

* [x]  Documentation

* [ ]  CI to run lint/tests

* [ ]  Job to generate types? (It looks like the types are hard coded, rather than generated from TS or JSDoc?)

* [x]  Published to NPM

More specific comments.

1. I'd highly recommend asserting something in the tests, and checking for different edge cases https://github.com/nikitarevenco/rehype-semantic-blockquotes/blob/c58fd80635318d6c4d19a82620b47d9dbadde21b/test.js#L1-L24, `node:test` offers a good framework for writing tests https://nodejs.org/api/test.html

2. The security statement (https://github.com/nikitarevenco/rehype-semantic-blockquotes/blob/c58fd80635318d6c4d19a82620b47d9dbadde21b/README.md?plain=1#L183-L187) is a bit misleading, the plugin does involve `rehype`, and when documented without a sanitize step, a modified version of the readme allows for scripting https://stackblitz.com/edit/github-ztazsu?file=src%2Fmain.

thank you for the review! changes:

  • added tests using node:test and node:assert
  • CI to run said tests
  • remove misleading section about security
  • types are now generated using tsc command automatically using prepare npm script

@wooorm
Copy link
Member

wooorm commented Sep 27, 2024

Very nice!


I do recommend, in the readme, after your “For instance, by turning the following syntax:” line and corresponding markdown code block, to also show some HTML code. Such as:

…which is equivalent to the following HTML code:

```html
<blockquote>
  <p>We cannot solve our problems with the same thinking we used when we created them.</p>
  <p>@ Albert Einstein</p>
</blockquote>
```

People can write HTML in markdown. Which will work with this plugin and rehype.
People can use this plugin without markdown: they can write HTML, and this plugin and rehype will work.
It’s understandable that you focus on markdown because that’s your use case. It may also very well be that most people who will use this plugin will use markdown. But it’s still good to at least balance the markdown and HTML.


I’d recommend not using words such as “easy” (easily). It may be easy to you, but not to your reader. See https://css-tricks.com/words-avoid-educational-writing/ for more info.


I see some things that TS should normally catch in your code. Perhaps turn on strict in your tsconfig.json? Here’s how our config typically looks: https://github.com/remarkjs/remark-rehype/blob/main/tsconfig.json.


With rehype, you do not need to generate MDX custom nodes. rehype is used inside MDX. You can generate elements. No isMdx is needed.


I recommend some tests for different HTML structures. You are now heavily basing things on what remark-rehype generates. But consider that people might minify HTML. Or write HTML manually. Or when people do > **@ something** or > @ _this_ (it is not guarantueed that a node is a text node).
I strongly recommend against tests such as blockquote.children.length <= 4. Instead, use https://github.com/syntax-tree/hast-util-whitespace to check if the tail node (the last child of blockquote) is a text node that can be ignored (if it is such whitespace).

Something like:

/**
 * @import {Root} from 'hast'
 */

import {whitespace} from 'hast-util-whitespace'
import {visit} from 'unist-util-visit'

// Get `tree` somehow.
const tree = /** @type {Root} */ (/** @type {unknown} */ (undefined))

visit(tree, 'element', (blockquote, index, parent) => {
  if (!parent || typeof index !== 'number') return

  if (blockquote.tagName !== 'blockquote') return

  let tailIndex = blockquote.children.length - 1

  while (tailIndex > -1 && whitespace(blockquote.children[tailIndex])) {
    tailIndex--
  }

  const tail = blockquote.children[tailIndex]

  // Tail must be a `p`.
  if (!tail || tail.type !== 'element' || tail.tagName !== 'p') return

  // Depends what you want with `> *@* this`, `> @ *this*`, `> *@ this*`.
  // Tail must be a `p`.
  const tailText = tail.children[0]
  if (tail.children.length !== 1 || tailText.type !== 'text') return

  // Tail text must start with prefix.
  if (!tailText.value.startsWith(opts.syntax)) return

  parent.children[index] = {
    type: 'element',
    tagName: 'figure',
    properties: {[opts.figure]: ''},
    children: [
      {
        type: 'element',
        tagName: 'blockquote',
        properties: {[opts.blockquote]: ''},
        children: blockquote.children.slice(0, tailIndex)
      },
      {
        type: 'element',
        tagName: 'figcaption',
        properties: {[opts.figcaption]: ''},
        children: [
          {
            type: 'element',
            tagName: 'p',
            properties: {},
            children: [
              {type: 'text', value: tailText.value.slice(opts.syntax.length)}
            ]
          }
        ]
      }
    ]
  }
})

In the hast, camelcased properties are used:

  • data-blockquote-container -> dataBlockquoteContainer
  • data-blockquote-content -> dataBlockquoteContent
  • data-blockquote-credit -> dataBlockquoteCredit

See: https://github.com/syntax-tree/hast#propertyname

@nikitarevenco
Copy link
Contributor Author

nikitarevenco commented Sep 30, 2024

Very nice!

I do recommend, in the readme, after your “For instance, by turning the following syntax:” line and corresponding markdown code block, to also show some HTML code. Such as:

…which is equivalent to the following HTML code:

```html
<blockquote>
  <p>We cannot solve our problems with the same thinking we used when we created them.</p>
  <p>@ Albert Einstein</p>
</blockquote>

People can write HTML _in_ markdown. Which will work with this plugin and rehype. People can use this plugin _without_ markdown: they can write HTML, and this plugin and rehype will work. It’s understandable that you focus on markdown because that’s your use case. It may also very well be that most people who will use this plugin will use markdown. But it’s still good to at least balance the markdown and HTML.

I’d recommend not using words such as “easy” (`easily`). It may be easy to you, but not to your reader. See https://css-tricks.com/words-avoid-educational-writing/ for more info.

I see some things that TS should normally catch in your code. Perhaps turn on `strict` in your `tsconfig.json`? Here’s how our config typically looks: https://github.com/remarkjs/remark-rehype/blob/main/tsconfig.json.

With rehype, you do not need to generate MDX custom nodes. rehype is used inside MDX. You can generate `elements`. No `isMdx` is needed.

I recommend some tests for different HTML structures. You are now heavily basing things on what `remark-rehype` generates. But consider that people might minify HTML. Or write HTML manually. Or when people do `> **@ something**` or `> @ _this_` (it is not guarantueed that a node is a text node). I strongly recommend against tests such as `blockquote.children.length <= 4`. Instead, use https://github.com/syntax-tree/hast-util-whitespace to check if the tail node (the last child of `blockquote`) is a text node that can be ignored (if it is such whitespace).

Something like:

```js
/**
 * @import {Root} from 'hast'
 */

import {whitespace} from 'hast-util-whitespace'
import {visit} from 'unist-util-visit'

// Get `tree` somehow.
const tree = /** @type {Root} */ (/** @type {unknown} */ (undefined))

visit(tree, 'element', (blockquote, index, parent) => {
  if (!parent || typeof index !== 'number') return

  if (blockquote.tagName !== 'blockquote') return

  let tailIndex = blockquote.children.length - 1

  while (tailIndex > -1 && whitespace(blockquote.children[tailIndex])) {
    tailIndex--
  }

  const tail = blockquote.children[tailIndex]

  // Tail must be a `p`.
  if (!tail || tail.type !== 'element' || tail.tagName !== 'p') return

  // Depends what you want with `> *@* this`, `> @ *this*`, `> *@ this*`.
  // Tail must be a `p`.
  const tailText = tail.children[0]
  if (tail.children.length !== 1 || tailText.type !== 'text') return

  // Tail text must start with prefix.
  if (!tailText.value.startsWith(opts.syntax)) return

  parent.children[index] = {
    type: 'element',
    tagName: 'figure',
    properties: {[opts.figure]: ''},
    children: [
      {
        type: 'element',
        tagName: 'blockquote',
        properties: {[opts.blockquote]: ''},
        children: blockquote.children.slice(0, tailIndex)
      },
      {
        type: 'element',
        tagName: 'figcaption',
        properties: {[opts.figcaption]: ''},
        children: [
          {
            type: 'element',
            tagName: 'p',
            properties: {},
            children: [
              {type: 'text', value: tailText.value.slice(opts.syntax.length)}
            ]
          }
        ]
      }
    ]
  }
})

In the hast, camelcased properties are used:

* `data-blockquote-container` -> `dataBlockquoteContainer`

* `data-blockquote-content` -> `dataBlockquoteContent`

* `data-blockquote-credit` -> `dataBlockquoteCredit`

See: https://github.com/syntax-tree/hast#propertyname

Okay, so I have:

  • added tests for pure HTML and not just markdown.
  • changed property names
  • remove wording like "easily"
  • using full-on typescript now with strict option set
  • not using isMdx anymore (thank you for this!)

I don't really want to let syntax like this: > **@ hello world** be accepted, it complicates things imo and people can just use > @ **hello world** and it'll be the same, the @ is going to be removed regardless

what do you think?

Copy link
Member

@wooorm wooorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks great new!! ✨

One nit about CI failing

doc/plugins.md Outdated
Comment on lines 185 to 186
* [`rehype-semantic-blockquotes`](https://github.com/nikitarevenco/rehype-semantic-blockquotes)
— extend blockquote syntax to make it simple to mention/cite sources in a semantically correct way
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [`rehype-semantic-blockquotes`](https://github.com/nikitarevenco/rehype-semantic-blockquotes)
extend blockquote syntax to make it simple to mention/cite sources in a semantically correct way
* [`rehype-semantic-blockquotes`](https://github.com/nikitarevenco/rehype-semantic-blockquotes)
new blockquote syntax to more simply mention/cite sources

How about this? A bit shorter.

If you think the semantically correct part is important, you could add it. Just make sure it line wraps correctly (CI tells you about problems: npm install && npm test!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you! I have slightly reworded it to make sure it's under 80 chars with inspiration from your suggestion

@codecov-commenter
Copy link

codecov-commenter commented Oct 1, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (9bc5528) to head (028b770).
Report is 15 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #178   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            6         6           
  Lines          154       154           
=========================================
  Hits           154       154           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wooorm wooorm merged commit e95d0c0 into rehypejs:main Oct 2, 2024
2 checks passed
@wooorm wooorm added 📚 area/docs This affects documentation 💪 phase/solved Post is done labels Oct 2, 2024

This comment has been minimized.

@github-actions github-actions bot removed the 🤞 phase/open Post is being triaged manually label Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📚 area/docs This affects documentation 💪 phase/solved Post is done
Development

Successfully merging this pull request may close these issues.

4 participants