Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for woff2 via an addon #7693

Merged
merged 6 commits into from
Apr 7, 2025
Merged

Add support for woff2 via an addon #7693

merged 6 commits into from
Apr 7, 2025

Conversation

davepagurek
Copy link
Contributor

@davepagurek davepagurek commented Apr 1, 2025

While Typr supports woff, it does not yet support woff2 because those need to be decompressed. We currently aren't thinking of adding built-in support for that because the package size would be on the large side.

In order to support this in an addon, I've restructured the font parsing a bit so that parseFontData can be overridden more easily in an addon to add the decompression step.

I've started an addon here: https://github.com/davepagurek/p5.woff2/blob/main/src/p5.woff2.js

  • brotli.js kept decompressing the fonts as 0 bytes. not sure why.
  • I briefly looked into using a bundler to get a wasm version working; all were sort of annoying for now or were packaged in a way that'd need patching for non-node
  • I think this wawoff2 library could work in a bundler, but I didn't want to get hung up on that for now, so it's loaded in a very hacky way inspired by this codepen demo but updated to not add stuff to the global namespace that p5 then breaks on

Within this repo, some other things need some work before this can be called done. Most notably, when loading a Google Fonts CSS file, there are usually a lot of fonts in it, split up by character set. For example try going to this one: https://fonts.googleapis.com/css2?family=Montserrat:ital,wght@0,100..900;1,100..900&display=swap In it, the Latin characters are last. For a quick demo I'm grabbing the last parsable font, but ideally we'd have a way to combine multiple CSS fonts into one p5 font.

Demo: https://editor.p5js.org/davepagurek/sketches/tXna970AJ

@davepagurek davepagurek requested review from dhowe and ksen0 April 1, 2025 21:40
@davepagurek
Copy link
Contributor Author

Looks like this breaks some tests, I'll have to look into that first!

@davepagurek
Copy link
Contributor Author

davepagurek commented Apr 3, 2025

ok the bug was from me trying to clean up some stuff and renaming variables but forgetting one spot. thanks, automated tests! I updated the build in the demo sketch too if you want to play around with that @ksen0.

Edit: I've also updated my implementation of p5.woff2 to be a little less sketchy. I wasn't able to just import the existing wawoff2 library, as it just isn't set up in a way that works with the browser, so I ended up copying their wasm decompressor into the addon repo myself and then made the minor adjustments necessary to make it easily importable. The file is 322 KB in the end.

I ended up using wawoff2 because I saw that codepen demo that actually worked in the browser. It still ended up having to be modified to be bundlable in one file though, since it adds all its stuff to a global int he published version. @limzykenneth I initially had issues with Brotli.js not actually running in the browser, but probably that could still work if we also copy its source and make the necessary modifications, but I'm probably just going to leave it as is for now. If anyone's interested, feel free to look closer at that other library later though!

}
}
const fonts = await Promise.all(fontPromises);
return fonts[0]; // TODO: handle multiple faces?
return fonts.findLast(f => f.data) || fonts[0]; // TODO: handle multiple faces?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What speaks against an experimental (marked unstable) optional parameter to positionally select the font? So it's not semantic (not "latin" and not "italic" etc) but leaves it to the user making the call to specify which in the list they want to load? I think it's alright to keep it as kind of an advanced edge case use before a more legible way to load one of multiple fonts is possible, and perhaps better than hardcoding a positional index like this, which seems like something that might stick around for a long time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense, so it'd be like:

  • If you specify an index, we try to parse that index (and possibly fail if it's not something we can read), but at least then it's clear to the programmer which one we tried to read
  • If you don't specify an index:
    • For now we try to pick the most compatible one, by some logic
    • In the future we try to return a multi-font that has everything, so it would have everything we currently return, plus extra things. So I guess this wouldn't be a breaking change, as we're adding more support and not taking anything away?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial preference was to put this into an options object so we don't have too many positional parameters. It looks like we have an optional object parameter, though, that you can use to manually set font descriptors. I could sort of turn that into an options object, where specific keys are settings and the rest is font descriptors, e.g.:

const { index, ...descriptors } = options

so this would work to specify an index:

loadFont('path/to/css/font.css', { index: 3 })

or this would work to specify css font descriptors for a non-CSS font file:

loadFont('myfont.ttf', {
  fontFamily: '"Bricolage Grotesque", serif',
  fontOpticalSizing: 'auto',
  fontWeight: 400,
  fontStyle: 'normal',
  fontVariationSettings: '"wdth" 100',
})

Or you could supply both if you wanted to augment the descriptors in a CSS file and also specify an index:

loadFont('path/to/css/font.css', {
  index: 3,
  fontWeight: 400,
})

// parse the font data
let fonts = Typr.parse(result);

// TODO: generate descriptors from font in the future
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also referring to the MultiFont concept, so it would be character faces? Or also style? I really like that idea, would be a great issue / feature addition to make.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly things like a variant name if it exists, weight ranges included, variable ranges, and maybe character set?

@@ -966,13 +967,6 @@ function font(p5, fn) {
* // Some other forms of loading fonts:
* loadFont("https://fonts.googleapis.com/css2?family=Bricolage+Grotesque:opsz,[email protected],200..800&display=swap");
* loadFont(`@font-face { font-family: "Bricolage Grotesque", serif; font-optical-sizing: auto; font-weight: 400; font-style: normal; font-variation-settings: "wdth" 100; }`);
* loadFont({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took this out because I was copy-and-pasting this from an earlier comment, but looking at the code, I don't think this actually works on its own. We can flesh out this new usage pattern in examples later, since this current example was from me lazily showing just code and no canvas anyway.

@dhowe
Copy link
Contributor

dhowe commented Apr 5, 2025

Perhaps someone can do a quick summary of where we are with this issue?

Meantime, a few questions/comments:

  1. What is the size of the brotl decompression library? In npm a wasm version appears to be 14k, likely less if we remove the compression code, which argues I'd think for having it in the core (woff2 seems ubiquitous these days). Not sure if or how well this lib works, but the wawoff2 version appears to be 10x in size (do we know why?).
  2. I think I'd argue against the creation (at least by default) of combined fonts, as they may be quite large, to the point where sketches using them would not run in the browser (as an alternative see the FontSet idea below)
  3. I think I'd also argue against the index parameter from the font URL, first because its not clear to what it refers (at very least I think we'd want a better name) and second because the content of these font url's are not at all static (as per documentation), so an "index" could resolve to one character set one day and another the next
  4. I wonder if it might be possible instead to use the commented character set names ('cyrillic', 'latin', etc.) to identify the font, as these would are both meaningful and stable. Rather than choosing the first or the last we might default to 'latin' (or the user's language setting?) unless otherwise specified.
  loadFont('https://google/font/url', {
    charset: 'cyrillic',
    fontWeight: 400,
  })

    Another option would be some sort of FontSet object wrapping the various fonts. This could might as simple as an object with charset-names as keys (eg 'cyrillic') and p5.Font instances as values. Again we'd need to be aware of resource-use here (memory, download-time) so would prob make sense to create these lazily

@ksen0
Copy link
Member

ksen0 commented Apr 5, 2025

This is looking really great as far as I can tell @davepagurek ! OK to merge from my POV.

Just adding a couple thoughts based on @dhowe 's points:

1 - Keeping it as a separate add-on for now allows adding it in 2.x, but adding it to core now would make it hard to remove. I don't know enough about the details of the size issue here but based on what I understand, it is not clear-cut / lightweight. So unless there is clear consensus it's lightweight enough I think it makes sense to keep separated at least initially.
2 - Makes sense, though I guess this is also about common usage. I think Google fonts browser makes it very easy to get these combined font links so realistically I would expect them to be used.
3 and 4 - my sense here is similar to 1. It doesn't seem like there's a super obvious clear-cut right answer here so starting with index as a positional parameter (even if unstable, the element is a list, so it is expected to "work") for optional/experimental use still leaves room to create a more understandable signature in the future?

Thinking from the point of resource use, does it make sense to generate a warning or FES nudge to not use font sets but only individual fonts? So not really supporting the usage that would require resolving 3/4.

That's just how I'm thinking of this, I'm happy to support the opinions folks who have already been working on and thinking about the typography features in 2.0.

@limzykenneth
Copy link
Member

I'll have more time to follow up on this more closely next week but just to chime in on the woff2 library issue, the main problem is that woff2 is brotli compressed so we need to decompress it first. With brotli decompression, there are both WASM and pure JS solution available and their size are actually not that different, the main reason is that to do brotli decompression, it requires a lookup table/dictionary and that dictionary size is about 10k from what I can find. There are libraries that tries to compress and encode the dictionary but the size saving is still minimal.

If there is an implementation of brotli decompress that does not use the dictionary it may work but I don't know enough about brotli to know if it is even technically possible (ie. decompress without dictionary). There is the Compression Streams API but it does not support brotli and there's no note on when or if ever it will be supported.

@davepagurek
Copy link
Contributor Author

davepagurek commented Apr 6, 2025

Another option would be some sort of FontSet object wrapping the various fonts. This could might as simple as an object with charset-names as keys (eg 'cyrillic') and p5.Font instances as values. Again we'd need to be aware of resource-use here (memory, download-time) so would prob make sense to create these lazily

I think this is pretty much what I was suggesting for a multi-font to try to match the experience of loading a CSS font. Although there are some complications in p5, namely:

  • The await implies the whole font will be present afterwards, so just lazily downloading a sub-font probably won't suffice. We might need to eagerly download (which is async) but lazily decompress/parse (which is synchronous) so that you can continue to assume that e.g. after awaiting the load, you can immediately save a final image of your sketch.
  • Alternatively, a multi-font would have sub-promises that you could await, something like:
    const myMultiFont = await loadFont('...')
    const myFont = await myMultiFont.latin

4. I wonder if it might be possible instead to use the commented character set names ('cyrillic', 'latin', etc.) to identify the font, as these would are both meaningful and stable. Rather than choosing the first or the last we might default to 'latin' (or the user's language setting?) unless otherwise specified.

I would probably advise against basing the parsing off of the comments in a google font, since there's not really a spec for that, and it's also subject to change by Google. But, that said, we could define a few of these ourselves based on some subset (or all of?) the groups of unicode character ranges defined in the spec. This would require adding and maintaining a lookup table of these ranges, and parsing the ranges out of the CSS font descriptors. I think this would definitely be a better user experience than indexes, so I'll see how feasible that'd be today.

What is the size of the brotl decompression library? In npm a wasm version appears to be 14k, likely less if we remove the compression code, which argues I'd think for having it in the core (woff2 seems ubiquitous these days). Not sure if or how well this lib works, but the wawoff2 version appears to be 10x in size (do we know why?).

Right now the addon is 322kb, which like you said is because it uses wawoff2. A pretty big reason for not including this in p5 itself yet is because this is bigger than we'd like, but it's going to take some work to get the other libraries actually functioning in the browser, since none of them seem to be made for the browser yet.

@davepagurek
Copy link
Contributor Author

davepagurek commented Apr 6, 2025

ok, updated demo here: https://editor.p5js.org/davepagurek/sketches/9MJzIBMBW

Changes:

  • CSS font files with multiple included fonts now only load one font file, the "best match"
  • The best match is a combination of matching font descriptors first, + matching character sets next
  • To do matching character sets, I've added this list of named unicode group ranges that you can query with an included keyword, and then I use this small unicode range parsing library to count how many overlapping code points there are between your query + the font
  • We default to regular weight, non-italic, latin, overridden by anything else you specify
  • If nothing matching is present, we still just go with the last one included in the font. So e.g. if you load a font with only Korean characters and you don't specify a character set, it'll still load that font despite not having anything matching the default latin character set
  • You can pass in an array of queries, e.g. ['latin', 'latin extended'], but since many fonts split up latin and latin extended into two different font faces, it'll likely end up matching the one with JUST extended and not normal latin. I think the only real solution here is to do multi-fonts

Usage now looks something like this:

let font = await loadFont(
  'https://fonts.googleapis.com/css2?family=Montserrat:ital,wght@0,100..900;1,100..900&display=swap',
  { sets: 'latin' }
)

@dhowe
Copy link
Contributor

dhowe commented Apr 6, 2025

After quite some time (20 sec?) the font loads but I still see an error:

image

@davepagurek
Copy link
Contributor Author

In my experience the longest part right now is the CDN delivering the p5.woff2.js addon (not sure why but not the biggest deal, there are other CDNs and more optimization we can do on it later) and not the font loading itself.

For the error, is this just from running the demo with no code changes?

@dhowe
Copy link
Contributor

dhowe commented Apr 6, 2025

the key is, I think, to find the best default here that allows the vast majority of users not to have to think about any of this, but simply to get a font that works as expected

but for advanced usage, I like the sort of fuzzy matching idea to find the best fit - we might even allow users to specify 'unicode-range' in addition to the other css properties (I would prob argue against an option key like 'sets')

@davepagurek have you looked at what the size of the full fontset would be for the font above (and/or one with chinese chars) ?

@dhowe
Copy link
Contributor

dhowe commented Apr 6, 2025

In my experience the longest part right now is the CDN delivering the p5.woff2.js addon (not sure why but not the biggest deal, there are other CDNs and more optimization we can do on it later) and not the font loading itself.

For the error, is this just from running the demo with no code changes?

yes, only replacing the debugger line with a log of the font

@davepagurek
Copy link
Contributor Author

@davepagurek have you looked at what the size of the full fontset would be for the font above (and/or one with chinese chars) ?

In the previous demo build we were still eagerly fetching all the fonts in the CSS file:
image

Added up that comes down to 352 KB for Montserrat normal + italic.

For Noto Sans, it adds to 942 KB:
image

yes, only replacing the debugger line with a log of the font

Ah that's a p5 web editor bug, it's trying to stringify the whole font including parsed font data, which is too large of an array.

@davepagurek
Copy link
Contributor Author

Also, for what it's worth, Typr doesn't seem to be able to parse the Latin variant from Noto Sans (https://fonts.googleapis.com/css2?family=Noto+Sans:ital,wght@0,100..900;1,100..900&display=swap). So there may be some other Typr-related issues we have yet to address (maybe issues with image characters, like emoji, breaks it?)

@dhowe
Copy link
Contributor

dhowe commented Apr 6, 2025

Also, for what it's worth, Typr doesn't seem to be able to parse the Latin variant from Noto Sans (https://fonts.googleapis.com/css2?family=Noto+Sans:ital,wght@0,100..900;1,100..900&display=swap). So there may be some other Typr-related issues we have yet to address (maybe issues with image characters, like emoji, breaks it?)

is there a minimal test I can post as a Typr issue?


// TODO: handle multiple font faces?
sets = sets || ['latin']; // Default to latin for now if omitted
const requestedGroups = (sets instanceof Array ? sets : [sets])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davepagurek what if is array is empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone passes in sets: [] (as opposed to undefined, which gets the default latin) then it won't match anything, and will fall back to just the last font face included in the file. Probably fine for now?

@davepagurek
Copy link
Contributor Author

davepagurek commented Apr 7, 2025

@dhowe looks like it may not actually be Typr and it's just that some of its functionality assumes another library to be present, which we don't provide. I made another issue for that here: #7707

I've added to the docs that this sets feature is experimental, so we can feel free to change this in 2.x versions if we come up with a better API or a solution to the multi font case. I'm going to merge this in for now mostly for demo purposes so that we have some way of doing textToPoints on google fonts in 2.0.0, and we can make a new issue for figuring out how to handle Google fonts CSS files better going forward. (Edit: issue is here #7708)

@davepagurek davepagurek merged commit e0a3357 into dev-2.0 Apr 7, 2025
2 checks passed
@davepagurek davepagurek deleted the woff2 branch April 7, 2025 13:11
@dhowe
Copy link
Contributor

dhowe commented Apr 7, 2025

makes sense - how would you feel about calling the option 'charsets' instead ?

@davepagurek
Copy link
Contributor Author

I avoided that one for now because in HTML/CSS charset refers to something different, which is for things like ascii or utf-8, so I didn't want to accidentally have users confuse those. I do wish there were a better name though, this page calls them groups/block ranges, which still leaves something to be desired.

@dhowe
Copy link
Contributor

dhowe commented Apr 7, 2025

somehow 'sets' still seems a bit generic/unclear

I imagine that, assuming some mech for fuzzy matching, we would allow the user to specify unicode-range in addition to the other css properties - other options for the shorter name might be charset-name or unicode-tag or ... ?

Re: parsing woffs (or at least some woffs) there was just a response to my issue on Typr suggesting that the pako lib (~45k?) is needed to make this work

@davepagurek
Copy link
Contributor Author

Found some more things calling them "blocks" so that could be one option if we want to be more consistent with the outside world? or unicodeBlocks to be more verbose? Or since this is kind of a fuzzy match, maybe query?

Re: parsing woffs (or at least some woffs) there was just a response to my issue on Typr suggesting that the ~45k pako lib is needed to make this work

45k seems maybe small enough that it's worth adding. I guess some of these fonts have internal zlib compression?

@dhowe
Copy link
Contributor

dhowe commented Apr 7, 2025

45k seems maybe small enough that it's worth adding. I guess some of these fonts have internal zlib compression?

could prob be smaller still if we separate compress/decompress

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants