Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Interactive Form Dictionary (PDF 1.7 Section 12.7.2) & Appearance Streams (12.5.5) #25

Merged
merged 25 commits into from
Apr 30, 2024

Conversation

tingerrr
Copy link
Contributor

This PR adds the writers and types for writing the document catalog /AcroForms dictionary, as well as appearance streams. This PR also cleans up and unifies the API and documentation of #20, #21, #22 and #23.

It finalizes the support for interactive forms (save for signature fields), allowing work on typst to begin.

I left out signature fields because the implementation and review should happen in isolation. I want to make sure the implementation of those is sound, as it comes with a higher risk.

src/forms.rs Outdated Show resolved Hide resolved
@tingerrr
Copy link
Contributor Author

I should also note that this has not been resolved and should be before this is merged:

pdf-writer/src/forms.rs

Lines 292 to 300 in 28b79dd

// TODO: the spec likely means the equivalent of unicode graphemes here
// for characters
/// Write the `/MaxLen` attribute to set the maximum length of the field's
/// text in characters. Only permissible on text fields.
pub fn text_max_len(&mut self, len: i32) -> &mut Self {
self.dict.pair(Name(b"MaxLen"), len);
self
}

@tingerrr
Copy link
Contributor Author

tingerrr commented Oct 26, 2023

@reknih I'll leave this in a draft to avoid that this is merged before all important points have been resolved. But I'd say it's ready for review. I want to actually try this out before it merged too, I'll mark it as ready when I added an example.

@tingerrr
Copy link
Contributor Author

tingerrr commented Oct 26, 2023

Blocked by #26.

@tingerrr
Copy link
Contributor Author

Currently, Annotation::appearance_state takes a Name, while Field::radio_value takes a RadioState which resolves to either to a custom /Name which is unique (or should be) across its child widget annotations, or /Off. I'm thinking that using RadioState for the annotation feels a bit like leaking the radio button impl there.

I found a few documentation errors that I fixed locally but haven't pushed yet.

Overall, I'm stuck on testing behavior, which just like the appearance is wildly different for each PDF viewer.

@tingerrr
Copy link
Contributor Author

tingerrr commented Nov 3, 2023

It appears that the optional merging of widget annotations and terminal fields (Section 12.5.6.19) is, in fact, not optional, but required by most viewers to make radio buttons work. This means that fields need an additional function that turns them into Annotations.

So while the PR is technically spec compliant so far, almost no reader will accept the fields if you don't merge them with their annots. Thankfully most radio specific attributes are easy to write to an annotation using Deref<Target = Dict>, but it's not pretty to have to fall back to this to get a working PDF.

@tingerrr
Copy link
Contributor Author

tingerrr commented Nov 8, 2023

I have rebased to incorporate #26. The following decisions must be made before merge:

  • Should /NeedAppearances be included (see my review comment)
  • What about this, I couldn't find what they mean by a character here, but it's probably not the rust equivalent pdf-writer/src/forms.rs#L292-L300
  • Are we so far ok with the API, i.e. using prefixes for specializations of methods, etc.

With the changes in this PR (and #26) pdf-writer is able to create PDF form fields which are accepted by the most common readers:

Full support:

  • PDF Acrobat

Almost Full support:

  • chromium
  • firefox

Partial support:

  • evince
  • okular

The lack of support seems to be a reader problem as, for some form field configurations neither forms created in accordance to the spec, not those created by Acrobat Pro behaved correctly, but had somewhat correct appearance.

I can add an example if that is desired.

@tingerrr tingerrr marked this pull request as ready for review November 8, 2023 20:03
@reknih
Copy link
Member

reknih commented Nov 13, 2023

  • Did you need NeedAppearances for any of your form test implementations?
  • I think they mean the number of bytes written, because section 7.2.2 states that "[regular characters] include bytes that are outside the ASCII character set", implying bytes are characters. This is also the reasonable interpretation that will fit the least characters. Have you investigated how readers interpret this, i.e. with four-byte (in UTF-8) CJK or emojis?
  • I'm happy with the API!

An example would be good, I think.

@tingerrr
Copy link
Contributor Author

tingerrr commented Nov 13, 2023

  • Did you need NeedAppearances for any of your form test implementations?

The PDF generated by Adobe Acrobat Pro did not use it anywhere as it came with default appearances. And I would generally also provide default appearances in the typst impl later on anyway.

  • Have you investigated how readers interpret this, i.e. with four-byte (in UTF-8) CJK or emojis?

It seems to depend on what the implementation sees as a character, for chromium and firefox with /MaxLen 1 it seems to work like this:

descr input bytes allowed
2 byte latin sequence á c3 a1 yes
3 byte cjk sequence e5 8e 9f yes
4 byte single emoji sequence 🏳 f0 9f 8f b3 no
12 byte composite emoji sequence 🏳️‍🌈 f0 9f 8f b3, ef b8 8f e2, 80 8d f0 9f, 8c 88 no

So it's neither a byte nor a codepoint nor a grapheme?

  • I'm happy with the API!

I'll add some additional methods I noticed would be handy when preparing the example.

EDIT: see #27 (comment) regarding the API, specifically subtypes vs method prefixes.

@laurmaedje
Copy link
Member

laurmaedje commented Nov 13, 2023

Is it perhaps UTF-16 len or something like that? I believe PdfDocEncoding is UTF-16.

@tingerrr
Copy link
Contributor Author

tingerrr commented Nov 13, 2023

Is it perhaps UTF-16 len or something like that? I believe PdfDocEncoding is UTF-16.

Wouldn't that disallow a 3-byte sequence?
It seems the encoded string was UTF-16BE, one of the valid encodings for text strings, in which is just a single code point (539f), whereas 𘬀 is two (d822, fd00), which does not work despite being a single grapheme so that clearly answers my question.

@reknih
Copy link
Member

reknih commented Nov 17, 2023

  • Did you need NeedAppearances for any of your form test implementations?

The PDF generated by Adobe Acrobat Pro did not use it anywhere as it came with default appearances. And I would generally also provide default appearances in the typst impl later on anyway.

Okay, so we don't plan to use it, neither does Acrobat and its getting deprecated. I say drop it.

@tingerrr tingerrr marked this pull request as draft November 19, 2023 17:22
@tingerrr
Copy link
Contributor Author

tingerrr commented Nov 25, 2023

After some painful debugging, I got radio buttons to also work, I've added Annotation::page while testing, it's required for some annotation actions, none that we need, but we may as well add it.

Something that doesn't work however, but is an upstream issue, is that for checkboxes pdf.js will respect the RADIOS_IN_UNISON flag, but not for radio buttons. pdfium handles this correctly, and so does Acrobat.

Dropdowns also work as expected. The following work remains:

  • Test more complex appearance streams
  • Test printing
  • Signature fields (likely a different PR as stated before)

You can view the new example to generate a PDF with some form fields. I'll change to use the writer.finish() version once I've tested everything to completion, this was easier for me to write. Feel free to test this in various readers, the feature set of forms is so large that I could've missed edge cases.

examples/forms.rs Outdated Show resolved Hide resolved
@tingerrr
Copy link
Contributor Author

tingerrr commented Jan 7, 2024

I have a hard time to find a consistent way to get appearances to show up in most viewers, it seems they almost all support XFA forms very well, but once normal forms are used they don't properly render the appearances if at all.

Okular refuses to not show fields with their names and a simple button, evince simply shows nothing making them impossible to find on a blank page, chrome renders at least some of them correctly, firefox kind of also provides its own defaults like okular does, but they fit better with the rest.

Sadly, it seems I haven't made much progress on this, and I can't test printing if I don't see anything in the PDF first.

@epsilonhalbe
Copy link

epsilonhalbe commented Jan 13, 2024

Hey I found some time to test this on Mac:

Review

Mac Preview/safari/Skim

image
  • radio buttons are buggy: the first radio also toggles the third one, which is usually caused by the first and third radio having the same label (I noticed that working with LaTeX)
  • buttons don't have any background set - so finding them is really tricky

firefox

image
  • radio buttons work independently, but look like odd googly eyes
  • the reset button appears slightly yellow on hover but is still hard to discover

chrome/microsoft edge (on mac)

image
  • radio buttons appear as rectangles and have checkmarks instead of radio button look, also first and third button go on and off together
  • reset button has a border now

Thoughts

I think that most of the difficulty with the appearance is the fallback behaviour of each pdf reader - so I think the best way would be to find a good set of default values to set like border, background color etc. to avoid having to deal with fallbacks and defaults.

Instead of digging into the PDF standard I started looking at LaTeX - bc. I know the docs are usually okay and I found this PDF documenting the hyperref package - the relevant bits are in section 9 PDF and HTML. in combination with tex.stackoverflow I was able to implement some forms in the past.

If it helps I can create some sample pdfs that have a certain behaviour/look to have a target implementation to aim for.

@tingerrr
Copy link
Contributor Author

radio buttons are buggy: the first radio also toggles the third one, which is usually caused by the first and third radio having the same label (I noticed that working with LaTeX)

This is intentional, I've used the RADIOS_IN_UNISON flag, meaning that those with the same appearance stream names (1 and 3) refer to the same value, the behavior of Firefox is actually buggy here.

I think that most of the difficulty with the appearance is the fallback behaviour of each pdf reader - so I think the best way would be to find a good set of default values to set like border, background color etc. to avoid having to deal with fallbacks and defaults.

Yeah, that's what I've tried around with, but I couldn't get it to work, I assume I need to read up more on general annotation appearance.

If it helps I can create some sample pdfs that have a certain behaviour/look to have a target implementation to aim for.

I wouldn't mind, every PDF helps to get a clearer picture!

Thank you for testing it!

@epsilonhalbe
Copy link

epsilonhalbe commented Jan 14, 2024 via email

@tingerrr
Copy link
Contributor Author

I'll likely come back to this when I have more time to work on forms. Currently, typst-test and university are taking up most of my time.

awehrfritz added a commit to awehrfritz/pdf-writer that referenced this pull request Feb 12, 2024
@tingerrr
Copy link
Contributor Author

tingerrr commented Mar 25, 2024

The last commit adds some examples for specifying appearances, unfortunately there are far too many ways to do this to meaningfully cover them in the forms example.

While the checkboxes may not look very different, this is actually now an explicit implementation replicating the chrome default behavior (although with slightly different spacing).

Unfortunately, evince still refuses to render my checkboxes correctly, whereas the example file I looked at works well on both chrome and evince. This is basically my last pain point, I'm not sure what's missing for evince to render this correctly.

EDIT: This now also works on evince, the lack of required bbox for the FormXObject threw it off. Okular behaves the same way as my reference file, which is unfortunately not very well, but it is according to reference, so I'm tempted to blame okular here.

@tingerrr
Copy link
Contributor Author

I think the example needs some polishing, what vexes me is how much redundant information has to be inserted for most viewers to accept this.

It seems, though, that the implementation itself is sufficiently complete to be able to start work on the typst side for the common field types.

I'll mark this ready for review.

@tingerrr tingerrr marked this pull request as ready for review March 25, 2024 17:54
src/forms.rs Outdated Show resolved Hide resolved
examples/forms.rs Outdated Show resolved Hide resolved
examples/forms.rs Outdated Show resolved Hide resolved
src/forms.rs Outdated Show resolved Hide resolved
@laurmaedje laurmaedje requested a review from reknih April 30, 2024 12:15
@reknih reknih merged commit f4a55a9 into typst:main Apr 30, 2024
1 check passed
@tingerrr tingerrr deleted the interactive-forms branch April 30, 2024 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants