Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not displaying some text fields in tax document #4789

Open
sshock opened this issue Feb 1, 2025 · 16 comments
Open

Not displaying some text fields in tax document #4789

sshock opened this issue Feb 1, 2025 · 16 comments
Labels

Comments

@sshock
Copy link

sshock commented Feb 1, 2025

SumatraPDF version

  • 3.5.2 and pre-release

Describe the bug
Several fields in the attached document do not show up.

To Reproduce
Steps to reproduce the behavior:

  1. Open the attached document.
  2. Notice several boxes are empty, including TRUSTEE, RECIPIENT'S TIN, Gross distribution, Distribution code, and Account number.

Expected behavior
All the boxes mentioned above should have values in them, e.g., Gross distribution should have a 1234.56. These fields all show up fine in every other PDF viewer I have tested, including Firefox and Chrome.

File that reproduces the problem
See attached f1099sa.pdf.

Screenshots
This screenshot shows Firefox on the left and SumatraPDF on right, with areas highlighted in red to show where fields are missing.

Image

Additional context
It appears this document is using Widget annotations, and the ones that do not show up are the ones with an /AP (appearance stream).

f1099sa.pdf

@GitHubRulesOK
Copy link
Collaborator

GitHubRulesOK commented Feb 1, 2025

A) SumatraPDF does not handle Adobe proprietary JavaScript enhanced JetForms (XFA etc.) and many .Gov forms are designed to be used only on Adobe Served readers where results can be monitored and verified using scripts and security hashes..
see reasons below

B) there are errors in that file that will clear when the FDF is eported& reimported so we then see what is valid ACROform (not XFAform) data. UNOFFICIAL COPY

PDF Producer: PDFsharp 6.1.1 (Original: Acrobat Distiller 21.0 (Windows))
PDF Version: 1.6

Image

OFFICIAL COPY Modified: 2025-02-01 22:25:21
PDF Producer: Adobe LiveCycle Designer ES 9.0
PDF Version: 1.7
Fast Web View Yes
PDF Optimizations: Tagged PDF
Number of Pages: 5

Image

Image

Image

b)filled on an Adobe scripted platform

OFFICIAL COPY Modified: 2025-02-01 22:25:21
PDF Producer: Adobe LiveCycle Designer ES 9.0
PDF Version: 1.7
Fast Web View Yes
PDF Optimizations: Tagged PDF
Number of Pages: 5

Image

ALL DATA ACCOUNTED FOR no issue displaying Adobe Server written contents.

Image

@sshock
Copy link
Author

sshock commented Feb 1, 2025

Interesting. I didn't notice any Javascript when I analyzed it, even after extracting and uncompressing all the streams.

Also, I've discovered the fields do show up if I remove the /AP (appearance stream) dictionaries. See attached document where I blanked out the contents of the /AP dictionaries and how SumatraPDF is now able to view all the fields.

So it seems like this could be relatively easy to fix. Why give up so quickly?

Image

f1099s - fixed.pdf

@GitHubRulesOK
Copy link
Collaborator

GitHubRulesOK commented Feb 1, 2025

Hints on how to Bulk Fill Govenment XFA

Open the FDF which loads the PDF for trusting now and in bulk.

Image
This should Trigger the XFA extended Features which this form seem to be XFDF ONLY !!
Image

iText (before their takeover) advised "Flattening" Fields using their commercial products to private Forms which is probably why Sharp was used in 2021 to break that file. However "Collateral Damage" such files are not Official and thus you risk a departmental scrutiny as to why they are now a poor submission.

When you import the XFDF it will AutoFill the form and you can programmatically save the new PDF or transmit the XFDF without a user necessarily knowing.

Image

@GitHubRulesOK
Copy link
Collaborator

GitHubRulesOK commented Feb 1, 2025

Try it here is the XFD so place in the same folder as an approved blank 1099-SA

f1099sa.zip

For FDF it would be simple drag and drop
for XML XMP you need to IMport into acrobat Reader Form !!.
There are easier methods but MuPDF does not attempt to support XFA as depreciated and Adobe licensed as NON open ISO PDF
To be avoided at all costs.

@sshock
Copy link
Author

sshock commented Feb 2, 2025

This wasn't a form I made. I downloaded it from my HSA provider. (I just modified it to replace my personal info with fake data before attaching it to this bug.)

The goal is that I and others be able to view tax forms provided by this major HSA provider.

@GitHubRulesOK
Copy link
Collaborator

GitHubRulesOK commented Feb 2, 2025

Been through this hoop elswhere the forms were not Open Source PDF but Adobe Licensed XFA some less scrupulous "Template" sites (and there are hundreds of such) will offer crippled copies as if official to service providers when it is very clear they are not likely to be sanctioned. So First ask IRS / .gov etc. is the format acceptable.

If you are required to file Form 5498-SA, you must provide a statement to the participant (generally Copy B) by June 2, 2025. You may, but you are not required to, provide the participant with a statement of the December 31, 2024, FMV of the participant's account by January 31, 2025. For more information about statements to participants, see part M in the 2024 General Instructions for Certain Information Returns.

The Taxpayer First Act of 2019 authorized the Department of the Treasury and the IRS to issue regulations that reduce the 250-return e-file threshold. T.D. 9972, published February 23, 2023, lowered the e-file threshold to 10 (calculated by aggregating all information returns), effective for information returns required to be filed on or after January 1, 2024. Go to IRS.gov/InfoReturn for e-file options.

Since it is the provider who complies. Then the question is are you acting on behalf of 10 or more recipients?

They will possibly say use a printer and submit as if a paper print not a recognised form you can read the instructions and download valid PDF at https://www.irs.gov/instructions/i1099sa#en_US_2024_publink100044357

@sshock
Copy link
Author

sshock commented Feb 2, 2025

I just want to be able to view and maybe print these forms, and SumatraPDF has been my favorite PDF reader for many years now.

I'm not acting on behalf of anyone else. I only mention this form comes from a major HSA provider to indicate that likely many other SumatraPDF users will have trouble viewing their tax documents.

If SumatraPDF is able to view most Widget annotations, just not ones with an /AP appearance stream, why is that not something we want to look into fixing?

@GitHubRulesOK
Copy link
Collaborator

GitHubRulesOK commented Feb 2, 2025

There is a lot more to the problems in that part of the standards.
As stated XFA sourced PDF are not valid as meeting Adobe published standard but serve their in house commercial users.
When XFA is flattened to "ISO Public Open Standard" some AcroForm (not all components) may have been correctly edited (as you say remove the non working appearance and the fall back is seen) However those are just the tip of an iceberg of related issues about Rich Text which MuPDF generally does not support at present.

SumatraPDF cannot strictly "edit" an existing entry but if it is a standard valid "Comments" (not form) widget it may be able to replace it with one with a new appearance, hence it can move COMMENT text boxes etc. (but may degrade some that include Unicode)

@sshock
Copy link
Author

sshock commented Feb 2, 2025

Interestingly, this document shows up fine in older versions of SumatraPDF (3.1.2 and earlier).

@sshock
Copy link
Author

sshock commented Feb 3, 2025

It seems most of your hesitation about looking into this stems from believing this document uses XFA, however I'm pretty confident it's using AcroForm.

You can see the document catalog has an /AcroForm object 44 0, which contains the /Fields array with all the field object references.

And of particular importance, the AcroForm dictionary contains this entry:

/NeedAppearances true

If I remove that (it defaults to false), all PDF readers exhibit the same problem as SumatraPDF.

The spec describes NeedAppearances as:

A flag specifying whether to construct appearance
streams and appearance dictionaries for all widget annotations
in the document (see 12.7.3.3, “Variable Text”).

It's not surprising the fields show up empty if NeedAppearances is false or missing (or unsupported), because these fields' appearance streams are practically empty (just have like /Tx BMC EMC in them).

So my conclusion is:

  1. This has nothing to do with XFA; it's an AcroForm.
  2. The AcroForm dictionary has /NeedAppearances true
  3. MuPDF currently lacks support for NeedAppearances.

@GitHubRulesOK
Copy link
Collaborator

GitHubRulesOK commented Feb 3, 2025

Ok lets Presume it counts as a regression from 3.1.2 when MuPDF behaviours were different.
I can reopen on the basis your own copy is a PDF from a crippled XFA but it does not mean that MuPDF has to work with any such content.

Will tag as a MuPDF difference but it may end up as a "wont fix"

@kjk over to you !

Image

The miss working form clearly works when saved out from Acrobat software such as Adobe Reader.

Image

@sshock
Copy link
Author

sshock commented Feb 3, 2025

I see that NeedAppearances is deprecated in PDF 2.0.

Image

However, there are still a lot of %PDF-1.x documents out there, so we probably want to keep supporting it and fix this regression.

I have verified that the SumatraPDF 3.1.2 code supported this flag; maybe it just got removed on accident due to the amount of refactoring that happened then. Perhaps adding it back in would be real easy.

@sshock
Copy link
Author

sshock commented Feb 3, 2025

This patch fixes the regression, though I don't know if I did it in the appropriate way.

diff --git a/mupdf/include/mupdf/pdf/name-table.h b/mupdf/include/mupdf/pdf/name-table.h
index 598f58d87..8da932952 100644
--- a/mupdf/include/mupdf/pdf/name-table.h
+++ b/mupdf/include/mupdf/pdf/name-table.h
@@ -367,6 +367,7 @@ PDF_MAKE_NAME("N", N)
 PDF_MAKE_NAME("Name", Name)
 PDF_MAKE_NAME("Named", Named)
 PDF_MAKE_NAME("Names", Names)
+PDF_MAKE_NAME("NeedAppearances", NeedAppearances)
 PDF_MAKE_NAME("NewWindow", NewWindow)
 PDF_MAKE_NAME("Next", Next)
 PDF_MAKE_NAME("NextPage", NextPage)
diff --git a/mupdf/source/pdf/pdf-appearance.c b/mupdf/source/pdf/pdf-appearance.c
index ab075994a..d4c7c77ff 100644
--- a/mupdf/source/pdf/pdf-appearance.c
+++ b/mupdf/source/pdf/pdf-appearance.c
@@ -3559,6 +3559,13 @@ retry_after_repair:
 				local_synthesis = 1;
 		}
 
+		/* Need to reconstruct appearance streams on all widgets if NeedAppearances is true */
+		if (subtype == PDF_NAME(Widget))
+		{
+			if (ap_n && pdf_to_bool(ctx, pdf_dict_getl(ctx, pdf_trailer(ctx, annot->page->doc), PDF_NAME(Root), PDF_NAME(AcroForm), PDF_NAME(NeedAppearances), NULL)))
+				local_synthesis = 1;
+		}
+
 		/* We need to put this appearance stream back into the document. */
 		needs_resynth = pdf_annot_needs_resynthesis(ctx, annot);
 		if (needs_resynth)

@sshock
Copy link
Author

sshock commented Feb 3, 2025

With the fix in place, all the fields show up and look great, except for the TRUSTEE company name and address, which has extra line spacing that shouldn't be there:

SumatraPDF 3.1.2 and all other PDF viewers display it correctly. I think the problem with current SumatraPDF is that it treats \r\n as two newlines instead of one.

Image

@sshock
Copy link
Author

sshock commented Feb 3, 2025

For this other regression with the extra newlines, I can see that the old MuPDF code in SumatraPDF 3.1.2 had logic to treat \r as a newline only when not followed by a \n as seen with this code from pdf_append_line():

			if (*end == '\n' || *end == '\r' && *(end + 1) != '\n')
				break;

In contrast, looking at the current code, I see no such logic in the break_string() method, or the write_string_with_quadding() that calls it.

I'm not exactly sure how to fix this one but if I get time I may take a stab at it...

@sshock
Copy link
Author

sshock commented Feb 3, 2025

Here's a new patch that fixes both issues:

diff --git a/mupdf/include/mupdf/pdf/name-table.h b/mupdf/include/mupdf/pdf/name-table.h
index 598f58d87..a1b447a9f 100644
--- a/mupdf/include/mupdf/pdf/name-table.h
+++ b/mupdf/include/mupdf/pdf/name-table.h
@@ -367,6 +367,7 @@ PDF_MAKE_NAME("N", N)
 PDF_MAKE_NAME("Name", Name)
 PDF_MAKE_NAME("Named", Named)
 PDF_MAKE_NAME("Names", Names)
+PDF_MAKE_NAME("NeedAppearances", NeedAppearances)
 PDF_MAKE_NAME("NewWindow", NewWindow)
 PDF_MAKE_NAME("Next", Next)
 PDF_MAKE_NAME("NextPage", NextPage)
diff --git a/mupdf/source/pdf/pdf-appearance.c b/mupdf/source/pdf/pdf-appearance.c
index ab075994a..ea9482c5e 100644
--- a/mupdf/source/pdf/pdf-appearance.c
+++ b/mupdf/source/pdf/pdf-appearance.c
@@ -1906,6 +1906,9 @@ write_string_with_quadding(fz_context *ctx, fz_buffer *buf,
 				write_string(ctx, buf, lang, font, fontname, size, a, b-1);
 			else
 				write_string(ctx, buf, lang, font, fontname, size, a, b);
+			// If \r followed by \n, skip the \n; \r\n is a single newline not two.
+			if (b[-1] == '\r' && b[0] == '\n')
+				++b;
 			a = b;
 			px = x;
 		}
@@ -2043,6 +2046,9 @@ layout_string_with_quadding(fz_context *ctx, fz_layout_block *out,
 				layout_string(ctx, out, lang, font, size, xorig+x, y, a, b);
 				add_line_at_end = 0;
 			}
+			// If \r followed by \n, skip the \n; \r\n is a single newline not two.
+			if (b[-1] == '\r' && b[0] == '\n')
+				++b;
 			a = b;
 			y -= lineheight;
 		}
@@ -3559,6 +3565,13 @@ retry_after_repair:
 				local_synthesis = 1;
 		}
 
+		/* Need to reconstruct appearance streams on all widgets if NeedAppearances is true */
+		if (subtype == PDF_NAME(Widget))
+		{
+			if (ap_n && pdf_to_bool(ctx, pdf_dict_getl(ctx, pdf_trailer(ctx, annot->page->doc), PDF_NAME(Root), PDF_NAME(AcroForm), PDF_NAME(NeedAppearances), NULL)))
+				local_synthesis = 1;
+		}
+
 		/* We need to put this appearance stream back into the document. */
 		needs_resynth = pdf_annot_needs_resynthesis(ctx, annot);
 		if (needs_resynth)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants