Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV header row is repeated when field is not matched #599

Open
memen45 opened this issue Jan 5, 2025 · 0 comments
Open

CSV header row is repeated when field is not matched #599

memen45 opened this issue Jan 5, 2025 · 0 comments

Comments

@memen45
Copy link

memen45 commented Jan 5, 2025

~/my-venv/bin/invoice2data -t . i*.pdf --debug --input-reader pdftotext --output-format csv

Processing multiple files where some fields are missing for a single pdf input.

Actual Output

This results in a broken CSV where the CSV header row is repeated halfway the file:

issuer,amount,amount_tax,date,invoice_number,vat,partner_name,country_code,partner_coc,iban,bic,currency,desc
"Company B.V.",1.00,0.21,2024/01/31,1,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/02/29,2,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/03/31,3,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/04/30,4,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/05/31,5,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/06/30,6,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
issuer,amount,amount_tax,date,invoice_number,vat,partner_name,country_code,partner_coc,currency,desc,,
"Company B.V.",1.00,0.21,2024/07/31,7,XX123456789XXX,"Company B.V.",NL,12345678,EUR,"Invoice from Company B.V.",,
issuer,amount,amount_tax,date,invoice_number,vat,partner_name,country_code,partner_coc,iban,bic,currency,desc
"Company B.V.",1.00,0.21,2024/08/31,8,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/09/30,9,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/10/31,10,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/11/30,11,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/12/31,12,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."

This makes the exported CSV file not usable for import into other software, as it is not a valid CSV document.

Expected Output

I would expect the headers to be removed and the noncritical fields remain empty for the invoices that could not be matched, like below:

issuer,amount,amount_tax,date,invoice_number,vat,partner_name,country_code,partner_coc,iban,bic,currency,desc
"Company B.V.",1.00,0.21,2024/01/31,1,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/02/29,2,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/03/31,3,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/04/30,4,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/05/31,5,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/06/30,6,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/07/31,7,XX123456789XXX,"Company B.V.",NL,12345678,,,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/08/31,8,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/09/30,9,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/10/31,10,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/11/30,11,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/12/31,12,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."

Is there any setting that would fix this or is this a bug that should be fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant