Feature requests and feedback from a user's perspective #88

brianwanderson · 2020-12-05T18:28:25Z

brianwanderson
Dec 5, 2020

Hi Jonas,

I'm writing this post to provide my feedback on paperless-ng.

First, a little background: I consider myself a database guru, but a mere enthusiast/tinker when it comes to coding python. I used Mayan (https://mayan-edms.com) to organize my home office documents for many years, but found it to be too cumbersome and overly complex for my needs. One thing it did do, however, is force me to think about how to organize my scanned documents in a way that makes sense to me. I've recently pulled all of my documents (~1500) out of Mayan and now store them in a simple file system hierarchy. I went searching for a new tool and found my way to your project, and all I can say is, thank-you. Great work on a great project. It checks all the right boxes for me: keeps it simple, OCRs documents and provides robust searching, stores the documents in the file system in a hierarchical way of my choosing, uses SQLite for simple backups or PostgreSQL (yeah!!) for larger setups, runs in a browser, is written in Python, has an intuitive GUI.

I'm following the development with great interest. For what it's worth, here is my list of feature requests and, perhaps, points of discussion:

1. Add a placeholder for document type to PAPERLESS_FILENAME_FORMAT

Such that if document types = Auto, Finance-Banking, Insurance-Health
and

PAPERLESS_FILENAME_FORMAT={document_type}/{correspondent}/{title}

I end up with something like:

Auto/
  /Joes_Auto_Repair
    2018.12.01-car_repair_bill-0000001.pdf
  /My_Local_Tire_Shop
    2020.08.01-new_tires_invoice-0000002.pdf
Finance-Banking/
  my_bank/
    2019.07.31-statement-0000003.pdf
Insurance-Health/
  my_insurance_company
    2020.08.31-benefits_statement-0000004.pdf

2. Don't alter the filename.

Currently, when the file is stored in the media directory, the filename is changed to all lower case, no spaces, and the document ID is appended on the end. For example, if I were to do the following:

import: "2020.12.01 Bank of America - Statement.pdf"
set the correspondent to "Bank of America"
set:

PAPERLESS_FILENAME_FORMAT={correspondent}/{title}

the file gets stored in the media directory as:

/path-to-media/documents/originals/bank-of-america/20201201-bank-of-america-statement-0000056.pdf

But, I would like it to be:

/path-to-media/documents/originals/Bank of America/2020.12.01 Bank of America - Statement.pdf

I would argue the following:

It's the database's job to uniquely identify a file (it's ID number in this case), and to keep track of where the file is located in the file system. Removing spaces and capital letters and appending the document ID to the end of the filename is unnecessary and inconvenient to the end-user.
Having duplicate titles/filenames would be an error on my part that I want to know about. Trying to name two documents with the same title/filename should raise an error message that alerts the user with something like "You already have a document titled duplicate_title.pdf". Or, at the very least, append "-01", "-02", etc., to duplicated filenames and alert the user or put a note in the log.

3. Add one hierarchical grouping (call it category?) above document_type

Perhaps this is not worth the extra complexity, and I'm also curious to see how your idea of nested tags turns out which may be more elegant. But, I would like to have something like:

categories = Insurance, Banking, Credit Cards
document_types = Invoices, Statements, Receipts, Letters

PAPERLESS_FILENAME_FORMAT={category}/{correspondent}/{document_type}/{title}

And end up with:

Insurance
  /My Insurance Company
    /Letters
      /2020.12.01 A Letter From My Insurance Company.pdf
Banking/
  /My Bank
    /Statements
      /2020.12.31 December Bank Statement.pdf
Credit Cards
  /My Credit Card Company
    /Statements
      /2020.12.31 December Credit Card Statement.pdf

4. Perhaps store the document's metadata in the file itself

I like to think of a document viewer the same way I think about music players/organizers such as Rhythmbox, Amarok, and iTunes. I keep my digital music files organized in my computer's filesystem, but I use a music playing app to help me organize and play my music. Why not store all of a document's metadata in the document itself in the same way that MP3 music files store ID3 tags. Could this be done with XMP tagging (https://en.wikipedia.org/wiki/Extensible_Metadata_Platform)?

5. One final note

I'm running paperless-ng 0.9.4 in a FreeBSD jail with a "bare metal" install. Aside from some inotify speed bumps, everything runs fine.

jonaswinkler · 2020-12-05T19:29:53Z

jonaswinkler
Dec 5, 2020
Maintainer

Hello!

Thanks for giving such detailed feedback. This is the stuff that keeps projects like this going.

First, a little background: I consider myself a database guru, but a mere enthusiast/tinker when it comes to coding python. I used Mayan (https://mayan-edms.com) to organize my home office documents for many years, but found it to be too cumbersome and overly complex for my needs. One thing it did do, however, is force me to think about how to organize my scanned documents in a way that makes sense to me. I've recently pulled all of my documents (~1500) out of Mayan and now store them in a simple file system hierarchy. I went searching for a new tool and found my way to your project, and all I can say is, thank-you. Great work on a great project. It checks all the right boxes for me: keeps it simple, OCRs documents and provides robust searching, stores the documents in the file system in a hierarchical way of my choosing, uses SQLite for simple backups or PostgreSQL (yeah!!) for larger setups, runs in a browser, is written in Python, has an intuitive GUI.

Just remember that half of the credit goes to the original project. :) For example, the entire filename format logic was done by someone else, I just reworked the implementation a bit to make it more maintainable and fail-safe.

1. Add a placeholder for document type to PAPERLESS_FILENAME_FORMAT

Sure, that's easy. I put it on the list.

2. Don't alter the filename.

It's the database's job to uniquely identify a file (it's ID number in this case), and to keep track of where the file is located in the file system. Removing spaces and capital letters and appending the document ID to the end of the filename is unnecessary and inconvenient to the end-user.

That's partially true. Right now, the title field allows characters that are not valid in file names. When I change this behavior, I have to take that into consideration. Some users may use such titles and renaming would fail.

The current logic makes url-safe filenames. That's not necessarily required, but the implementation is simple and bomb proof. I'll consider using less restrictive conversions, that just remove problematic characters or replace them with dashes, or something.

Having duplicate titles/filenames would be an error on my part that I want to know about. Trying to name two documents with the same title/filename should raise an error message that alerts the user with something like "You already have a document titled duplicate_title.pdf". Or, at the very least, append "-01", "-02", etc., to duplicated filenames and alert the user or put a note in the log.

There's valid reason to have multiple documents with the same title, and putting ids in the title is the safest way of dealing with that. I've got two documents titled "December 2020", both typed "Bank statement", one tagged "account1", the other "account2". I agree it's inconvenient. "-01", "-02" seems like a good idea, I'll keep that in mind. There won't be any errors about duplicate titles, though.

3. Add one hierarchical grouping (call it category?) above document_type

Perhaps this is not worth the extra complexity, and I'm also curious to see how your idea of nested tags turns out which may be more elegant. But, I would like to have something like:

I don't see myself adding a new entity type anytime soon. However, we could use tags to specify a "category" for each document type, which you would then be able to reference in the filename format. Or maybe just provide a plain text field for that. I'll think about that once I tackle this hierarchical tags thing. I also thought about scratching types entirely, since they're just a more specific form of tags, but it seems they are rather useful.

4. Perhaps store the document's metadata in the file itself

I like to think of a document viewer the same way I think about music players/organizers such as Rhythmbox, Amarok, and iTunes. I keep my digital music files organized in my computer's filesystem, but I use a music playing app to help me organize and play my music. Why not store all of a document's metadata in the document itself in the same way that MP3 music files store ID3 tags. Could this be done with XMP tagging (https://en.wikipedia.org/wiki/Extensible_Metadata_Platform)?

I'd rather not modify the original files at all. Paperless even has checksums in place that make sure they stay the same. I don't want to cause any data loss in case some library decides to wreak havoc. That's also the reason why I decided to keep originals and OCR-enhanced files next to each other with the new update.

Also, we'd need the database anyway for searching and serving data to the front end in a reasonable time. Fetching that data by reading hundreds of files when the users searches for something just won't cut it.

5. One final note

I'm running paperless-ng 0.9.4 in a FreeBSD jail with a "bare metal" install. Aside from some inotify speed bumps, everything runs fine.

Glad to hear that!

Honestly, when I started working on this, I thought about removing this filename feature all together, since it was giving me lots of headaches and I thought that users don't care anyway about how the files are stored on disk, but it seems I was wrong.

0 replies

brianwanderson · 2020-12-07T04:52:14Z

brianwanderson
Dec 7, 2020
Author

Just remember that half of the credit goes to the original project. :) For example, the entire filename format logic was done by someone else, I just reworked the implementation a bit to make it more maintainable and fail-safe.

Yes, credit to Daniel Quinn, the original author of paperless and all who contributed to that project. But, don't sell yourself short. You've put in a lot of work.

1. Add a placeholder for document type to PAPERLESS_FILENAME_FORMAT

Sure, that's easy. I put it on the list.

Great. I was hoping you would say that. I looked over the code, and it seemed like it might an easy fix.

2. Don't alter the filename.

The current logic makes url-safe filenames. That's not necessarily required, but the implementation is simple and bomb proof. I'll consider using less restrictive conversions, that just remove problematic characters or replace them with dashes, or something.

Fair enough. But I still feel that it would be much simpler and more elegant to leverage the file system and simply have the name of the file be the same as the the document title and simply restrict illegal characters or raise an alert box when the user tries to use an illegal character or something like that. Although, I have to admit, I'm not well versed on how Python deals with illegal characters in file names vs. what operating system and file system the code is running on.

(As a side rant: If I were the absolute ruler of all things computer, the first thing I would do is have a dedicated path separator key on the keyboard that was not a printed character. The fact that /, \, and : serve as both printed characters and directory separators is one of the most unfortunate outcomes of history.)

There's valid reason to have multiple documents with the same title, and putting ids in the title is the safest way of dealing with that. I've got two documents titled "December 2020", both typed "Bank statement", one tagged "account1", the other "account2".

I've been putting a lot of though into this--trying to see things from both sides. And, I'm afraid I don't agree. I think you are making things harder on yourself then they need to be. I find it inadvisable to design a system that allows both:

A

% ls  media/documents/originals
December 2020 - 0000001.pdf
December 2020 - 0000002.pdf

such that
file: December 2020 - 0000001.pdf has

title attribute set to December 2020
tagged as account 1

and file: December 2020 - 0000002.pdf has

title attribute set to December 2020
tagged as account 2

and B

PAPERLESS_FILENAME_FORMAT= {placeholder}/{another_placeholder}/{title}

I think that if you want condition A, you shouldn't allow the config setting in B.

One of the main features of both paperless and paperless-ng that I like is the ability to store my documents in a directory structure of my choosing with file names of my choosing. I've tried other document management systems that put all of the document files in one giant directory and managed all the file names. I ended up with a directory filled with thousands of PDF files with names like: "ffe24e1f-a5d8-40a6-bb64-708fb8e078d9". Yuk!

When I go look at my files, I would much rather see something like:

PAPERLESS_FILENAME_FORMAT= {document_type}/{title}
% ls media/documents/originals
Bank Statement/December 2020 - account 1.pdf
Bank Statement/December 2020 - account 2.pdf

or, alternatively

PAPERLESS_FILENAME_FORMAT= {document_type}/{tag[0]}/{title}
% ls media/documents/originals
Bank Statement/account 1/December 2020.pdf
Bank Statement/account 2/December 2020.pdf

then

PAPERLESS_FILENAME_FORMAT= {document_type}/{title}
% ls media/documents/originals
Bank Statements/December 2020-0000059.pdf
Bank Statements/December 2020-0000060.pdf

or even

Bank Statements/December 2020.pdf
Bank Statements/December 2020-01.pdf

side note:
I've found that trying to make directories from tags is clumsy at best.

3. Add one hierarchical grouping (call it category?) above document_type

I don't see myself adding a new entity type anytime soon.

Fair enough. That would probably be a heavy lift and would perhaps tip it over the "don't over complicate things" edge.

I also thought about scratching types entirely, since they're just a more specific form of tags, but it seems they are rather useful.

I think date, correspondent, and document type are the three core attributes and warrant their own special top-level type treatment as currently implemented.

4. Perhaps store the document's metadata in the file itself

I'd rather not modify the original files at all.

Agreed. That would be a big bite to chew. I tried doing a little more research on the subject of writing XMP tags (https://en.wikipedia.org/wiki/Extensible_Metadata_Platform, https://exiftool.org). Ugh, what a thorny mess. I still think the notion of storing a document's metadata within the file itself is the best way to do things. Unfortunately, I don't think anyone has come up with any good standard way of doing it.

0 replies

jonaswinkler · 2020-12-07T23:57:15Z

jonaswinkler
Dec 7, 2020
Maintainer

I've been putting a lot of though into this--trying to see things from both sides. And, I'm afraid I don't agree. I think you are making things harder on yourself then they need to be.

Probably. The thing is, paperless allowed duplicate titles in the past, and therefore, some users will have documents with duplicate titles and I need to take that into consideration. I really like the idea of having _01, _02 at the end of the file name in case that happens and will look into getting that into the code. No impact for users who are not in that situation, and a working solution for users who are.

side note:
I've found that trying to make directories from tags is clumsy at best.

I'm open to any recommendations on how to make it better. That part of the code is still in there from original paperless. The idea I'm working on right now looks somewhat like this:

Placeholder {tags} always returns an alphabetically sorted, comma separated list of tags assigned to the document, which might be empty.
Placeholder {tags_folder} would return either 'none' for no tags, the name of a single assigned tag, or 'multiple' for multiple tags.
Placeholder {tags_path} would return a path where each assigned tag is a folder.

Not optimal, since tags don't translate well into folders. There's also the idea of having hierarchical tags over at #56, maybe we can work out something with that.

0 replies

jonaswinkler · 2020-12-10T01:38:40Z

jonaswinkler
Dec 10, 2020
Maintainer

I believe I adjusted most of the issues with the filenames now, except for the tags.

0 replies

brianwanderson · 2020-12-10T22:23:23Z

brianwanderson
Dec 10, 2020
Author

The thing is, paperless allowed duplicate titles in the past, and therefore, some users will have documents with duplicate titles and I need to take that into consideration. I really like the idea of having _01, _02 at the end of the file name in case that happens and will look into getting that into the code. No impact for users who are not in that situation, and a working solution for users who are.

Yes, I see your point. I agree that the _01, _02 is the the best work-around for the few times the problem of duplicate titles comes up.

I believe I adjusted most of the issues with the filenames now, except for the tags.

I got version 0.9.6 up and running. Things are looking really, really nice. I really like how my file names don't change and that I can organize my files into a directory structure of my choosing with

PAPERLESS_FILENAME_FORMAT= {document_type}/{correspondent}/{title}

I also really like the Details, Content, and Metadata tabs on the edit document page. Great work.

Regarding tags:

I'm open to any recommendations on how to make it better.

This is a tough one. Obviously tags don't translate into file system directories because a file can have multiple tags and the file system is "flat". For me, {document_type}/{correspondent}/{title} is good enough, and I don't bother with tags in my file system directory structure.

New topic: Ability to delete the original and keep the archive version

I do have one other suggestion--not sure if I should start a new thread on a new issue ticket, but since this is already in the "discussion" section here it is:

I would like to have the ability to delete the original version of the document if I'm happy with the archive version that ocrmypdf produces. I don't want to have to save two versions of the same document. My workaround is to run ocrmypdf on the file and verifying the results prior to importing it into paperless-ng and setting

PAPERLESS_OCR_MODE=skip_noarchive

0 replies

jonaswinkler · 2020-12-10T23:21:50Z

jonaswinkler
Dec 10, 2020
Maintainer

Awesome. Thank you for your feedback.

I do have one other suggestion--not sure if I should start a new thread on a new issue ticket, but since this is already in the "discussion" section here it is:

This is alright. GitHub just proposed me to enable this and I feel its a good place for things that aren't exactly tasks that fit into tickets.

I would like to have the ability to delete the original version of the document if I'm happy with the archive version that ocrmypdf produces. I don't want to have to save two versions of the same document. My workaround is to run ocrmypdf on the file and verifying the results prior to importing it into paperless-ng and setting

I see. I already figured people would ask for that. Am I correct in assuming that you'd still want paperless to keep the original for each document you uploaded until you decide that the OCR'ed version is alright? I see two options:

Offer some form of UI menu option to delete the original, if archived. I am working on bulk editing documents right now, and we could also have an option to do this quickly for many selected documents.
- When deleting originals, I'm considering to move the archived file back into the originals folder and use the database to keep track of what's archived and what is not. This would result in all documents residing in a single folder, instead of multiple.
- Edit: When continuing that thought, maybe it wasn't such a good idea to have two separate folders in the first place.
Directly store the archived version instead of the original. I suppose this is not what you want. I also don't fully trust this PDF library.

0 replies

brianwanderson · 2020-12-12T15:57:57Z

brianwanderson
Dec 12, 2020
Author

Regarding tags and folders

After some more thought--the following makes sense to me:

Simply concatenate alphabetically all of the file's tags together into a directory name such that:

file0-with_no_tags.pdf
- (empty)
file1-with_one_tag.pdf
- tag1
file2-with_two_tags.pdf
- tag1
- tag2
file3-with_three_tags.pdf
- tag1
- tag2
- tag3

and:

PAPERLESS_FILENAME_FORMAT= {tags}/{title}

results in the following directory structure:

media/documents/archive|original/
    NONE/
        file0-with_no_tags.pdf
    tag1/
        file1-with_one_tag.pdf
    tag1-tag2/
        file2-with_two_tags.pdf
    tag1-tag2-tag3/
        file3-with_three_tags.pdf

Original vs. Archived document version

Am I correct in assuming that you'd still want paperless to keep the original for each document you uploaded until you decide that the OCR'ed version is alright?

For my use case, I only want to keep one version of any given document. The document should be a PDF/A with an OCR text layer. I'm not necessarily interested in paperless-ng converting my PDFs for me. I am content to perform all of the document preparation before uploading to paperless-ng. So my work flow is the following:

Scan documents.

I use a good quality scanner with an automatic document feeder (Fujitsu fi-6230)
I like gscan2pdf (http://gscan2pdf.sourceforge.net/) because it allows me to rearrange pages easily before saving to PDF

Run resulting PDF through ocrmypdf to generate an OCR'd PDF/A file
Visually verify resulting PDF/A. This is my final document. Everything else gets deleted.
Upload my final PDF/A to paperless-ng (or similar) to help me tag, add metadata, search, and organize my documents. If paperless-ng blows up, or goes away, or whatever, I want to have my documents safely stored on my file system with at least a minimal organized structure that is simple to back-up such as:

document_type/correspondent/YYY-MM-DD-document.pdf

This is not to say that incorporating PDF/A conversion into paperless-ng isn't a good idea--others may find it much more useful then me. I think it's a matter of how much of steps 1-3 you want to add. With your current set up, I think I would like to see a side-by-side comparison view that would allow me to visually inspect the "archived" and "original" versions before being confident about deleting the "original". This might not be so bad when uploading one file at a time, but might be challenging if there are multiple files to go through. Perhaps, present the user with a list of documents with two versions available and let the user choose to keep the original, delete the original, or retry the ocr step on the original with different settings, on a case-by-case basis.

I'm considering to move the archived file back into the originals folder and use the database to keep track of what's archived and what is not. This would result in all documents residing in a single folder, instead of multiple.

I wouldn't. I would keep the final PDF/A document in a separate directory from files that have yet to be OCR'd and converted to PDF/A's as currently implemented. I always want to be able to simply look at the file system and see what is going on.

1 reply

jonaswinkler Dec 12, 2020
Maintainer

Regarding tags and folders

This seems easy enough. I'll have that added soon.

Original vs. Archived document version

Well, it seems you've found a workflow that works out for you. gscan2pdf is nice, I use that as well on linux.

I'll keep your comments about comparing originals and archived versions in mind. Someone else also commented about the ability to compare both versions. I might do something like that in the future.

When I implemented the mechanics regarding originals and archived documents in this way, I was primarily concerned about users who don't want their original documents touched/messed with in any way as well as users who just want paperless to do the OCR steps all by itself, without caring too much about the details. This is the best I was able to come up with.

alexgrahamuk · 2021-01-08T15:39:12Z

alexgrahamuk
Jan 8, 2021

Just starting to use paperless ng and I'd like to tag the document type against two users and then use correspondent as inciter of the correspondence. My scanner has a number of shortcuts I've setup for this but obviously it isn't picking up on the naming convention User-Correspondent* == {document_type}-{correspondent}- is there going to be a patch available soon or should I just hack something in and make a pull request?

2 replies

jonaswinkler Jan 8, 2021
Maintainer

I'm not exactly sure I understand what you're trying to do. Are you saying that your files are named something like documenttype-correspondent-*.pdf and you'd like paperless to pick these up?

If so, we've had something like this in paperless before, but more often than not, it would not pick up correspondent names and tags in the intended way, and in result would fill the DB with lots of incorrect information.

If not, please elaborate.

Regarding making a PR: please don't just hack something into paperless. Consider other users as well. Both the users that don't want to use that feature and users who want to use it, but not in the exact way you've specified (maybe they want a different format).

This is also not related to this discussion, so please make a new discussion or issue.

alexgrahamuk Feb 8, 2021

Thanks Jonas

Yes that's basically what I needed it to do to be able to quickly filter between two users without needing full multi user support.

I think I've found a reliable (not ideal) way to handle this interim so I can start getting everything scanned in.

Thanks again to yourself and other contributors for a very useful software product.

anthosz · 2021-07-30T19:32:53Z

anthosz
Jul 30, 2021

Hello,

I just discover paperless-ng, so intuitive compared to Mayan-EDMS (work in progress to move all document from Mayan to Paper-NG).

What is the status of these feature requests?

Following the last comments, little suggestion to add something like group/tree function.

Example:
file1.pdf uploaded the 1th February 2021 and related to "marketing" (keyword/tag/..):
file2.pdf uploaded the 10th February 2021 and related to "marketing" (keyword/tag/..):
The goal is to create multiple view (tag/etc..) in one tree:

/ marketing (keyword)
  year (uploaded by example) -> 2021
    month (uploaded by example) -> February (or 02)
     day (uploaded by example) -> 01
       file1.pdf
     day (uploaded by example) -> 02
       file2.pdf

In all case, thank you for the work accomplished :)

0 replies

svenjott · 2021-09-03T13:43:35Z

svenjott
Sep 3, 2021

Just hijacking this thread on your final note:

5. One final note

I'm running paperless-ng 0.9.4 in a FreeBSD jail with a "bare metal" install. Aside from some inotify speed bumps, everything runs fine.

do mind sharing your post-install steps? I managed to install all requirements (though scipy and numpi were impossible via pip for me, I ended up using it from the FreeBSD latest repo and changed the version in the requirements.txt - but that’s a different story)
and to start the „not for use development“ frontend, but that’s all. How do you start the different parts? Did you wtite rc scripts for them?

greetings
Sven

2 replies

brianwanderson Sep 5, 2021
Author

Sven,

I don't remember all of the details of getting paperless-ng running on FreeBSD, but I do remember having trouble with scipy and numpy and maybe a couple of others. I ended up installing a couple of packages from ports as opposed to pip. There may be a couple of version mismatches, but everything seems to run fine.

How do you start the different parts?

I start everything manually using three tmux panes--see my notes in my paperless.conf. Everything is running in a jail on my TrueNAS server which only goes down in a power failure, so I don't have to restart paperless very often. This is very much a manual "works for me" solution. Maybe someday I'll get around to writing up some rc scripts, or a FreeBSD port would be great...

Here are some of my configuration details that may point you in the right direction. This is for paperless-ng 1.4.0. When I get around to upgrading to a newer version, I'll try to start a new thread with a more proper step-by-step install howto.

Cheers,
Brian

% freebsd-version
12.2-RELEASE

% pkg info | grep py
py37-cffi-1.14.5               Foreign Function Interface for Python calling C code
py37-cryptography-3.3.2        Cryptographic recipes and primitives for Python developers
py37-joblib-0.13.0             Lightweight pipelining using Python functions as jobs
py37-numpy-1.16.6,1            The New Numeric Extension to Python
py37-pip-20.2.3                Tool for installing and managing Python packages
py37-pycparser-2.20            C parser in Python
py37-pyinotify-0.9.6           Python interface to (lib)inotify
py37-scikit-learn-0.22_1       Machine learning algorithms for python
py37-scipy-1.5.4_1             Scientific tools for Python
py37-setuptools-44.0.0         Python packages installer
py37-six-1.15.0                Python 2 and 3 compatibility utilities
py37-sqlite3-3.7.9_7           Standard Python binding to the SQLite3 library (Python 3.7)
python37-3.7.10                Interpreted object-oriented programming language

% cat requirements.txt 

...(bunch of other stuff)...
#scikit-learn==0.24.0
#scipy==1.5.4
...(bunch of other stuff)...

% cat paperless.conf 
PAPERLESS_REDIS=redis://localhost:6379

PAPERLESS_CONSUMPTION_DIR=/mnt/paperless/consume
PAPERLESS_DATA_DIR=/mnt/paperless/data
PAPERLESS_MEDIA_ROOT=/mnt/paperless/media
PAPERLESS_STATICDIR=../static
PAPERLESS_FILENAME_FORMAT={document_type}/{correspondent}/{tag_list}/{title}

#PAPERLESS_AUTO_LOGIN_USERNAME=

PAPERLESS_OCR_LANGUAGE=eng
PAPERLESS_OCR_MODE=skip_noarchive

PAPERLESS_TIME_ZONE=America/Chicago
PAPERLESS_CONSUMER_POLLING=1000
PAPERLESS_CONSUMER_DELETE_DUPLICATES=false
#PAPERLESS_CONSUMER_RECURSIVE=false
#PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS=false
PAPERLESS_OPTIMIZE_THUMBNAILS=true
PAPERLESS_FILENAME_DATE_ORDER=YMD

PAPERLESS_TIKA_ENABLED=false

PAPERLESS_CONVERT_BINARY=/usr/local/bin/convert
PAPERLESS_GS_BINARY=/usr/local/bin/gs
PAPERLESS_OPTIPNG_BINARY=/usr/local/bin/optipng

# notes:
# in order to start:
# 
# $ tmux
##################
##### pane 1 #####
# $ cd src/
# $ ./manage.py document_consumer
##################
#
#################
#### pane 2 #####
# $ cd src/
# $ ./manage.py qcluster
#################
#
################
#### pane 3 ####
# $ cd src/
# $ gunicorn -c ../gunicorn.conf.py paperless.asgi:application
################

powellc Jan 30, 2022

@brianwanderson you are a king among men for this cheatsheet for FreeBSD. I just ran throug hit, however, and with the current master branch pulled, I was unable to use py38-cryptography. Rather, I just had to ensure the rust toolchain was installed (pkg install rust) and then cryptography could build via pip just fine and everything else worked a treat. Thank you!

fedegiova · 2022-04-10T14:28:24Z

fedegiova
Apr 10, 2022

I'm adding package list for FreeBSD 12-3

pkg install py38-cffi py38-cryptography py38-joblib py38-numpy py38-pip py38-pycparser py38-pyinotify py38-scikit-learn py38-scipy py38-setuptools py38-six py38-sqlite3

pkg install liberation-fonts-ttf ImageMagick7-nox11 optipng gnupg postgresql-libpqxx mime-support

pkg instal unpaper ghostscript8-base icc-profiles-openicc qpdf leptonica libxml2 pngquant tesseract-data tesseract

pkg install rust libxstl

and supervisord configuration for starting automatically at boot (pkg install py38-supervisord

[program:paperless-consumer]
command=/usr/local/bin/python3 ./manage.py document_consumer
process_name=%(program_name)s ; process_name expr (default %(program_name)s)
numprocs=1                    ; number of processes copies to start (def 1)
directory=/opt/paperless/paperless-ng/src/
user=paperless                   ; setuid to this UNIX account to run the program
environment=PATH="/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:/opt/paperless/bin:/opt/paperless/.local/bin/",
            USER="paperless",
            HOME="/opt/paperless"

[program:paperless-worker]
command=/usr/local/bin/python3 ./manage.py qcluster
process_name=%(program_name)s ; process_name expr (default %(program_name)s)
numprocs=1                    ; number of processes copies to start (def 1)
directory=/opt/paperless/paperless-ng/src/
user=paperless                   ; setuid to this UNIX account to run the program
environment=PATH="/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:/opt/paperless/bin:/opt/paperless/.local/bin/",
            USER="paperless",
            HOME="/opt/paperless"

[program:paperless-web]
command=/opt/paperless/.local/bin/gunicorn -c ../gunicorn.conf.py paperless.asgi:application
process_name=%(program_name)s ; process_name expr (default %(program_name)s)
numprocs=1                    ; number of processes copies to start (def 1)
directory=/opt/paperless/paperless-ng/src/
user=paperless                   ; setuid to this UNIX account to run the program
environment=PATH="/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:/opt/paperless/bin:/opt/paperless/.local/bin/",
            USER="paperless",
            HOME="/opt/paperless"

[group:paperless]
programs=paperless-consumer,paperless-worker,paperless-web

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature requests and feedback from a user's perspective #88

{{title}}

Replies: 11 comments 5 replies

{{title}}

1. Add a placeholder for document type to PAPERLESS_FILENAME_FORMAT

2. Don't alter the filename.

3. Add one hierarchical grouping (call it category?) above document_type

4. Perhaps store the document's metadata in the file itself

5. One final note

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

5. One final note

{{title}}

{{title}}

{{title}}

Select a reply

Feature requests and feedback from a user's perspective #88

brianwanderson Dec 5, 2020

1. Add a placeholder for document type to PAPERLESS_FILENAME_FORMAT

2. Don't alter the filename.

3. Add one hierarchical grouping (call it category?) above document_type

4. Perhaps store the document's metadata in the file itself

5. One final note

Replies: 11 comments · 5 replies

jonaswinkler Dec 5, 2020 Maintainer

1. Add a placeholder for document type to PAPERLESS_FILENAME_FORMAT

2. Don't alter the filename.

3. Add one hierarchical grouping (call it category?) above document_type

4. Perhaps store the document's metadata in the file itself

5. One final note

brianwanderson Dec 7, 2020 Author

1. Add a placeholder for document type to PAPERLESS_FILENAME_FORMAT

2. Don't alter the filename.

3. Add one hierarchical grouping (call it category?) above document_type

4. Perhaps store the document's metadata in the file itself

jonaswinkler Dec 7, 2020 Maintainer

jonaswinkler Dec 10, 2020 Maintainer

brianwanderson Dec 10, 2020 Author

New topic: Ability to delete the original and keep the archive version

jonaswinkler Dec 10, 2020 Maintainer

brianwanderson Dec 12, 2020 Author

Regarding tags and folders

Original vs. Archived document version

jonaswinkler Dec 12, 2020 Maintainer

alexgrahamuk Jan 8, 2021

jonaswinkler Jan 8, 2021 Maintainer

alexgrahamuk Feb 8, 2021

anthosz Jul 30, 2021

svenjott Sep 3, 2021

5. One final note

brianwanderson Sep 5, 2021 Author

powellc Jan 30, 2022

fedegiova Apr 10, 2022

brianwanderson
Dec 5, 2020

Replies: 11 comments 5 replies

jonaswinkler
Dec 5, 2020
Maintainer

brianwanderson
Dec 7, 2020
Author

jonaswinkler
Dec 7, 2020
Maintainer

jonaswinkler
Dec 10, 2020
Maintainer

brianwanderson
Dec 10, 2020
Author

jonaswinkler
Dec 10, 2020
Maintainer

brianwanderson
Dec 12, 2020
Author

jonaswinkler Dec 12, 2020
Maintainer

alexgrahamuk
Jan 8, 2021

jonaswinkler Jan 8, 2021
Maintainer

anthosz
Jul 30, 2021

svenjott
Sep 3, 2021

brianwanderson Sep 5, 2021
Author

fedegiova
Apr 10, 2022