Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a database as a backend for JabRef library management #12708

Open
InAnYan opened this issue Mar 12, 2025 · 4 comments
Open

Use a database as a backend for JabRef library management #12708

InAnYan opened this issue Mar 12, 2025 · 4 comments

Comments

@InAnYan
Copy link
Member

InAnYan commented Mar 12, 2025

Is your suggestion for improvement related to a problem? Please describe.

Currently, JabRef struggles with libraries that have over 1000 entries (#10209).

Short reason and solution: JabRef stores all information in RAM. JabRef needs a mechanism to manage lots of data. This is a perfect use case for databases!

Longer issue description: look at how JabRef manages libraries and entries:

  1. Load .bib file.
  2. Convert .bib file into BibDatabase (with BibDatabaseContext) and BibEntry. Those are Java objects that are stored in RAM.
  3. Manipulate library with those objects.
  4. Save those objects into a .bib file.

So, JabRef's original philosophy is to be a file editor. However, when you have a giant library, you just don't have enough JVM heap. It is limited.

Describe the solution you'd like

JabRef should have a mechanism for managing a lot of data and use it for storing and manipulating libraries.

This is the purpose of databases! A DBMS will also cache data: a typical DBMS stores data in pages. Some pages are stored in RAM, some are offloaded to disk. This is a perfect solution for giant libraries, as now you are not limited to RAM space, but to space on your HDD/SDD!

Moreover, DBMS allows you to query data fast and powerful. Here is one place where SQL can be used: #10209 (comment). Search functionality is also a perfect case for databases.

Additional context

This is planned as a GSoC project. Beware, while this project is quite important for JabRef, it might turn out to be very complex.

We aim for a Relational DBMS like SQLite, DuckDB, Postgres. Especially, we want a database to be embedded.

In fact, we want Postgres to be our backend, as Postgres has powerful capabilities for search. It can be used as an embedded database, actually; checkout this library: https://github.com/zonkyio/embedded-postgres.

Here are some materials for this project:

@github-project-automation github-project-automation bot moved this to Normal priority in Prioritization Mar 12, 2025
@subhramit subhramit moved this from Normal priority to High priority in Prioritization Mar 12, 2025
@koppor
Copy link
Member

koppor commented Mar 12, 2025

@koppor
Copy link
Member

koppor commented Mar 12, 2025

In last year's GSoC we opted for Postgres, because it has many plugins, especially for fast regular expression search so that the DBMS does the regex indexing and resolving - and not the client.

@ryan-carpenter
Copy link

XTDB? There is so much more to do with literature than a bibtex/biblatex schema covers, and immutable data would enable some great opportunities. Examples include a traceable history of screening or detecting duplicates from the past (recognising previously imported entries even after they have changed).

@koppor
Copy link
Member

koppor commented Mar 16, 2025

Since this is about memory saving, more things need to be considered:

Loading of a .bib file

  • If the bib file is read with "plain" JabRef, strings might be in memory, too
  • latex-to-unicode and unicode-to-latex (for search) is done in the DB, not in Java (hopefully --> stored procedure)

Presenting data to JabRef

  • Tableviews might help here to gain speed and reduce memory consumption on JabRef's side.

Saving of a .bib file

  • If the bib file is saved with "plain" JabRef, strings might be in memory, too
  • Postgres should offer some pre-formatting for saving (e.g., views, stored procedure, ...)
  • Maybe, a "simple" table view also helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: High priority
Development

No branches or pull requests

4 participants