Use a database as a backend for JabRef library management #12708

InAnYan · 2025-03-12T10:09:28Z

Is your suggestion for improvement related to a problem? Please describe.

Currently, JabRef struggles with libraries that have over 1000 entries (#10209).

Short reason and solution: JabRef stores all information in RAM. JabRef needs a mechanism to manage lots of data. This is a perfect use case for databases!

Longer issue description: look at how JabRef manages libraries and entries:

Load .bib file.
Convert .bib file into BibDatabase (with BibDatabaseContext) and BibEntry. Those are Java objects that are stored in RAM.
Manipulate library with those objects.
Save those objects into a .bib file.

So, JabRef's original philosophy is to be a file editor. However, when you have a giant library, you just don't have enough JVM heap. It is limited.

Describe the solution you'd like

JabRef should have a mechanism for managing a lot of data and use it for storing and manipulating libraries.

This is the purpose of databases! A DBMS will also cache data: a typical DBMS stores data in pages. Some pages are stored in RAM, some are offloaded to disk. This is a perfect solution for giant libraries, as now you are not limited to RAM space, but to space on your HDD/SDD!

Moreover, DBMS allows you to query data fast and powerful. Here is one place where SQL can be used: #10209 (comment). Search functionality is also a perfect case for databases.

Additional context

This is planned as a GSoC project. Beware, while this project is quite important for JabRef, it might turn out to be very complex.

We aim for a Relational DBMS like SQLite, DuckDB, Postgres. Especially, we want a database to be embedded.

In fact, we want Postgres to be our backend, as Postgres has powerful capabilities for search. It can be used as an embedded database, actually; checkout this library: https://github.com/zonkyio/embedded-postgres.

Here are some materials for this project:

Postgres: https://www.postgresql.org/.
Other databases you might consider (though, Postgres is preferable):
- DuckDB: https://duckdb.org/ -- seems promising too.
- SQLite: https://www.sqlite.org/.
- H2: https://www.h2database.com/html/main.html.
- HSQLDB: https://hsqldb.org/.
BibTeX and BibLaTeX (you can use this information to design the schema of the DB):
- Internals of BibTeX: https://polish-mirror.evolution-host.com/ctan/biblio/bibtex/base/btxdoc.pdf.
- Internals of BibLaTeX: https://mirrors.ibiblio.org/CTAN/macros/latex/contrib/biblatex/doc/biblatex.pdf.
How Zotero internally stores data: https://github.com/zotero/zotero/blob/main/resource/schema/userdata.sql.
Use Postgres as an embedded database: https://github.com/zonkyio/embedded-postgres.
Take a look at JabRef's code:
- Search functionality: https://github.com/JabRef/jabref/blob/main/src/main/java/org/jabref/model/search/PostgreConstants.java#L6. (It already uses embedded Postgres).
- Shared database: https://github.com/JabRef/jabref/blob/main/src/main/java/org/jabref/logic/shared/PostgreSQLProcessor.java (schemas, etc.)

The text was updated successfully, but these errors were encountered:

koppor · 2025-03-12T10:46:45Z

Regarding the de-coupling of the UI with the data, starting points are

koppor · 2025-03-12T10:47:25Z

In last year's GSoC we opted for Postgres, because it has many plugins, especially for fast regular expression search so that the DBMS does the regex indexing and resolving - and not the client.

ryan-carpenter · 2025-03-14T11:22:03Z

XTDB? There is so much more to do with literature than a bibtex/biblatex schema covers, and immutable data would enable some great opportunities. Examples include a traceable history of screening or detecting duplicates from the past (recognising previously imported entries even after they have changed).

koppor · 2025-03-16T22:05:22Z

Since this is about memory saving, more things need to be considered:

Loading of a .bib file

If the bib file is read with "plain" JabRef, strings might be in memory, too
latex-to-unicode and unicode-to-latex (for search) is done in the DB, not in Java (hopefully --> stored procedure)

Presenting data to JabRef

Tableviews might help here to gain speed and reduce memory consumption on JabRef's side.

Saving of a .bib file

If the bib file is saved with "plain" JabRef, strings might be in memory, too
Postgres should offer some pre-formatting for saving (e.g., views, stored procedure, ...)
Maybe, a "simple" table view also helps

subhramit added project: gsoc dev: data-model labels Mar 12, 2025

subhramit added this to Prioritization Mar 12, 2025

github-project-automation bot moved this to Normal priority in Prioritization Mar 12, 2025

subhramit moved this from Normal priority to High priority in Prioritization Mar 12, 2025

koppor added the dev: performance label Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a database as a backend for JabRef library management #12708

Use a database as a backend for JabRef library management #12708

InAnYan commented Mar 12, 2025 •

edited

Loading

koppor commented Mar 12, 2025

koppor commented Mar 12, 2025

ryan-carpenter commented Mar 14, 2025

koppor commented Mar 16, 2025 •

edited

Loading

Use a database as a backend for JabRef library management #12708

Use a database as a backend for JabRef library management #12708

Comments

InAnYan commented Mar 12, 2025 • edited Loading

koppor commented Mar 12, 2025

koppor commented Mar 12, 2025

ryan-carpenter commented Mar 14, 2025

koppor commented Mar 16, 2025 • edited Loading

Loading of a .bib file

Presenting data to JabRef

Saving of a .bib file

InAnYan commented Mar 12, 2025 •

edited

Loading

koppor commented Mar 16, 2025 •

edited

Loading