Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite journal abbrevation repository #12571

Open
koppor opened this issue Feb 25, 2025 · 13 comments · May be fixed by #12606
Open

Rewrite journal abbrevation repository #12571

koppor opened this issue Feb 25, 2025 · 13 comments · May be fixed by #12606
Assignees
Labels
📍 Assigned Assigned by assign-issue-action (or manually assigned) dev: performance good second issue Issues that involve a tour of two or three interweaved components in JabRef 📌 Pinned

Comments

@koppor
Copy link
Member

koppor commented Feb 25, 2025

Currently, we have several maps in memory. We should use MV Store for maps!

Main reasons:

  • 1 second is used for initialiatzion
  • 500 MB of RAM are used for abbreviations

Image

Image

Task

Rewrite the whole org.jabref.logic.journals.JournalAbbreviationRepository class. Only maps of MVStore should be used: MVStore has caching itself, no need to build up own data structures.

@koppor koppor added dev: performance good second issue Issues that involve a tour of two or three interweaved components in JabRef labels Feb 25, 2025
@github-project-automation github-project-automation bot moved this to Normal priority in Prioritization Feb 25, 2025
@koppor koppor moved this from Normal priority to High priority in Prioritization Feb 25, 2025
@marwanemad07
Copy link
Contributor

/assign-me

@github-actions github-actions bot added the 📍 Assigned Assigned by assign-issue-action (or manually assigned) label Feb 26, 2025
Copy link
Contributor

👋 Hey @marwanemad07, thank you for your interest in this issue! 🎉

We're excited to have you on board. Start by exploring our Contributing guidelines, and don't forget to check out our workspace setup guidelines to get started smoothly.

In case you encounter failing tests during development, please check our developer FAQs!

Having any questions or issues? Feel free to ask here on GitHub. Need help setting up your local workspace? Join the conversation on JabRef's Gitter chat. And don't hesitate to open a (draft) pull request early on to show the direction it is heading towards. This way, you will receive valuable feedback.

Happy coding! 🚀

⏳ Please note, you will be automatically unassigned if the issue isn't closed within 45 days (by 12 April 2025). A maintainer can also add the "📌 Pinned"" label to prevent automatic unassignment.

@marwanemad07
Copy link
Contributor

marwanemad07 commented Mar 2, 2025

Can you tell me the way u profiled this constructor? I tested my solution using Runtime runtime = Runtime.getRuntime(); for the memory and long startTime = System.nanoTime(); for the time, I got the memory to be around 80MB in the initialization and the time almost the same because I iterate over the FullToAbbreviation map and move the abbreviations to other maps of MVStore.
Also, I think I need to create these maps in org.jabref.cli.JournalListMvGenerator to be generated once and saved in the H2 database, Can you confirm that?

@koppor
Copy link
Member Author

koppor commented Mar 2, 2025

I got the memory to be around 80MB in the initialization a

You seem to have a pretty nice JRE. Maybe you profiled the wrong main class? Are you sure you benchmarked org.jabref.Launcher#main?

Did you also use the YourKit profiler or something else?

@koppor
Copy link
Member Author

koppor commented Mar 2, 2025

Also, I think I need to create these maps in org.jabref.cli.JournalListMvGenerator to be generated once and saved in the H2 database, Can you confirm that?

Yes. In addition to rewrite of org.jabref.logic.journals.JournalAbbreviationRepository.

@marwanemad07
Copy link
Contributor

Did you also use the YourKit profiler or something else?

I tried to use the Intellji profiler, but I faced some errors so I used the old way of measuring time and memory for a class but it doesn't seem accurate based on what you are saying. If you can share a resource or guide on properly profiling the project, I’d really appreciate it.

@koppor
Copy link
Member Author

koppor commented Mar 3, 2025

I tried to use the Intellji profiler, but I faced some errors

This is very vague.,,,

share a resource or guide on properly profiling the project, I’d really appreciate it.

See https://www.jetbrains.com/pages/intellij-idea-profiler/

Applied here:

  1. Image
  2. Wait until JabRef Starts
  3. Shut down JabRef
  4. Ctrl+N
  5. Type JARepo
  6. Press Enter
  7. See Image

@marwanemad07
Copy link
Contributor

I made changes that passed tests on my local and tested it manually, but it failed on unit tests of the CI environment it gives me java.lang.OutOfMemoryError. I profiled the JournalListMvGenerator and I found it took 1GB of memory then I changed the cache size of the store and this reduced the memory to 650MB.
I'm thinking of another solution to create the maps in a different way but until then, if you can help me to assure that the error is from which class, I'd appreciate this.

@ungerts
Copy link
Contributor

ungerts commented Mar 5, 2025

I believe we should focus less on profiling the current solution and instead prioritize designing a more efficient one. Below is a list of performance-related issues I have identified with the current solution:

  • It should not be necessary to iterate through all abbreviation instances during the initialization of the repository. Looping over approximately 100,000 instances consumes excessive time and memory.
  • The current implementation transforms the abbreviation instances into new instances of the same class. Additionally, the abbreviation constructor performs three string operations per abbreviation. Ideally, the store should return abbreviation instances in the correct format without requiring transformation. Especially considering that all 100,000 instances undergo this process.
  • Objects should not be cached in the heap, as the cache is too large.
  • The hash maps function like a self-implemented database index. If using MVStore for this functionality is not an option, we should consider alternatives such as a (file-based) database or a Lucene index.

@marwanemad07
Copy link
Contributor

  • For the first point I moved this part to another class that will be called when seeding data.
  • These transformations are due to the Abbreviation class having two transient fields (name and dotlessAbreviation), I tried to remove them since I thought they were redundant and we could save them on the disk, but I failed on the unit test. So I tried another solution in which these fields are transient again.
  • I was thinking of reducing the cache size of the MVStore to 32MB or less.
  • I agree with using a file-based database and creating indexes on the columns.

@marwanemad07
Copy link
Contributor

@koppor Do you remember why you used a transient name in the Abbreviation.java in this commit. I gave it more than two hours and I read other commits, and all I get based on this comment is that you needed to reduce the size of the journal-list.mv which is now generated locally.

Also, should I use the MV store for this issue for now or use a file-based database as recommended above?

@koppor
Copy link
Member Author

koppor commented Mar 17, 2025

Also, should I use the MV store for this issue for now or use a file-based database as recommended above?

It was conditioned with an if. We should first get the MV implementation running.

@koppor
Copy link
Member Author

koppor commented Mar 17, 2025

@koppor Do you remember why you used a transient name in the Abbreviation.java in this commit

Yeah, you are right. Think, I thought somehow wrong when doing this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📍 Assigned Assigned by assign-issue-action (or manually assigned) dev: performance good second issue Issues that involve a tour of two or three interweaved components in JabRef 📌 Pinned
Projects
Status: High priority
Development

Successfully merging a pull request may close this issue.

3 participants