You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We appreciate the author's dedicated efforts in compiling research on cybersecurity LLMs, which is highly beneficial to the entire field, including our work!
We hope the author would consider incorporating our latest work 🤗. We have open-sourced the first large-scale cybersecurity pretraining dataset. In addition, we also provide specialized datasets for cybersecurity IFT, reasoning with reflection for distillation, and the cybersecurity LLMs trained with them.
Primus-Seed: Manually collected high-quality cybersecurity texts, including MITRE, Wikipedia, blogs, books, etc.
Primus-FineWeb: A cybersecurity text classifier was trained to filter 2.57B tokens from FineWeb (a refined version of Common Crawl).
Primus-Instruct: Includes approximately 1K QA pairs covering common cybersecurity business scenarios.
Primus-Reasoning: Includes reasoning and reflection data generated by o1-preview on cybersecurity tasks for distillation.
Cybersecurity LLMs:
Llama-Primus-Base: Based on Llama-3.1-8B-Instruct, continually pretrained on 2.77B tokens of cybersecurity text (Primus-Seed and Primus-FineWeb), achieving a 🚀15.88% improvement in the aggregated score across multiple cybersecurity benchmarks.
Llama-Primus-Merged: While maintaining nearly the same instruction-following capability as Llama-3.1-8B-Instruct, achieving a 🚀14.84% improvement across multiple cybersecurity benchmarks.
Llama-Primus-Reasoning: First cybersecurity reasoning model! Distilled on reasoning and reflection data from o1-preview for cybersecurity tasks (Primus-Reasoning), achieving a 🚀10% improvement on CISSP.
We look forward to your feedback and hope this can contribute to advancing cybersecurity AI community! 🚀
The text was updated successfully, but these errors were encountered:
Thank you for your kind words and for sharing your impactful work with us! We've noticed your paper and are impressed by your contributions. This aligns perfectly with our mission to document advancements in cybersecurity LLMs.
We’ll update our literature review section by the end of this month to highlight recent works, and we’ll ensure your research is included in this update.
Hi,
We appreciate the author's dedicated efforts in compiling research on cybersecurity LLMs, which is highly beneficial to the entire field, including our work!
We hope the author would consider incorporating our latest work 🤗. We have open-sourced the first large-scale cybersecurity pretraining dataset. In addition, we also provide specialized datasets for cybersecurity IFT, reasoning with reflection for distillation, and the cybersecurity LLMs trained with them.
📄 Paper: https://arxiv.org/abs/2502.11191
🤗 Hugging Face: https://huggingface.co/collections/trendmicro-ailab/primus-67b1fd27052b802b4af9d243
What We Do?
Datasets:
Cybersecurity LLMs:
We look forward to your feedback and hope this can contribute to advancing cybersecurity AI community! 🚀
The text was updated successfully, but these errors were encountered: