Consider Including New Open Cybersecurity LLM & Datasets #4

yaoching0 · 2025-02-21T07:09:17Z

Hi,

We appreciate the author's dedicated efforts in compiling research on cybersecurity LLMs, which is highly beneficial to the entire field, including our work!

We hope the author would consider incorporating our latest work 🤗. We have open-sourced the first large-scale cybersecurity pretraining dataset. In addition, we also provide specialized datasets for cybersecurity IFT, reasoning with reflection for distillation, and the cybersecurity LLMs trained with them.

📄 Paper: https://arxiv.org/abs/2502.11191
🤗 Hugging Face: https://huggingface.co/collections/trendmicro-ailab/primus-67b1fd27052b802b4af9d243

What We Do?

Datasets:

Primus-Seed: Manually collected high-quality cybersecurity texts, including MITRE, Wikipedia, blogs, books, etc.
Primus-FineWeb: A cybersecurity text classifier was trained to filter 2.57B tokens from FineWeb (a refined version of Common Crawl).
Primus-Instruct: Includes approximately 1K QA pairs covering common cybersecurity business scenarios.
Primus-Reasoning: Includes reasoning and reflection data generated by o1-preview on cybersecurity tasks for distillation.

Cybersecurity LLMs:

Llama-Primus-Base: Based on Llama-3.1-8B-Instruct, continually pretrained on 2.77B tokens of cybersecurity text (Primus-Seed and Primus-FineWeb), achieving a 🚀15.88% improvement in the aggregated score across multiple cybersecurity benchmarks.
Llama-Primus-Merged: While maintaining nearly the same instruction-following capability as Llama-3.1-8B-Instruct, achieving a 🚀14.84% improvement across multiple cybersecurity benchmarks.
Llama-Primus-Reasoning: First cybersecurity reasoning model! Distilled on reasoning and reflection data from o1-preview for cybersecurity tasks (Primus-Reasoning), achieving a 🚀10% improvement on CISSP.

We look forward to your feedback and hope this can contribute to advancing cybersecurity AI community! 🚀

tmylla · 2025-02-21T08:48:17Z

Thank you for your kind words and for sharing your impactful work with us! We've noticed your paper and are impressed by your contributions. This aligns perfectly with our mission to document advancements in cybersecurity LLMs.

We’ll update our literature review section by the end of this month to highlight recent works, and we’ll ensure your research is included in this update.

Keep up the fantastic work! 🚀

yaoching0 · 2025-02-21T13:17:25Z

Thank you so much🥰, tmylla!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider Including New Open Cybersecurity LLM & Datasets #4

Consider Including New Open Cybersecurity LLM & Datasets #4

yaoching0 commented Feb 21, 2025 •

edited

Loading

tmylla commented Feb 21, 2025

yaoching0 commented Feb 21, 2025

Consider Including New Open Cybersecurity LLM & Datasets #4

Consider Including New Open Cybersecurity LLM & Datasets #4

Comments

yaoching0 commented Feb 21, 2025 • edited Loading

What We Do?

Datasets:

Cybersecurity LLMs:

tmylla commented Feb 21, 2025

yaoching0 commented Feb 21, 2025

yaoching0 commented Feb 21, 2025 •

edited

Loading