Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SubgraphRAG model and example #10000

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft

Conversation

Kh4L
Copy link
Contributor

@Kh4L Kh4L commented Feb 4, 2025

This PR adds a E2E implementation of Simple Is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation

Reproducing

to add

Results

Metrics with topk=100 and model_name=meta-llama/Llama-3.1-8B

==================================================
Evaluation Metrics (1638 samples):
--------------------------------------------------
Hit (Any): 0.8077 (1323/1638)
Hit@1:     0.7613 (1247/1638)
Precision: 0.7062
Recall:    0.6322
F1 Score:  0.5994
==================================================

Ablation study, no triplets (question only): Hit@1: 0.3547

Bigger LLM (meta-llama/Llama-3.1-70B-Instruct) doesn't improve the hit rate: Hit@1: 0.7608

This was referenced Feb 28, 2025
Copy link
Contributor

@puririshi98 puririshi98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model file needs type defs and need to make CI green but overall looks good in principle. also please include comparison between webQSP for g_retriever.py vs subgraphrag.py

@@ -264,11 +494,17 @@ class WebQSPDataset(KGQABaseDataset):
(default: :obj:`False`)
use_pcst (bool, optional): Whether to preprocess the dataset's graph
with PCST or return the full graphs. (default: :obj:`True`)
subgraphrag (bool, optional): Whether to preprocess the dataset
into the format expected by SubgraphRAG. The dataset the full
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The dataset the full..." needs a rewrite

@@ -285,8 +521,14 @@ class CWQDataset(KGQABaseDataset):
(default: :obj:`False`)
use_pcst (bool, optional): Whether to preprocess the dataset's graph
with PCST or return the full graphs. (default: :obj:`True`)
subgraphrag (bool, optional): Whether to preprocess the dataset
into the format expected by SubgraphRAG. The dataset the full
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here



def reason(pred_dict, model_name, K_triplets, max_tokens=4096):
llm = LLM(model_name=model_name, backend='openai')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs NIM api key passed, probs pass in through argparser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants