Add SubgraphRAG model and example #10000

Kh4L · 2025-02-04T12:03:22Z

This PR adds a E2E implementation of Simple Is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation

Reproducing

to add

Results

Metrics with topk=100 and model_name=meta-llama/Llama-3.1-8B

==================================================
Evaluation Metrics (1638 samples):
--------------------------------------------------
Hit (Any): 0.8077 (1323/1638)
Hit@1:     0.7613 (1247/1638)
Precision: 0.7062
Recall:    0.6322
F1 Score:  0.5994
==================================================

Ablation study, no triplets (question only): Hit@1: 0.3547

Bigger LLM (meta-llama/Llama-3.1-70B-Instruct) doesn't improve the hit rate: Hit@1: 0.7608

for more information, see https://pre-commit.ci

puririshi98

model file needs type defs and need to make CI green but overall looks good in principle. also please include comparison between webQSP for g_retriever.py vs subgraphrag.py

puririshi98 · 2025-02-28T18:37:34Z

torch_geometric/datasets/web_qsp_dataset.py

@@ -264,11 +494,17 @@ class WebQSPDataset(KGQABaseDataset):
            (default: :obj:`False`)
        use_pcst (bool, optional): Whether to preprocess the dataset's graph
            with PCST or return the full graphs. (default: :obj:`True`)
+        subgraphrag (bool, optional): Whether to preprocess the dataset
+            into the format expected by SubgraphRAG. The dataset the full


"The dataset the full..." needs a rewrite

puririshi98 · 2025-02-28T18:37:44Z

torch_geometric/datasets/web_qsp_dataset.py

@@ -285,8 +521,14 @@ class CWQDataset(KGQABaseDataset):
            (default: :obj:`False`)
        use_pcst (bool, optional): Whether to preprocess the dataset's graph
            with PCST or return the full graphs. (default: :obj:`True`)
+        subgraphrag (bool, optional): Whether to preprocess the dataset
+            into the format expected by SubgraphRAG. The dataset the full


puririshi98 · 2025-02-28T18:39:40Z

examples/llm/subgraphrag.py

+
+
+def reason(pred_dict, model_name, K_triplets, max_tokens=4096):
+    llm = LLM(model_name=model_name, backend='openai')


needs NIM api key passed, probs pass in through argparser

Kh4L and others added 2 commits February 28, 2025 00:18

Add SubgraphRAG model and example

70cb876

[pre-commit.ci] auto fixes from pre-commit.com hooks

caf7d7a

for more information, see https://pre-commit.ci

Kh4L force-pushed the add_subgraphrag branch from 7bb45bf to caf7d7a Compare February 27, 2025 15:18

Kh4L and others added 4 commits February 28, 2025 01:08

update dataset and metrics

36d9783

changelog

642658e

[pre-commit.ci] auto fixes from pre-commit.com hooks

f49cc90

for more information, see https://pre-commit.ci

Merge branch 'master' into add_subgraphrag

9570b22

This was referenced Feb 28, 2025

GNN-RAG with PyG #9852

Closed

CS224W ReaRev GNN-RAG #9857

Closed

puririshi98 requested changes Feb 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SubgraphRAG model and example #10000

Add SubgraphRAG model and example #10000

Kh4L commented Feb 4, 2025 •

edited

Loading

puririshi98 left a comment

puririshi98 Feb 28, 2025

puririshi98 Feb 28, 2025

puririshi98 Feb 28, 2025



		def reason(pred_dict, model_name, K_triplets, max_tokens=4096):
		llm = LLM(model_name=model_name, backend='openai')

Add SubgraphRAG model and example #10000

Are you sure you want to change the base?

Add SubgraphRAG model and example #10000

Conversation

Kh4L commented Feb 4, 2025 • edited Loading

Reproducing

Results

puririshi98 left a comment

Choose a reason for hiding this comment

puririshi98 Feb 28, 2025

Choose a reason for hiding this comment

puririshi98 Feb 28, 2025

Choose a reason for hiding this comment

puririshi98 Feb 28, 2025

Choose a reason for hiding this comment

Kh4L commented Feb 4, 2025 •

edited

Loading