Number your segments in docs and prompt LLM to output numbers of relevant segments instead of actual relevant text to increase speed and reduce cost and increase extraction quality #9

shreyas-shinde · 2025-02-05T10:43:11Z

You can check the how to here https://www.reddit.com/r/MachineLearning/comments/17k6iha/d_relevance_extraction_in_rag_pipelines/?share_id=j0imtSJRhwS2Jz9gDYplj&utm_content=2&utm_medium=ios_app&utm_name=ioscss&utm_source=share&utm_term=1

shreyas-shinde · 2025-02-05T10:44:11Z

FYI: I have actually tried the numbered approach and have seen cost, latency go down because of output token reduction and also found that the eval score of downstream task (QA in my case) went up.

homanp · 2025-02-05T15:32:34Z

@shreyas-shinde feel free to contribute!

homanp · 2025-02-07T07:04:41Z

@shreyas-shinde
So is the process to:

Have the LLM Segment the docs
Have the LLM extract relevant segments

shreyas-shinde · 2025-02-07T08:03:42Z

@homanp

Segment the docs -> not done by LLM simply some python code. A reference algo can be https://github.com/langroid/langroid/blob/main/langroid/parsing/utils.py#L135 . Maybe we can give an option to user if they want to use LLM for this.
LLM extracts relevant segment numbers given the context (this could be split across multiple parallel LLMs given the context length) ref prompt https://github.com/langroid/langroid/blob/main/langroid/agent/special/relevance_extractor_agent.py#L75
Get the segment text based on the numbers in 2 by simple regex ref: https://github.com/langroid/langroid/blob/main/langroid/parsing/utils.py#L296

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Number your segments in docs and prompt LLM to output numbers of relevant segments instead of actual relevant text to increase speed and reduce cost and increase extraction quality #9

Number your segments in docs and prompt LLM to output numbers of relevant segments instead of actual relevant text to increase speed and reduce cost and increase extraction quality #9

shreyas-shinde commented Feb 5, 2025

shreyas-shinde commented Feb 5, 2025

homanp commented Feb 5, 2025

homanp commented Feb 7, 2025

shreyas-shinde commented Feb 7, 2025

Number your segments in docs and prompt LLM to output numbers of relevant segments instead of actual relevant text to increase speed and reduce cost and increase extraction quality #9

Number your segments in docs and prompt LLM to output numbers of relevant segments instead of actual relevant text to increase speed and reduce cost and increase extraction quality #9

Comments

shreyas-shinde commented Feb 5, 2025

shreyas-shinde commented Feb 5, 2025

homanp commented Feb 5, 2025

homanp commented Feb 7, 2025

shreyas-shinde commented Feb 7, 2025