Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create evals #11

Open
homanp opened this issue Feb 7, 2025 · 4 comments
Open

Create evals #11

homanp opened this issue Feb 7, 2025 · 4 comments

Comments

@homanp
Copy link
Collaborator

homanp commented Feb 7, 2025

Would be nice to have an eval pipeline to continuously measure performance/cost gains over a set of LLMs and documents.

@g-hano
Copy link
Contributor

g-hano commented Feb 7, 2025

are you planning to include Ollama models in the evaluation? Since their performance metrics would depend heavily on local hardware, we might need a different approach compared to API-based models

@homanp
Copy link
Collaborator Author

homanp commented Feb 7, 2025

are you planning to include Ollama models in the evaluation? Since their performance metrics would depend heavily on local hardware, we might need a different approach compared to API-based models

I was thinking starting with the hosted models, since as you correctly point out local inference would need it's own evals.

The tricky part here is to find a good dataset that is accepted as a benchmark. Any ideas?

@g-hano
Copy link
Contributor

g-hano commented Feb 7, 2025

After a quick search, I found Frames and RewardBench

@homanp
Copy link
Collaborator Author

homanp commented Feb 7, 2025

After a quick search, I found Frames and RewardBench

I would go with frames

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants