-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create evals #11
Comments
are you planning to include Ollama models in the evaluation? Since their performance metrics would depend heavily on local hardware, we might need a different approach compared to API-based models |
I was thinking starting with the hosted models, since as you correctly point out local inference would need it's own evals. The tricky part here is to find a good dataset that is accepted as a benchmark. Any ideas? |
After a quick search, I found Frames and RewardBench |
I would go with frames |
Would be nice to have an eval pipeline to continuously measure performance/cost gains over a set of LLMs and documents.
The text was updated successfully, but these errors were encountered: