Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SurrealML with state of the art models #66

Open
sFritsch09 opened this issue Feb 5, 2025 · 1 comment
Open

SurrealML with state of the art models #66

sFritsch09 opened this issue Feb 5, 2025 · 1 comment

Comments

@sFritsch09
Copy link

I am wondering if it is possible or planed in the near future to be able to upload local models like Llama 3.2 or Deepseek to import into SurrealDB by converting to surml.
Training a model with sklearn or pytorch is old tech! We can already train models in a much more profound way like unsloth or LlamaFactory or ZenML.

Why starting from scratch to train a model when I can train a strong model which is already pretty good on handling data.

@maxwellflitton
Copy link
Contributor

maxwellflitton commented Feb 6, 2025

It depends on what you want. For instance, if you're making decisions in finance or insurance, you need to ensure that you are adhering to regulations with backtesting and explainable weights. You're are going to want to use PyTorch, Sklearn, or Tensorflow for these. I myself apply ML at the London centre of bioengineering in surgical robotics and we very much use Pytorch, nothing else. A lot of people I know who are working professionally in ML for academia or industry in central London use PyTorch, Tensorflow, or Sklearn. If we look at the Google trends we can see that pytorch is still widely searched:

Image

Right now I am working on C lib wrappers so we can have better integration with other languages and better deployment. Initially, it makes sense to support the most widely used ML frameworks that are being professionally used, as they have established ecosystems, quality control methods, and the professionals using these frameworks can explain/trace the exact data passed into the model as they want to avoid having legal action and adhere to regulations.

That being said, we can offer support for something like Llama. Machine learning models are essentially math matrix operations where the weights are stored in onnx format which is essentially protobuf to represent the computational graph. We support raw onnx as you can see with the following link:

https://github.com/surrealdb/surrealml?tab=readme-ov-file#raw-onnx-models

onnx is the established standard for storing these computational graphs. This means that any serious machine learning model you come across should have an onnx format. Below is the microsoft repo that explains the theory behind converting Llama to onnx:

https://github.com/microsoft/Llama-2-Onnx

And below is documentation on how they accelerated inference with Llama in the onnxruntime:

https://onnxruntime.ai/blogs/accelerating-llama-2

The surml core engine uses the onnxruntime to execute the model in the database. So if you get an onnx representation of Llama you can run it on surrealML. I've also now looked at the code for unsloth and LlamaFactory , they're wrappers around pytorch. So instead of us maintaining interfaces around unsloth and LlamaFactory which I'm sure will change over time, you should be able to train your model using unsloth and LlamaFactory and convert to surml using the TORCH engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants