Skip to content

Commit 30dcccc

Browse files
updating docs for final merge
1 parent 95ba46d commit 30dcccc

File tree

3 files changed

+7
-381
lines changed

3 files changed

+7
-381
lines changed

README.md

+2-362
Original file line numberDiff line numberDiff line change
@@ -16,365 +16,5 @@ SurrealML is a feature that allows you to store trained machine learning models
1616

1717
## New Clients
1818

19-
We are removing the `PyO3` bindings and just using raw C bindings for the `surrealml-core` library. This will simplfy builds and also enable clients in other languges to use the `surrealml-core` library. The `c-wrapper` module can be found in the `modules/c-wrapper` directory. The new clients can be found in the `clients` directory.
20-
21-
## Installation
22-
23-
To install SurrealML, make sure you have Python installed. Then, install the `SurrealML` library and either `PyTorch` or
24-
`SKLearn`, based on your model choice. You can install the package with both `PyTorch` and `SKLearn` with the command
25-
below:
26-
27-
```
28-
pip install "git+https://github.com/surrealdb/surrealml#egg=surrealml[sklearn,torch]"
29-
```
30-
31-
If you want to use `SurrealML` with `sklearn` you will need the following installation:
32-
33-
```bash
34-
pip install "git+https://github.com/surrealdb/surrealml#egg=surrealml[sklearn]"
35-
```
36-
37-
For `PyTorch`:
38-
39-
```bash
40-
pip install "git+https://github.com/surrealdb/surrealml#egg=surrealml[torch]"
41-
```
42-
43-
For `Tensorflow`:
44-
45-
```bash
46-
pip install "git+https://github.com/surrealdb/surrealml#egg=surrealml[tensorflow]"
47-
```
48-
49-
After that, you can train your model and save it in the SurrealML format.
50-
51-
## Compilation config
52-
53-
If nothing is configured the crate will compile the ONNX runtime into the binary. This is the default behaviour. However, you have 2 more options:
54-
55-
- If you want to use an ONNX runtime that is installed on your system, you can set the environment variable `ONNXRUNTIME_LIB_PATH` before you compile the crate. This will make the crate use the ONNX runtime that is installed on your system.
56-
- If you want to statically compile the library, you can download it from https://github.com/surrealdb/onnxruntime-build/releases/tag/v1.16.3 and then build the crate this way:
57-
58-
```
59-
$ tar xvf <onnx-archive-file> -C extract-dir
60-
$ ORT_STRATEGY=system ORT_LIB_LOCATION=$(pwd)/extract-dir cargo build
61-
```
62-
63-
## Quick start with Sk-learn
64-
65-
Sk-learn models can also be converted and stored in the `.surml` format enabling developers to load them in any
66-
python version as we are not relying on pickle. Metadata in the file also enables other users of the model to use them
67-
out of the box without having to worry about the normalisation of the data or getting the right inputs in order. You
68-
will also be able to load your sk-learn models in Rust and run them meaning you can use them in your SurrealDB server.
69-
Saving a model is as simple as the following:
70-
71-
```python
72-
from sklearn.linear_model import LinearRegression
73-
from surrealml import SurMlFile, Engine
74-
from surrealml.model_templates.datasets.house_linear import HOUSE_LINEAR # click on this HOUSE_LINEAR to see the data
75-
76-
# train the model
77-
model = LinearRegression()
78-
model.fit(HOUSE_LINEAR["inputs"], HOUSE_LINEAR["outputs"])
79-
80-
# package and save the model
81-
file = SurMlFile(model=model, name="linear", inputs=HOUSE_LINEAR["inputs"], engine=Engine.SKLEARN)
82-
83-
# add columns in the order of the inputs to map dictionaries passed in to the model
84-
file.add_column("squarefoot")
85-
file.add_column("num_floors")
86-
87-
# add normalisers for the columns
88-
file.add_normaliser("squarefoot", "z_score", HOUSE_LINEAR["squarefoot"].mean(), HOUSE_LINEAR["squarefoot"].std())
89-
file.add_normaliser("num_floors", "z_score", HOUSE_LINEAR["num_floors"].mean(), HOUSE_LINEAR["num_floors"].std())
90-
file.add_output("house_price", "z_score", HOUSE_LINEAR["outputs"].mean(), HOUSE_LINEAR["outputs"].std())
91-
92-
# save the file
93-
file.save(path="./linear.surml")
94-
95-
# load the file
96-
new_file = SurMlFile.load(path="./linear.surml", engine=Engine.SKLEARN)
97-
98-
# Make a prediction (both should be the same due to the perfectly correlated example data)
99-
print(new_file.buffered_compute(value_map={"squarefoot": 5, "num_floors": 6}))
100-
print(new_file.raw_compute(input_vector=[5, 6]))
101-
```
102-
103-
## Raw ONNX models
104-
105-
You may not have a model that is supported by the `surrealml` library. However, if you can convert the model into ONNX
106-
format by yourself, you can merely use the `ONNX` engine when saving your model with the following code:
107-
108-
```python
109-
file = SurMlFile(model=raw_onnx_model, name="linear", inputs=HOUSE_LINEAR["inputs"], engine=Engine.ONNX)
110-
```
111-
112-
## Python tutorial using Pytorch
113-
114-
First we need to have one script where we create and store the model. In this example we will merely do a linear regression model
115-
to predict the house price using the number of floors and the square feet.
116-
117-
### Defining the data
118-
119-
We can create some fake data with the following python code:
120-
121-
```python
122-
import torch
123-
import torch.nn as nn
124-
import torch.optim as optim
125-
import numpy as np
126-
127-
128-
squarefoot = np.array([1000, 1200, 1500, 1800, 2000, 2200, 2500, 2800, 3000, 3200], dtype=np.float32)
129-
num_floors = np.array([1, 1, 1.5, 1.5, 2, 2, 2.5, 2.5, 3, 3], dtype=np.float32)
130-
house_price = np.array([200000, 230000, 280000, 320000, 350000, 380000, 420000, 470000, 500000, 520000], dtype=np.float32)
131-
```
132-
133-
We then get the parameters to perform normalisation to get better convergance with the following"
134-
135-
```python
136-
squarefoot_mean = squarefoot.mean()
137-
squarefoot_std = squarefoot.std()
138-
num_floors_mean = num_floors.mean()
139-
num_floors_std = num_floors.std()
140-
house_price_mean = house_price.mean()
141-
house_price_std = house_price.std()
142-
```
143-
144-
We then normalise our data with the code below:
145-
146-
```python
147-
squarefoot = (squarefoot - squarefoot.mean()) / squarefoot.std()
148-
num_floors = (num_floors - num_floors.mean()) / num_floors.std()
149-
house_price = (house_price - house_price.mean()) / house_price.std()
150-
```
151-
152-
We then create our tensors so they can be loaded into our model and stack it together with the following:
153-
154-
```python
155-
squarefoot_tensor = torch.from_numpy(squarefoot)
156-
num_floors_tensor = torch.from_numpy(num_floors)
157-
house_price_tensor = torch.from_numpy(house_price)
158-
159-
X = torch.stack([squarefoot_tensor, num_floors_tensor], dim=1)
160-
```
161-
162-
### Defining our model
163-
164-
We can now define our linear regression model with loss function and an optimizer with the code below:
165-
166-
```python
167-
# Define the linear regression model
168-
class LinearRegressionModel(nn.Module):
169-
def __init__(self):
170-
super(LinearRegressionModel, self).__init__()
171-
self.linear = nn.Linear(2, 1) # 2 input features, 1 output
172-
173-
def forward(self, x):
174-
return self.linear(x)
175-
176-
177-
# Initialize the model
178-
model = LinearRegressionModel()
179-
180-
# Define the loss function and optimizer
181-
criterion = nn.MSELoss()
182-
optimizer = optim.SGD(model.parameters(), lr=0.01)
183-
```
184-
185-
### Training our model
186-
187-
We are now ready to train our model on the data we have generated with 100 epochs with the following loop:
188-
189-
```python
190-
num_epochs = 1000
191-
for epoch in range(num_epochs):
192-
# Forward pass
193-
y_pred = model(X)
194-
195-
# Compute the loss
196-
loss = criterion(y_pred.squeeze(), house_price_tensor)
197-
198-
# Backward pass and optimization
199-
optimizer.zero_grad()
200-
loss.backward()
201-
optimizer.step()
202-
203-
# Print the progress
204-
if (epoch + 1) % 100 == 0:
205-
print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}")
206-
```
207-
208-
### Saving our `.surml` file
209-
210-
Our model is now trained and we need some example data to trace the model with the code below:
211-
212-
```python
213-
test_squarefoot = torch.tensor([2800, 3200], dtype=torch.float32)
214-
test_num_floors = torch.tensor([2.5, 3], dtype=torch.float32)
215-
test_inputs = torch.stack([test_squarefoot, test_num_floors], dim=1)
216-
```
217-
218-
We can now wrap our model in the `SurMlFile` object with the following code:
219-
220-
```python
221-
from surrealml import SurMlFile, Engine
222-
223-
file = SurMlFile(model=model, name="linear", inputs=inputs[:1], engine=Engine.PYTORCH)
224-
```
225-
226-
The name is optional but the inputs and model are essential. We can now add some meta data to the file such as our inputs and outputs with the following code, however meta data is not essential, it just helps with some types of computation:
227-
228-
```python
229-
file.add_column("squarefoot")
230-
file.add_column("num_floors")
231-
file.add_output("house_price", "z_score", house_price_mean, house_price_std)
232-
```
233-
234-
It must be stressed that the `add_column` order needs to be consistent with the input tensors that the model was trained on as these
235-
now act as key bindings to convert dictionary inputs into the model. We need to also add the normalisers for our column but these will
236-
be automatically mapped therefore we do not need to worry about the order they are inputed, again, normalisers are optional, you can
237-
normalise the data yourself:
238-
239-
```python
240-
file.add_normaliser("squarefoot", "z_score", squarefoot_mean, squarefoot_std)
241-
file.add_normaliser("num_floors", "z_score", num_floors_mean, num_floors_std)
242-
```
243-
244-
We then save the model with the following code:
245-
246-
```python
247-
file.save("./test.surml")
248-
```
249-
250-
### Loading our `.surml` file in Python
251-
252-
If you have followed the previous steps you should have a `.surml` file saved with all our meta data. We load it with the following code:
253-
254-
```python
255-
from surrealml import SurMlFile, Engine
256-
257-
new_file = SurMlFile.load("./test.surml", engine=Engine.PYTORCH)
258-
```
259-
260-
Our model is now loaded. We can now perform computations.
261-
262-
### Buffered computation in Python
263-
264-
This is where the computation utilises the data in the header. We can do this by merely passing in a dictionary as seen below:
265-
266-
```python
267-
print(new_file.buffered_compute({
268-
"squarefoot": 1.0,
269-
"num_floors": 2.0
270-
}))
271-
```
272-
273-
### Uploading our model to SurrealDB
274-
275-
We can upload our trained model with the following code:
276-
277-
```python
278-
url = "http://0.0.0.0:8000/ml/import"
279-
SurMlFile.upload(
280-
path="./linear_test.surml",
281-
url=url,
282-
chunk_size=36864,
283-
namespace="test",
284-
database="test",
285-
username="root",
286-
password="root"
287-
)
288-
```
289-
290-
### Running SurrealQL operations against our trained model
291-
292-
With this, we can perform SQL statements in our database. To test this, we can create the following rows:
293-
294-
```sql
295-
CREATE house_listing SET squarefoot_col = 500.0, num_floors_col = 1.0;
296-
CREATE house_listing SET squarefoot_col = 1000.0, num_floors_col = 2.0;
297-
CREATE house_listing SET squarefoot_col = 1500.0, num_floors_col = 3.0;
298-
299-
SELECT * FROM (
300-
SELECT
301-
*,
302-
ml::house-price-prediction<0.0.1>({
303-
squarefoot: squarefoot_col,
304-
num_floors: num_floors_col
305-
}) AS price_prediction
306-
FROM house_listing
307-
)
308-
WHERE price_prediction > 177206.21875;
309-
```
310-
311-
What is happening here is that we are feeding the columns from the table `house_listing` into a model we uploaded
312-
called `house-price-prediction` with a version of `0.0.1`. We then get the results of that trained ML model as the column
313-
`price_prediction`. We then use the calculated predictions to filter the rows giving us the following result:
314-
315-
```json
316-
[
317-
{
318-
"id": "house_listing:7bo0f35tl4hpx5bymq5d",
319-
"num_floors_col": 3,
320-
"price_prediction": 406534.75,
321-
"squarefoot_col": 1500
322-
},
323-
{
324-
"id": "house_listing:8k2ttvhp2vh8v7skwyie",
325-
"num_floors_col": 2,
326-
"price_prediction": 291870.5,
327-
"squarefoot_col": 1000
328-
}
329-
]
330-
```
331-
332-
### Loading our `.surml` file in Rust
333-
334-
We can now load our `.surml` file with the following code:
335-
336-
```rust
337-
use crate::storage::surml_file::SurMlFile;
338-
339-
let mut file = SurMlFile::from_file("./test.surml").unwrap();
340-
```
341-
342-
### Raw computation in Rust
343-
344-
You can have an empty header if you want. This makes sense if you're doing something novel, or complex such as convolutional neural networks
345-
for image processing. To perform a raw computation you can merely just do the following:
346-
347-
```rust
348-
file.model.set_eval();
349-
let x = Tensor::f_from_slice::<f32>(&[1.0, 2.0, 3.0, 4.0]).unwrap().reshape(&[2, 2]);
350-
let outcome = file.model.forward_ts(&[x]);
351-
println!("{:?}", outcome);
352-
```
353-
354-
However if you want to use the header you need to perform a buffered computer
355-
356-
### Buffered computation in Rust
357-
358-
This is where the computation utilises the data in the header. We can do this by wrapping our `File` struct in a `ModelComputation` struct
359-
with the code below:
360-
361-
```rust
362-
use crate::execution::compute::ModelComputation;
363-
364-
let computert_unit = ModelComputation {
365-
surml_file: &mut file
366-
};
367-
```
368-
369-
Now that we have this wrapper we can create a hashmap with values and keys that correspond to the key bindings. We can then pass this into
370-
a `buffered_compute` that maps the inputs and applies normalisation to those inputs if normalisation is present for that column with the
371-
following:
372-
373-
```rust
374-
let mut input_values = HashMap::new();
375-
input_values.insert(String::from("squarefoot"), 1.0);
376-
input_values.insert(String::from("num_floors"), 2.0);
377-
378-
let outcome = computert_unit.buffered_compute(&mut input_values);
379-
println!("{:?}", outcome);
380-
```
19+
We have removed `PyO3` for a raw dynamic C lib written in rust. This is how working with Python and we can also link this dynamic C lib to other languages such as JavaScript. The new `Python` client is housed in the `clients`
20+
directory. Please visit this for the updated installation and API docs.

0 commit comments

Comments
 (0)