-
Notifications
You must be signed in to change notification settings - Fork 0
Design exploration #2
base: master
Are you sure you want to change the base?
Changes from 17 commits
1345273
c602eec
7407331
a808405
866b1ed
0fa4e2b
6bc7a71
60e649b
1c4bcc4
5d03be2
2b0bbf6
d56a5a0
fbe9a66
6f1b906
f1a0312
d39a6e3
cbfa74c
afebfe1
7a2c4a9
f7f8f50
da9a9d7
834ffdd
9ffa8cd
dfe62f4
93ad90a
39b7379
6aa09e0
ace68b1
d473588
c494d3c
8616d44
2a54ddd
3d23c4b
a29de61
eef5f6f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
/target | ||
**/*.rs.bk | ||
Cargo.lock | ||
|
||
# IDEs | ||
.idea/ | ||
tags |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
[package] | ||
name = "linfa" | ||
version = "0.1.0" | ||
authors = ["LukeMathWalker <[email protected]>"] | ||
edition = "2018" | ||
|
||
[dependencies] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
use std::error; | ||
use std::iter; | ||
|
||
/// The basic `Model` trait. | ||
/// | ||
/// It is training-agnostic: a model takes an input and returns an output. | ||
/// | ||
/// There might be multiple ways to discover the best settings for every | ||
/// particular algorithm (e.g. training a logistic regressor using | ||
/// a pseudo-inverse matrix vs using gradient descent). | ||
/// It doesn't matter: the end result, the model, is a set of parameters. | ||
/// The way those parameter originated is an orthogonal concept. | ||
/// | ||
/// In the same way, it has no notion of loss or "correct" predictions. | ||
/// Those concepts are embedded elsewhere. | ||
pub trait Model { | ||
type Input; | ||
type Output; | ||
type Error: error::Error; | ||
|
||
fn predict(&self, inputs: &Self::Input) -> Result<Self::Output, Self::Error>; | ||
} | ||
|
||
/// One step closer to the peak. | ||
/// | ||
/// `Optimizer` is generic over a type `M` implementing the `Model` trait: `M` is used to | ||
/// constrain what type of inputs and targets are acceptable. | ||
/// | ||
/// `train` takes an instance of `M` as one of its inputs, `model`: it doesn't matter if `model` | ||
/// has been through several rounds of training before, or if it just came out of a `Blueprint` | ||
/// using `initialize` - it's consumed by `train` and a new model is returned. | ||
/// | ||
/// This means that there is no difference between one-shot training and incremental training. | ||
/// Furthermore, the optimizer doesn't have to "own" the model or know anything about its hyperparameters, | ||
/// because it never has to initialize it. | ||
pub trait Optimizer<M> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wording: |
||
where | ||
M: Model, | ||
{ | ||
type Error: error::Error; | ||
|
||
fn train( | ||
&self, | ||
inputs: &M::Input, | ||
targets: &M::Output, | ||
model: M, | ||
) -> Result<M, Self::Error>; | ||
} | ||
|
||
/// Where `Model`s are forged. | ||
/// | ||
/// `Blueprint`s are used to specify how to build and initialize an instance of the model type `M`. | ||
/// | ||
/// For the same model type `M`, nothing prevents a user from providing more than one `Blueprint`: | ||
/// multiple initialization strategies can somethings be used to be build the same model type. | ||
/// | ||
/// Each of these strategies can take different (hyper)parameters, even though they return an | ||
/// instance of the same model type in the end. | ||
/// | ||
/// The initialization procedure could be data-dependent, hence the signature of `initialize`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a bit concerned about potential user confusion about what should be put in the What would be an example of a workflow with a data-dependent initialization? Are there any other options for handling that initialization? |
||
pub trait Blueprint<M> | ||
where | ||
M: Model, | ||
{ | ||
type Error: error::Error; | ||
|
||
fn initialize(&self, inputs: &M::Input, targets: &M::Output) -> Result<M, Self::Error>; | ||
} | ||
|
||
/// Any `Model` can be used as `Blueprint`, as long as it's clonable: | ||
/// it returns a clone of itself when `initialize` is called, ignoring the data. | ||
impl<M> Blueprint<M> for M | ||
where | ||
M: Model + Clone, | ||
{ | ||
type Error = M::Error; | ||
|
||
fn initialize(&self, _inputs: &M::Input, _targets: &M::Output) -> Result<M, Self::Error> | ||
{ | ||
Ok(self.clone()) | ||
} | ||
} | ||
|
||
/// Where you need to go meta (hyperparameters!). | ||
/// | ||
/// `BlueprintGenerator`s can be used to explore different combination of hyperparameters | ||
/// when you are working with a certain `Model` type. | ||
/// | ||
/// `BlueprintGenerator::generate` returns, if successful, an `IntoIterator` type | ||
/// yielding instances of blueprints. | ||
pub trait BlueprintGenerator<B, M> | ||
where | ||
B: Blueprint<M>, | ||
M: Model | ||
{ | ||
type Error: error::Error; | ||
type Output: IntoIterator<Item=B>; | ||
|
||
fn generate(&self) -> Result<Self::Output, Self::Error>; | ||
} | ||
|
||
/// Any `Blueprint` can be used as `BlueprintGenerator`, as long as it's clonable: | ||
/// it returns an iterator with a single element, a clone of itself. | ||
impl<B, M> BlueprintGenerator<B, M> for B | ||
where | ||
B: Blueprint<M> + Clone, | ||
M: Model, | ||
{ | ||
type Error = B::Error; | ||
type Output = iter::Once<B>; | ||
|
||
fn generate(&self) -> Result<Self::Output, Self::Error> | ||
{ | ||
Ok(iter::once(self.clone())) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to think whether
Input
andOutput
should be associated types or struct generics (Model<Input, Output>
). It's definitely possible that a trained model could be implemented to provide predictions over multiple types of input / output. For instance, we could have a model defined over ndarray input, or dataframe input, or even aVec<T>
.I could also see a case for
Model<Input>
withOutput
being an associated type -- given a particular input, the output could only be a specific type.