Skip to content
This repository was archived by the owner on Aug 12, 2020. It is now read-only.

Design exploration #2

Open
wants to merge 35 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
1345273
Initialize project
LukeMathWalker May 11, 2019
c602eec
Start from rusty-machine approach
LukeMathWalker May 11, 2019
7407331
Explain rationale
LukeMathWalker May 11, 2019
a808405
Restructure the trainer class
LukeMathWalker May 11, 2019
866b1ed
Cargo fmt + comment
LukeMathWalker May 11, 2019
0fa4e2b
Add rationale for Optimizer trait (former Trainer)
LukeMathWalker May 11, 2019
6bc7a71
Introduce Blueprint trait and add details to Optimizer docs.
LukeMathWalker May 11, 2019
60e649b
Cargo fmt
LukeMathWalker May 11, 2019
1c4bcc4
Ignore IDE-related files
LukeMathWalker May 11, 2019
5d03be2
Remove the loss parameter from Optimizer: for most models it's not po…
LukeMathWalker May 11, 2019
2b0bbf6
Typo
LukeMathWalker May 11, 2019
d56a5a0
Typos
LukeMathWalker May 11, 2019
fbe9a66
Add BlueprintGenerator
LukeMathWalker May 12, 2019
6f1b906
Remove parameters from generate
LukeMathWalker May 12, 2019
f1a0312
Refine BlueprintGenerator, moving I to associated type. Implement Blu…
LukeMathWalker May 12, 2019
d39a6e3
Re-org
LukeMathWalker May 12, 2019
cbfa74c
Add blanket implementation of Blueprint for Model types
LukeMathWalker May 12, 2019
afebfe1
Refactor
LukeMathWalker May 15, 2019
7a2c4a9
Add docsc, fix typos
LukeMathWalker May 15, 2019
f7f8f50
Make input and output generic parameters
LukeMathWalker May 15, 2019
da9a9d7
Add comments
LukeMathWalker May 15, 2019
834ffdd
Doc minor fix
LukeMathWalker May 19, 2019
9ffa8cd
Add examples folder
LukeMathWalker May 19, 2019
dfe62f4
Basic transformer implementation for standard scaling
LukeMathWalker May 19, 2019
93ad90a
Add other structs
LukeMathWalker May 19, 2019
39b7379
Skeleton of Fit and IncrementalFit implementation
LukeMathWalker May 19, 2019
6aa09e0
Implemented Fit trait
LukeMathWalker May 19, 2019
ace68b1
Convert ddof to f64. Make fit and incremental_fit take self as mutabl…
LukeMathWalker May 19, 2019
d473588
Implement IncrementalFit
LukeMathWalker May 19, 2019
c494d3c
Add very basic usage example
LukeMathWalker May 19, 2019
8616d44
Move into folder
LukeMathWalker May 19, 2019
2a54ddd
Restructure into a proper module
LukeMathWalker May 19, 2019
3d23c4b
Restructure into a proper module
LukeMathWalker May 19, 2019
a29de61
Fix stdd update
LukeMathWalker May 19, 2019
eef5f6f
Clean up code for stdd update
LukeMathWalker May 19, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
/target
**/*.rs.bk
Cargo.lock

# IDEs
.idea/
tags
7 changes: 7 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[package]
name = "linfa"
version = "0.1.0"
authors = ["LukeMathWalker <[email protected]>"]
edition = "2018"

[dependencies]
116 changes: 116 additions & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
use std::error;
use std::iter;

/// The basic `Model` trait.
///
/// It is training-agnostic: a model takes an input and returns an output.
///
/// There might be multiple ways to discover the best settings for every
/// particular algorithm (e.g. training a logistic regressor using
/// a pseudo-inverse matrix vs using gradient descent).
/// It doesn't matter: the end result, the model, is a set of parameters.
/// The way those parameter originated is an orthogonal concept.
///
/// In the same way, it has no notion of loss or "correct" predictions.
/// Those concepts are embedded elsewhere.
pub trait Model {
type Input;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to think whether Input and Output should be associated types or struct generics (Model<Input, Output>). It's definitely possible that a trained model could be implemented to provide predictions over multiple types of input / output. For instance, we could have a model defined over ndarray input, or dataframe input, or even a Vec<T>.

I could also see a case for Model<Input> with Output being an associated type -- given a particular input, the output could only be a specific type.

type Output;
type Error: error::Error;

fn predict(&self, inputs: &Self::Input) -> Result<Self::Output, Self::Error>;
}

/// One step closer to the peak.
///
/// `Optimizer` is generic over a type `M` implementing the `Model` trait: `M` is used to
/// constrain what type of inputs and targets are acceptable.
///
/// `train` takes an instance of `M` as one of its inputs, `model`: it doesn't matter if `model`
/// has been through several rounds of training before, or if it just came out of a `Blueprint`
/// using `initialize` - it's consumed by `train` and a new model is returned.
///
/// This means that there is no difference between one-shot training and incremental training.
/// Furthermore, the optimizer doesn't have to "own" the model or know anything about its hyperparameters,
/// because it never has to initialize it.
pub trait Optimizer<M>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wording: Optimizer or something like Estimator? Optimizer might be confusing given that some algorithms are actually optimization algorithms, but others aren't.

where
M: Model,
{
type Error: error::Error;

fn train(
&self,
inputs: &M::Input,
targets: &M::Output,
model: M,
) -> Result<M, Self::Error>;
}

/// Where `Model`s are forged.
///
/// `Blueprint`s are used to specify how to build and initialize an instance of the model type `M`.
///
/// For the same model type `M`, nothing prevents a user from providing more than one `Blueprint`:
/// multiple initialization strategies can somethings be used to be build the same model type.
///
/// Each of these strategies can take different (hyper)parameters, even though they return an
/// instance of the same model type in the end.
///
/// The initialization procedure could be data-dependent, hence the signature of `initialize`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned about potential user confusion about what should be put in the Blueprint's initialize method vs Optimizer's train method, given the similarities in method signatures (they both take input and targets, they both return models).

What would be an example of a workflow with a data-dependent initialization? Are there any other options for handling that initialization?

pub trait Blueprint<M>
where
M: Model,
{
type Error: error::Error;

fn initialize(&self, inputs: &M::Input, targets: &M::Output) -> Result<M, Self::Error>;
}

/// Any `Model` can be used as `Blueprint`, as long as it's clonable:
/// it returns a clone of itself when `initialize` is called, ignoring the data.
impl<M> Blueprint<M> for M
where
M: Model + Clone,
{
type Error = M::Error;

fn initialize(&self, _inputs: &M::Input, _targets: &M::Output) -> Result<M, Self::Error>
{
Ok(self.clone())
}
}

/// Where you need to go meta (hyperparameters!).
///
/// `BlueprintGenerator`s can be used to explore different combination of hyperparameters
/// when you are working with a certain `Model` type.
///
/// `BlueprintGenerator::generate` returns, if successful, an `IntoIterator` type
/// yielding instances of blueprints.
pub trait BlueprintGenerator<B, M>
where
B: Blueprint<M>,
M: Model
{
type Error: error::Error;
type Output: IntoIterator<Item=B>;

fn generate(&self) -> Result<Self::Output, Self::Error>;
}

/// Any `Blueprint` can be used as `BlueprintGenerator`, as long as it's clonable:
/// it returns an iterator with a single element, a clone of itself.
impl<B, M> BlueprintGenerator<B, M> for B
where
B: Blueprint<M> + Clone,
M: Model,
{
type Error = B::Error;
type Output = iter::Once<B>;

fn generate(&self) -> Result<Self::Output, Self::Error>
{
Ok(iter::once(self.clone()))
}
}