Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding examples #47

Closed
munckymagik opened this issue May 11, 2019 · 5 comments
Closed

Consider adding examples #47

munckymagik opened this issue May 11, 2019 · 5 comments

Comments

@munckymagik
Copy link
Contributor

I suggest we consider adding an examples folder to demonstrate more real world usage.

The benefits I think this would bring are:

  • By seeing the library used in a more realistic situation we may learn things about the design that the tests/doc-tests didn't reveal.
  • We help users get started more quickly for typical use-cases.
  • We get to set up some good usage patterns for others to follow.

Can we brain-storm a list of the kinds of examples we would want?

Does anybody have any toy examples we could use to seed the folder?

@LukeMathWalker
Copy link
Member

I finally have time to go back to this ❤️

We could start by porting some of the examples in the Scipy cookbook.
Considering that we have focused on the stats section of Scipy, these ones could be relevant to us:

Do you know any other collections of examples we can poach from @munckymagik?

@munckymagik
Copy link
Contributor Author

@LukeMathWalker sorry for the delay, some good ones here maybe: https://github.com/ddbourgin/numpy-ml.

Should we create a checklist organised by the major feature areas in our crate, and try to propose at least one compelling example for each? I realise there might be some cross over so one example may cover several areas.

I'm going to need you to guide as to what kinds of examples would suit our crate and be good starting points for users with real problems to solve. Maybe a mix of common-use items plus something more niche? IDK maybe regression, f-score, p-values, confidence intervals etc. (???)

Then there's what to do about sample data. I found:

What do you think?

@munckymagik
Copy link
Contributor Author

Also, have you used any of the rust plotting libraries?

@LukeMathWalker
Copy link
Member

We though of the same repo there - I have drafted a first linear regression example using ndarray-linalg and ndarray-stats, see rust-ndarray/ndarray-linalg#166
I have filed the PR against ndarray-linalg because it might be a little complicated to deal with the BLAS backend in this crate, but we can sort it out if we wanted to.

The examples you mentioned are good starting point. In terms of ML, we could have a look at some stuff in the preprocessing space:

  • given a (n_samples, n_features) input matrix, keep only the columns whose variance is above a threshold (ScikitLearn equivalent);
  • given a (n_samples, n_features) input matrix, recursively remove the columns who have a pearson correlation score above a certain threshold;
  • multiclass logistic regression, using our cross_entropy method as loss function.

Given the nature of our crate, I think that to make it shine we need examples that do require vectorised operations - ML is a fantastic domain for these purposes.

For datasets, I think we can either generate them (as in my linear regression example) or we could use openml-rust, it seems sufficiently plug-and-play.

I opened a thread a while ago on Reddit for plotting libraries, but none of those I have seen so far seemed mature enough. I don't know if the landscape has changed significantly since.

@munckymagik
Copy link
Contributor Author

Shutting this crate-specific issue now. Work to build examples for the ndarray ecosystem is happening in https://github.com/rust-ndarray/ndarray-examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants