Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Flair Embeddings: hero.embed(s, flair_embedding) function #146

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

henrifroese
Copy link
Collaborator

@henrifroese henrifroese commented Aug 9, 2020

  • new file embeddings.py
  • function hero.embed(s: TokenSeries, flair_embedding: flair.embeddings.DocumentEmbeddings) -> VectorSeries
  • tests for everything in new test_embeddings.py
  • new dependency flair>=0.5.1.

The new function adds full support for Flair Document Embeddings. See embeddings.py docstrings for further explainations.

EDIT:

As you can see below, we had some (many 🙃 ) travis problems. Root cause was the added dependency flair that introduces a lot of overhead through it's dependencies (mainly torch). Problems and solutions (in .travis.yml) were:

  • macOS permission denied errors when installing torch and related dependencies ➡️ Solution: install packages with --user flag, so pip3 install --user ".[dev]" . for macOS
  • windows torch installation has to be done a little more manually, so solution is to do pip3 install --no-deps torch===1.6.0 -f https://download.pytorch.org/whl/torch_stable.html before pip3 install ".[dev]" ..

Overall, we had to declare different install steps for the different OSes (python as before, windows and macOS as described above) as you can see in the .travis.yml file.

henrifroese and others added 2 commits August 8, 2020 16:04
- add docstring
- improve functionality
- add dependency

Co-authored-by: Maximilian Krahn <[email protected]>
@henrifroese
Copy link
Collaborator Author

henrifroese commented Aug 9, 2020

Build fails on Windows and errors on macOS, will look into it

EDIT linux+macOS passing now; still working on Windows build

EDIT 2 all passing now; see above for explaination

@jbesomi
Copy link
Owner

jbesomi commented Aug 9, 2020

Nice and very beautiful! 🎉

Why is flair not automatically installing torch? Shouldn't we specify install_require=...torch... in setup.cfg too instead of in the travis yaml file?

What about the discussion/questions there: #140. Q: embed is also a representation function, shouldn't we return a RepresentationSeries? What's your view on that?

A distinction we might want to do is that in general the embed/dim red size is quite limited (100-300 dim size) and as such VectorSeries still make sense, whereas for tfidf/count/term_freq the size can be as large as the number of tokens and therefore the use of RepresentationSeries...

texthero/embeddings.py Outdated Show resolved Hide resolved
texthero/embeddings.py Outdated Show resolved Hide resolved
@henrifroese
Copy link
Collaborator Author

henrifroese commented Aug 9, 2020

Why is flair not automatically installing torch? Shouldn't we specify install_require=...torch... in setup.cfg too instead of in the travis yaml file?

Flair tries to automatically install torch, but it fails on macOS and Windows. We could probably open a separate Issue/PR after this is merged to clean up the pipeline etc. , similar to #137 . You can e.g. see here when selecting the Windows configuration that it won't just install with pip install torch

What about the discussion/questions there: #140. Q: embed is also a representation function, shouldn't we return a RepresentationSeries? What's your view on that?

EDIT: see my answer on #140

@jbesomi
Copy link
Owner

jbesomi commented Aug 9, 2020

If I well understood what you are saying, on a clean macOS/windows machines,pip install texthero won't work as torch will not installed correctly? What's your idea of how we can solve this?

What's a bit unclear is that I haven't found anything on the flair documentation that explains how torch should be installed. It appears likepip install flair should do the job!

@henrifroese
Copy link
Collaborator Author

Hmm I'm actually not sure (and also not sure how to try that out), I'll try again with some .travis.yml configurations in this PR

@henrifroese
Copy link
Collaborator Author

We have now tried out lots of stuff in #149 .

  • Linux works ✔️
  • Windows: installing torch just doesn't work with pip install torch; see here that it does not work with Windows; pip install flair also won't work on Windows as long as torch is not already installed
  • macOS: installing pip install texthero or flair etc. does usually work, the torch dependency can however produce errors when --user is not set as flag

We see the following options:

  1. Remove flair as default dependency (would save the huge torch dependency and only affects the embed function) -> when users want to use hero.embed, check if flair is installed and tell them to install it if it isn't

  2. Use the .travis.yml file that is currently used in this PR and tell macOS users in the README they might want to use --user if they have problems (most won't have any issues) and tell Windows users how they can buy a Mac install torch first

What do you think? 🙃

@jbesomi
Copy link
Owner

jbesomi commented Aug 10, 2020

  1. Seems ok for now. Anyway, the users will have to understand how flair works, so it's "okay" not to have it as dependencies (at least for now). We will have to make this clear in the documentation (docstring/getting started/tutorial).

We still need to "fake" on Travis CI and install flair as you were doing (and torch eventually) otherwise, unit-tests will fail

can buy a Mac install torch first

😂😂😂😂

- additionally set testing output to quiet as flair tests print a lot of unnecessary stuff
@henrifroese
Copy link
Collaborator Author

Great, we've now switched flair to a dev dependency and made the necessary changes in the code :octocat:

.travis.yml Outdated Show resolved Hide resolved
.travis.yml Show resolved Hide resolved
scripts/tests.sh Outdated Show resolved Hide resolved
texthero/embeddings.py Outdated Show resolved Hide resolved
texthero/embeddings.py Outdated Show resolved Hide resolved
texthero/embeddings.py Outdated Show resolved Hide resolved
texthero/embeddings.py Outdated Show resolved Hide resolved
texthero/embeddings.py Outdated Show resolved Hide resolved
texthero/embeddings.py Outdated Show resolved Hide resolved
texthero/embeddings.py Outdated Show resolved Hide resolved
(Try to) fix macOS pipeline error.
@henrifroese
Copy link
Collaborator Author

Concerning Travis: We've now moved back to one install block at the bottom. However, we need

  • for macOS: install with --user (as discussed before) -> custom install; we need to install black separately as it does not work with the --user flag that we use in pip3 install --user ".[dev]" . for mac.
  • for windows: torch installation moved to before-install.

We believe that the .travis.yml is now as nice as it can be while supporting flair as dev dependency.

@henrifroese henrifroese marked this pull request as ready for review August 22, 2020 17:09
@henrifroese
Copy link
Collaborator Author

TODO:

  • clearer description that (and how) users have to install flair themselves
  • longer explanation in the docstring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants