Skip to content

Files

This branch is 11 commits ahead of, 142 commits behind practical-nlp/practical-nlp-code:master.

Ch3

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Jun 1, 2020
May 12, 2020
Jun 22, 2020
Jun 22, 2020
Jun 22, 2020
Jun 22, 2020
Jun 22, 2020
Jul 7, 2020
Aug 2, 2020
Jul 2, 2020
Jun 22, 2020
Jun 22, 2020
Sep 5, 2020

Text Representation

🔖 Outline

To be added

🗒️ Notebooks

Set of notebooks associated with the chapter.

  1. One-Hot Encoding: Here we demonstrate One-Hot encoding from the first principle as well as scikit learn's implementation on our toy corpus.

  2. Bag of Words : Here we demonstrate how to arrive at the bag of words representation for our toy corpus.    

  3. Bag of N Grams: Here we demonstrate how Bag of N-Grams work using our toy corpus.

  4. TF-IDF: Here we demonstrate how to obtain the get the TF-IDF representation of a document using sklearn's TfidfVectorizer(we will be using our toy corpus).

  5. Pre-trained Word Embeddings: Here we demonstrate how we can represent text using pre-trained word embedding models and how to use them to get representations for the full text.

  6. Custom Word Embeddings: Here we demonstrate how to train a custom Word Embedding model(word2vec) using gensim on both, our toy corpus and a subset of Wikipedia data.

  7. Vector Representations via averaging: Here we demonstrate averaging of Document Vectors using spaCy.

  8. Doc2Vec Model: Here we demonstrate how to train your own doc2vec model.

  9. Visualizing Embeddings Using TSNE: Here we demonstrate how we can use dimensionality reduction techniques such as TSNE to visualize embeddings.

  10. Visualizing Embeddings using Tensorboard: Here we demonstrate how we can visualize embeddings using Tensorboard.

🖼️ Figures

Color figures as requested by the readers.

figure figure figure figure figure figure figure figure figure figure figure figure figure figure figure figure figure figure figure