Skip to content
This repository has been archived by the owner on Mar 19, 2024. It is now read-only.

Commit

Permalink
supervised tutorial and autotune documentation with python tabs
Browse files Browse the repository at this point in the history
Summary: Docusaurus now allows multiple language tabs. This commit adds a python tab for supervised and autotune examples. It also includes a snippet that activates the same language tab for the whole page.

Reviewed By: EdouardGrave

Differential Revision: D17091834

fbshipit-source-id: 6e6f76aa9408baa08fcd6c0bfd011de2cb477dfb
  • Loading branch information
Celebio authored and facebook-github-bot committed Aug 28, 2019
1 parent 4aca28c commit 38350a5
Show file tree
Hide file tree
Showing 4 changed files with 365 additions and 10 deletions.
68 changes: 65 additions & 3 deletions docs/autotune.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,28 +13,55 @@ In order to activate hyperparameter optimization, we must provide a validation f

For example, using the same data as our [tutorial example](/docs/en/supervised-tutorial.html#our-first-classifier), the autotune can be used in the following way:

<!--DOCUSAURUS_CODE_TABS-->
<!--Command line-->
```sh
>> ./fasttext supervised -input cooking.train -output model_cooking -autotune-validation cooking.valid
```
<!--Python-->
```py
>>> import fasttext
>>> model = fasttext.train_supervised(input='cooking.train', autotuneValidationFile='cooking.valid')
```
<!--END_DOCUSAURUS_CODE_TABS-->


Then, fastText will search the hyperparameters that gives the best f1-score on `cooking.valid` file:
```sh
Progress: 100.0% Trials: 27 Best score: 0.406763 ETA: 0h 0m 0s
```

Now we can test the obtained model with:
<!--DOCUSAURUS_CODE_TABS-->
<!--Command line-->
```sh
>> ./fasttext test model_cooking.bin data/cooking.valid
>> ./fasttext test model_cooking.bin cooking.valid
N 3000
P@1 0.666
R@1 0.288
```
<!--Python-->
```py
>>> model.test("cooking.valid")
(3000L, 0.666, 0.288)
```
<!--END_DOCUSAURUS_CODE_TABS-->


By default, the search will take 5 minutes. You can set the timeout in seconds with the `-autotune-duration` argument. For example, if you want to set the limit to 10 minutes:

<!--DOCUSAURUS_CODE_TABS-->
<!--Command line-->
```sh
>> ./fasttext supervised -input cooking.train -output model_cooking -autotune-validation cooking.valid -autotune-duration 600
```
<!--Python-->
```py
>>> import fasttext
>>> model = fasttext.train_supervised(input='cooking.train', autotuneValidationFile='cooking.valid', autotuneDuration=600)
```
<!--END_DOCUSAURUS_CODE_TABS-->


While autotuning, fastText displays the best f1-score found so far. If we decide to stop the tuning before the time limit, we can send one `SIGINT` signal (via `CTLR-C` for example). FastText will then finish the current training, and retrain with the best parameters found so far.

Expand All @@ -46,23 +73,42 @@ As you may know, fastText can compress the model with [quantization](/docs/en/ch

Fortunately, autotune can also find the hyperparameters for this compression task while targeting the desired model size. To this end, we can set the `-autotune-modelsize` argument:

<!--DOCUSAURUS_CODE_TABS-->
<!--Command line-->
```sh
>> ./fasttext supervised -input cooking.train -output model_cooking -autotune-validation cooking.valid -autotune-modelsize 2M
```

This will produce a `.ftz` file with the best accuracy having the desired size:
```sh
>> ls -la model_cooking.ftz
-rw-r--r--. 1 celebio users 1990862 Aug 25 05:39 model_cooking.ftz
>> ./fasttext test model_cooking.ftz data/cooking.valid
>> ./fasttext test model_cooking.ftz cooking.valid
N 3000
P@1 0.57
R@1 0.246
```
<!--Python-->
```py
>>> import fasttext
>>> model = fasttext.train_supervised(input='cooking.train', autotuneValidationFile='cooking.valid', autotuneModelSize="2M")
```
If you save the model, you will obtain a model file with the desired size:
```py
>>> model.save_model("model_cooking.ftz")
>>> import os
>>> os.stat("model_cooking.ftz").st_size
1990862
>>> model.test("cooking.valid")
(3000L, 0.57, 0.246)
```
<!--END_DOCUSAURUS_CODE_TABS-->


# How to set the optimization metric?

<!--DOCUSAURUS_CODE_TABS-->
<!--Command line-->
<br />
By default, autotune will test the validation file you provide, exactly the same way as `./fasttext test model_cooking.bin cooking.valid` and try to optimize to get the highest [f1-score](https://en.wikipedia.org/wiki/F1_score).

But, if we want to optimize the score of a specific label, say `__label__baking`, we can set the `-autotune-metric` argument:
Expand All @@ -74,3 +120,19 @@ But, if we want to optimize the score of a specific label, say `__label__baking`
This is equivalent to manually optimize the f1-score we get when we test with `./fasttext test-label model_cooking.bin cooking.valid | grep __label__baking` in command line.

Sometimes, you may be interested in predicting more than one label. For example, if you were optimizing the hyperparameters manually to get the best score to predict two labels, you would test with `./fasttext test model_cooking.bin cooking.valid 2`. You can also tell autotune to optimize the parameters by testing two labels with the `-autotune-predictions` argument.
<!--Python-->
<br />
By default, autotune will test the validation file you provide, exactly the same way as `model.test("cooking.valid")` and try to optimize to get the highest [f1-score](https://en.wikipedia.org/wiki/F1_score).

But, if we want to optimize the score of a specific label, say `__label__baking`, we can set the `autotuneMetric` argument:

```py
>>> import fasttext
>>> model = fasttext.train_supervised(input='cooking.train', autotuneValidationFile='cooking.valid', autotuneMetric="f1:__label__baking")
```

This is equivalent to manually optimize the f1-score we get when we test with `model.test_label('cooking.valid')['__label__baking']`.

Sometimes, you may be interested in predicting more than one label. For example, if you were optimizing the hyperparameters manually to get the best score to predict two labels, you would test with `model.test("cooking.valid", k=2)`. You can also tell autotune to optimize the parameters by testing two labels with the `autotunePredictions` argument.
<!--END_DOCUSAURUS_CODE_TABS-->

Loading

0 comments on commit 38350a5

Please sign in to comment.