Skip to content

Commit

Permalink
adjust docs
Browse files Browse the repository at this point in the history
  • Loading branch information
t83714 committed Jul 25, 2024
1 parent 4e813ca commit ea2df19
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 16 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Kubernetes: `>= 1.21.0`
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| affinity | object | `{}` | |
| appConfig | object | `{}` | Application configuration of the service. You can supply a list of key-value pairs to be used as the application configuration. Currently, the only supported config field is `modelList`. Via the `modelList` field, you can specify a list of LLM models that the service supports. Although you can specify multiple models, only one model will be used at this moment. Each model item have the following fields: - `name` (string): The huggingface registered model name. We only support ONNX model at this moment. This field is required. - `default` (bool): Optional; Whether this model is the default model. If not specified, the first model in the list will be the default model. Only default model will be loaded. - `quantized` (bool): Optional; Whether the quantized version of model will be used. If not specified, the quantized version model will be loaded. - `config` (object): Optional; The configuration object that will be passed to the model. - `cache_dir` (string): Optional; The cache directory of the downloaded models. If not specified, the default cache directory will be used. - `local_files_only` (bool): Optional; Whether to only load the model from local files. If not specified, the model will be downloaded from the huggingface model hub. - `revision` (string) Optional, Default to 'main'; The specific model version to use. It can be a branch name, a tag name, or a commit id. Since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. NOTE: This setting is ignored for local requests. - `model_file_name` (string) Optional; - `extraction_config` (object) Optional; The configuration object that will be passed to the model extraction function for embedding generation. - `pooling`: ('none'|'mean'|'cls') Default to 'none'. The pooling method to use. - `normalize`: (bool) Default to true. Whether or not to normalize the embeddings in the last dimension. - `quantize`: (bool) Default to `false`. Whether or not to quantize the embeddings. - `precision`: ("binary" | "ubinary") default to "binary". The precision to use for quantization. Only used when `quantize` is true. Please note: The released docker image only contains "Alibaba-NLP/gte-base-en-v1.5" model. If you specify other models, the server will download the model from the huggingface model hub at the startup. You might want to adjust the `startupProbe` settings to accommodate the model downloading time. Depends on the model size, you might also want to adjust the `resources.limits.memory` & `resources.requests.memory`value. |
| appConfig | object | `{}` | Application configuration of the service. You can supply a list of key-value pairs to be used as the application configuration. Currently, the only supported config field is `modelList`. Via the `modelList` field, you can specify a list of LLM models that the service supports. Although you can specify multiple models, only one model will be used at this moment. Each model item have the following fields: <ul> <li> `name` (string): The huggingface registered model name. We only support ONNX model at this moment. This field is required. </li> <li> `default` (bool): Optional; Whether this model is the default model. If not specified, the first model in the list will be the default model. Only default model will be loaded. </li> <li> `quantized` (bool): Optional; Whether the quantized version of model will be used. If not specified, the quantized version model will be loaded. </li> <li> `config` (object): Optional; The configuration object that will be passed to the model. </li> <li> `cache_dir` (string): Optional; The cache directory of the downloaded models. If not specified, the default cache directory will be used. </li> <li> `local_files_only` (bool): Optional; Whether to only load the model from local files. If not specified, the model will be downloaded from the huggingface model hub. </li> <li> `revision` (string) Optional, Default to 'main'; The specific model version to use. It can be a branch name, a tag name, or a commit id. Since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. NOTE: This setting is ignored for local requests. </li> <li> `model_file_name` (string) Optional; </li> <li> `extraction_config` (object) Optional; The configuration object that will be passed to the model extraction function for embedding generation. <br/> <ul> <li> `pooling`: ('none' or 'mean' or 'cls') Default to 'none'. The pooling method to use. </li> <li> `normalize`: (bool) Default to true. Whether or not to normalize the embeddings in the last dimension. </li> <li> `quantize`: (bool) Default to `false`. Whether or not to quantize the embeddings. </li> <li> `precision`: ("binary" or "ubinary") default to "binary". The precision to use for quantization. Only used when `quantize` is true. </li> </ul> </li> </ul> Please note: The released docker image only contains "Alibaba-NLP/gte-base-en-v1.5" model. If you specify other models, the server will download the model from the huggingface model hub at the startup. You might want to adjust the `startupProbe` settings to accommodate the model downloading time. Depends on the model size, you might also want to adjust the `resources.limits.memory` & `resources.requests.memory`value. |
| autoscaling.hpa.enabled | bool | `false` | |
| autoscaling.hpa.maxReplicas | int | `3` | |
| autoscaling.hpa.minReplicas | int | `1` | |
Expand Down
35 changes: 20 additions & 15 deletions deploy/magda-embedding-api/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,21 +31,26 @@ closeGraceDelay: 25000
# Currently, the only supported config field is `modelList`.
# Via the `modelList` field, you can specify a list of LLM models that the service supports.
# Although you can specify multiple models, only one model will be used at this moment.
# Each model item have the following fields:
# - `name` (string): The huggingface registered model name. We only support ONNX model at this moment. This field is required.
# - `default` (bool): Optional; Whether this model is the default model. If not specified, the first model in the list will be the default model. Only default model will be loaded.
# - `quantized` (bool): Optional; Whether the quantized version of model will be used. If not specified, the quantized version model will be loaded.
# - `config` (object): Optional; The configuration object that will be passed to the model.
# - `cache_dir` (string): Optional; The cache directory of the downloaded models. If not specified, the default cache directory will be used.
# - `local_files_only` (bool): Optional; Whether to only load the model from local files. If not specified, the model will be downloaded from the huggingface model hub.
# - `revision` (string) Optional, Default to 'main'; The specific model version to use. It can be a branch name, a tag name, or a commit id.
# Since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. NOTE: This setting is ignored for local requests.
# - `model_file_name` (string) Optional;
# - `extraction_config` (object) Optional; The configuration object that will be passed to the model extraction function for embedding generation.
# - `pooling`: ('none'|'mean'|'cls') Default to 'none'. The pooling method to use.
# - `normalize`: (bool) Default to true. Whether or not to normalize the embeddings in the last dimension.
# - `quantize`: (bool) Default to `false`. Whether or not to quantize the embeddings.
# - `precision`: ("binary" | "ubinary") default to "binary". The precision to use for quantization. Only used when `quantize` is true.
# Each model item have the following fields:
# <ul>
# <li> `name` (string): The huggingface registered model name. We only support ONNX model at this moment. This field is required. </li>
# <li> `default` (bool): Optional; Whether this model is the default model. If not specified, the first model in the list will be the default model. Only default model will be loaded. </li>
# <li> `quantized` (bool): Optional; Whether the quantized version of model will be used. If not specified, the quantized version model will be loaded. </li>
# <li> `config` (object): Optional; The configuration object that will be passed to the model. </li>
# <li> `cache_dir` (string): Optional; The cache directory of the downloaded models. If not specified, the default cache directory will be used. </li>
# <li> `local_files_only` (bool): Optional; Whether to only load the model from local files. If not specified, the model will be downloaded from the huggingface model hub. </li>
# <li> `revision` (string) Optional, Default to 'main'; The specific model version to use. It can be a branch name, a tag name, or a commit id.
# Since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. NOTE: This setting is ignored for local requests. </li>
# <li> `model_file_name` (string) Optional; </li>
# <li> `extraction_config` (object) Optional; The configuration object that will be passed to the model extraction function for embedding generation. <br/>
# <ul>
# <li> `pooling`: ('none' or 'mean' or 'cls') Default to 'none'. The pooling method to use. </li>
# <li> `normalize`: (bool) Default to true. Whether or not to normalize the embeddings in the last dimension. </li>
# <li> `quantize`: (bool) Default to `false`. Whether or not to quantize the embeddings. </li>
# <li> `precision`: ("binary" or "ubinary") default to "binary". The precision to use for quantization. Only used when `quantize` is true. </li>
# </ul>
# </li>
# </ul>
# Please note: The released docker image only contains "Alibaba-NLP/gte-base-en-v1.5" model.
# If you specify other models, the server will download the model from the huggingface model hub at the startup.
# You might want to adjust the `startupProbe` settings to accommodate the model downloading time.
Expand Down

0 comments on commit ea2df19

Please sign in to comment.