-
Notifications
You must be signed in to change notification settings - Fork 70
Embedding Function Templates for unstructured text data
Prem Piyush Goyal edited this page Sep 18, 2024
·
2 revisions
IBM watsonx.governance users need to pass these custom embedding functions as an input while generating embeddings for a subscription via the notebook. This page has some templates of score functions that can be used for reference.
- The input to the embedding function has to be a list of strings.
- The output of the embedding function has to be a list of embedding vectors (floats)
- The size of the output list needs to be same as the size of the input list.
- Make sure to install the
ibm-watsonx-ai
package. - User needs to add
API_KEY
of their account. - The embeddings functionality of watsonx.ai works within the scope of a project or a space. The example below asks for a
PROJECT_ID
. - The example below uses
sentence-transformers/all-minilm-l12-v2
to generate embeddings. Please check the list of supported embedding models in watsonx.ai documentation. A list is also available using the watsonx.ai clientclient.foundation_models.EmbeddingModels.show()
def embeddings_fn(inputs):
from ibm_watsonx_ai import Credentials, APIClient
from ibm_watsonx_ai.foundation_models import Embeddings
from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames
# from time import time
# start_time = time()
API_KEY = "TO BE EDITED"
WX_URL = "https://us-south.ml.cloud.ibm.com"
PROJECT_ID = "TO BE EDITED"
credentials = Credentials(
url = WX_URL,
api_key = API_KEY
)
client = APIClient(credentials, project_id=PROJECT_ID)
# client.foundation_models.EmbeddingModels.show()
embedding = Embeddings(
model_id=client.foundation_models.EmbeddingModels.ALL_MINILM_L12_V2,
api_client=client,
params={
EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: 128
}
)
result = embedding.embed_documents(texts=inputs)
# print(f"Got embeddings of {len(inputs)} inputs in {time() - start_time}s.")
return result
- The example below uses
sentence-transformers/all-minilm-l12-v2
to generate embeddings. Please check the list of supported embedding models.
from sentence_transformers import SentenceTransformer
# 1. Load a pretrained Sentence Transformer model
model = SentenceTransformer("all-MiniLM-L12-v2")
# 2. Calculate embeddings by calling model.encode()
embeddings_fn = model.encode