Skip to content

Embedding Function Templates for unstructured text data

Prem Piyush Goyal edited this page Sep 18, 2024 · 2 revisions

Embedding Function Templates for unstructured text data

IBM watsonx.governance users need to pass these custom embedding functions as an input while generating embeddings for a subscription via the notebook. This page has some templates of score functions that can be used for reference.

Input to embedding function

  • The input to the embedding function has to be a list of strings.

Output to embedding function

  • The output of the embedding function has to be a list of embedding vectors (floats)
  • The size of the output list needs to be same as the size of the input list.

IBM watsonx.ai

  1. Make sure to install the ibm-watsonx-ai package.
  2. User needs to add API_KEY of their account.
  3. The embeddings functionality of watsonx.ai works within the scope of a project or a space. The example below asks for a PROJECT_ID.
  4. The example below uses sentence-transformers/all-minilm-l12-v2 to generate embeddings. Please check the list of supported embedding models in watsonx.ai documentation. A list is also available using the watsonx.ai client client.foundation_models.EmbeddingModels.show()
def embeddings_fn(inputs):
    from ibm_watsonx_ai import Credentials, APIClient
    from ibm_watsonx_ai.foundation_models import Embeddings
    from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames 
    
    # from time import time
    # start_time = time()

    API_KEY = "TO BE EDITED"
    WX_URL = "https://us-south.ml.cloud.ibm.com"
    PROJECT_ID = "TO BE EDITED"

    credentials = Credentials(
        url = WX_URL,
        api_key = API_KEY
    )

    client = APIClient(credentials, project_id=PROJECT_ID)
    # client.foundation_models.EmbeddingModels.show()
    embedding = Embeddings(
        model_id=client.foundation_models.EmbeddingModels.ALL_MINILM_L12_V2,
        api_client=client,
        params={
            EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: 128
        }
    )
    result = embedding.embed_documents(texts=inputs)
    # print(f"Got embeddings of {len(inputs)} inputs in {time() - start_time}s.")
    return result

Sentence Transformers' Library

  1. The example below uses sentence-transformers/all-minilm-l12-v2 to generate embeddings. Please check the list of supported embedding models.
from sentence_transformers import SentenceTransformer

# 1. Load a pretrained Sentence Transformer model
model = SentenceTransformer("all-MiniLM-L12-v2")

 # 2. Calculate embeddings by calling model.encode()
embeddings_fn = model.encode