Skip to content

Score function templates for unstructured image data

salman-khan-s edited this page Feb 3, 2025 · 3 revisions

Score function templates

IBM watsonx.governance users need to pass these custom score functions as an input while generating common configuration package for a subscription via the notebook. This page has some templates of score functions that can be used for reference.

Input to score function:

  • The input to the score function is the training data and the schema as described below.
    • training_data_frame : (type: pandas.DataFrame) A data frame containing the feature columns, the meta columns, the label column etc. In case of images, this should contain a column with all the file paths of images to read.
    • schema : (type: Dict) This is for identifying different columns in the scoring response.
      • prediction_column: Name of the prediction column in the scoring response
      • probability_column: Name of the probability column in the scoring response
      • input_token_count_column: Name of the input token count column. Applicable when input data type is unstructured_text [prompt assets]
      • output_token_count_column: Name of the output token count column. Applicable when input data type is unstructured_text. [prompt assets]
      • prediction_probability_column: Name of the prediction probability column. Applicable when input data type is unstructured_text [prompt assets]
      • label_column: Name of the label column. Applicable when input data type is unstructured_image
      • image_path_column: Name of the image path column, contains the path of images to load. Applicable when input data type is unstructured_image

Output from score function:

  • The output of the score function will be a pandas.DataFrame for all problem and input data types.
  • The output pandas.DataFrame should contain all the the columns of the input pandas.DataFrame.
  • The output pandas.DataFrame may or may not contain certain columns depending on the problem_type
    • For binary and multiclass problem types, the output pandas.DataFrame should contain both probability_column and prediction_column.
    • For regression problems, the output pandas.DataFrame should contain prediction_column.
    • For prompt asset related problem types, e.g. classification, generation etc., the output pandas.DataFrame may contain the input_token_count_column, output_token_count_column, prediction_probability_column etc.
    • For unstructured_image input data types, output pandas.DataFrame will also contain the label_column.

WML Model Engine:

This section provides the score function templates for model deployed in WML.

The templates specified below are common for binary / multi-class classification cases.

Online Scoring

  • Please install python library ibm_watsonx_ai using pip install ibm_watsonx_ai. The snippets make use of the above python client to score against the online endpoint of a WML model. Please be aware that a cost will be incurred for scoring using this method.

Classification Problems

def score(training_data_frame, schema={}):
    # To be filled by the user
    WML_CREDENTIALS = {
        <EDIT THIS>
    }
    try:
        deployment_id = <EDIT THIS>
        space_id = <EDIT THIS>

        # The data type of the label column and prediction column should be same .
        # User needs to make sure that label column and prediction column array 
        # should have the same unique class labels
        label_column_name = schema.get("label_column")
        prediction_column_name = schema.get("prediction_column")
        probability_column_name = schema.get("probability_column")
        image_path_column = schema.get("image_path_column")
        meta_columns = schema.get("meta_columns", [])

        # Validation to ensure that the required columns are present in the schema
        if label_column_name is None:
            raise ValueError("'label_column_name' must be present in schema")
        if prediction_column_name is None:
            raise ValueError("'prediction_column_name' must be present in schema")
        if image_path_column is None:
            raise ValueError("'image_path_column' must be present in schema")
        if probability_column_name is None:
            raise ValueError("'probability_column_name' must be present in schema")
        if meta_columns and not (isinstance(meta_columns, list)):
            raise ValueError("Meta columns are of incorrect type. Need to be of type list of strings.")
        if meta_columns and not all(col in training_data_frame.columns for col in meta_columns):
            raise ValueError("Meta columns are not present in the training data.")
        

        training_df = training_data_frame.copy()
        meta_payload = {}
        cols_to_remove = []
        
        # Prepare meta payload values if meta columns are available in dataframe
        if meta_columns:
            meta_df = training_df[meta_columns].copy()
            meta_df = meta_df.fillna('')
            meta_fields = meta_df.columns.tolist()
            meta_values = meta_df[meta_fields].values.tolist()
            cols_to_remove.extend(meta_columns)
            meta_payload = {
            "fields": meta_fields,
            "values": meta_values
            }

        # Removing the meta columns from the dataframe
        for col in cols_to_remove:
            if col in training_df.columns:
                del training_df[col]
        
        # Read individual images from the path for scoring the images
        import matplotlib.image as mpimg
        from skimage.transform import resize

        image_vals = []
        for _ , row in training_data_frame.iterrows():
            image_path = row[image_path_column]
            image = mpimg.imread(image_path)
            processed_image = resize(image, (28,28,1))
            image_vals.append(processed_image.tolist())

        scoring_data = {"input_data": [{"values": image_vals, "meta": meta_payload}]}

        from ibm_watsonx_ai import APIClient
        wml_client = APIClient(WML_CREDENTIALS)
        wml_client.set.default_space(space_id)

        score = wml_client.deployments.score(deployment_id, scoring_data)
        score_predictions = score.get("predictions")[0]
        
    
        prob_col_index = list(score_predictions.get(
            "fields")).index(probability_column_name)
        predict_col_index = list(score_predictions.get(
            "fields")).index(prediction_column_name)
    
        if prob_col_index < 0 or predict_col_index < 0:
            raise Exception(
                "Missing prediction/probability column in the scoring response")

        import numpy as np
        import pandas as pd
        
        probability_array = np.array([value[prob_col_index]
                                     for value in score_predictions.get("values")])
        prediction_vector = np.array(
            [value[predict_col_index] for value in score_predictions.get("values")])
            
        # # Incorporate the prediction and probability values into the training dataframe
        training_data_frame =training_data_frame.assign( **{prediction_column_name: prediction_vector, probability_column_name:          probability_array.tolist()})

        # Remove the image_path_column since it does not qualify as a feature column for the model.
        training_data_frame = training_data_frame.drop([image_path_column], axis=1)
        return training_data_frame

    except Exception as ex:
        raise Exception("Scoring failed. Error: {}".format(str(ex)))