-
Notifications
You must be signed in to change notification settings - Fork 70
Score function templates for unstructured image data
salman-khan-s edited this page Feb 3, 2025
·
3 revisions
IBM watsonx.governance users need to pass these custom score functions as an input while generating common configuration package for a subscription via the notebook. This page has some templates of score functions that can be used for reference.
- The input to the score function is the training data and the schema as described below.
- training_data_frame : (type: pandas.DataFrame) A data frame containing the feature columns, the meta columns, the label column etc. In case of images, this should contain a column with all the file paths of images to read.
-
schema : (type: Dict) This is for identifying different columns in the scoring response.
-
prediction_column
: Name of the prediction column in the scoring response -
probability_column
: Name of the probability column in the scoring response -
input_token_count_column
: Name of the input token count column. Applicable when input data type isunstructured_text
[prompt assets] -
output_token_count_column
: Name of the output token count column. Applicable when input data type isunstructured_text
. [prompt assets] -
prediction_probability_column
: Name of the prediction probability column. Applicable when input data type isunstructured_text
[prompt assets] -
label_column
: Name of the label column. Applicable when input data type isunstructured_image
-
image_path_column
: Name of the image path column, contains the path of images to load. Applicable when input data type isunstructured_image
-
- The output of the score function will be a
pandas.DataFrame
for all problem and input data types. - The output
pandas.DataFrame
should contain all the the columns of the inputpandas.DataFrame
. - The output
pandas.DataFrame
may or may not contain certain columns depending on theproblem_type
- For
binary
andmulticlass
problem types, the outputpandas.DataFrame
should contain bothprobability_column
andprediction_column
. - For
regression
problems, the outputpandas.DataFrame
should containprediction_column
. - For prompt asset related problem types, e.g.
classification
,generation
etc., the outputpandas.DataFrame
may contain theinput_token_count_column
,output_token_count_column
,prediction_probability_column
etc. - For
unstructured_image
input data types, output pandas.DataFrame will also contain thelabel_column
.
- For
This section provides the score function templates for model deployed in WML.
The templates specified below are common for binary / multi-class classification cases.
- Please install python library ibm_watsonx_ai using
pip install ibm_watsonx_ai
. The snippets make use of the above python client to score against the online endpoint of a WML model. Please be aware that a cost will be incurred for scoring using this method.
def score(training_data_frame, schema={}):
# To be filled by the user
WML_CREDENTIALS = {
<EDIT THIS>
}
try:
deployment_id = <EDIT THIS>
space_id = <EDIT THIS>
# The data type of the label column and prediction column should be same .
# User needs to make sure that label column and prediction column array
# should have the same unique class labels
label_column_name = schema.get("label_column")
prediction_column_name = schema.get("prediction_column")
probability_column_name = schema.get("probability_column")
image_path_column = schema.get("image_path_column")
meta_columns = schema.get("meta_columns", [])
# Validation to ensure that the required columns are present in the schema
if label_column_name is None:
raise ValueError("'label_column_name' must be present in schema")
if prediction_column_name is None:
raise ValueError("'prediction_column_name' must be present in schema")
if image_path_column is None:
raise ValueError("'image_path_column' must be present in schema")
if probability_column_name is None:
raise ValueError("'probability_column_name' must be present in schema")
if meta_columns and not (isinstance(meta_columns, list)):
raise ValueError("Meta columns are of incorrect type. Need to be of type list of strings.")
if meta_columns and not all(col in training_data_frame.columns for col in meta_columns):
raise ValueError("Meta columns are not present in the training data.")
training_df = training_data_frame.copy()
meta_payload = {}
cols_to_remove = []
# Prepare meta payload values if meta columns are available in dataframe
if meta_columns:
meta_df = training_df[meta_columns].copy()
meta_df = meta_df.fillna('')
meta_fields = meta_df.columns.tolist()
meta_values = meta_df[meta_fields].values.tolist()
cols_to_remove.extend(meta_columns)
meta_payload = {
"fields": meta_fields,
"values": meta_values
}
# Removing the meta columns from the dataframe
for col in cols_to_remove:
if col in training_df.columns:
del training_df[col]
# Read individual images from the path for scoring the images
import matplotlib.image as mpimg
from skimage.transform import resize
image_vals = []
for _ , row in training_data_frame.iterrows():
image_path = row[image_path_column]
image = mpimg.imread(image_path)
processed_image = resize(image, (28,28,1))
image_vals.append(processed_image.tolist())
scoring_data = {"input_data": [{"values": image_vals, "meta": meta_payload}]}
from ibm_watsonx_ai import APIClient
wml_client = APIClient(WML_CREDENTIALS)
wml_client.set.default_space(space_id)
score = wml_client.deployments.score(deployment_id, scoring_data)
score_predictions = score.get("predictions")[0]
prob_col_index = list(score_predictions.get(
"fields")).index(probability_column_name)
predict_col_index = list(score_predictions.get(
"fields")).index(prediction_column_name)
if prob_col_index < 0 or predict_col_index < 0:
raise Exception(
"Missing prediction/probability column in the scoring response")
import numpy as np
import pandas as pd
probability_array = np.array([value[prob_col_index]
for value in score_predictions.get("values")])
prediction_vector = np.array(
[value[predict_col_index] for value in score_predictions.get("values")])
# # Incorporate the prediction and probability values into the training dataframe
training_data_frame =training_data_frame.assign( **{prediction_column_name: prediction_vector, probability_column_name: probability_array.tolist()})
# Remove the image_path_column since it does not qualify as a feature column for the model.
training_data_frame = training_data_frame.drop([image_path_column], axis=1)
return training_data_frame
except Exception as ex:
raise Exception("Scoring failed. Error: {}".format(str(ex)))