This is a flow illustrating how to evaluate the performance of a classification system. It involves comparing each prediction to the groundtruth and assigns a "Correct" or "Incorrect" grade, and aggregating the results to produce metrics such as accuracy, which reflects how good the system is at classifying the data.
Tools used in this flow:
python
tool
In this flow, you will learn
- how to compose a point based evaluation flow, where you can calculate point-wise metrics.
- the way to log metrics. use
from promptflow import log_metric
- see file calculate_accuracy.py
Prepare your Azure Open AI resource follow this instruction and get your api_key
if you don't have one.
# Override keys with --set to avoid yaml file changes
pf connection create --file ../../../connections/azure_openai.yml --set api_key=<your_api_key> api_base=<your_api_base>
# test with default input value in flow.dag.yaml
pf flow test --flow .
# test with flow inputs
pf flow test --flow . --inputs groundtruth=APP prediction=APP
# test node with inputs
pf flow test --flow . --node grade --inputs groundtruth=groundtruth prediction=prediction
There are two ways to evaluate an classification flow.
pf run create --flow . --data ./data.jsonl --stream
Learn more in web-classification