diff --git a/docs/how-to-guides/develop-a-flow/develop-standard-flow.md b/docs/how-to-guides/develop-a-flow/develop-standard-flow.md new file mode 100644 index 00000000000..b8f6d6c2e18 --- /dev/null +++ b/docs/how-to-guides/develop-a-flow/develop-standard-flow.md @@ -0,0 +1,313 @@ +# Develop standard flow + +:::{admonition} Experimental feature +This is an experimental feature, and may change at any time. Learn [more](../faq.md#stable-vs-experimental). +::: + +From this document, you can learn how to develop a standard flow by writing a flow yaml from scratch. You can +find additional information about flow yaml schema in [Flow YAML Schema](../../reference/flow-yaml-schema-reference.md). + +## Flow input data +The flow input data is the data that you want to process in your flow. +::::{tab-set} +:::{tab-item} CLI +:sync: CLI +You can add a flow input in inputs section of flow yaml. +```yaml +inputs: + url: + type: string + default: https://www.microsoft.com/en-us/d/xbox-wireless-controller-stellar-shift-special-edition/94fbjc7h0h6h +``` +::: + +:::{tab-item} VS Code Extension +:sync: VS Code Extension +When unfolding Inputs section in the authoring page, you can set and view your flow inputs, including input schema (name and type), +and the input value. + +![flow_input](../../media/how-to-guides/develop-standard-flow/flow_input.png) +::: + +:::: +For Web Classification sample as shown the screenshot above, the flow input is a url of string type. We also support +the input type of `int`, `bool`, `double`, `list` and `object`. + +## Develop the flow using different tools +In one flow, you can consume different kinds of tools. We now support built-in tool like +[LLM](../../reference/tools-reference/llm-tool.md), [Python](../../reference/tools-reference/python-tool.md) and +[Prompt](../../reference/tools-reference/prompt-tool.md) and +third-party tool like [Serp API](../../reference/tools-reference/serp-api-tool.md), +[Vector Search](../../reference/tools-reference/vector_db_lookup_tool.md), etc. + +### Add tool as your need +::::{tab-set} +:::{tab-item} CLI +:sync: CLI +You can add a tool node in nodes section of flow yaml. For example, yaml below shows how to add a Python tool node in the flow. + +```yaml +nodes: +- name: fetch_text_content_from_url + type: python + source: + type: code + path: fetch_text_content_from_url.py + inputs: + url: ${inputs.url} +``` +::: + +:::{tab-item} VS Code Extension +:sync: VS Code Extension +By selecting the tool card on the very top, you'll add a new tool node to flow. + +![add_tool](../../media/how-to-guides/develop-standard-flow/add_tool.png) +::: + +:::: + +### Edit tool +::::{tab-set} +:::{tab-item} CLI +:sync: CLI +You can edit the tool by simply opening the source file and making edits. For example, we provide a simple Python tool code below. + +```python +from promptflow import tool + +# The inputs section will change based on the arguments of the tool function, after you save the code +# Adding type to arguments and return value will help the system show the types properly +# Please update the function name/signature per need +@tool +def my_python_tool(input1: str) -> str: + return 'hello ' + input1 +``` + +We also provide an LLM tool prompt below. + +```jinja +Please summarize the following text in one paragraph. 100 words. +Do not add any information that is not in the text. +Text: {{text}} +Summary: +``` +::: + +:::{tab-item} VS Code Extension +:sync: VS Code Extension +When a new tool node is added to flow, it will be appended at the bottom of flatten view with a random name by default. +At the top of each tool node card, there's a toolbar for adjusting the tool node. You can move it up or down, you can delete or rename it too. +For a python tool node, you can edit the tool code by clicking the code file. For a LLM tool node, you can edit the +tool prompt by clicking the prompt file and adjust input parameters like connection, api and etc. +![edit_tool](../../media/how-to-guides/develop-standard-flow/edit_tool.png) +::: + +:::: + +### Create connection +Please refer to the [Create necessary connections](../quick-start.md#create-necessary-connections) for details. + +## Chain your flow - link nodes together +Before linking nodes together, you need to define and expose an interface. + +### Define LLM node interface +LLM node has only one output, the completion given by LLM provider. + +As for inputs, we offer a templating strategy that can help you create parametric prompts that accept different input +values. Instead of fixed text, enclose your input name in `{{}}`, so it can be replaced on the fly. We use Jinja as our +templating language. For example: + +```jinja +Your task is to classify a given url into one of the following types: +Movie, App, Academic, Channel, Profile, PDF or None based on the text content information. +The classification will be based on the url, the webpage text content summary, or both. + +Here are a few examples: +{% for ex in examples %} +URL: {{ex.url}} +Text content: {{ex.text_content}} +OUTPUT: +{"category": "{{ex.category}}", "evidence": "{{ex.evidence}}"} + +{% endfor %} + +For a given URL : {{url}}, and text content: {{text_content}}. +Classify above url to complete the category and indicate evidence. +OUTPUT: +``` + +### Define Python node interface +Python node might have multiple inputs and outputs. Define inputs and outputs as shown below. +If you have multiple outputs, remember to make it a dictionary so that the downstream node can call each key separately. +For example: + +```python +import json +from promptflow import tool + +@tool +def convert_to_dict(input_str: str, input_str2: str) -> dict: + try: + print(input_str2) + return json.loads(input_str) + except Exception as e: + print("input is not valid, error: {}".format(e)) + return {"category": "None", "evidence": "None"} +``` + +### Link nodes together +After the interface is defined, you can use: + +- ${inputs.key} to link with flow input. +- ${upstream_node_name.output} to link with single-output upstream node. +- ${upstream_node_name.output.key} to link with multi-output upstream node. + +Below are common scenarios for linking nodes together. + +### Scenario 1 - Link LLM node with flow input and single-output upstream node +After you add a new LLM node and edit the prompt file like [Define LLM node interface](#define-llm-node-interface), +three inputs called `url`, `examples` and `text_content` are created in inputs section. + +::::{tab-set} +:::{tab-item} CLI +:sync: CLI +You can link the LLM node input with flow input by `${inputs.url}`. +And you can link `examples` to the upstream `prepare_examples` node and `text_content` to the `summarize_text_content` node +by `${prepare_examples.output}` and `${summarize_text_content.output}`. +```yaml +- name: classify_with_llm + type: llm + source: + type: code + path: classify_with_llm.jinja2 + inputs: + deployment_name: text-davinci-003 + suffix: "" + max_tokens: 128 + temperature: 0.2 + top_p: 1 + echo: false + presence_penalty: 0 + frequency_penalty: 0 + best_of: 1 + url: ${inputs.url} # Link with flow input + examples: ${prepare_examples.output} # Link LLM node with single-output upstream node + text_content: ${summarize_text_content.output} # Link LLM node with single-output upstream node +``` +::: + +:::{tab-item} VS Code Extension +:sync: VS Code Extension +In the value drop-down, select `${inputs.url}`, `${prepare_examples.output}` and `${summarize_text_content.output}`, then +you'll see in the graph view that the newly created LLM node is linked to the flow input, upstream `prepare_examples` and `summarize_text_content` node. + +![link_llm_with_flow_input_single_output_node](../../media/how-to-guides/develop-standard-flow/link_llm_with_flow_input_single_output_node.png) +::: + +:::: +When running the flow, the `url` input of the node will be replaced by flow input on the fly, and the `examples` and +`text_content` input of the node will be replaced by `prepare_examples` and `summarize_text_content` node output on the fly. + +### Scenario 2 - Link LLM node with multi-output upstream node +Suppose we want to link the newly created LLM node with `covert_to_dict` Python node whose output is a dictionary with two keys: `category` and `evidence`. +::::{tab-set} +:::{tab-item} CLI +:sync: CLI +You can link `examples` to the `evidence` output of upstream `covert_to_dict` node by `${convert_to_dict.output.evidence}` like below: +```yaml +- name: classify_with_llm + type: llm + source: + type: code + path: classify_with_llm.jinja2 + inputs: + deployment_name: text-davinci-003 + suffix: "" + max_tokens: 128 + temperature: 0.2 + top_p: 1 + echo: false + presence_penalty: 0 + frequency_penalty: 0 + best_of: 1 + text_content: ${convert_to_dict.output.evidence} # Link LLM node with multi-output upstream node +``` +::: + +:::{tab-item} VS Code Extension +:sync: VS Code Extension +In the value drop-down, select `${convert_to_dict.output}`, then manually append `evidence`, then you'll see in the graph +view that the newly created LLM node is linked to the upstream `convert_to_dict node`. + +![link_llm_with_multi_output_node](../../media/how-to-guides/develop-standard-flow/link_llm_with_multi_output_node.png) +::: +:::: +When running the flow, the `text_content` input of the node will be replaced by `evidence` value from `convert_to_dict node` output dictionary on the fly. + +### Scenario 3 - Link Python node with upstream node/flow input +After you add a new Python node and edit the code file like [Define Python node interface](#define-python-node-interface)], +two inputs called `input_str` and `input_str2` are created in inputs section. The linkage is the same as LLM node, +using `${flow.input_name}` to link with flow input or `${upstream_node_name.output}` to link with upstream node. + +::::{tab-set} +:::{tab-item} CLI +:sync: CLI +```yaml +- name: prepare_examples + type: python + source: + type: code + path: prepare_examples.py + inputs: + input_str: ${inputs.url} # Link Python node with flow input + input_str2: ${fetch_text_content_from_url.output} # Link Python node with single-output upstream node +``` +::: + +:::{tab-item} VS Code Extension +:sync: VS Code Extension + +![link_python_with_flow_node_input](../../media/how-to-guides/develop-standard-flow/link_python_with_flow_node_input.png) +::: + +:::: +When running the flow, the `input_str` input of the node will be replaced by flow input on the fly and the `input_str2` +input of the node will be replaced by `fetch_text_content_from_url` node output dictionary on the fly. + +## Set flow output +When the flow is complicated, instead of checking outputs on each node, you can set flow output and check outputs of +multiple nodes in one place. Moreover, flow output helps: + +- Check bulk test results in one single table. +- Define evaluation interface mapping. +- Set deployment response schema. + +::::{tab-set} +:::{tab-item} CLI +:sync: CLI +You can add flow outputs in outputs section of flow yaml . The linkage is the same as LLM node, +using `${convert_to_dict.output.category}` to link `category` flow output with with `category` value of upstream node +`convert_to_dict`. + +```yaml +outputs: + category: + type: string + reference: ${convert_to_dict.output.category} + evidence: + type: string + reference: ${convert_to_dict.output.evidence} +``` +::: + +:::{tab-item} VS Code Extension +:sync: VS Code Extension +First define flow output schema, then select in drop-down the node whose output you want to set as flow output. +Since `convert_to_dict` has a dictionary output with two keys: `category` and `evidence`, you need to manually append +`category` and `evidence` to each. Then run flow, after a while, you can check flow output in a table. + +![flow_output](../../media/how-to-guides/develop-standard-flow/flow_output.png) +::: + +:::: \ No newline at end of file diff --git a/docs/how-to-guides/develop-a-flow/index.md b/docs/how-to-guides/develop-a-flow/index.md new file mode 100644 index 00000000000..3d409421419 --- /dev/null +++ b/docs/how-to-guides/develop-a-flow/index.md @@ -0,0 +1,9 @@ +# Develop a flow +We provide guides on how to develop a flow by writing a flow yaml from scratch in this section. + +```{toctree} +:maxdepth: 1 +:hidden: + +develop-standard-flow +``` \ No newline at end of file diff --git a/docs/how-to-guides/index.md b/docs/how-to-guides/index.md index 852cde13deb..a6be983dd90 100644 --- a/docs/how-to-guides/index.md +++ b/docs/how-to-guides/index.md @@ -5,6 +5,7 @@ Simple and short articles grouped by topics, each introduces a core feature of p ```{toctree} :maxdepth: 1 +develop-a-flow/index init-and-test-a-flow run-and-evaluate-a-flow tune-prompts-with-variants diff --git a/docs/how-to-guides/quick-start.md b/docs/how-to-guides/quick-start.md index 658fab9a057..a655b4571cf 100644 --- a/docs/how-to-guides/quick-start.md +++ b/docs/how-to-guides/quick-start.md @@ -110,6 +110,7 @@ inputs: default: https://play.google.com/store/apps/details?id=com.twitter.android ... ``` +See more details of this topic in [Develop a flow](./develop-a-flow/index.md). ### Create necessary connections @@ -283,6 +284,7 @@ See more details of this topic in [Initialize and test a flow](./init-and-test-a ## Next steps Learn more on how to: +- [Develop a flow](./develop-a-flow/index.md): details on how to develop a flow by writing a flow yaml from scratch. - [Initialize and test a flow](./init-and-test-a-flow.md): details on how develop a flow from scratch or existing code. - [Run and evaluate a flow](./run-and-evaluate-a-flow.md): run and evaluate the flow using multi line data file. - [Deploy a flow](./deploy-a-flow/index.md): how to deploy the flow as a web app. diff --git a/docs/index.md b/docs/index.md index 33aecaaea14..7afeb5cd245 100644 --- a/docs/index.md +++ b/docs/index.md @@ -38,6 +38,7 @@ This documentation site contains guides for prompt flow [sdk, cli](https://pypi. - header: "📒 How-to Guides" content: " Articles guide user to complete a specific task in prompt flow.

+ - [Develop a flow](how-to-guides/develop-a-flow/index.md)
- [Initialize and test a flow](how-to-guides/init-and-test-a-flow.md)
- [Run and evaluate a flow](how-to-guides/run-and-evaluate-a-flow.md)
- [Tune prompts using variants](how-to-guides/tune-prompts-with-variants.md)
diff --git a/docs/media/how-to-guides/develop-standard-flow/add_tool.png b/docs/media/how-to-guides/develop-standard-flow/add_tool.png new file mode 100644 index 00000000000..50596457c2f Binary files /dev/null and b/docs/media/how-to-guides/develop-standard-flow/add_tool.png differ diff --git a/docs/media/how-to-guides/develop-standard-flow/edit_tool.png b/docs/media/how-to-guides/develop-standard-flow/edit_tool.png new file mode 100644 index 00000000000..139e64e1888 Binary files /dev/null and b/docs/media/how-to-guides/develop-standard-flow/edit_tool.png differ diff --git a/docs/media/how-to-guides/develop-standard-flow/flow_input.png b/docs/media/how-to-guides/develop-standard-flow/flow_input.png new file mode 100644 index 00000000000..9d4c43e11a3 Binary files /dev/null and b/docs/media/how-to-guides/develop-standard-flow/flow_input.png differ diff --git a/docs/media/how-to-guides/develop-standard-flow/flow_output.png b/docs/media/how-to-guides/develop-standard-flow/flow_output.png new file mode 100644 index 00000000000..f821486c2e8 Binary files /dev/null and b/docs/media/how-to-guides/develop-standard-flow/flow_output.png differ diff --git a/docs/media/how-to-guides/develop-standard-flow/link_llm_with_flow_input_single_output_node.png b/docs/media/how-to-guides/develop-standard-flow/link_llm_with_flow_input_single_output_node.png new file mode 100644 index 00000000000..1425cf83a18 Binary files /dev/null and b/docs/media/how-to-guides/develop-standard-flow/link_llm_with_flow_input_single_output_node.png differ diff --git a/docs/media/how-to-guides/develop-standard-flow/link_llm_with_multi_output_node.png b/docs/media/how-to-guides/develop-standard-flow/link_llm_with_multi_output_node.png new file mode 100644 index 00000000000..e33aa6f3c90 Binary files /dev/null and b/docs/media/how-to-guides/develop-standard-flow/link_llm_with_multi_output_node.png differ diff --git a/docs/media/how-to-guides/develop-standard-flow/link_python_with_flow_node_input.png b/docs/media/how-to-guides/develop-standard-flow/link_python_with_flow_node_input.png new file mode 100644 index 00000000000..07e2639fbdc Binary files /dev/null and b/docs/media/how-to-guides/develop-standard-flow/link_python_with_flow_node_input.png differ diff --git a/docs/reference/flow-yaml-schema-reference.md b/docs/reference/flow-yaml-schema-reference.md new file mode 100644 index 00000000000..1aad6a0c04a --- /dev/null +++ b/docs/reference/flow-yaml-schema-reference.md @@ -0,0 +1,79 @@ +# Flow YAML Schema + +:::{admonition} Experimental feature +This is an experimental feature, and may change at any time. Learn [more](../how-to-guides/faq.md#stable-vs-experimental). +::: + +The source JSON schema can be found at [Flow.schema.json](https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json) + +## YAML syntax + +| Key | Type | Description | +|----------------------------|-----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `$schema` | string | The YAML schema. If you use the Prompt flow VS Code extension to author the YAML file, including `$schema` at the top of your file enables you to invoke schema and resource completions. | +| `inputs` | object | Dictionary of flow inputs. The key is a name for the input within the context of the flow and the value is the flow input definition. | +| `inputs.` | object | The flow input definition. See [Flow input](#flow-input) for the set of configurable properties. | +| `outputs` | object | Dictionary of flow outputs. The key is a name for the output within the context of the flow and the value is the flow output definition. | +| `outputs.` | object | The component output definition. See [Flow output](#flow-output) for the set of configurable properties. | +| `nodes` | array | Sets of dictionary of individual nodes to run as steps within the flow. Node can use built-in tool or third-party tool. See [Nodes](#nodes) for more information. | +| `node_variants` | object | Dictionary of nodes with variants. The key is the node name and value contains variants definition and `default_variant_id`. See [Node variants](#node-variants) for more information. | +| `environment` | object | The environment to use for the flow. The key can be `image` or `python_requirements_txt` and the value can be either a image or a python requirements text file. | +| `additional_includes` | array | Additional includes is a list of files that can be shared among flows. Users can specify additional files and folders used by flow, and Prompt flow will help copy them all to the snapshot during flow creation. | + + +### Flow input + +| Key | Type | Description | Allowed values | +|-------------------|-------------------------------------------|------------------------------------------------------|-----------------------------------------------------| +| `type` | string | The type of flow input. | `int`, `double`, `bool`, `string`, `list`, `object` | +| `description` | string | Description of the input. | | +| `default` | int, double, bool, string, list or object | The default value for the input. | | +| `is_chat_input` | boolean | Whether the input is the chat flow input. | | +| `is_chat_history` | boolean | Whether the input is the chat history for chat flow. | | + +### Flow output + +| Key | Type | Description | Allowed values | +|------------------|---------|-------------------------------------------------------------------------------|-----------------------------------------------------| +| `type` | string | The type of flow output. | `int`, `double`, `bool`, `string`, `list`, `object` | +| `description` | string | Description of the output. | | +| `reference` | string | A reference to the node output, e.g. ${.output.} | | +| `is_chat_output` | boolean | Whether the output is the chat flow output. | | + +### Nodes +Nodes is a set of node which is a dictionary with following fields. Below, we only show the common fields of a single node using built-in tool. + +| Key | Type | Description | Allowed values | +|----------------|--------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------| +| `name` | string | The name of the node. | | +| `type` | string | The type of the node. | Type of built-in tool like `Python`, `Prompt`, `LLM` and third-party tool like `Vector Search`, etc. | +| `inputs` | object | Dictionary of node inputs. The key is the input name and the value can be primitive value or a reference to the flow input or the node output, e.g. `${inputs.}`, `${.output}` or `${.output.}` | | +| `source` | object | Dictionary of tool source used by the node. The key contains `type`, `path` and `tool`. The type can be `code`, `package` and `package_with_prompt`. | | +| `provider` | string | It indicates the provider of the tool. Used when the `type` is LLM. | `AzureOpenAI` or `OpenAI` | +| `connection` | string | The connection name which has been created before. Used when the `type` is LLM. | | +| `api` | string | The api name of the provider. Used when the `type` is LLM. | | +| `module` | string | The module name of the tool using by the node. Used when the `type` is LLM. | | +| `use_variants` | bool | Whether the node has variants. | | + + +### Node variants +Node variants is a dictionary containing variants definition for nodes with variants with their respective node names as dictionary keys. +Below, we explore the variants for a single node. + +| Key | Type | Description | Allowed values | +|----------------------|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------| +| `` | string | The name of the node. | | +| `default_variant_id` | string | Default variant id. | | +| `variants ` | object | This dictionary contains all node variations, with the variant id serving as the key and a node definition dictionary as the corresponding value. Within the node definition dictionary, the key labeled 'node' should contain a variant definition similar to [Nodes](#nodes), excluding the 'name' field. | | + + + +## Examples + +Flow examples are available in the [GitHub repository](https://github.com/microsoft/promptflow/tree/main/examples/flows). + +- [basic](https://github.com/microsoft/promptflow/tree/main/examples/flows/standard/basic) +- [web-classification](https://github.com/microsoft/promptflow/tree/main/examples/flows/standard/web-classification) +- [basic-chat](https://github.com/microsoft/promptflow/tree/main/examples/flows/chat/basic-chat) +- [chat-with-pdf](https://github.com/microsoft/promptflow/tree/main/examples/flows/chat/chat-with-pdf) +- [eval-basic](https://github.com/microsoft/promptflow/tree/main/examples/flows/evaluation/eval-basic) \ No newline at end of file diff --git a/docs/reference/index.md b/docs/reference/index.md index b37481d585e..e8eaf140815 100644 --- a/docs/reference/index.md +++ b/docs/reference/index.md @@ -37,4 +37,12 @@ tools-reference/serp-api-tool tools-reference/faiss_index_lookup_tool tools-reference/vector_db_lookup_tool tools-reference/embedding_tool +``` + +```{toctree} +:caption: YAML Schema +:maxdepth: 1 + +flow-yaml-schema-reference.md + ``` \ No newline at end of file diff --git a/examples/flows/chat/chat-with-wikipedia/get_wiki_url.py b/examples/flows/chat/chat-with-wikipedia/get_wiki_url.py index bc4578f1b21..e371fea6d17 100644 --- a/examples/flows/chat/chat-with-wikipedia/get_wiki_url.py +++ b/examples/flows/chat/chat-with-wikipedia/get_wiki_url.py @@ -35,7 +35,7 @@ def get_wiki_url(entity: str, count=2): if mw_divs: # mismatch result_titles = [decode_str(div.get_text().strip()) for div in mw_divs] result_titles = [remove_nested_parentheses(result_title) for result_title in result_titles] - print(f"Could not find {entity}. Similar ententity: {result_titles[:count]}.") + print(f"Could not find {entity}. Similar entity: {result_titles[:count]}.") url_list.extend( [f"https://en.wikipedia.org/w/index.php?search={result_title}" for result_title in result_titles] ) diff --git a/examples/flows/standard/basic/hello.py b/examples/flows/standard/basic/hello.py index 95cb1bc94e6..035df3ac801 100644 --- a/examples/flows/standard/basic/hello.py +++ b/examples/flows/standard/basic/hello.py @@ -38,7 +38,7 @@ def my_python_tool( load_dotenv() if "AZURE_OPENAI_API_KEY" not in os.environ: - raise Exception("Please sepecify environment variables: AZURE_OPENAI_API_KEY") + raise Exception("Please specify environment variables: AZURE_OPENAI_API_KEY") conn = dict( api_key=os.environ["AZURE_OPENAI_API_KEY"], diff --git a/examples/flows/standard/web-classification/convert_to_dict.py b/examples/flows/standard/web-classification/convert_to_dict.py index 3b287df5c65..8e9490b801a 100644 --- a/examples/flows/standard/web-classification/convert_to_dict.py +++ b/examples/flows/standard/web-classification/convert_to_dict.py @@ -8,5 +8,5 @@ def convert_to_dict(input_str: str): try: return json.loads(input_str) except Exception as e: - print("input is not valid, error: {}".format(e)) + print("The input is not valid, error: {}".format(e)) return {"category": "None", "evidence": "None"} diff --git a/src/promptflow-tools/promptflow/version.txt b/src/promptflow-tools/promptflow/version.txt index b7cbee9a727..e3c3372d4a2 100644 --- a/src/promptflow-tools/promptflow/version.txt +++ b/src/promptflow-tools/promptflow/version.txt @@ -1 +1 @@ -VERSION = "0.1.0b5" \ No newline at end of file +VERSION = "0.1.0b6" \ No newline at end of file diff --git a/src/promptflow/CHANGELOG.md b/src/promptflow/CHANGELOG.md index ebce9783d26..765eb500c52 100644 --- a/src/promptflow/CHANGELOG.md +++ b/src/promptflow/CHANGELOG.md @@ -1,5 +1,11 @@ # Release History +## 0.1.0b6 (Upcoming) + +### Features Added + +- Add token metrics in run properties + ## 0.1.0b5 (2023.09.08) ### Features Added diff --git a/src/promptflow/promptflow/_core/_errors.py b/src/promptflow/promptflow/_core/_errors.py index bfb7023f18a..b89482ba09a 100644 --- a/src/promptflow/promptflow/_core/_errors.py +++ b/src/promptflow/promptflow/_core/_errors.py @@ -121,3 +121,11 @@ class MetaFileNotFound(GenerateMetaUserError): class MetaFileReadError(GenerateMetaUserError): pass + + +class RunRecordNotFound(SystemErrorException): + pass + + +class FlowOutputUnserializable(UserErrorException): + pass diff --git a/src/promptflow/promptflow/_core/run_tracker.py b/src/promptflow/promptflow/_core/run_tracker.py index e278e216396..ec66b889fba 100644 --- a/src/promptflow/promptflow/_core/run_tracker.py +++ b/src/promptflow/promptflow/_core/run_tracker.py @@ -8,6 +8,7 @@ from types import GeneratorType from typing import Any, Dict, List, Mapping, Optional, Union +from promptflow._core._errors import FlowOutputUnserializable, RunRecordNotFound from promptflow._core.log_manager import NodeLogManager from promptflow._core.thread_local_singleton import ThreadLocalSingleton from promptflow._utils.dataclass_serializer import serialize @@ -16,7 +17,7 @@ from promptflow.contracts.run_info import FlowRunInfo, RunInfo, Status from promptflow.contracts.run_mode import RunMode from promptflow.contracts.tool import ConnectionType -from promptflow.exceptions import ErrorTarget, UserErrorException, ValidationException +from promptflow.exceptions import ErrorTarget from promptflow.storage import AbstractRunStorage from promptflow.storage._run_storage import DummyRunStorage @@ -216,7 +217,14 @@ def end_run( ): run_info = self._flow_runs.get(run_id) or self._node_runs.get(run_id) if run_info is None: - raise RunRecordNotFound(message=f"Run {run_id} not found", target=ErrorTarget.RUN_TRACKER) + raise RunRecordNotFound( + message_format=( + "Run record with ID '{run_id}' was not tracked in promptflow execution. " + "Please contact support for further assistance." + ), + target=ErrorTarget.RUN_TRACKER, + run_id=run_id, + ) if isinstance(run_info, FlowRunInfo): self._flow_run_postprocess(run_info, result, ex) elif isinstance(run_info, RunInfo): @@ -247,14 +255,26 @@ def _ensure_inputs_is_json_serializable(self, inputs: dict, node_name: str) -> d } def _assert_flow_output_serializable(self, output: Any) -> Any: - try: - return {k: self._ensure_serializable_value(v) for k, v in output.items()} - except Exception as e: - # If it is flow output not node output, raise an exception. - raise UserErrorException( - f"Flow output must be json serializable, dump json failed: {e}", - target=ErrorTarget.FLOW_EXECUTOR, - ) from e + serializable_output = {} + for k, v in output.items(): + try: + serializable_output[k] = self._ensure_serializable_value(v) + except Exception as e: + # If a specific key-value pair is not serializable, raise an exception with the key. + error_type_and_message = f"({e.__class__.__name__}) {e}" + message_format = ( + "The output '{output_name}' for flow is incorrect. The output value is not JSON serializable. " + "JSON dump failed: {error_type_and_message}. Please verify your flow output and " + "make sure the value serializable." + ) + raise FlowOutputUnserializable( + message_format=message_format, + target=ErrorTarget.FLOW_EXECUTOR, + output_name=k, + error_type_and_message=error_type_and_message, + ) from e + + return serializable_output def _enrich_run_info_with_exception(self, run_info: Union[RunInfo, FlowRunInfo], ex: Exception): """Update exception details into run info.""" @@ -283,7 +303,12 @@ def ensure_run_info(self, run_id: str) -> Union[RunInfo, FlowRunInfo]: run_info = self._node_runs.get(run_id) or self._flow_runs.get(run_id) if run_info is None: raise RunRecordNotFound( - message=f"Run {run_id} not found when tracking inputs", target=ErrorTarget.RUN_TRACKER + message_format=( + "Run record with ID '{run_id}' was not tracked in promptflow execution. " + "Please contact support for further assistance." + ), + target=ErrorTarget.RUN_TRACKER, + run_id=run_id, ) return run_info @@ -390,7 +415,3 @@ def get_status_summary(self, run_id: str): def persist_status_summary(self, status_summary: Dict[str, int], run_id: str): self._storage.persist_status_summary(status_summary, run_id) - - -class RunRecordNotFound(ValidationException): - pass diff --git a/src/promptflow/promptflow/_sdk/_constants.py b/src/promptflow/promptflow/_sdk/_constants.py index 1f080e2fcc8..161dacfcdad 100644 --- a/src/promptflow/promptflow/_sdk/_constants.py +++ b/src/promptflow/promptflow/_sdk/_constants.py @@ -140,6 +140,7 @@ class FlowRunProperties: OUTPUT_PATH = "output_path" NODE_VARIANT = "node_variant" RUN = "run" + SYSTEM_METRICS = "system_metrics" class CommonYamlFields: diff --git a/src/promptflow/promptflow/_sdk/_orm/run_info.py b/src/promptflow/promptflow/_sdk/_orm/run_info.py index 53efdfdddcb..67aef235776 100644 --- a/src/promptflow/promptflow/_sdk/_orm/run_info.py +++ b/src/promptflow/promptflow/_sdk/_orm/run_info.py @@ -11,7 +11,12 @@ from sqlalchemy.exc import IntegrityError from sqlalchemy.orm import declarative_base -from promptflow._sdk._constants import RUN_INFO_CREATED_ON_INDEX_NAME, RUN_INFO_TABLENAME, ListViewType +from promptflow._sdk._constants import ( + RUN_INFO_CREATED_ON_INDEX_NAME, + RUN_INFO_TABLENAME, + FlowRunProperties, + ListViewType, +) from promptflow._sdk._errors import RunExistsError, RunNotFoundError from .retry import sqlite_retry @@ -89,6 +94,7 @@ def update( tags: Optional[Dict[str, str]] = None, start_time: Optional[Union[str, datetime.datetime]] = None, end_time: Optional[Union[str, datetime.datetime]] = None, + system_metrics: Optional[Dict[str, int]] = None, ) -> None: update_dict = {} if status is not None: @@ -110,7 +116,20 @@ def update( self.end_time = end_time if isinstance(end_time, str) else end_time.isoformat() update_dict["end_time"] = self.end_time with mgmt_db_session() as session: - session.query(RunInfo).filter(RunInfo.name == self.name).update(update_dict) + # if not update system metrics, we can directly update the row; + # otherwise, we need to get properties first, update the dict and finally update the row + if system_metrics is None: + session.query(RunInfo).filter(RunInfo.name == self.name).update(update_dict) + else: + # with high concurrency on same row, we may lose the earlier commit + # we regard it acceptable as it should be an edge case to update properties + # on same row with high concurrency; + # if it's a concern, we can move those properties to an extra column + run_info = session.query(RunInfo).filter(RunInfo.name == self.name).first() + props = json.loads(run_info.properties) + props[FlowRunProperties.SYSTEM_METRICS] = system_metrics.copy() + update_dict["properties"] = json.dumps(props) + session.query(RunInfo).filter(RunInfo.name == self.name).update(update_dict) session.commit() @staticmethod diff --git a/src/promptflow/promptflow/_sdk/entities/_run.py b/src/promptflow/promptflow/_sdk/entities/_run.py index 6094afd8945..602e1051cc0 100644 --- a/src/promptflow/promptflow/_sdk/entities/_run.py +++ b/src/promptflow/promptflow/_sdk/entities/_run.py @@ -194,6 +194,7 @@ def _from_orm_object(cls, obj: ORMRun) -> "Run": end_time=datetime.datetime.fromisoformat(str(obj.end_time)) if obj.end_time else None, status=str(obj.status), data=Path(obj.data).resolve().absolute().as_posix() if obj.data else None, + properties={FlowRunProperties.SYSTEM_METRICS: properties_json.get(FlowRunProperties.SYSTEM_METRICS, {})}, ) @classmethod diff --git a/src/promptflow/promptflow/_sdk/operations/_run_operations.py b/src/promptflow/promptflow/_sdk/operations/_run_operations.py index c06811a6b12..b18758e16ad 100644 --- a/src/promptflow/promptflow/_sdk/operations/_run_operations.py +++ b/src/promptflow/promptflow/_sdk/operations/_run_operations.py @@ -232,6 +232,7 @@ def _visualize(self, runs: List[Run], html_path: Optional[str] = None) -> None: metadata = RunMetadata( name=run.name, display_name=run.display_name, + create_time=run.created_on, tags=run.tags, lineage=run.run, metrics=self.get_metrics(name=run.name), diff --git a/src/promptflow/promptflow/_sdk/operations/_run_submitter.py b/src/promptflow/promptflow/_sdk/operations/_run_submitter.py index 35d9160277c..39eda5b976c 100644 --- a/src/promptflow/promptflow/_sdk/operations/_run_submitter.py +++ b/src/promptflow/promptflow/_sdk/operations/_run_submitter.py @@ -310,15 +310,17 @@ def _submit_bulk_run(self, flow: Flow, run: Run, local_storage: LocalStorageOper local_storage.dump_snapshot(flow) local_storage.dump_inputs(mapped_inputs) # result: outputs and metrics - # TODO: retrieve root run system metrics from executor return, we might store it in db local_storage.persist_result(bulk_result) - + # exceptions local_storage.dump_exception(exception=exception, bulk_results=bulk_result) + # system metrics: token related + system_metrics = bulk_result.get_openai_metrics() self.run_operations.update( name=run.name, status=status, end_time=datetime.datetime.now(), + system_metrics=system_metrics, ) def _resolve_data(self, run: Run): diff --git a/src/promptflow/promptflow/contracts/_run_management.py b/src/promptflow/promptflow/contracts/_run_management.py index 558570660a0..d2d974cffc5 100644 --- a/src/promptflow/promptflow/contracts/_run_management.py +++ b/src/promptflow/promptflow/contracts/_run_management.py @@ -19,6 +19,7 @@ class RunDetail: class RunMetadata: name: str display_name: str + create_time: str tags: Optional[List[Dict[str, str]]] lineage: Optional[str] metrics: Optional[Dict[str, Any]] diff --git a/src/promptflow/promptflow/executor/flow_executor.py b/src/promptflow/promptflow/executor/flow_executor.py index 6d2d992e389..c5875650877 100644 --- a/src/promptflow/promptflow/executor/flow_executor.py +++ b/src/promptflow/promptflow/executor/flow_executor.py @@ -741,7 +741,7 @@ def _extract_outputs(self, nodes_outputs, bypassed_nodes, flow_inputs): if not node: raise OutputReferenceNotExist( message_format=( - "Flow is defined incorrectly. The node '{node_name}' " + "The output '{output_name}' for flow is incorrect. The node '{node_name}' " "referenced by the output '{output_name}' can not found in flow. " "Please rectify the error in your flow and try again." ), diff --git a/src/promptflow/promptflow/executor/flow_validator.py b/src/promptflow/promptflow/executor/flow_validator.py index 9ef63b9e4f4..05abe4c4b18 100644 --- a/src/promptflow/promptflow/executor/flow_validator.py +++ b/src/promptflow/promptflow/executor/flow_validator.py @@ -33,11 +33,14 @@ def _ensure_nodes_order(flow: Flow): if i.value_type != InputValueType.NODE_REFERENCE: continue if i.value not in dependencies: - msg = ( - f"Node '{n.name}' references a non-existent node '{i.value}' in your flow. " - f"Please review your flow to ensure that the node name is accurately specified." + msg_format = ( + "Invalid node definitions found in the flow graph. Node '{node_name}' references " + "a non-existent node '{reference_node_name}' in your flow. Please review your flow to " + "ensure that the node name is accurately specified." + ) + raise NodeReferenceNotFound( + message_format=msg_format, node_name=n.name, reference_node_name=i.value ) - raise NodeReferenceNotFound(message=msg) dependencies[n.name].add(i.value) sorted_nodes = [] picked = set() @@ -50,9 +53,12 @@ def _ensure_nodes_order(flow: Flow): # Figure out the nodes names with circular dependency problem alphabetically remaining_nodes = sorted(list(set(dependencies.keys()) - picked)) raise NodeCircularDependency( - message=f"Node circular dependency has been detected among the nodes in your flow. " - f"Kindly review the reference relationships for the nodes {remaining_nodes} " - f"and resolve the circular reference issue in the flow." + message_format=( + "Invalid node definitions found in the flow graph. Node circular dependency has been detected " + "among the nodes in your flow. Kindly review the reference relationships for the nodes " + "{remaining_nodes} and resolve the circular reference issue in the flow." + ), + remaining_nodes=remaining_nodes, ) sorted_nodes.append(node_to_pick) picked.add(node_to_pick.name) @@ -73,9 +79,12 @@ def _validate_nodes_topology(flow: Flow) -> Flow: for node in flow.nodes: if node.name in node_names: raise DuplicateNodeName( - message=f"Node with name '{node.name}' appears more than once in the node definitions in your " - f"flow, which is not allowed. To address this issue, please review your " - f"flow and either rename or remove nodes with identical names.", + message_format=( + "Invalid node definitions found in the flow graph. Node with name '{node_name}' appears " + "more than once in the node definitions in your flow, which is not allowed. To address " + "this issue, please review your flow and either rename or remove nodes with identical names." + ), + node_name=node.name, ) node_names.add(node.name) for node in flow.nodes: @@ -83,13 +92,15 @@ def _validate_nodes_topology(flow: Flow) -> Flow: if v.value_type != InputValueType.FLOW_INPUT: continue if v.value not in flow.inputs: - msg = ( - f"Node '{node.name}' references flow input '{v.value}' which is not defined in your " - f"flow. To resolve this issue, please review your flow, " - f"ensuring that you either add the missing flow inputs or adjust node reference " - f"to the correct flow input." + msg_format = ( + "Invalid node definitions found in the flow graph. Node '{node_name}' references flow input " + "'{flow_input_name}' which is not defined in your flow. To resolve this issue, " + "please review your flow, ensuring that you either add the missing flow inputs " + "or adjust node reference to the correct flow input." + ) + raise InputReferenceNotFound( + message_format=msg_format, node_name=node.name, flow_input_name=v.value ) - raise InputReferenceNotFound(message=msg) return FlowValidator._ensure_nodes_order(flow) @staticmethod @@ -107,11 +118,14 @@ def resolve_flow_inputs_type(flow: Flow, inputs: Mapping[str, Any], idx: Optiona updated_inputs[k] = v.type.parse(inputs[k]) except Exception as e: line_info = "" if idx is None else f"in line {idx} of input data" - msg = ( - f"The value '{inputs[k]}' for flow input '{k}' {line_info} does not match the expected type " - f"'{v.type}'. Please review the input data or adjust the input type of '{k}' in your flow." + msg_format = ( + "The input for flow is incorrect. The value for flow input '{flow_input_name}' {line_info} " + "does not match the expected type '{expected_type}'. Please change flow input type " + "or adjust the input value in your input data." ) - raise InputTypeError(message=msg) from e + raise InputTypeError( + message_format=msg_format, flow_input_name=k, line_info=line_info, expected_type=v.type + ) from e return updated_inputs @staticmethod @@ -125,11 +139,12 @@ def ensure_flow_inputs_type(flow: Flow, inputs: Mapping[str, Any], idx: Optional for k, v in flow.inputs.items(): if k not in inputs: line_info = "in input data" if idx is None else f"in line {idx} of input data" - msg = ( - f"The value for flow input '{k}' is not provided {line_info}. " - f"Please review your input data or remove this input in your flow if it's no longer needed." + msg_format = ( + "The input for flow is incorrect. The value for flow input '{input_name}' is not " + "provided {line_info}. Please review your input data or remove this input in your flow " + "if it's no longer needed." ) - raise InputNotFound(message=msg) + raise InputNotFound(message_format=msg_format, input_name=k, line_info=line_info) return FlowValidator.resolve_flow_inputs_type(flow, inputs, idx) @staticmethod @@ -145,23 +160,36 @@ def convert_flow_inputs_for_node(flow: Flow, node: Node, inputs: Mapping[str, An for k, v in node.inputs.items(): if v.value_type == InputValueType.FLOW_INPUT: if v.value not in flow.inputs: - flow_input_keys = ", ".join(flow.inputs.keys()) if flow.inputs is not None else None raise InputNotFound( - message=f"Node input {k} is not found in flow input '{flow_input_keys}' for node" + message_format=( + "The input for node is incorrect. Node input '{node_input_name}' is not found " + "from flow inputs of node '{node_name}'. Please review the node definition in your flow." + ), + node_input_name=v.value, + node_name=node.name, ) if v.value not in inputs: - input_keys = ", ".join(inputs.keys()) raise InputNotFound( - message=f"Node input {k} is not found in input data with keys of '{input_keys}' for node" + message_format=( + "The input for node is incorrect. Node input '{node_input_name}' is not found " + "in input data for node '{node_name}'. Please verify the inputs data for the node." + ), + node_input_name=v.value, + node_name=node.name, ) try: updated_inputs[v.value] = flow.inputs[v.value].type.parse(inputs[v.value]) except Exception as e: - msg = ( - f"Input '{k}' for node '{node.name}' of value '{inputs[v.value]}' " - f"is not type '{flow.inputs[v.value].type}'." + msg_format = ( + "The input for node is incorrect. Value for input '{input_name}' of node '{node_name}' " + "is not type '{expected_type}'. Please review and rectify the input data." ) - raise InputTypeError(message=msg) from e + raise InputTypeError( + message_format=msg_format, + input_name=k, + node_name=node.name, + expected_type=flow.inputs[v.value].type, + ) from e return updated_inputs @staticmethod @@ -169,27 +197,30 @@ def _ensure_outputs_valid(flow: Flow): updated_outputs = {} for k, v in flow.outputs.items(): if v.reference.value_type == InputValueType.LITERAL and v.reference.value == "": - msg = ( - f"The reference is not specified for the output '{k}' in the flow. " - f"To rectify this, ensure that you accurately specify the reference in the flow." + msg_format = ( + "The output '{output_name}' for flow is incorrect. The reference is not specified for " + "the output '{output_name}' in the flow. To rectify this, " + "ensure that you accurately specify the reference in the flow." ) - raise EmptyOutputReference(message=msg) + raise EmptyOutputReference(message_format=msg_format, output_name=k) if v.reference.value_type == InputValueType.FLOW_INPUT and v.reference.value not in flow.inputs: - msg = ( - f"The output '{k}' references non-existent flow input '{v.reference.value}' in your flow. " - f"please carefully review your flow " - f"and correct the reference definition for the output in question." + msg_format = ( + "The output '{output_name}' for flow is incorrect. The output '{output_name}' references " + "non-existent flow input '{flow_input_name}' in your flow. Please carefully review your flow and " + "correct the reference definition for the output in question." + ) + raise OutputReferenceNotFound( + message_format=msg_format, output_name=k, flow_input_name=v.reference.value ) - raise OutputReferenceNotFound(message=msg) if v.reference.value_type == InputValueType.NODE_REFERENCE: node = flow.get_node(v.reference.value) if node is None: - msg = ( - f"The output '{k}' references non-existent node '{v.reference.value}' in your flow. " - f"To resolve this issue, please carefully review your flow " - f"and correct the reference definition for the output in question." + msg_format = ( + "The output '{output_name}' for flow is incorrect. The output '{output_name}' references " + "non-existent node '{node_name}' in your flow. To resolve this issue, please carefully review " + "your flow and correct the reference definition for the output in question." ) - raise OutputReferenceNotFound(message=msg) + raise OutputReferenceNotFound(message_format=msg_format, output_name=k, node_name=v.reference.value) if node.aggregation: msg = f"Output '{k}' references a reduce node '{v.reference.value}', will not take effect." logger.warning(msg) diff --git a/src/promptflow/tests/executor/e2etests/test_executor_validation.py b/src/promptflow/tests/executor/e2etests/test_executor_validation.py index 2465047758c..f0077f72f97 100644 --- a/src/promptflow/tests/executor/e2etests/test_executor_validation.py +++ b/src/promptflow/tests/executor/e2etests/test_executor_validation.py @@ -3,6 +3,7 @@ import pytest +from promptflow._core._errors import FlowOutputUnserializable from promptflow._core.tool_meta_generator import PythonParsingError from promptflow._core.tools_manager import APINotFound from promptflow.contracts._errors import FailedToImportModule @@ -30,19 +31,108 @@ @pytest.mark.usefixtures("use_secrets_config_file", "dev_connections") @pytest.mark.e2etest class TestValidation: + @pytest.mark.parametrize( + "flow_folder, yml_file, error_class, error_msg", + [ + ( + "nodes_names_duplicated", + "flow.dag.yaml", + DuplicateNodeName, + ( + "Invalid node definitions found in the flow graph. Node with name 'stringify_num' appears more " + "than once in the node definitions in your flow, which is not allowed. To " + "address this issue, please review your flow and either rename or remove " + "nodes with identical names." + ), + ), + ( + "source_file_missing", + "flow.dag.jinja.yaml", + InvalidSource, + ( + "Node source path 'summarize_text_content__variant_1.jinja2' is invalid on " + "node 'summarize_text_content'." + ), + ), + ( + "node_reference_not_found", + "flow.dag.yaml", + NodeReferenceNotFound, + ( + "Invalid node definitions found in the flow graph. Node 'divide_num_2' references a non-existent " + "node 'divide_num_3' in your flow. Please review your flow to ensure that the " + "node name is accurately specified." + ), + ), + ( + "node_circular_dependency", + "flow.dag.yaml", + NodeCircularDependency, + ( + "Invalid node definitions found in the flow graph. Node circular dependency has been detected " + "among the nodes in your flow. Kindly review the reference relationships for " + "the nodes ['divide_num', 'divide_num_1', 'divide_num_2'] and resolve the " + "circular reference issue in the flow." + ), + ), + ( + "flow_input_reference_invalid", + "flow.dag.yaml", + InputReferenceNotFound, + ( + "Invalid node definitions found in the flow graph. Node 'divide_num' references flow input 'num_1' " + "which is not defined in your flow. To resolve this issue, please review your " + "flow, ensuring that you either add the missing flow inputs or adjust node " + "reference to the correct flow input." + ), + ), + ( + "flow_output_reference_invalid", + "flow.dag.yaml", + EmptyOutputReference, + ( + "The output 'content' for flow is incorrect. The reference is not specified for the output " + "'content' in the flow. To rectify this, ensure that you accurately specify " + "the reference in the flow." + ), + ), + ( + "outputs_reference_not_valid", + "flow.dag.yaml", + OutputReferenceNotFound, + ( + "The output 'content' for flow is incorrect. The output 'content' references non-existent " + "node 'another_stringify_num' in your flow. To resolve this issue, please " + "carefully review your flow and correct the reference definition for the " + "output in question." + ), + ), + ( + "outputs_with_invalid_flow_inputs_ref", + "flow.dag.yaml", + OutputReferenceNotFound, + ( + "The output 'num' for flow is incorrect. The output 'num' references non-existent flow " + "input 'num11' in your flow. Please carefully review your flow and correct " + "the reference definition for the output in question." + ), + ), + ], + ) + def test_executor_create_failure_type_and_message( + self, flow_folder, yml_file, error_class, error_msg, dev_connections + ): + with pytest.raises(error_class) as exc_info: + FlowExecutor.create(get_yaml_file(flow_folder, WRONG_FLOW_ROOT, yml_file), dev_connections) + assert error_msg == exc_info.value.message + @pytest.mark.parametrize( "flow_folder, yml_file, error_class", [ - ("nodes_names_duplicated", "flow.dag.yaml", DuplicateNodeName), ("source_file_missing", "flow.dag.python.yaml", PythonParsingError), - ("source_file_missing", "flow.dag.jinja.yaml", InvalidSource), - ("node_reference_not_found", "flow.dag.yaml", NodeReferenceNotFound), - ("node_circular_dependency", "flow.dag.yaml", NodeCircularDependency), - ("flow_input_reference_invalid", "flow.dag.yaml", InputReferenceNotFound), - ("flow_output_reference_invalid", "flow.dag.yaml", EmptyOutputReference), ], ) - def test_executor_create(self, flow_folder, yml_file, error_class, dev_connections): + def test_executor_create_failure_type(self, flow_folder, yml_file, error_class, dev_connections): with pytest.raises(error_class): FlowExecutor.create(get_yaml_file(flow_folder, WRONG_FLOW_ROOT, yml_file), dev_connections) @@ -62,8 +152,6 @@ def test_node_topology_in_order(self, ordered_flow_folder, unordered_flow_folder @pytest.mark.parametrize( "flow_folder, error_class", [ - ("outputs_reference_not_valid", OutputReferenceNotFound), - ("outputs_with_invalid_flow_inputs_ref", OutputReferenceNotFound), ("invalid_connection", ConnectionNotFound), ("tool_type_missing", NotImplementedError), ("wrong_module", FailedToImportModule), @@ -88,6 +176,27 @@ def test_flow_run_input_type_invalid(self, flow_folder, line_input, error_class, with pytest.raises(error_class): executor.exec_line(line_input) + @pytest.mark.parametrize( + "flow_folder, line_input, error_class, error_msg", + [ + ( + "flow_output_unserializable", + {"num": "22"}, + FlowOutputUnserializable, + ( + "The output 'content' for flow is incorrect. The output value is not JSON serializable. " + "JSON dump failed: (TypeError) Object of type UnserializableClass is not JSON serializable. " + "Please verify your flow output and make sure the value serializable." + ), + ), + ], + ) + def test_flow_run_execution_errors(self, flow_folder, line_input, error_class, error_msg, dev_connections): + executor = FlowExecutor.create(get_yaml_file(flow_folder, WRONG_FLOW_ROOT), dev_connections) + # For now, there exception is designed to be swallowed in executor. But Run Info would have the error details + res = executor.exec_line(line_input) + assert error_msg == res.run_info.error["message"] + @pytest.mark.parametrize( "flow_folder, batch_input, error_message, error_class", [ @@ -104,8 +213,9 @@ def test_flow_run_input_type_invalid(self, flow_folder, line_input, error_class, "simple_flow_with_python_tool", [{"num": "hello"}], ( - "The value 'hello' for flow input 'num' in line 0 of input data does not match the expected " - "type 'int'. Please review the input data or adjust the input type of 'num' in your flow." + "The input for flow is incorrect. The value for flow input 'num' in line 0 of input data does not " + "match the expected type 'int'. Please change flow input type or adjust the input value in " + "your input data." ), "InputTypeError", ), @@ -136,18 +246,48 @@ def test_bulk_run_input_type_invalid(self, flow_folder, batch_input, error_messa ), f"Expected message {error_class} but got {str(bulk_result.line_results[0].run_info.error)}" @pytest.mark.parametrize( - "path_root, flow_folder, node_name, line_input, error_class", + "path_root, flow_folder, node_name, line_input, error_class, error_msg", [ - (FLOW_ROOT, "simple_flow_with_python_tool", "divide_num", {"num11": "22"}, InputNotFound), - (FLOW_ROOT, "simple_flow_with_python_tool", "divide_num", {"num": "hello"}, InputTypeError), - (WRONG_FLOW_ROOT, "flow_input_reference_invalid", "divide_num", {"num": "22"}, InputNotFound), + ( + FLOW_ROOT, + "simple_flow_with_python_tool", + "divide_num", + {"num11": "22"}, + InputNotFound, + ( + "The input for node is incorrect. Node input 'num' is not found in input data " + "for node 'divide_num'. Please verify the inputs data for the node." + ), + ), + ( + FLOW_ROOT, + "simple_flow_with_python_tool", + "divide_num", + {"num": "hello"}, + InputTypeError, + ( + "The input for node is incorrect. Value for input 'num' of node 'divide_num' " + "is not type 'int'. Please review and rectify the input data." + ), + ), + ( + WRONG_FLOW_ROOT, + "flow_input_reference_invalid", + "divide_num", + {"num": "22"}, + InputNotFound, + ( + "The input for node is incorrect. Node input 'num_1' is not found from flow " + "inputs of node 'divide_num'. Please review the node definition in your flow." + ), + ), ], ) def test_single_node_input_type_invalid( - self, path_root: str, flow_folder, node_name, line_input, error_class, dev_connections + self, path_root: str, flow_folder, node_name, line_input, error_class, error_msg, dev_connections ): # Single Node run - the inputs are from flow_inputs + dependency_nodes_outputs - with pytest.raises(error_class): + with pytest.raises(error_class) as exe_info: FlowExecutor.load_and_exec_node( flow_file=get_yaml_file(flow_folder, path_root), node_name=node_name, @@ -157,6 +297,8 @@ def test_single_node_input_type_invalid( raise_ex=True, ) + assert error_msg == exe_info.value.message + @pytest.mark.parametrize( "flow_folder, msg", [ diff --git a/src/promptflow/tests/executor/unittests/executor/test_flow_validator.py b/src/promptflow/tests/executor/unittests/executor/test_flow_validator.py index 226eb0f01b5..1fbb2bd6d92 100644 --- a/src/promptflow/tests/executor/unittests/executor/test_flow_validator.py +++ b/src/promptflow/tests/executor/unittests/executor/test_flow_validator.py @@ -32,34 +32,32 @@ def test_ensure_nodes_order(self, flow_folder, expected_node_order): ( "nodes_cycle", ( - "Node circular dependency has been detected among the nodes in your flow. " - "Kindly review the reference relationships for the nodes " - "['first_node', 'second_node'] and resolve the circular reference issue in " - "the flow." + "Invalid node definitions found in the flow graph. Node circular dependency has been detected " + "among the nodes in your flow. Kindly review the reference relationships for the nodes " + "['first_node', 'second_node'] and resolve the circular reference issue in the flow." ), ), ( "nodes_cycle_with_skip", ( - "Node circular dependency has been detected among the nodes in your flow. " - "Kindly review the reference relationships for the " - "nodes ['first_node', 'second_node'] and resolve the circular reference issue " - "in the flow." + "Invalid node definitions found in the flow graph. Node circular dependency has been detected " + "among the nodes in your flow. Kindly review the reference relationships for the nodes " + "['first_node', 'second_node'] and resolve the circular reference issue in the flow." ), ), ( "nodes_cycle_with_activate", ( - "Node circular dependency has been detected among the nodes in your flow. " - "Kindly review the reference relationships for the nodes ['first_node', " - "'second_node'] and resolve the circular reference issue in the flow." + "Invalid node definitions found in the flow graph. Node circular dependency has been detected " + "among the nodes in your flow. Kindly review the reference relationships " + "for the nodes ['first_node', 'second_node'] and resolve the circular reference issue in the flow." ), ), ( "wrong_node_reference", ( - "Node 'second_node' references a non-existent node 'third_node' in your flow. " - "Please review your flow to ensure that the node " + "Invalid node definitions found in the flow graph. Node 'second_node' references a non-existent " + "node 'third_node' in your flow. Please review your flow to ensure that the node " "name is accurately specified." ), ), diff --git a/src/promptflow/tests/sdk_cli_azure_test/e2etests/test_run_operations.py b/src/promptflow/tests/sdk_cli_azure_test/e2etests/test_run_operations.py index fa98392095c..75f17d69116 100644 --- a/src/promptflow/tests/sdk_cli_azure_test/e2etests/test_run_operations.py +++ b/src/promptflow/tests/sdk_cli_azure_test/e2etests/test_run_operations.py @@ -543,6 +543,7 @@ def submit(*args, **kwargs): data=f"{DATAS_DIR}/env_var_names.jsonl", ) + @pytest.mark.skip(reason="temporarily disable this for service-side error.") def test_automatic_runtime_creation_failure(self, pf): with pytest.raises(FlowRequestException) as e: diff --git a/src/promptflow/tests/sdk_cli_test/e2etests/test_flow_run.py b/src/promptflow/tests/sdk_cli_test/e2etests/test_flow_run.py index 083cb1b1915..bbae2db33ae 100644 --- a/src/promptflow/tests/sdk_cli_test/e2etests/test_flow_run.py +++ b/src/promptflow/tests/sdk_cli_test/e2etests/test_flow_run.py @@ -7,7 +7,7 @@ from promptflow import PFClient from promptflow._constants import PROMPTFLOW_CONNECTIONS -from promptflow._sdk._constants import LocalStorageFilenames, RunStatus +from promptflow._sdk._constants import FlowRunProperties, LocalStorageFilenames, RunStatus from promptflow._sdk._errors import InvalidFlowError, RunExistsError, RunNotFoundError from promptflow._sdk._run_functions import create_yaml_run from promptflow._sdk._utils import _get_additional_includes @@ -680,3 +680,9 @@ def test_error_message_dump(self, pf): run_dict = run._to_dict() assert "error" in run_dict assert run_dict["error"] == exception + + def test_system_metrics_in_properties(self, pf) -> None: + run = create_run_against_multi_line_data(pf) + assert FlowRunProperties.SYSTEM_METRICS in run.properties + assert isinstance(run.properties[FlowRunProperties.SYSTEM_METRICS], dict) + assert "total_tokens" in run.properties[FlowRunProperties.SYSTEM_METRICS] diff --git a/src/promptflow/tests/test_configs/wrong_flows/flow_output_unserializable/divide_num.py b/src/promptflow/tests/test_configs/wrong_flows/flow_output_unserializable/divide_num.py new file mode 100644 index 00000000000..576002520ae --- /dev/null +++ b/src/promptflow/tests/test_configs/wrong_flows/flow_output_unserializable/divide_num.py @@ -0,0 +1,14 @@ +from promptflow import tool + + +@tool +def divide_num(num: int): + return UnserializableClass(num=(int)(num / 2)) + + +class UnserializableClass: + def __init__(self, num: int): + self.num = num + + def __str__(self): + return str(self.num) \ No newline at end of file diff --git a/src/promptflow/tests/test_configs/wrong_flows/flow_output_unserializable/flow.dag.yaml b/src/promptflow/tests/test_configs/wrong_flows/flow_output_unserializable/flow.dag.yaml new file mode 100644 index 00000000000..e2499fee0f1 --- /dev/null +++ b/src/promptflow/tests/test_configs/wrong_flows/flow_output_unserializable/flow.dag.yaml @@ -0,0 +1,17 @@ +inputs: + num: + type: int +outputs: + content: + type: string + reference: ${divide_num.output} +nodes: +- name: divide_num + type: python + source: + type: code + path: divide_num.py + inputs: + num: ${inputs.num} + +