Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DJM] Document custom host tags env variables #28577

Merged
merged 12 commits into from
Apr 11, 2025
18 changes: 16 additions & 2 deletions content/en/data_jobs/databricks.md
Original file line number Diff line number Diff line change
@@ -74,6 +74,16 @@ Datadog can install and manage a global init script in the Databricks workspace.
1. Click **Save Databricks Workspace** at the bottom of the browser window.
{{< img src="data_jobs/databricks/configure-data-jobs-monitoring-existing.png" alt="In the Datadog-Databricks integration tile, Datadog Agent Setup for a Databricks workspace already added to the integration. Datadog can install and manage a global init script." style="width:100%;" >}}

Optionally, you can add tags to your Databricks cluster and Spark performance metrics by configuring the following environment variable in the Advanced Configuration section of your cluster in the Databricks UI or as [Spark env vars][2] with the Databricks API:

| Variable | Description |
|--------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DD_TAGS | Add tags to Databricks cluster and Spark performance metrics. Comma or space separated key:value pairs. Follow [Datadog tag conventions][1]. Example: `env:staging,team:data_engineering` |
| DD_ENV | Set the `env` environment tag on metrics, traces, and logs from this cluster. |

[1]: /getting_started/tagging/
[2]: https://docs.databricks.com/api/workspace/clusters/edit#spark_env_vars

{{% /tab %}}

{{% tab "Manually install a global init script" %}}
@@ -125,11 +135,13 @@ Optionally, you can also set other init script parameters and Datadog environmen
| DATABRICKS_WORKSPACE | Name of your Databricks Workspace. It should match the name provided in the [Datadog-Databricks integration step](#configure-the-datadog-databricks-integration). Enclose the name in double quotes if it contains whitespace. | |
| DRIVER_LOGS_ENABLED | Collect spark driver logs in Datadog. | false |
| WORKER_LOGS_ENABLED | Collect spark workers logs in Datadog. | false |
| DD_DJM_ADD_LOGS_TO_FAILURE_REPORT | Include init script logs for debugging when reporting a failure back to Datadog. | false |
| DD_TAGS | Add tags to Databricks cluster and Spark performance metrics. Comma or space separated key:value pairs. Follow [Datadog tag conventions][4]. Example: `env:staging,team:data_engineering` | |
| DD_ENV | Set the `env` environment tag on metrics, traces, and logs from this cluster. | |

[1]: https://app.datadoghq.com/organization-settings/api-keys
[2]: /getting_started/site/
[3]: https://github.com/DataDog/datadog-agent/blob/main/pkg/fleet/installer/setup/djm/databricks.go
[4]: /getting_started/tagging/

{{% /tab %}}

@@ -178,12 +190,14 @@ Optionally, you can also set other init script parameters and Datadog environmen
| DATABRICKS_WORKSPACE | Name of your Databricks Workspace. It should match the name provided in the [Datadog-Databricks integration step](#configure-the-datadog-databricks-integration). Enclose the name in double quotes if it contains whitespace. | |
| DRIVER_LOGS_ENABLED | Collect spark driver logs in Datadog. | false |
| WORKER_LOGS_ENABLED | Collect spark workers logs in Datadog. | false |
| DD_DJM_ADD_LOGS_TO_FAILURE_REPORT | Include init script logs for debugging when reporting a failure back to Datadog. | false |
| DD_TAGS | Add tags to Databricks cluster and Spark performance metrics. Comma or space separated key:value pairs. Follow [Datadog tag conventions][4]. Example: `env:staging,team:data_engineering` | |
| DD_ENV | Set the `env` environment tag on metrics, traces, and logs from this cluster. | |


[1]: https://app.datadoghq.com/organization-settings/api-keys
[2]: /getting_started/site/
[3]: https://github.com/DataDog/datadog-agent/blob/main/pkg/fleet/installer/setup/djm/databricks.go
[4]: /getting_started/tagging/

3. Click **Confirm**.

9 changes: 9 additions & 0 deletions content/en/data_jobs/dataproc.md
Original file line number Diff line number Diff line change
@@ -60,6 +60,15 @@ When you create a new **Dataproc Cluster on Compute Engine** in the [Google Clou

The script above sets the required parameters, and downloads and runs the latest init script for Data Jobs Monitoring in Dataproc. If you want to pin your script to a specific version, you can replace the filename in the URL with `install-dataproc-0.12.9.sh` to use version `0.12.9`, for example. The source code used to generate this script, and the changes between script versions, can be found on the [Datadog Agent repository][13].

Optionally, the script can be configured by adding the following environment variable:

| Variable | Description |
|--------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DD_TAGS | Add tags to Dataproc cluster and Spark performance metrics. Comma or space separated key:value pairs. Follow [Datadog tag conventions][15]. Example: `env:staging,team:data_engineering` |
| DD_ENV | Set the `env` environment tag on metrics, traces, and logs from this cluster.

[15]: /getting_started/tagging/

1. On the **Customize cluster** page, locate the **Initialization Actions** section. Enter the path where you saved the script from the previous step.

When your cluster is created, this initialization action installs the Datadog Agent and downloads the Java tracer on each node of the cluster.
11 changes: 11 additions & 0 deletions content/en/data_jobs/emr.md
Original file line number Diff line number Diff line change
@@ -113,8 +113,19 @@ When you create a new EMR cluster in the [Amazon EMR console][4], add a bootstra

```

Optionally, the script can be configured adding the following environment variables:
The script above sets the required parameters, and downloads and runs the latest init script for Data Jobs Monitoring in EMR. If you want to pin your script to a specific version, you can replace the filename in the URL with `install-emr-0.12.9.sh` to use version `0.12.9`, for example. The source code used to generate this script, and the changes between script versions can be found on the [Datadog Agent repository][12].

Optionally, the script can be configured by adding the following environment variables:

| Variable | Description | Default |
|--------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
| DD_TAGS | Add tags to EMR cluster and Spark performance metrics. Comma or space separated key:value pairs. Follow [Datadog tag conventions][15]. Example: `env:staging,team:data_engineering` | |
| DD_ENV | Set the `env` environment tag on metrics, traces, and logs from this cluster. | |
| DD_EMR_LOGS_ENABLED | Send Spark driver and worker logs to Datadog. | false |

[15]: /getting_started/tagging/

1. On the **Create Cluster** page, find the **Bootstrap actions** section. Click **Add** to bring up the **Add bootstrap action** dialog.
{{< img src="data_jobs/emr/add_bootstrap_action_without_arguments.png" alt="Amazon EMR console, Create Cluster, Add Bootstrap Action dialog. Text fields for name, script location, and arguments." style="width:80%;" >}}
- For **Name**, give your bootstrap action a name. You can use `datadog_agent`.