The Temporal server supports ability to configure OTEL trace exporters to support emitting spans and traces for observability. More specifically, the server uses the Go Open Telemetry library for instrumentation and multi-protocol multi-model telemetry exporting. This document is intended to help developers understand how to configure exporters and instrument their code. A full exploration of tracing and telemetry is out of scope of this document and the reader is referred to external reference material, third party descriptions, and the specification itself.
- Run
make start-dependencies
(which starts Grafana Tempo) - Start the server using
make OTEL=true start
(or any other start-x command) - Visit http://localhost:3000/explore and select "Tempo" from the datasource dropdown.
tip: use TraceQL
{ .temporalWorkflowID =~ "<WF-ID>.*" }
to find the traces for your workflow
No trace exporters are configured by default and thus trace data is neither collected nor emitted without additional configuration.
In OpenTelemetry, the concept of an "exporter" is abstract. The concrete implementation of an exporter is determined by a 3-tuple of values: the exporter signal, model, and protocol:
- a "signal" is one of traces, metrics, or logs (in this document we will only deal with traces),
- "model" indicates the abstract data model for the span and trace data being exported,
- and the "protocol" specifies the concrete application protocol binding for the indicated model.
Temporal is known to support exporting trace data as defined by otlp over grpc.
The server supports an otel
YAML stanza which is used to configure a
set of process-wide exporters.
A common configuration is to emit tracing data to an agent such as the otel-collector running locally. To configure such a system add the stanza below to your configuration yaml file(s).
otel:
exporters:
- kind:
signal: traces
model: otlp
protocol: grpc
spec:
connection:
insecure: true
endpoint: localhost:4317
Another example is pointing Temporal directly at the Honeycomb hosted OTLP collection service. To achieve such a configuration you will need an API key from the upstream Honeycomb service and the stanza below.
otel:
exporters:
- kind:
signal: traces
model: otlp
protocol: grpc
spec:
connection:
endpoint: api.honeycomb.io:443
headers:
x-honeycomb-team: <a honeycomb API key>
Note that the configuration parser supports defining multiple exporters by
supplying additional kind
and spec
declarations. Additional configuration
fields can be found in config_test.go
and are mostly related to the underlying gRPC client configuration (retries,
timeouts, etc).
An OTEL span exporter can also be configured via environment variables: OTEL_TRACES_EXPORTER creates a span exporter.
OTEL_TRACES_EXPORTER=otlp
Note that if the configuration file already defines a traces exporter, no additional exporter will be created.
The Go OTEL SDK will also read a well-known set of environment variables for the configuration of the exporter. So if you prefer setting environment variables to writing YAML then you can use the variables defined in the OTEL spec.
For example:
OTEL_SERVICE_NAME=my-service OTEL_EXPORTER_OTLP_TRACES_INSECURE=true
NOTE: If an environment variable conflicts with YAML-provided configuration then the environment variable takes precedence.
While the exporter configuration described above is executed and set up at
process startup time, instrumentation code - the creation and termination of
spans - is inserted inline (like logging statements) into normal server
processing code. Spans are created by go.opentelemetry.io/otel/trace.Tracer
objects which are themselves created by
go.opentelemetry.io/otel/trace.TracerProvider
instances. The TracerProvider
instances are bound to a single logical service and as such a single Temporal
process will have up to four such instances (for worker, mathcing, history, and
frontend services respectively). The Tracer
object is bound to a single
logical library which is different than a service. Consider that a history
service instance might run code from the temporal common library, gRPC
library, and gocql library.
Tracer
and TracerProvider
object management has been added to the server's
fx
DI configuration
and thus they are available to be added to any fx-enabled object constructors.
Due the possibility of multiple services being coresident within a single
process, we do not use the OTEL library's capability to host and access a single
global TracerProvider
.
By default, gRPC clients and servers are instrumented via the open source otelgrpc library.
The OpenTelemetry project has published a non-normative set of guidelines for attribute naming.
If nothing else, please
- Always check for an appropriate attribute in semconv before creating your own
- Always prefix Temporal attributes with
io.temporal
Do not create a single file in common for all attributes
Do not create packages just for OTEL attributes
Do create a set of attribute.Key
s in the semantically appropriate package and
re-use those to create attribute.KeyValue
s as needed.
Do create a set of utility functions that can transform frequently used
aggregate types (Tasks, WorkflowExecutions, TaskQueues, etc) into an
[]attribute.KeyValue
. The association of attribute.KeyValue
s to a
trace.Span
can be verbose in terms of the number of lines of code needed so
any reduction in that noise will be a good idea. Not to mention the consistency
benefit of sharing a single mapping function.
Q: Given that common code can be called from any service, how can I start a span in common library code that is bound the appropriate service (frontend/history/matching/worker)?
A: The TracerProvider
that created the currently active Span can be retrieved
from that Span itself and the currently active Span can be received from the context.Context
.
// DoFoo is a function in the common package
func DoFoo(ctx context.Context, x int, y string) string {
var span trace.Span
ctx, span = trace.SpanFromContext(ctx).TracerProvider().Tracer("go.temporal.io/server/common").Start("DoFoo")
defer span.End()
return fmt.Sprintf("%v-%v", y, x)
}
Using Span.RecordError
is a good idea but not all errors imply failure. Thus,
if you want to capture an error and also capture that a span failed, you must
additionally call Span.SetStatus(codes.Error, err.Error())
. A
FailSpanWithError
utility function might be a good idea.
This is taken care of by default for gRPC calls via the otelgrpc interceptors.
However, you may want to propagate tracing information between goroutines or
other places where the context.Context
is not passed such as handoffs through
a Go channel or an external datastore. There are two broad approaches that are
applicable in different situations:
- If the object being transferred is not externally durable (e.g. an object put
into a Go channel but not spooled to a database) then you can pull the
trace.SpanContext
out of the currenttrace.Span
withtrace.SpanContextFromContext(context.Context)
orSpan.SpanContext()
and pass that object along with the data being transferred. The consuming side can restore the tracing state withtrace.ContextWithSpanContext(trace.SpanContext)
. - If the tracing state needs to be serialized, the OTEL library provides the
propagation
package to convert trace state into a more serialization-friendly type such
as a
map[string]string
. Thepropagation.TraceContext
type can be used to inject and extract trace state into a key-value-ish object.
carrier := propagation.MapCarrier(map[string]string{})
propagation.TraceContext{}.Inject(ctx, carrier)
// write the carrier object to a durable store
OpenTelemetry Spans can be linked together to form a non-parent-child relationship. One of the main use cases for linking is so that a batch process (e.g. a database read that fills a large buffer of work items) can create Spans for each of the individual work items it creates and those Spans can be linked back to the parent batch Span without that span becoming their logical parent.
Use Span.AddEvent
to write messages that will be associated with that Span
.
From the OTEL manual
An event is a human-readable message on a span that represents “something happening” during it’s lifetime