Skip to content

Latest commit

 

History

History
198 lines (179 loc) · 39.6 KB

prometheus.md

File metadata and controls

198 lines (179 loc) · 39.6 KB

Prometheus integration

The Prometheus integration enables you to query and visualize Coder's platform metrics.

Requirements

Configuration

Coder sends Prometheus-formatted metrics to port 2112 on the coderd container. Use the below PodMonitor resource to connect the Prometheus Operator to this endpoint:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: master-monitor
  namespace: coder
spec:
  selector:
    matchLabels:
      app.kubernetes.io/component: coderd
  podMetricsEndpoints:
    - port: prom-coderd

Workspace Metrics

Each coder workspace has an agent that connects to a single coderd instance. Each coderd instance will include all metrics from the workspaces it manages. The workspace metrics will all look like this:

coderd_workspace_<workspace_metric_name>{user_id="<user_id>",workspace_id="<workspace_id>"}

Due to the nature of workspace ids, this produces a high cardinality of metric labels. This could be problematic for some configurations. If specific workspace metrics are not of interest, or are causing issues, you can configure your metric scraping service to drop these metrics.

Note that if a workspace connects to a new coderd (rebuild, network issue, coder update, etc), the metrics for that workspace will be moved to the new coderd metrics endpoint. The labels on the new metrics will likely have the new coderd pod name. So when tracking a singular workspace, you should track only by workspace_id throughout the lifetime of the workspace until it is deleted.

Drop workspace metrics config

Prometheus Documentation about relabelling metrics. In this case we will drop all metrics that contain the workspace_id label.

metric_relabel_configs:
  - source_labels: ["workspace_id"]
    action: drop

In Prometheus Operator we can pass this config addition to our coderd PodMonitor spec.

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: master-monitor
  namespace: coder
spec:
  selector:
    matchLabels:
      app.kubernetes.io/component: coderd
  podMetricsEndpoints:
    - port: prom-coderd
      relabelings:
        - action: drop
          sourceLabels:
            - workspace_id

Coderd Metrics

Below is a list of the various metrics emitted by Coder's Prometheus endpoint:

Metric Type Description
coderd_agent_aggregator_agent_push_backlog gauge Total number of agent metric bundles waiting to be processed.
coderd_agent_aggregator_collect_backlog gauge Total amount of gathers waiting to collect metrics.
coderd_agent_aggregator_collect_nanoseconds summary Time taken to collect all metrics.
coderd_agent_aggregator_count_total gauge Total number of agent metrics being reported by this coderd.
coderd_agent_aggregator_delete_backlog gauge Total number of agents waiting to be deleted in aggregator.
coderd_agent_aggregator_workspace_count_total gauge Total number of workspace agents pushing metrics to this coderd.
coderd_api_concurrent_requests gauge The total number of concurrent API requests
coderd_api_concurrent_websockets gauge The total number of concurrent API websockets
coderd_api_request_latencies_ms histogram Latency distribution of requests in milliseconds
coderd_api_requests_processed_total counter The total number of processed API requests
coderd_api_websocket_durations_ms histogram Websocket duration distribution of requests in milliseconds
coderd_background_workspace_build_duration_s histogram Duration distribution of workspace builds in seconds
coderd_backgroundjob_completed_total counter Total number of jobs completed since startup.
coderd_backgroundjob_current_enqueued_jobs gauge Current number of enqueued and not started background jobs.
coderd_backgroundjob_enqueue_time_seconds histogram Histogram of total time taken by job type to transition from Enqueue to Running.
coderd_backgroundjob_enqueued_total counter Total number of jobs enqueued.
coderd_backgroundjob_execution_time_seconds histogram Histogram of total time taken by job type to transition from Running to Completed.
coderd_backgroundjob_started_total counter Total number of jobs started.
coderd_db_sql_queries_executed_total counter The total number of executed SQL queries
coderd_db_sql_query_latencies_ms histogram Latency distribution of SQL queries in milliseconds
coderd_license_expires_at_unix gauge Unix timestamp of the license expiry date.
coderd_license_issued_at_unix gauge Unix timestamp of the license issue date.
coderd_license_time_until_expires_days gauge Number of days until the license expires.
coderd_license_user_count gauge Number of active (non-dormant) users.
coderd_license_user_limit gauge Number of users allowed by the license.
coderd_rtc_agent_listeners_concurrent gauge The total number of concurrent RTC agent listener websockets.
coderd_rtc_client_connections_total counter The total number of RTC client connections.
coderd_rtc_turn_connections_concurrent gauge The number of concurrent TURN connections.
coderd_rtc_turn_connections_total counter The total number of TURN connections opened.
coderd_rtc_workspace_connections_current gauge The number of concurrent wsnet workspace connections.
coderd_rtc_workspace_connections_total counter The total number of wsnet workspace connections opened.
go_gc_cycles_automatic_gc_cycles_total counter Count of completed GC cycles generated by the Go runtime.
go_gc_cycles_forced_gc_cycles_total counter Count of completed GC cycles forced by the application.
go_gc_cycles_total_gc_cycles_total counter Count of all completed GC cycles.
go_gc_duration_seconds summary A summary of the pause duration of garbage collection cycles.
go_gc_heap_allocs_by_size_bytes histogram Distribution of heap allocations by approximate size. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks.
go_gc_heap_allocs_bytes_total counter Cumulative sum of memory allocated to the heap by the application.
go_gc_heap_allocs_objects_total counter Cumulative count of heap allocations triggered by the application. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks.
go_gc_heap_frees_by_size_bytes histogram Distribution of freed heap allocations by approximate size. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks.
go_gc_heap_frees_bytes_total counter Cumulative sum of heap memory freed by the garbage collector.
go_gc_heap_frees_objects_total counter Cumulative count of heap allocations whose storage was freed by the garbage collector. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks.
go_gc_heap_goal_bytes gauge Heap size target for the end of the GC cycle.
go_gc_heap_objects_objects gauge Number of objects, live or unswept, occupying heap memory.
go_gc_heap_tiny_allocs_objects_total counter Count of small allocations that are packed together into blocks. These allocations are counted separately from other allocations because each individual allocation is not tracked by the runtime, only their block. Each block is already accounted for in allocs-by-size and frees-by-size.
go_gc_pauses_seconds histogram Distribution individual GC-related stop-the-world pause latencies.
go_goroutines gauge Number of goroutines that currently exist.
go_info gauge Information about the Go environment.
go_memory_classes_heap_free_bytes gauge Memory that is completely free and eligible to be returned to the underlying system, but has not been. This metric is the runtime's estimate of free address space that is backed by physical memory.
go_memory_classes_heap_objects_bytes gauge Memory occupied by live objects and dead objects that have not yet been marked free by the garbage collector.
go_memory_classes_heap_released_bytes gauge Memory that is completely free and has been returned to the underlying system. This metric is the runtime's estimate of free address space that is still mapped into the process, but is not backed by physical memory.
go_memory_classes_heap_stacks_bytes gauge Memory allocated from the heap that is reserved for stack space, whether or not it is currently in-use.
go_memory_classes_heap_unused_bytes gauge Memory that is reserved for heap objects but is not currently used to hold heap objects.
go_memory_classes_metadata_mcache_free_bytes gauge Memory that is reserved for runtime mcache structures, but not in-use.
go_memory_classes_metadata_mcache_inuse_bytes gauge Memory that is occupied by runtime mcache structures that are currently being used.
go_memory_classes_metadata_mspan_free_bytes gauge Memory that is reserved for runtime mspan structures, but not in-use.
go_memory_classes_metadata_mspan_inuse_bytes gauge Memory that is occupied by runtime mspan structures that are currently being used.
go_memory_classes_metadata_other_bytes gauge Memory that is reserved for or used to hold runtime metadata.
go_memory_classes_os_stacks_bytes gauge Stack memory allocated by the underlying operating system.
go_memory_classes_other_bytes gauge Memory used by execution trace buffers, structures for debugging the runtime, finalizer and profiler specials, and more.
go_memory_classes_profiling_buckets_bytes gauge Memory that is used by the stack trace hash map used for profiling.
go_memory_classes_total_bytes gauge All memory mapped by the Go runtime into the current process as read-write. Note that this does not include memory mapped by code called via cgo or via the syscall package. Sum of all metrics in /memory/classes.
go_memstats_alloc_bytes gauge Number of bytes allocated and still in use.
go_memstats_alloc_bytes_total counter Total number of bytes allocated, even if freed.
go_memstats_buck_hash_sys_bytes gauge Number of bytes used by the profiling bucket hash table.
go_memstats_frees_total counter Total number of frees.
go_memstats_gc_sys_bytes gauge Number of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytes gauge Number of heap bytes allocated and still in use.
go_memstats_heap_idle_bytes gauge Number of heap bytes waiting to be used.
go_memstats_heap_inuse_bytes gauge Number of heap bytes that are in use.
go_memstats_heap_objects gauge Number of allocated objects.
go_memstats_heap_released_bytes gauge Number of heap bytes released to OS.
go_memstats_heap_sys_bytes gauge Number of heap bytes obtained from system.
go_memstats_last_gc_time_seconds gauge Number of seconds since 1970 of last garbage collection.
go_memstats_lookups_total counter Total number of pointer lookups.
go_memstats_mallocs_total counter Total number of mallocs.
go_memstats_mcache_inuse_bytes gauge Number of bytes in use by mcache structures.
go_memstats_mcache_sys_bytes gauge Number of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytes gauge Number of bytes in use by mspan structures.
go_memstats_mspan_sys_bytes gauge Number of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytes gauge Number of heap bytes when next garbage collection will take place.
go_memstats_other_sys_bytes gauge Number of bytes used for other system allocations.
go_memstats_stack_inuse_bytes gauge Number of bytes in use by the stack allocator.
go_memstats_stack_sys_bytes gauge Number of bytes obtained from system for stack allocator.
go_memstats_sys_bytes gauge Number of bytes obtained from system.
go_sched_goroutines_goroutines gauge Count of live goroutines.
go_sched_latencies_seconds histogram Distribution of the time goroutines have spent in the scheduler in a runnable state before actually running.
go_sql_idle_connections gauge The number of idle connections.
go_sql_in_use_connections gauge The number of connections currently in use.
go_sql_max_idle_closed_total counter The total number of connections closed due to SetMaxIdleConns.
go_sql_max_idle_time_closed_total counter The total number of connections closed due to SetConnMaxIdleTime.
go_sql_max_lifetime_closed_total counter The total number of connections closed due to SetConnMaxLifetime.
go_sql_max_open_connections gauge Maximum number of open connections to the database.
go_sql_open_connections gauge The number of established connections both in use and idle.
go_sql_wait_count_total counter The total number of connections waited for.
go_sql_wait_duration_seconds_total counter The total time blocked waiting for a new connection.
go_threads gauge Number of OS threads created.
process_cpu_seconds_total counter Total user and system CPU time spent in seconds.
process_max_fds gauge Maximum number of open file descriptors.
process_open_fds gauge Number of open file descriptors.
process_resident_memory_bytes gauge Resident memory size in bytes.
process_start_time_seconds gauge Start time of the process since unix epoch in seconds.
process_virtual_memory_bytes gauge Virtual memory size in bytes.
process_virtual_memory_max_bytes gauge Maximum amount of virtual memory available in bytes.
promhttp_metric_handler_requests_in_flight gauge Current number of scrapes being served.
promhttp_metric_handler_requests_total counter Total number of scrapes by HTTP status code.