Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thanos receive - failed to handle request, internal server error #7703

Open
ALX-TH opened this issue Sep 6, 2024 · 1 comment
Open

Thanos receive - failed to handle request, internal server error #7703

ALX-TH opened this issue Sep 6, 2024 · 1 comment

Comments

@ALX-TH
Copy link

ALX-TH commented Sep 6, 2024

Hi all. I have an issue with thanos receive. Starting from thanos version 0.30.0 my prometheus server cannot write chunks to receiver. But at same time, if i using thanos 0.29.0 (and lower) - no issues. May u please help with solving problem ?
I suppose, something changed with Prometheus implementation (API ?) at thanos starting from version 0.30.0

Object Storage Provider:
GCS (Google Bucket)

What happened:
Prometheus cannot write chunks into thanos receive

What you expected to happen:
Prometheus should successfully write chunks into thanos receive

Full logs to relevant components:

Prometheus Logs

{"caller":"dedupe.go:112","component":"remote","err":"server returned HTTP status 500 Internal Server Error: \n","level":"warn","msg":"Failed to send batch, retrying","remote_name":"69f30b","ts":"2024-09-06T06:31:42.261Z","url":"http://xxxxxxxxxxxxxxxx:19291/api/v1/receive"}

Thanos Logs

Sep 06 06:34:12 thanos-receiver[9671]: {"caller":"handler.go:584","component":"receive-handler","err":"","level":"debug","msg":"failed to handle request","name":"receive","tenant":"prometheus-ha","ts":"2024-09-06T06:34:12.706646006Z"}
Sep 06 06:34:12 thanos-receiver[9671]: {"caller":"handler.go:595","component":"receive-handler","err":"","level":"error","msg":"internal server error","name":"receive","tenant":"prometheus-ha","ts":"2024-09-06T06:34:12.710355721Z"}

Configs:
Prometheus:

    remote_write:
    - basic_auth:
        password: xxxxxx
        username: xxxxxx
      enable_http2: true
      headers:
        X-SRC: xxxxx
        THANOS-TENANT: prometheus-ha
      metadata_config:
        send: false
      queue_config:
        capacity: 5000
        max_backoff: 25s
        max_shards: 100
        min_backoff: 30ms
        retry_on_http_429: true
      url: http://xxxxxxxxx:19291/api/v1/receive

Thanos

ExecStart=/opt/thanos/bin/thanos receive \
           --auto-gomemlimit.ratio=0.75  \
           --enable-auto-gomemlimit \
           --log.level=debug \
           --debug.name=receive \
           --log.format=json \
           --tsdb.path=/opt/thanos/data/receive \
           --tsdb.retention=6h \
           --tsdb.too-far-in-future.time-window=120s \
           --tsdb.out-of-order.time-window=120s \
           --tsdb.max-exemplars=10000 \
           --tsdb.allow-overlapping-blocks \
           --tsdb.no-lockfile \
           --objstore.config-file=/opt/thanos/config/storage.yml \
           --http.config=/opt/thanos/config/http-config.yml \
           --grpc-address=0.0.0.0:12901 \
           --remote-write.address=0.0.0.0:19291 \
           --http-address=0.0.0.0:12902 \
           --receive.default-tenant-id=prometheus-ha \
           --receive.tenant-header="THANOS-TENANT" \
           --receive-forward-timeout=60s \
           --receive.local-endpoint=127.0.0.1:12901 \
           --receive.replication-factor=0 \
           --receive.forward.async-workers=32 \
           --receive.hashrings-algorithm=ketama \
           --receive.hashrings-file=/opt/thanos/config/thanos-receive-hashrings.json \
           --label="prometheus_minion=\"thanos_receive\"" \
           --label="receive=\"true\"" \
           --label="replica=\"thanos_receive\""

Environment:

  • OS: Debian 12
  • Kernel: 6.1.0-25-amd64
  • Prometheus version: v2.54.1
  • Thanos version: 0.36.1
Copy link

dosubot bot commented Sep 6, 2024

Hey @ALX-TH! I'm here to help you with the issue you're facing. Let's work together to get this sorted out.

Prometheus fails to write chunks to Thanos receive starting from Thanos version 0.30.0 due to a change introduced in the following pull request:

  • #7248 Receive: Fix RemoteWriteAsync was sequentially executed causing high latency in the ingestion path.

This change likely altered the behavior of the remote write path, which could be causing the issue with Prometheus writing chunks to Thanos receive [1].

To address this issue, you may need to review and adjust your configuration to align with the changes introduced in version 0.30.0. Specifically, ensure that your configuration accounts for the new features and fixes:

  1. Tenant Globbing Support:

    • Added tenant globbing support to hashring config.
    • PR: #7155
  2. Remote Write Client TLS Configuration:

    • Added --remote-write.client-tls-secure and --remote-write.client-tls-skip-verify flags.
    • PR: #7219
  3. Splitting Remote-Write HTTP Requests:

    • Split remote-write HTTP requests via tenant labels of series.
    • PR: #7256 [1].

Additionally, the expected HTTP status codes and error messages for the /api/v1/receive endpoint in Thanos 0.30.0 and later versions include:

  • 503 Service Unavailable: When the server is not ready.
  • 500 Internal Server Error: For internal errors.
  • 409 Conflict: For conflict-type errors.
  • 400 Bad Request: For bad data errors.
  • 503 Service Unavailable: When the target is not ready or available [2].

Reviewing these changes and ensuring your configuration is updated accordingly should help resolve the issue.

To continue talking to Dosu, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant