Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/prometheusremotewrite] Flaky test #36654

Open
ArthurSens opened this issue Dec 3, 2024 · 15 comments · May be fixed by #37563
Open

[receiver/prometheusremotewrite] Flaky test #36654

ArthurSens opened this issue Dec 3, 2024 · 15 comments · May be fixed by #37563

Comments

@ArthurSens
Copy link
Member

Component(s)

receiver/prometheusremotewrite

Describe the issue you're reporting

I've just noticed we have some flaky tests in main: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/12146357038/job/33870088465

=== FAIL: . TestComponentLifecycle/metrics-lifecycle (0.00s)
    generated_component_test.go:65: 
        	Error Trace:	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver/generated_component_test.go:65
        	Error:      	Received unexpected error:
        	            	failed to create prometheus remote-write listener: listen tcp 127.0.0.1:9090: bind: address already in use
        	Test:       	TestComponentLifecycle/metrics-lifecycle

=== FAIL: . TestComponentLifecycle (0.00s)

=== FAIL: . TestHandlePRWContentTypeNegotiation/no_content_type (0.00s)
    receiver_test.go:90: 
        	Error Trace:	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver/receiver_test.go:90
        	Error:      	Received unexpected error:
        	            	Post "http://localhost:9090/api/v1/write": dial tcp [::1]:9090: connect: connection refused
        	Test:       	TestHandlePRWContentTypeNegotiation/no_content_type

=== FAIL: . TestHandlePRWContentTypeNegotiation (0.00s)
    receiver_test.go:[43](https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/12146357038/job/33870088465#step:7:44): 
        	Error Trace:	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver/receiver_test.go:36
        	            				/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver/receiver_test.go:43
        	Error:      	Received unexpected error:
        	            	failed to create prometheus remote-write listener: listen tcp 127.0.0.1:9090: bind: address already in use
        	Test:       	TestHandlePRWContentTypeNegotiation
@ArthurSens ArthurSens added the needs triage New item requiring triage label Dec 3, 2024
Copy link
Contributor

github-actions bot commented Dec 3, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@ArthurSens
Copy link
Member Author

/label -needs-triage

@github-actions github-actions bot removed the needs triage New item requiring triage label Dec 4, 2024
@dashpole
Copy link
Contributor

dashpole commented Dec 4, 2024

maybe we need to change tests to use a random port?

@ArthurSens
Copy link
Member Author

Yeah, that's probably enough :)

@pjanotti
Copy link
Contributor

@pjanotti
Copy link
Contributor

maybe we need to change tests to use a random port?

Very wild guess without looking at the code, but, I saw this some symptom many times on Windows tests. One possibility is that the component has a goroutine that holds the port/server and while the goroutine is terminated when the component is shut down it may do so asynchronously so the next test may find the port is in use. Once more I'm shooting from the hip here ...

@pjanotti
Copy link
Contributor

@ArthurSens
Copy link
Member Author

We're hardcoding the port during tests:

req, err := http.NewRequest(http.MethodPost, "http://localhost:9090/api/v1/write", bytes.NewBuffer(compressedBody))

We probably want to use httptest.ServeHTTP instead, so we pick a random port. @perebaj, is this another you'd like to fix? Again no pressure, my family issues should be over next week. I can tackle this if you can't :)

@perebaj
Copy link
Contributor

perebaj commented Jan 24, 2025

On my list. Don't worry. ✌🏽 @ArthurSens

@pjanotti
Copy link
Contributor

@perebaj @ArthurSens another thing to consider: sometimes the type of failure is because a prior test is still holding the port when another test is trying to open it again. Perhaps this was already helped by #37430 - I'm not sure at this point.

@perebaj
Copy link
Contributor

perebaj commented Jan 27, 2025

@perebaj @ArthurSens another thing to consider: sometimes the type of failure is because a prior test is still holding the port when another test is trying to open it again. Perhaps this was already helped by #37430 - I'm not sure at this point.

Thanks for the tip, I will consider it

@adwait-godbole
Copy link

Hi @perebaj, I hope you're doing well! I wanted to ask if you’ve already started working on fixing up this issue? I'm an LFX enthusiast, and since PrometheusRemoteWrite receiver is listed as a project for this term, I’ve been trying to get familiar with the code so far. If you’ve already started work on it, I wouldn’t mind stepping aside, but if not, would you be okay with me picking it up? Also, if you have any inputs or suggestions, I’d greatly appreciate them. Thanks!

@perebaj
Copy link
Contributor

perebaj commented Jan 28, 2025

Hi @perebaj, I hope you're doing well! I wanted to ask if you’ve already started working on fixing up this issue? I'm an LFX enthusiast, and since PrometheusRemoteWrite receiver is listed as a project for this term, I’ve been trying to get familiar with the code so far. If you’ve already started work on it, I wouldn’t mind stepping aside, but if not, would you be okay with me picking it up? Also, if you have any inputs or suggestions, I’d greatly appreciate them. Thanks!

Hey ✌🏽. I'm already started to work on it.

Maybe this issue would be nice for you #33661

I'm working on `addTypeSum. But there are a lot of open TODOs to address.

@adwait-godbole
Copy link

adwait-godbole commented Jan 28, 2025

Thanks @perebaj for pointing me over there at #33661.

I'm working on `addTypeSum. But there are a lot of open TODOs to address.

Shall I work on top of this already open PR of yours at #37156?

@perebaj
Copy link
Contributor

perebaj commented Jan 28, 2025

Shall I work on top of this already open PR of yours at #37156?

I don't think so, I'm finishing it, just missing a test case. I think that this one would be a good idea.

The approach that I'm following is using the previous implementation as reference, but using prw_v2 functions. WDYT?

@perebaj perebaj linked a pull request Jan 29, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants