Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issue after docker image update. #44

Closed
tophercullen opened this issue Jul 2, 2024 · 4 comments
Closed

Memory issue after docker image update. #44

tophercullen opened this issue Jul 2, 2024 · 4 comments

Comments

@tophercullen
Copy link

We run the scalyr/fluentd:latest docker image as a side-car for a few different Fargate tasks. Since the update to the docker hub image latest tag to 0.8.18 in the past 24 hours, these have all been going OOM more or less instantly on the scalyr container. Pinning everything to 0.8.17 works fine again.

One of these tasks is grossly over-provisioned in terms of memory (for CPU reasons), having something like 1gb+ more than it uses, and its still going OOM. So its seems unlikely this is an unfortunate edge case in multiple tasks with different uses, libraries, and resource allocations.

@weilliu
Copy link

weilliu commented Jul 2, 2024

@tophercullen Are you seeing the issue with the fargate integration?
https://app.scalyr.com/solutions/fargate
The only difference in 0.8.18 is adding the new fluentd docker image.
https://github.com/scalyr/scalyr-fluentd/commit/3d6caf67dd3391805efe8e63187872c1c2c696e6
That itself shouldn't trigger any overutilization of the container resources.

Could you open the ticket on support.dataset.com and send us the screenshots/cmdline outputs showing OOM? Granting us your Scalyr team access will be helpful to troubleshoot the issue.

@tophercullen
Copy link
Author

tophercullen commented Jul 2, 2024

Yes, we are using the scalyr image in a side-car in fargate.

I am aware of the change that was made. Its why I first tried reverting to the previous image version, which again works as-is still. Clearly there are differences beyond a simple version change, as even the resulting image sizes are substantially different.

Reviewing the config again, I now see there's a memory limit on the container (100mb, same as in the linked docs). Previously, I thought the scalyr container was just being subject the the overarching task resources constraints (which seemed absurd it would be hitting).

Increasing this limit to 200mb appears to allow the new container version to function properly. At the documented 100mb, the current latest image (0.8.18) does not function properly for us. Somewhere between 100-200mb appears to be the actual required memory now. 0.8.17 still works with a 100mb.

Given nature of this memory issue, and the documented limit published by scalyr, I would classify this as a breaking change for this version. I recommend reverting the latest tag until such a time as an updated memory allocation can be determined, the documentation can be updated, and customers notified of the new memory requirement.

@weilliu
Copy link

weilliu commented Jul 2, 2024

Thanks for the additional context. I just created an internal ticket for engineering to look at the issue.

@weilliu
Copy link

weilliu commented Jul 24, 2024

The engineering has reviewed the issue and confirmed that this requirement is caused by a fluentd process update. The recommended memory limit is now 300mb for running the Fargate agent.

@weilliu weilliu closed this as completed Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants