Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: clampResources called with invalid arguments: upperBound=X has field less than current=Y #1229

Open
petuhovskiy opened this issue Jan 30, 2025 · 1 comment
Assignees
Labels
c/autoscaling/autoscaler-agent Component: autoscaling: autoscaler-agent t/bug Issue Type: Bug

Comments

@petuhovskiy
Copy link
Member

Environment

prod, staging

Subject

Found this error in the logs:

clampResources called with invalid arguments: upperBound=&{VCPU:3.75 Mem:14Gi} has field less than current={VCPU:3.75 Mem:15Gi}

with the following stacktrace:

goroutine 55421856 [running]:
runtime/debug.Stack()
	/usr/local/go/src/runtime/debug/stack.go:26 +0x5e
github.com/neondatabase/autoscaling/pkg/agent.(*Runner).spawnBackgroundWorker.func1.1()
	/workspace/pkg/agent/runner.go:366 +0x1b5
panic({0x1d1c560?, 0xc0101920d0?})
	/usr/local/go/src/runtime/panic.go:785 +0x132
github.com/neondatabase/autoscaling/pkg/agent/core.(*state).clampResources(0x1ef5a00?, {0xea6, 0x3c0000000}, {0x1af9518?, 0xc1df12b8eb3c3fd0?}, 0x406b736f072fe?, 0x330c9e0?)
	/workspace/pkg/agent/core/state.go:974 +0x126
github.com/neondatabase/autoscaling/pkg/agent/core.(*state).calculateNeonVMAction(0xc00f61bb08, {0x7feb6e551f48?, 0xc001607480?, 0x330c9e0?}, {0x2052aa0?, 0x5?}, 0xc00878a648, {0x20546a7?, 0x7})
	/workspace/pkg/agent/core/state.go:490 +0x326
github.com/neondatabase/autoscaling/pkg/agent/core.(*state).nextActions(0xc00f61bb08, {0x206c5b1?, 0x1e19d00?, 0x330c9e0?})
	/workspace/pkg/agent/core/state.go:321 +0x145
github.com/neondatabase/autoscaling/pkg/agent/core.(*State).NextActions(...)
	/workspace/pkg/agent/core/state.go:292
github.com/neondatabase/autoscaling/pkg/agent/executor.(*ExecutorCore).getActions(0xc00fc13040)
	/workspace/pkg/agent/executor/core.go:107 +0x397
github.com/neondatabase/autoscaling/pkg/agent/executor.(*ExecutorCoreWithClients).DoPluginRequests(0xc00fc13080, {0x23758b0, 0xc00912b400}, 0xc0084ef110)
	/workspace/pkg/agent/executor/exec_plugin.go:33 +0x10d
github.com/neondatabase/autoscaling/pkg/agent.(*Runner).spawnBackgroundWorker.func1()
	/workspace/pkg/agent/runner.go:380 +0xcf
created by github.com/neondatabase/autoscaling/pkg/agent.(*Runner).spawnBackgroundWorker in goroutine 55421847
	/workspace/pkg/agent/runner.go:346 +0x1c5

There is also another stacktrace that is slightly different:

goroutine 493 [running]:
runtime/debug.Stack()
	/usr/local/go/src/runtime/debug/stack.go:26 +0x5e
github.com/neondatabase/autoscaling/pkg/agent.(*Runner).spawnBackgroundWorker.func1.1()
	/workspace/pkg/agent/runner.go:450 +0x1b5
panic({0x1d23280?, 0xc00215a970?})
	/usr/local/go/src/runtime/panic.go:785 +0x132
github.com/neondatabase/autoscaling/pkg/agent/core.(*state).clampResources(0x1efdd00?, {0x7d0, 0x200000000}, {0x1afde18?, 0xc1df13ed6e1b0fa3?}, 0x27d4c98a1c7?, 0x331da00?)
	/workspace/pkg/agent/core/state.go:997 +0x126
github.com/neondatabase/autoscaling/pkg/agent/core.(*state).calculateNeonVMAction(0xc001b48008, {0x3341a80?, 0xc003242080?, 0x331da00?}, {0x41045e?, 0x7f86185e0ac8?}, 0xc001b27ae8, {0x205e78b?, 0x7})
	/workspace/pkg/agent/core/state.go:498 +0x326
github.com/neondatabase/autoscaling/pkg/agent/core.(*state).nextActions(0xc001b48008, {0x20766bf?, 0x1e21420?, 0x331da00?})
	/workspace/pkg/agent/core/state.go:329 +0x145
github.com/neondatabase/autoscaling/pkg/agent/core.(*State).NextActions(...)
	/workspace/pkg/agent/core/state.go:300
github.com/neondatabase/autoscaling/pkg/agent/executor.(*ExecutorCore).getActions(0xc001b082c0)
	/workspace/pkg/agent/executor/core.go:107 +0x397
github.com/neondatabase/autoscaling/pkg/agent/executor.(*ExecutorCoreWithClients).DoNeonVMRequests(0xc001b08300, {0x2380550, 0xc001b26190}, 0xc001b18e70)
	/workspace/pkg/agent/executor/exec_neonvm.go:39 +0x119
github.com/neondatabase/autoscaling/pkg/agent.(*Runner).spawnBackgroundWorker.func1()
	/workspace/pkg/agent/runner.go:464 +0xcf
created by github.com/neondatabase/autoscaling/pkg/agent.(*Runner).spawnBackgroundWorker in goroutine 506
	/workspace/pkg/agent/runner.go:430 +0x1c5

It may be a valid error, but it shouldn't panic. We need to understand the issue and fix the error and/or fix the panic.

Other logs, links

Similar errors in the last hour: https://neonprod.grafana.net/goto/GlKDS6dNg?orgId=1

@petuhovskiy petuhovskiy added t/bug Issue Type: Bug c/autoscaling/autoscaler-agent Component: autoscaling: autoscaler-agent labels Jan 30, 2025
@petuhovskiy
Copy link
Member Author

Tried to look at a specific case, collected logs for it: https://gist.github.com/petuhovskiy/2d05fb76bc34fd085588d6845a2ac09b

(explore https://neonprod.grafana.net/goto/PKUAoY5NR?orgId=1)

On the first glance, it looks like response from NeonVM task arrived after plugin downsized the permit. It made current state s.VM.Using() greater than s.Plugin.Permit and caused a panic.

Will try to think of a fix tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/autoscaling/autoscaler-agent Component: autoscaling: autoscaler-agent t/bug Issue Type: Bug
Projects
None yet
Development

No branches or pull requests

1 participant