Fix timeout bug on very long command output #687
+5
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
SUMMARY
I was having timeout problems on IOS XE and IOS XR devices with very long configurations (more than 50k lines / 2M bytes). On those same devices, a "show running-config" (with pager disabled) over plain old openssh client takes about 10-15 seconds.
Even for smaller configurations (20k lines / 700k bytes) I had to change command_timeout up to 300 for my ansible playbook to success.
After some debug, I found that the problem is in the regex search for errors and prompt in network_cli, which is done on the full buffer, after each 4096 bytes from the wire, so it takes an exponential time to parse it.
My change is to only do regex search on the last 1k of the buffer, I think there is no chance of a prompt or error longer than 1k.
It prevents timeout issues, but also greatly speed-up many tasks on those devices.
Currently, for the proof-of-concept, I hardcoded the limit in the source code, but if someone think it could benefit from being an option, I could look on changing that.
My initial issue: https://forum.ansible.com/t/ios-config-and-iosxr-config-for-very-long-configs-50k-lines/40757
ISSUE TYPE
COMPONENT NAME
network_cli
ADDITIONAL INFORMATION
Before changes:
After change: OK, no error
I also did some timing checks on some devices using this command:
time ansible -vvvv -i inventory.yaml --playbook-dir . <DEVICE> -m cisco.ios.ios_command -a '{"commands":["sh runn"]}' -c ansible.netcommon.network_cli
Device with config of 55967 lines / 2188377 bytes
Before: Timeout
After: Ok in 25s
Device with config of 24651 lines / 717168 bytes
Before: Ok in 1m16s
After: Ok in 13s
Device with config of 2740 lines / 67080 bytes
Before: Ok in 13s
After: Ok in 11s