Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce CI fuzz iterations as we're now timing out #3691

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

TheBlueMatt
Copy link
Collaborator

I think github has slowed down the runners so now our fuzz tests are timing out. Here we just reduce iteration count a bit.

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Mar 29, 2025

I've assigned @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK, seems we hit yet another failing case in the router target now: #3692

tnull
tnull previously approved these changes Mar 31, 2025
else
HFUZZ_RUN_ARGS="$HFUZZ_RUN_ARGS -N1000000"
HFUZZ_RUN_ARGS="$HFUZZ_RUN_ARGS -N500000"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reasoning behind the 10x, 10x and 2x reductions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I probably could have benchmarked, but in general there's not a lot of value in full_stack_target in CI because its just too dense to make any real progress, and similar for chanmon_consistency_target.

@joostjager
Copy link
Contributor

joostjager commented Apr 1, 2025

I checked out a fuzzer run on the attr failures PR and grepped the logs. Interestingly it seems like indeed chanmon_consistency_target takes nearly 3 hours. But the other one you mention, full_stack_target is only 18 seconds?

process_network_graph_target is also slow (2+ hours), and looking at the fuzz log in #3687, it doesn't seem to be resolved with that change.

Maybe I am not interpreting correctly though.

Also from this log, you'd say that all the fast tests (<30 sec) don't need their iteration count changed.

base32_target.rs
Summary iterations:1000002 time:198 speed:5050 crashes_count:0 timeout_count:0 new_units_added:449 slowest_unit_ms:17 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

bech32_parse_target.rs
Summary iterations:1000002 time:22 speed:45454 crashes_count:0 timeout_count:0 new_units_added:897 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

bolt11_deser_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:211 slowest_unit_ms:17 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

chanmon_consistency_target.rs
Summary iterations:100002 time:9769 speed:10 crashes_count:0 timeout_count:87 new_units_added:3435 slowest_unit_ms:1139 guard_nb:566357 branch_coverage_percent:2 peak_rss_mb:42

chanmon_deser_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:963 slowest_unit_ms:17 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

fromstr_to_netaddress_target.rs
Summary iterations:1000002 time:17 speed:58823 crashes_count:0 timeout_count:0 new_units_added:160 slowest_unit_ms:17 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

full_stack_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:797 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

indexedmap_target.rs
Summary iterations:1000002 time:648 speed:1543 crashes_count:0 timeout_count:0 new_units_added:698 slowest_unit_ms:19 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

invoice_deser_target.rs
Summary iterations:1000002 time:20 speed:50000 crashes_count:0 timeout_count:0 new_units_added:2955 slowest_unit_ms:17 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

invoice_request_deser_target.rs
Summary iterations:1000002 time:19 speed:52631 crashes_count:0 timeout_count:0 new_units_added:2244 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_accept_channel_target.rs
Summary iterations:1000002 time:23 speed:43478 crashes_count:0 timeout_count:0 new_units_added:1115 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_accept_channel_v2_target.rs
Summary iterations:1000002 time:246 speed:4065 crashes_count:0 timeout_count:0 new_units_added:1031 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_announcement_signatures_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:322 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_channel_announcement_target.rs
Summary iterations:1000002 time:20 speed:50000 crashes_count:0 timeout_count:0 new_units_added:413 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_channel_details_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:993 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_channel_ready_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:374 slowest_unit_ms:17 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_channel_reestablish_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:434 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_channel_update_target.rs
Summary iterations:1000002 time:19 speed:52631 crashes_count:0 timeout_count:0 new_units_added:210 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_closing_signed_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:380 slowest_unit_ms:17 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_commitment_signed_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:697 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_decoded_onion_error_packet_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:178 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_error_message_target.rs
Summary iterations:1000002 time:19 speed:52631 crashes_count:0 timeout_count:0 new_units_added:108 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_funding_created_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:342 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_funding_signed_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:304 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_gossip_timestamp_filter_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:294 slowest_unit_ms:17 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_init_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:1006 slowest_unit_ms:17 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_node_announcement_target.rs
Summary iterations:1000002 time:21 speed:47619 crashes_count:0 timeout_count:0 new_units_added:851 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_open_channel_target.rs
Summary iterations:1000002 time:35 speed:28571 crashes_count:0 timeout_count:0 new_units_added:1163 slowest_unit_ms:17 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_open_channel_v2_target.rs
Summary iterations:1000002 time:24 speed:41666 crashes_count:0 timeout_count:0 new_units_added:1289 slowest_unit_ms:17 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_ping_target.rs
Summary iterations:1000002 time:19 speed:52631 crashes_count:0 timeout_count:0 new_units_added:91 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_pong_target.rs
Summary iterations:1000002 time:20 speed:50000 crashes_count:0 timeout_count:0 new_units_added:83 slowest_unit_ms:17 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_query_channel_range_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:334 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_query_short_channel_ids_target.rs
Summary iterations:1000002 time:19 speed:52631 crashes_count:0 timeout_count:0 new_units_added:108 slowest_unit_ms:17 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_reply_channel_range_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:158 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_reply_short_channel_ids_end_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:337 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_revoke_and_ack_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:360 slowest_unit_ms:17 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_shutdown_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:333 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_splice_ack_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:387 slowest_unit_ms:17 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_splice_init_target.rs
Summary iterations:1000002 time:19 speed:52631 crashes_count:0 timeout_count:0 new_units_added:401 slowest_unit_ms:17 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_splice_locked_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:319 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_stfu_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:319 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_tx_abort_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:403 slowest_unit_ms:17 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_tx_ack_rbf_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:366 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_tx_add_input_target.rs
Summary iterations:1000002 time:24 speed:41666 crashes_count:0 timeout_count:0 new_units_added:1270 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_tx_add_output_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:390 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_tx_complete_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:285 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_tx_init_rbf_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:393 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_tx_remove_input_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:297 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_tx_remove_output_target.rs
Summary iterations:1000002 time:17 speed:58823 crashes_count:0 timeout_count:0 new_units_added:322 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_tx_signatures_target.rs
Summary iterations:1000002 time:52 speed:19230 crashes_count:0 timeout_count:0 new_units_added:851 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_update_add_htlc_target.rs
Summary iterations:1000002 time:19 speed:52631 crashes_count:0 timeout_count:0 new_units_added:430 slowest_unit_ms:17 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_update_fail_htlc_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:480 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_update_fail_malformed_htlc_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:321 slowest_unit_ms:17 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_update_fee_target.rs
Summary iterations:1000002 time:17 speed:58823 crashes_count:0 timeout_count:0 new_units_added:308 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

msg_update_fulfill_htlc_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:309 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

offer_deser_target.rs
Summary iterations:1000002 time:19 speed:52631 crashes_count:0 timeout_count:0 new_units_added:1333 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

onion_hop_data_target.rs
Summary iterations:1000002 time:18 speed:55555 crashes_count:0 timeout_count:0 new_units_added:981 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

onion_message_target.rs
Summary iterations:1000002 time:24 speed:41666 crashes_count:0 timeout_count:0 new_units_added:395 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

peer_crypt_target.rs
Summary iterations:1000002 time:27 speed:37037 crashes_count:0 timeout_count:0 new_units_added:315 slowest_unit_ms:16 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:42

process_network_graph_target.rs
Summary iterations:735922 time:8434 speed:87 crashes_count:0 timeout_count:0 new_units_added:30 slowest_unit_ms:40 guard_nb:566357 branch_coverage_percent:0 peak_rss_mb:161

@TheBlueMatt
Copy link
Collaborator Author

I checked out a fuzzer run on the attr failures PR and grepped the logs. Interestingly it seems like indeed chanmon_consistency_target takes nearly 3 hours. But the other one you mention, full_stack_target is only 18 seconds?

Ah, thanks for doing that. Yea, I was going on the results when I fuzz with real corpuses, where full_stack_target can get quite slow. But, given we currently don't initialize our CI fuzzers with a real corpus so it really doesn't get into it too much. I did go ahead and update the CI fuzzer to use the hard-coded full_stack_target seeds, but it still doesn't spend much time in complicated paths.

process_network_graph_target is also slow (2+ hours), and looking at the fuzz log in #3687, it doesn't seem to be resolved with that change.

Huh, interesting, I went ahead and slowed this one down though.

Also from this log, you'd say that all the fast tests (<30 sec) don't need their iteration count changed.

Yep!

@TheBlueMatt TheBlueMatt force-pushed the 2025-03-fuzz-less branch 2 times, most recently from c87c3f3 to ad4fb1e Compare April 1, 2025 18:33
Copy link

codecov bot commented Apr 1, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.16%. Comparing base (2e435de) to head (ad4fb1e).
Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3691      +/-   ##
==========================================
- Coverage   89.18%   89.16%   -0.02%     
==========================================
  Files         155      155              
  Lines      120796   120796              
  Branches   120796   120796              
==========================================
- Hits       107731   107710      -21     
- Misses      10415    10432      +17     
- Partials     2650     2654       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@TheBlueMatt TheBlueMatt force-pushed the 2025-03-fuzz-less branch 5 times, most recently from 2773431 to b93b1d6 Compare April 2, 2025 22:23
When we made `test_node_counter_consistency` more aggressively
run, our `process_network_graph` fuzzer got materially slower,
resulting in consistent fuzz CI job timeouts.

Thus, here, we tweak the iteration count on all our fuzz jobs to
get them running in more consistent times.

Further, we further reduce `full_stack_target` iterations in
anticipation of a later commit which will start using our
hard-coded fuzz seeds, creating substantially more coverage and
slowing down fuzzing iterations.
In 3145168 we disabled
`test_node_counter_consistency` in debug builds since it can make
make things very slow, including `lightning-rapid-gossip-sync`
tests.

We should, however, have kept it when fuzzing, since that gives us
testing of potential coverage gaps in normal tests.
This should materially improve our fuzzing coverage in CI.
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Fuzz failure is #3708

Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that fuzzing again found an issue. This is awesome.

Before merging this, I think we still want to see a successful fuzz run well within the timeout?

@TheBlueMatt
Copy link
Collaborator Author

Yea, happy to wait until we can at least run a fuzzing run and check that all the timings make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants