Detect fuzzing issues by considering past results #2054

phi-go · 2025-02-04T09:27:46Z

Hello, as part of some research we analyzed fuzzer performance degradation by looking at the reasons why fuzzing coverage reduces for C/C++ projects in OSS-Fuzz. We found that there are several types of issues that are easier to detect by comparing to past reports.

I would be happy to implement these metrics if you are interested.

Detecting coverage drops would be a generic way to detect degradation, this is already discussed here: idea: treat a major coverage drop an issue! google/oss-fuzz#11398. Here a threshold would need to be decided, maybe percentage or absolute number of lines.
A common reason for large coverage drops is the vendoring of third-party library code, though, sometimes also project specific code. If you agree that library code should not be included in the coverage measurement, large changes should cause an alert and be ignored. See grpc-httpjson-transcoding as an example, which is by itself a few hundred lines of code with close to 100% coverage but vendored 100k lines of library code.
Compare the fuzz targets over time. It sometimes happens that a project starts to have a partial build failure that only stops one (or few) fuzz target from building, while not necessarily causing a build failure issue to be created for the project. For example this happened with curl: idea: treat a major coverage drop an issue! google/oss-fuzz#11398 (comment)
The number of corpus entries is normally quite stable. But due to the way coverage is collected can fluctuate and drop to a fraction of the real size: Reported coverage results do not match corpus google/oss-fuzz#12986 and Understanding inconsistent coverage reports google/oss-fuzz#11935. So this could be detected by looking at past corpus sizes. Though, if I understand correctly the seed corpus is combined across fuzz targets? Alternatively, a expected number of corpus entries for covered code branches/lines could be decided. For example covering 10k lines with five corpus entries does not seem like effective fuzzing.

This is also related to diffing runs: #734

I can also provide more examples if you want, just wanted to keep it short.

DavidKorczynski · 2025-02-06T22:51:18Z

I like these ideas a lot and would be more than happy to review PRs.

Regarding third-party code, then my personal position is that any third-party code in your target is from a security standpoint the same as your own code, as longs as it's reachable/triggerable from untrusted input. So I think it's a bit more nuanced than just excluding third-party code.

In general I like the direction of these ideas and would happy to land them. I think these would require most changes to be done in the webapp rather than core, but am happy in either case to review and get PRs landed.

phi-go · 2025-02-11T16:02:41Z

Happy to hear you are interested. It will take a bit before I have some real results as I'm still getting familiar with the code.

Regarding third-party code, then my personal position is that any third-party code in your target is from a security standpoint the same as your own code, as longs as it's reachable/triggerable from untrusted input. So I think it's a bit more nuanced than just excluding third-party code.

I understand your point to be that, third party code included in the project can have the same impact on security as project code. I definitely agree, however, what I am not quite sure about is who is responsible for testing/fuzzing the third-party code. So maybe we can discuss this a bit.

Thinking about this some more, we could differentiate between:

(1.) Code that is actually vendored, so copied into the repo
Code that is included only as a dependency, this code can be split in two again:
- (2.) a dependency that is already fuzzed separately
- (3.) a dependency that is not fuzzed separately

I would only exclude code coverage for category 2. I guess the alternative would be to duplicate the fuzzer harnesses for this dependency, which seem wasteful to me. There is however the argument that the project might use the library code in a specific way that is not already tested for.

For me the big reason to exclude code coverage of these dependencies is to make the coverage metric more meaningful. Coming back to the grpc-httpjson-transcoding example, I actually made a mistake and the code is not vendored but should be of category 2. So if the "real" coverage of this project drops we would not really know, a current introspector report also seems to suggest that there is hardly any fuzzing going on. Is this just because the runtime coverage is higher than static reachable code?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect fuzzing issues by considering past results #2054

Detect fuzzing issues by considering past results #2054

phi-go commented Feb 4, 2025

DavidKorczynski commented Feb 6, 2025

phi-go commented Feb 11, 2025

Detect fuzzing issues by considering past results #2054

Detect fuzzing issues by considering past results #2054

Comments

phi-go commented Feb 4, 2025

DavidKorczynski commented Feb 6, 2025

phi-go commented Feb 11, 2025