WAF Harvester parsing issues #309

benjwadams · 2023-05-22T18:03:24Z

WAF harvesting can fail to parse on numerous things which are a de facto a WAF, such as this listing: https://gcoos4.tamu.edu/erddap/metadata/iso19115/xml/

Because the harvester is looking explicitly for "a href", anything that doesn't exactly follow that string ordering will fail to harvest? Is there any reason why a proper XML parsing library isn't used when finding links instead of using a parsing library, which has known pitfalls when parsing XML?

Also, on the above link, the "apache" parser is used due to the "Server" header, even though this is clearly not an Apache directory listing, but rather a reverse proxied application. This was difficult to track down when I had to create custom logic for the "other" parser to account for some of the shortcomings of the WAF parser mentioned above.

amercader · 2023-07-06T12:09:37Z

@benjwadams you are right that the parser used in WAF is very brittle. Any improvements on that front would be a great contribution

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WAF Harvester parsing issues #309

WAF Harvester parsing issues #309

benjwadams commented May 22, 2023 •

edited

Loading

amercader commented Jul 6, 2023

WAF Harvester parsing issues #309

WAF Harvester parsing issues #309

Comments

benjwadams commented May 22, 2023 • edited Loading

amercader commented Jul 6, 2023

benjwadams commented May 22, 2023 •

edited

Loading