-
Notifications
You must be signed in to change notification settings - Fork 34
Failure to parse RSS feeds that use unencoded <body> #32
Comments
@daveaglick Can you send a link to the spec for the |
It looks like it was briefly considered as a replacement for content. I looked and couldn't find an official mention in any specifications, but there's some references to it around the 2004 time frame: https://web.archive.org/web/20040217110945/http://www.thearchitect.co.uk/weblog/archives/2003/03/000116.html Probably more important is that some blog engines continue to produce it, official specification or not. And that it's valid RSS: https://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Ffeeds.feedburner.com%2FRockfordLhotka The root of the problem in SyndicationFeedReaderWriter is that the body element often contains mixed content (because XHTML is mixed) but the streaming read mode of the parser requires matched node opens and closes. My PR solves the specific body problem by assuming that element is likely to have mixed content and treats it separately by loading it into a single |
Looks like xhtml:body is used in RSS feeds. Given an element in a feed with the xhtml namespace and element name body, readers should expect unencoded xhtml. An example
I think we can add this behavior in. |
Some feeds that use
<body>
instead of encoded<content>
elements are failing. It appears that the parser gets confused about open/close using theXmlReader
in the<body>
element since it attempts to constructSyndicationContent
for all the nested XHTML. This results in exceptions like:For example, see http://feeds.feedburner.com/RockfordLhotka. I'm going to try and fix this by skipping
<body>
elements and using their inner XML as the value for the outerSyndicationContent
. Will submit a PR if that works.The text was updated successfully, but these errors were encountered: