Failure to parse RSS feeds that use unencoded <body> #32

daveaglick · 2018-05-07T21:24:25Z

Some feeds that use <body> instead of encoded <content> elements are failing. It appears that the parser gets confused about open/close using the XmlReader in the <body> element since it attempts to construct SyndicationContent for all the nested XHTML. This results in exceptions like:

'Element' is an invalid XmlNodeType

For example, see http://feeds.feedburner.com/RockfordLhotka. I'm going to try and fix this by skipping <body> elements and using their inner XML as the value for the outer SyndicationContent. Will submit a PR if that works.

The text was updated successfully, but these errors were encountered:

jimmyca15 · 2018-05-08T16:27:42Z

@daveaglick Can you send a link to the spec for the <body> element. I'm having trouble finding it.

daveaglick · 2018-05-08T16:45:23Z

It looks like it was briefly considered as a replacement for content. I looked and couldn't find an official mention in any specifications, but there's some references to it around the 2004 time frame: https://web.archive.org/web/20040217110945/http://www.thearchitect.co.uk/weblog/archives/2003/03/000116.html

Probably more important is that some blog engines continue to produce it, official specification or not. And that it's valid RSS: https://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Ffeeds.feedburner.com%2FRockfordLhotka

The root of the problem in SyndicationFeedReaderWriter is that the body element often contains mixed content (because XHTML is mixed) but the streaming read mode of the parser requires matched node opens and closes. My PR solves the specific body problem by assuming that element is likely to have mixed content and treats it separately by loading it into a single SyndicationContent value without attempting to read it's children.

jimmyca15 · 2018-05-08T17:57:07Z

@drago-draganov

Looks like xhtml:body is used in RSS feeds. Given an element in a feed with the xhtml namespace and element name body, readers should expect unencoded xhtml.

An example

<rss version="2.0" xmlns:xhtml="http://www.w3.org/1999/xhtml">
  <channel>
    <title>xhtml test feed</title>
    <link>http://example.org</link>
    <description>Test feed for xhtml body element</description>
    <item>
      <title>Test Item</title> 
      <link>http://example.org/post1</link> 
      <description>Here we go!</description> 
      <xhtml:body><xhtml:div><xhtml:p>content here demonstrating use of &lt;xhtml:body&gt; element.</xhtml:p></xhtml:div></xhtml:body>
    </item>
    <item>
      <title>Test Item single xmlns</title> 
      <link>http://example.org/post1</link> 
      <description>Here we go!</description> 
      <body xmlns="http://www.w3.org/1999/xhtml"><div><p>content here demonstrating use of &lt;body&gt; element.</p></div></body>
    </item>
  </channel>
</rss>

I think we can add this behavior in.

daveaglick added a commit to daveaglick/SyndicationFeedReaderWriter that referenced this issue May 8, 2018

Support for unencoded RSS bodies (dotnet#32)

bbbbd44

daveaglick mentioned this issue May 15, 2018

Support for unencoded RSS bodies (#32) #33

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure to parse RSS feeds that use unencoded <body> #32

Failure to parse RSS feeds that use unencoded <body> #32

daveaglick commented May 7, 2018

jimmyca15 commented May 8, 2018

daveaglick commented May 8, 2018

jimmyca15 commented May 8, 2018

Failure to parse RSS feeds that use unencoded <body> #32

Failure to parse RSS feeds that use unencoded <body> #32

Comments

daveaglick commented May 7, 2018

jimmyca15 commented May 8, 2018

daveaglick commented May 8, 2018

jimmyca15 commented May 8, 2018