Mark Pilgrim explains in complete detail why insisting that clients reject invalid xml out of hand is a bad idea. The best part is, as I write this, he has four perfect examples:
Norman Walsh (invalid XML), Danny Ayers (invalid XML), Brent Simmons (invalid XML), Nick Bradbury (invalid XML), and Joe Gregorio (invalid XML claiming to be HTML) have all denounced me as a heretic for pointing out that, perhaps, rejecting invalid XML on the client side is a bad idea. The reason I know that they have denounced me is that I read what they had to say, and the reason I was able to read what they had to say is that my browser is very forgiving of all their various XML wellformedness and validity errors.
All of those folks have been extremely insistent that aggregators should reject bad content out of hand. All of those folks have invalid feeds on that basis right now. The lot of them need to read the rest of Mark's post; they should consider what he has to say carefully. Here's how I checked the validity myself:
I used this code to check whether a feed was valid xml - it just grabs the source xml, and tries to parse it - without handling any errors
source := 'http://inessential.com/xml/rss.xml'. parser := XMLParser new. parser validate: false. doc := (HttpClient new get: source) contents. ^parser parse: doc
I used this snippet of BottomFeeder code to make sure that the invalid xml was getting handled:
doc := Constructor documentFromURL: 'http://bitworking.org/index.rss' forceUpdate: true useMaskedAgent: false. cls := Constructor determineClassToHandle: doc. target := cls objectForData. feed := cls processDocument: doc from: 'http://bitworking.org/index.rss' into: target.
In each case, the simple parse fails with an error, while the BottomFeeder framework deals with the error and produces the content I want to see.