This post from BitWorking is everything you need to know about XML in one easy package. Here's what I mean:
When I tried to load it into an XML parser it failed to load. As it turns out the file was riddled with character set encoding problems, in particular quote marks. After much hand tweaking I finally have it in shape where a real XML parser can open in up. Now I can get on with the job of importing the data into pyblosxom. It isn't supposed to be this hard.
The real value of XML is interop and the currency of that interop is syntax as expressed in the term "well-formed".
I love this theory from the XML crowd. The theory being, "If only all XML data were well formed, our parsers would work". Well, deal with reality - not all XML will be well formed. Ever. There will always be crap out there. Some of it will be crap that people want to read. So, do you do what the advocates say - auto-reject the bad stuff and keep your parser "pure"? Or do what a sensible person does - have your parser deal with errors gracefully (logging them in some fashion so that you have a shot at notifying the producer) and moving on? The Atom advocates are still in the navel gazing phase of imagining a perfect world of all well formed XML. The rest of us who live in the real world have accepted the reality of bad content and moved on.