Encoding Issues
Dare Obasanjo talks about the specs and reality, and the variances thereof, in the encoding of xml docs on the web:
All files are sent with a content type of text/xml and no encoding specified in the charset parameter of the Content-Type HTTP header. According to RFC 3023 which Mark Pilgrim quoted in his article that clients should treat them as us-ascii. With the above examples this behavior would be wrong in all four cases.
He then goes on the list the way a client application actually needs to deal with this conundrum - check for:
- the encoding given in the charset parameter of the Content-Type HTTP header, or
- the encoding given in the encoding attribute of the XML declaration within the document, or
- utf-8.
Which is what I stumbled on for BottomFeeder awhile back. I wish Dare had posted this back when I was stumbling in the dark :)


Comments
Untitled
[] February 13, 2004 15:46:13.226
Although non-normative, there is a part of the XML spec that attempts to shed some light on this encoding detection issue: http://www.w3.org/TR/2004/REC-xml-20040204/#sec-guessing