xml

Encoding Issues

February 13, 2004 11:23:43.220

Dare Obasanjo talks about the specs and reality, and the variances thereof, in the encoding of xml docs on the web:

All files are sent with a content type of text/xml and no encoding specified in the charset parameter of the Content-Type HTTP header. According to RFC 3023 which Mark Pilgrim quoted in his article that clients should treat them as us-ascii. With the above examples this behavior would be wrong in all four cases.

He then goes on the list the way a client application actually needs to deal with this conundrum - check for:

  1. the encoding given in the charset parameter of the Content-Type HTTP header, or
  2. the encoding given in the encoding attribute of the XML declaration within the document, or
  3. utf-8.

Which is what I stumbled on for BottomFeeder awhile back. I wish Dare had posted this back when I was stumbling in the dark :)

Comments

Untitled

[] February 13, 2004 15:46:13.226

Although non-normative, there is a part of the XML spec that attempts to shed some light on this encoding detection issue: http://www.w3.org/TR/2004/REC-xml-20040204/#sec-guessing

 Share Tweet This
-->