[pubcookie-dev] Re: utf-8 encoding?
Konstantin Ryabitsev
icon at fedoraproject.org
Tue Jun 6 12:37:21 PDT 2006
On 6/6/06, Nathan Dors <dors at cac.washington.edu> wrote:
> In our case, the login cgi doesn't use a utf-8 decoder on its
> input. Instead, the login cgi's xss countermeasures scan for and
> percent-encode harmful chars like '<' and '>' using plain old
> ASCII (I think) as the input encoding. It's safe for it to specify
> "charset=utf-8" on the output because ASCII and utf-8 align for
> those chars. If ASCII and utf-8 didn't align, then harmful chars
> wouldn't be detected and could be passed thru to the browser
> unencoded; in other words, if they didn't align, the "charset="
> enforcement wouldn't be effective.
>
> Does that make sense? That's how I've been thinking about it, but
> I'm no whiz when it comes to character encodings.
To illustate:
User submits "<script>" in utf-8.
Pubcookie does byte-by-byte replacement of all "<" and ">" into <
and > and outputs it on the page. Since utf-8 and ascii are
interoperable, we're good and thwart the XSS.
In utf-7 (which is a 7-bit encoding of utf data), "<script>" looks like so:
icon at rakta:[~]$ echo "<script>" | iconv -f ascii -t utf-7
So, the hacker submits "+ADw-script+AD4" (in ascii, since utf-7 is
just a 7-bit ascii-only representation of multibyte data). Pubcookie
looks through it, doesn't find any "<" or ">" (since there aren't
any), and outputs it on the page. Because the page doesn't specify the
encoding, explorer tries to be clever, sees the "+XXX-" used to encode
utf content, and decides that the page itself must be in utf-7. So, it
decodes that back into "<" and ">", which results in a vulnerability.
If, however, we specifically tell the browser "what you receive is in
utf-8, so don't try to be clever," then the "+ADw-" is going to be
left unencoded and never gets converted in to "<" or ">".
As I was saying, it doesn't matter what encoding you output in, as
long as you tell the browser what it is, so it doesn't try to guess it
-- it's the guessing that can be used against us. Going with utf-8 is
not going to make the application less secure, but will make life much
easier for places like McGill, where we have to use both English and
French and have long since standardized on utf-8 for this purpose.
Cheers,
--
Konstantin Ryabitsev
Montréal, Québec
More information about the pubcookie-dev
mailing list