[Alpine-info] long MIME encoded-word subject lines
mbmiller at taxa.epi.umn.edu
Tue May 20 13:04:03 PDT 2008
Thanks for correcting me on this, Mark. Also in RFC 2047 they say:
The length restrictions are included both to ease interoperability
through internetwork mail gateways, and to impose a limit on the
amount of lookahead a header parser must employ (while looking for a
final ?= delimiter) before it can decide whether a token is an
"encoded-word" or something else.
That is from Section 2, referred to below. So the plan was to do the
encoding in a way that would make decoding easier. The problem with this
plan is that when the encoding is done wrong, the decoding also is wrong,
but it doesn't have to be wrong. We can decode longer strings. I think
you are taking the rule below the wrong way. The whole point of it is to
save a little extra work in decoding by assuming that the encoding was
done according to the RFC. I think that now with a dozen years of
hindsight we can see that the plan has failed. I think they were remiss
in not considering a robust implementation for decoding headers.
On Tue, 20 May 2008, Mark Crispin wrote:
> On Tue, 20 May 2008, Mike Miller wrote:
>> What I am saying is that I think there is no rule stating that the MUA
>> should not decode long strings. The RFC says something different.
> The comment from RFC 2045 that you quoted describes "encoded lines" in the
> body of messages using the QUOTED-PRINTABLE encoding. It does not describe
> encoded words in headers. Two different things.
> The definition of encoded words is in RFC 2047:
> (1) Any message or body part header field defined as '*text', or any
> user-defined header field, should be parsed as follows: Beginning
> at the start of the field-body and immediately following each
> occurrence of 'linear-white-space', each sequence of up to 75
> printable characters (not containing any 'linear-white-space')
> should be examined to see if it is an 'encoded-word' according to
> the syntax rules in section 2. Any other sequence of printable
> characters should be treated as ordinary ASCII text.
> (2) Any header field not defined as '*text' should be parsed
> according to the syntax rules for that header field. However,
> any 'word' that appears within a 'phrase' should be treated as an
> 'encoded-word' if it meets the syntax rules in section 2.
> Otherwise it should be treated as an ordinary 'word'.
> (3) Within a 'comment', any sequence of up to 75 printable characters
> (not containing 'linear-white-space'), that meets the syntax
> rules in section 2, should be treated as an 'encoded-word'.
> Otherwise it should be treated as normal comment text.
> -- Mark --
> Science does not emerge from voting, party politics, or public debate.
> Si vis pacem, para bellum.
More information about the Alpine-info