\begindata{text,1461464}
\textdsversion{12}
\template{default}
\define{rationale
}
The following is added right before Section 5.1:


\rationale{NOTE ON TRANSLATING ENCODINGS:

The quoted-printable and base64 encodings are designed so that conversion

between them is possible. The only issue that arises in such a conversion is

the handling of line breaks. When converting from quoted-printable to base64 a

line break must be converted into a CRLF sequence. Similarly, a CRLF sequence

in base64 data should be converted to a quoted-printable line break, but ONLY

when converting text data.


NOTE ON CANONICAL ENCODING MODEL:   There was some confusion, in earlier 
drafts of this memo, regarding the model for when email data was to be 
converted to canonical form and encoded, and in particular how this process 
would affect the treatment of CRLFs, given that the representation of newlines 
varies greatly from system to system.  For this reason, a canonical model for 
encoding is presented as Appendix H.}


This is the new Appendix H:


There was some confusion, in earlier drafts of this memo, regarding the model 
for when email data was to be converted to canonical form and encoded, and in 
particular how this process would affect the treatment of CRLFs, given that 
the representation of newlines varies greatly from system to system.  For this 
reason, a canonical model for encoding is presented below.


The process of composing a MIME message part can be modelled as being done

in a number of steps. Note that these steps are roughly similar to those

steps used in RFC1113:


Step 1. Creation of local form.


   The body part to be transmitted is created in the system's native format.

   The native character set is used, and where appropriate local end of line

   conventions are used as well. The may be a UNIX-style text file, or

   a Sun raster image, or a VMS indexed file, or audio data in a

   system-dependent format stored only in memory, or anything else that

   corresponds to the local model for the representation of some form of

   information.


Step 2. Conversion to canonical form.


   The entire body part, including "out-of-band" information such as record

   lengths and possibly file attribute information, is converted to a

   universal canonical form. The specific content type of the body part

   as well as its associated attributes dictate the nature of the canonical

   form that is used. Conversion to the proper canonical form may involve

   character set conversion, transformation of audio data, compression, or

   various other operations specific to the various content types.


   For example, in the case of text/plain data, the text must be converted to

   a supported character set and lines must be delimited with CRLF delimiters

   in accordance with RFC822. Note that the restriction on line lengths 

   implied by RFC822 is eliminated if the next step employs either

   quoted-printable or base64 encoding.


Step 3. Apply transfer encoding.


   A Content-Transfer-Encoding appropriate for this body part is applied. Note

   that there is no fixed relationship between the content type and the

   transfer encoding. In particular, it may be appropriate to base the choice

   of base64 or quoted-printable on character frequency counts which are

   specific to a given instance of body part.


Step 4. Insertion into message.


  The encoded object is inserted into a MIME message with appropriate body

  part headers and boundary markers.


It is vital to note that these steps are only a model; they are specifically

NOT a blueprint for how an actual system would be built. In particular, the

model fails to account for two common designs:


1. In many cases the conversion to a canonical form prior to encoding will

   be subsumed into the encoder itself, which understands local formats

   directly. For example, the local newline convention for text bodyparts

   might be carried through to the encoder itself along with knowledge of

   what that format is.


2. The output of the encoders may have to pass through one or more additional

   steps prior to being transmitted as a message. As such, the output of the

   encoder may not be compliant with the formats specified by RFC822. In

   particular, once again it may be appropriate for the converter's output

   to be expressed using local newline conventions rather than using the

   standard RFC822 CRLF delimiters.


Other implementation variations are conceivable as well. The only important

aspect of this discussion is that the resulting messages are consistent with

those produced by the model described here.


--- Rule #1 of the quoted-printable encoding is slightly modified.


Rule #1: (General 8-bit representation) Any octet, except those indicating a

line break according to the newline convention of the canonical form of the

data being encoded, may be represented by an "=" followed by a two digit

hexadecimal representation of the octet's value. The digits of the hexadecimal

alphabet, for this purpose, are "0123456789ABCDEF". Uppercase letters must be

used when sending hexadecimal data, though a robust implementation may choose

to recognize lowercase letters on receipt. Thus, for example, the value 12

(ASCII form feed) can be represented by "=0C", and the value 61 (ASCII EQUAL

SIGN) can be represented by "=3D". Except when the following rules allow an

alternative encoding, this rule is mandatory.


--- Rule #4 is slightly modified, and a paragraph has been added.


Rule #4 (Line Breaks): A line break in a text body part, independent of

what its representation is following the canonical representation of the data

being encoded, must be represented by a (RFC 822) line break, which is a CRLF

sequence, in the Quoted-Printable encoding. If isolated CRs and LFs, or LF CR

and CR LF sequences are allowed to appear in binary data according to the

canonical form, they must be represented  using the "=0D", "=0A", "=0A=0D" and

"=0D=0A" notations respectively.


Note that many implementation mays elect to encode the local representation of

various content types directly. In particular, this may apply to plain text

material on systems that use newline conventions other than CRLF delimiters.

Such an implementation is permissible, but the generation of line breaks must

be generalized to account for the case where alternate representations of

newline sequences are used.


--- The following is inserted near the end of section 5.2:


Care must be taken to use the proper octets for line breaks if base64 encoding

is applied directly to text material that has not been converted to canonical

form. In particular, text line breaks should be converted into CRLF sequences

prior to base64 encoding. The important thing to note is that this may be done

directly by the encoder rather than in a prior canonicalization step in some

implementations.


--- In appendix B, the first two guidelines are changed:


(1) Under some circumstances the encoding used for data may change as part of

normal gateway or user agent operation. In particular, conversion from base64

to quoted-printable and vice versa may be necessary. This may result in the

confusion of CRLF sequences with line breaks in text body parts. As such, the

persistence of CRLF as something other than a line break should not be relied

on.


(2) Many systems may elect to represent and store text data using local 
newline

conventions. Local newline conventions may not match the RFC822 CRLF

convention -- systems are known that use plain CR, plain LF, CRLF, or counted

records.  The result is that isolated CR and LF characters  are  not  well

tolerated in   general; they may be lost or converted to delimiters on some

systems, and hence should not be relied on.


\enddata{text,1461464}
