
draft                Languages and character sets               Sept 94


           Characters and character sets for various languages

                       Thu, 01 Sep 1994 09:35:37 MET 1994

                     <draft-ietf-mailext-lang-char-00.txt>

                         Harald Tveit Alvestrand
                               SINTEF DELAB
                    Harald.Alvestrand@delab.sintef.no



    Abstract

    There is a need to have a source of information about the
    characters that are used in various languages. No such information
    is currently readily available on the net. This document attempts
    to fill that void.


    Status of this Memo

    This draft document is being circulated for comment.
    It does not yet cover anything but Latin-based scripts; volunteers
    to collect material for other scripts are sought.
    Promises made have not been kept, so the Cyrillic information is
    still not present; this draft has only minor updateds and
    corrections compared to the June 93 version.

    Please send comments to the author, or to the RARE WG-CHAR list
    <wg-char@rare.nl>.

    The following text is required by the Internet-draft rules:

    This document is an Internet Draft.  Internet Drafts are working
    documents of the Internet Engineering Task Force (IETF), its
    Areas, and its Working Groups. Note that other groups may also
    distribute working documents as Internet Drafts.

    Internet Drafts are draft documents valid for a maximum of six
    months. Internet Drafts may be updated, replaced, or obsoleted by
    other documents at any time.  It is not appropriate to use
    Internet Drafts as reference material or to cite them other than
    as a "working draft" or "work in progress."






Alvestrand                 Expires Aug 2 94                   [Page 1]

draft                Languages and character sets               Feb 94


    Please check the I-D abstract listing contained in each Internet
    Draft directory to learn the current status of this or any other
    Internet Draft.














































Alvestrand                 Expires Aug 2 94                   [Page 2]

draft                Languages and character sets               Feb 94


    1.  Introduction

    There are a lot of languages in the world. Estimates vary between
    500 and 6000, with some eternal conflicts about the difference
    between a language and a dialect guaranteeing that any list
    claiming to be authoritative will be the source of endless debate.

    Many of these languages have a writing system. Some have several.
    These are also likely to have changed over time, with the meaning
    of character symbols changing, the shape of the characters
    changing, or completely new characters being added, or old ones
    removed from the set. This means that even within a single
    language, a list of characters is likely to be controversial.

    These problems have made several experts in the field of languages
    and characters refuse to even consider the idea of working out
    such a list.

    Nevertheless, it is clear that an easily available source of this
    kind of information is needed, in order to:


    (1)  Identify the problems encountered when trying to use
         equipment with limited character support for a language

    (2)  Identify what support for additional characters will be
         "enough" for that language

    (3)  Identify what internationally standardized character sets are
         able to fulfill the requirements for that languag


    The tables given below are an attempt at providing such an
    identification.

    The rest of the document is in 3 parts: The language tables a




    2.  Introduction to language tables








Alvestrand                 Expires Aug 2 94                   [Page 3]

draft                Languages and character sets               Feb 94


    2.1.  Table structure

    Each language is listed in 4 parts:


    (1)  The language name with its ISO 639 code if applicable

    (2)  The characters required for that language. For brevity, the
         characters of ASCII (A-Z) are not listed. Note that some
         languages do NOT require all the ASCII characters.

    (3)  Characters that are in normal use, but have replacements that
         mostly do not change the meaning of the word in context.
         These may be called "optional" characters. This should _not_
         be taken as liberty to remove those characters from the
         language, but as a reminder that if it is great trouble to
         use the charsets that cover the complete language, a smaller
         character set may be used without causing grievous harm to
         the expressive power of the writer.

    (4)  Internationally registered character sets that cover the
         required and/or optional characters for that language.

    (5)  Comments

         The division between "required" and "optional" characters is
         likely to produce much discussion. As a rough guide, I have
         taken the registered ISO 646 variants of a number of
         countries, and classified as "optional" all characters which
         did _not_ appear in that ISO 646 variant. As a result, an ISO
         646 variant should appear under the "required characters
         only" for all languages that have an ISO 646 variant.

         Note that for brevity, only the lower case version of the
         character is listed. If no note is made, one should assume
         that the upper case version is equally required.

         Note, however, that a lot of languages permit the dropping of
         accents on upper case characters where it would be considered
         improper to drop them on lower case characters.









Alvestrand                 Expires Aug 2 94                   [Page 4]

draft                Languages and character sets               Feb 94


    2.2.  Sources utilized

    The table of Latin-script languages is based on work by Johan van
    Wingen.  <BUTPAA@rulmvs.leidenuniv.nl>. The others are best
    guesses by the author.

    The tables of character sets prepared by Keld Jorn Simonsen
    <keld@dkuug.dk> (RFC-KELD) were invaluable in matching the data on
    languages to the data on character sets.

    The language codes (for those languages that have codes) come from
    ISO 639.

    NOTE: ISO 639 is a very incomplete list of the world's languages
    (perhaps 10 or 20 % according to some experts), and is undergoing
    revision. The only reason for using it is that it is the only
    ISO-standardized shorthand notation for languages available at the
    moment.

    Languages for which no such exact information is known are listed
    at the end of the tables.


    2.3.  What accents mean

    For those who feel unfamiliar with the names of accents:


    Grave
         slants upwards to the left, like the Unix "backtick".


    Acute
         slants upwards to the right.


    Circumflex
         looks like a little pointed hat.


    Tilde
         looks like a wavy line.







Alvestrand                 Expires Aug 2 94                   [Page 5]

draft                Languages and character sets               Feb 94


    Macron
         looks like a bar placed on top of the character.


    Breve
         looks like the lower quarter of a circle, placed on top of
         the character.


    Dot above
         should be self-explanatory.


    Diaeresis
         looks like 2 dots above the character.


    Ring above
         should be self-explanatory.


    Cedilla
         looks like a little squiggle on the bottom of the letter,
         down and then left.


    Ogonek
         looks like a squiggle too, but goes down and to the right.


    Caron
         looks like a little "v" on top of the character.


    3.  Language tables   This language has no known character set


    3.1.  lt Lithuanian

    Required characters

    a;    0105 LATIN SMALL LETTER A WITH OGONEK
    e;    0119 LATIN SMALL LETTER E WITH OGONEK
    i;    012f LATIN SMALL LETTER I WITH OGONEK





Alvestrand                 Expires Aug 2 94                   [Page 6]

draft                Languages and character sets               Feb 94


    u;    0173 LATIN SMALL LETTER U WITH OGONEK
    e.    0117 LATIN SMALL LETTER E WITH DOT ABOVE
    u-    016b LATIN SMALL LETTER U WITH MACRON
    c<    010d LATIN SMALL LETTER C WITH CARON
    s<    0161 LATIN SMALL LETTER S WITH CARON
    z<    017e LATIN SMALL LETTER Z WITH CARON

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    T.61-8bit (iso 103)
    ISO_8859-4:1988 (iso 110)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    latin6 (iso 157)
    JIS_X0212-1990 (iso 159)


    3.2.  lv Latvian

    Required characters

    a-    0101 LATIN SMALL LETTER A WITH MACRON
    e-    0113 LATIN SMALL LETTER E WITH MACRON
    i-    012b LATIN SMALL LETTER I WITH MACRON
    o-    014d LATIN SMALL LETTER O WITH MACRON
    u-    016b LATIN SMALL LETTER U WITH MACRON
    g,    0123 LATIN SMALL LETTER G WITH CEDILLA
    k,    0137 LATIN SMALL LETTER K WITH CEDILLA
    l,    013c LATIN SMALL LETTER L WITH CEDILLA
    n,    0146 LATIN SMALL LETTER N WITH CEDILLA
    r,    0157 LATIN SMALL LETTER R WITH CEDILLA
    c<    010d LATIN SMALL LETTER C WITH CARON
    s<    0161 LATIN SMALL LETTER S WITH CARON
    z<    017e LATIN SMALL LETTER Z WITH CARON

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    T.61-8bit (iso 103)





Alvestrand                 Expires Aug 2 94                   [Page 7]

draft                Languages and character sets               Feb 94


    ISO_8859-4:1988 (iso 110)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    latin6 (iso 157)


    3.3.  et Estonian

    Required characters

    o?    00f5 LATIN SMALL LETTER O WITH TILDE
    a:    00e4 LATIN SMALL LETTER A WITH DIAERESIS
    o:    00f6 LATIN SMALL LETTER O WITH DIAERESIS
    u:    00fc LATIN SMALL LETTER U WITH DIAERESIS
    s<    0161 LATIN SMALL LETTER S WITH CARON
    z<    017e LATIN SMALL LETTER Z WITH CARON

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    T.61-8bit (iso 103)
    ISO_8859-4:1988 (iso 110)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    latin6 (iso 157)
    JIS_X0212-1990 (iso 159)


    3.4.  fi Finnish

    Required characters

    a:    00e4 LATIN SMALL LETTER A WITH DIAERESIS
    o:    00f6 LATIN SMALL LETTER O WITH DIAERESIS

    Character sets covering the whole

    NATS-SEFI (iso 8)
    NATS-DANO-ADD (iso 9)
    SEN_850200_B (iso 10)
    SEN_850200_C (iso 11)
    DIN_66003 (iso 21)





Alvestrand                 Expires Aug 2 94                   [Page 8]

draft                Languages and character sets               Feb 94


    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-1:1987 (iso 100)
    ISO_8859-2:1987 (iso 101)
    T.61-8bit (iso 103)
    ISO_8859-3:1988 (iso 109)
    ISO_8859-4:1988 (iso 110)
    T.101-G2 (iso 128)
    CSN_369103 (iso 139)
    ISO_6937-2-add (iso 142)
    ISO_8859-9:1989 (iso 148)
    latin6 (iso 157)
    JIS_X0212-1990 (iso 159)


    3.5.  ?? Sami

    Required characters

    a'    00e1 LATIN SMALL LETTER A WITH ACUTE
    d/    0111 LATIN SMALL LETTER D WITH STROKE
    ng    014b LATIN SMALL LETTER ENG
    t/    0167 LATIN SMALL LETTER T WITH STROKE
    c<    010d LATIN SMALL LETTER C WITH CARON
    s<    0161 LATIN SMALL LETTER S WITH CARON
    z<    017e LATIN SMALL LETTER Z WITH CARON

    Optional characters

    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    a>    00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX
    a:    00e4 LATIN SMALL LETTER A WITH DIAERESIS
    e:    00eb LATIN SMALL LETTER E WITH DIAERESIS
    i:    00ef LATIN SMALL LETTER I WITH DIAERESIS
    o:    00f6 LATIN SMALL LETTER O WITH DIAERESIS
    u:    00fc LATIN SMALL LETTER U WITH DIAERESIS
    ae    00e6 LATIN SMALL LETTER AE
    aa    00e5 LATIN SMALL LETTER A WITH RING ABOVE
    o/    00f8 LATIN SMALL LETTER O WITH STROKE
    n'    0144 LATIN SMALL LETTER N WITH ACUTE

    Comments






Alvestrand                 Expires Aug 2 94                   [Page 9]

draft                Languages and character sets               Feb 94


    Information from Otto Prytz <otto.prytz@kri.uio.no> This
    information is for the current Norwegian North Sami ortography.
    The letters aa, ae and o/ are in use for Norwegian/Swedish names,
    but not for Sami proper. a> and n' are no longer used.  There is
    some doubt about whether e: and i: were ever used, but they are
    listed by van Wingen.

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    T.61-8bit (iso 103)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    latin6 (iso 157)
    JIS_X0212-1990 (iso 159)

    Character sets covering the required characters only

    ISO_8859-4:1988 (iso 110)


    3.6.  sv Swedish

    Required characters

    a:    00e4 LATIN SMALL LETTER A WITH DIAERESIS
    o:    00f6 LATIN SMALL LETTER O WITH DIAERESIS
    aa    00e5 LATIN SMALL LETTER A WITH RING ABOVE

    Optional characters

    a'    00e1 LATIN SMALL LETTER A WITH ACUTE
    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    e:    00eb LATIN SMALL LETTER E WITH DIAERESIS
    u:    00fc LATIN SMALL LETTER U WITH DIAERESIS

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-1:1987 (iso 100)





Alvestrand                 Expires Aug 2 94                  [Page 10]

draft                Languages and character sets               Feb 94


    T.61-8bit (iso 103)
    ISO_8859-4:1988 (iso 110)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    ISO_8859-9:1989 (iso 148)
    latin6 (iso 157)
    JIS_X0212-1990 (iso 159)

    Character sets covering the required characters only

    NATS-SEFI (iso 8)
    SEN_850200_B (iso 10)
    SEN_850200_C (iso 11)


    3.7.  no Norwegian

    Required characters

    ae    00e6 LATIN SMALL LETTER AE
    aa    00e5 LATIN SMALL LETTER A WITH RING ABOVE
    o/    00f8 LATIN SMALL LETTER O WITH STROKE

    Optional characters

    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    o'    00f3 LATIN SMALL LETTER O WITH ACUTE
    o>    00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX
    a!    00e0 LATIN SMALL LETTER A WITH GRAVE
    u:    00fc LATIN SMALL LETTER U WITH DIAERESIS
    a<    01ce LATIN SMALL LETTER A WITH CARON

    Comments

    Information from Johan van Wingen and Otto Prytz

    Character sets covering the whole

    JIS_X0212-1990 (iso 159)

    Character sets covering the required characters only

    NATS-DANO (iso 9)
    NS_4551-1 (iso 60)





Alvestrand                 Expires Aug 2 94                  [Page 11]

draft                Languages and character sets               Feb 94


    NS_4551-2 (iso 61)
    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-1:1987 (iso 100)
    T.61-8bit (iso 103)
    ISO_8859-4:1988 (iso 110)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    ISO_8859-9:1989 (iso 148)
    latin6 (iso 157)


    3.8.  da Danish

    Required characters

    ae    00e6 LATIN SMALL LETTER AE
    aa    00e5 LATIN SMALL LETTER A WITH RING ABOVE
    o/    00f8 LATIN SMALL LETTER O WITH STROKE

    Optional characters

    a'    00e1 LATIN SMALL LETTER A WITH ACUTE
    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    i'    00ed LATIN SMALL LETTER I WITH ACUTE
    o'    00f3 LATIN SMALL LETTER O WITH ACUTE
    u'    00fa LATIN SMALL LETTER U WITH ACUTE
    y'    00fd LATIN SMALL LETTER Y WITH ACUTE

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-1:1987 (iso 100)
    T.61-8bit (iso 103)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    latin6 (iso 157)
    JIS_X0212-1990 (iso 159)

    Character sets covering the required characters only






Alvestrand                 Expires Aug 2 94                  [Page 12]

draft                Languages and character sets               Feb 94


    NATS-DANO (iso 9)
    NS_4551-1 (iso 60)
    NS_4551-2 (iso 61)
    ISO_8859-4:1988 (iso 110)
    ISO_8859-9:1989 (iso 148)


    3.9.  fo Faeroese

    Required characters

    a'    00e1 LATIN SMALL LETTER A WITH ACUTE
    i'    00ed LATIN SMALL LETTER I WITH ACUTE
    o'    00f3 LATIN SMALL LETTER O WITH ACUTE
    u'    00fa LATIN SMALL LETTER U WITH ACUTE
    y'    00fd LATIN SMALL LETTER Y WITH ACUTE
    ae    00e6 LATIN SMALL LETTER AE
    o/    00f8 LATIN SMALL LETTER O WITH STROKE
    d-    00f0 LATIN SMALL LETTER ETH (Icelandic)

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-1:1987 (iso 100)
    T.61-8bit (iso 103)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    latin6 (iso 157)
    JIS_X0212-1990 (iso 159)


    3.10.  is Icelandic

    Required characters

    a'    00e1 LATIN SMALL LETTER A WITH ACUTE
    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    i'    00ed LATIN SMALL LETTER I WITH ACUTE
    o'    00f3 LATIN SMALL LETTER O WITH ACUTE
    u'    00fa LATIN SMALL LETTER U WITH ACUTE
    y'    00fd LATIN SMALL LETTER Y WITH ACUTE
    o:    00f6 LATIN SMALL LETTER O WITH DIAERESIS





Alvestrand                 Expires Aug 2 94                  [Page 13]

draft                Languages and character sets               Feb 94


    ae    00e6 LATIN SMALL LETTER AE
    d-    00f0 LATIN SMALL LETTER ETH (Icelandic)
    th    00fe LATIN SMALL LETTER THORN (Icelandic)

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-1:1987 (iso 100)
    T.61-8bit (iso 103)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    latin6 (iso 157)
    JIS_X0212-1990 (iso 159)


    3.11.  kl Greenlandic

    Required characters

    a'    00e1 LATIN SMALL LETTER A WITH ACUTE
    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    i'    00ed LATIN SMALL LETTER I WITH ACUTE
    u'    00fa LATIN SMALL LETTER U WITH ACUTE
    a>    00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX
    e>    00ea LATIN SMALL LETTER E WITH CIRCUMFLEX
    i>    00ee LATIN SMALL LETTER I WITH CIRCUMFLEX
    o>    00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX
    u>    00fb LATIN SMALL LETTER U WITH CIRCUMFLEX
    ae    00e6 LATIN SMALL LETTER AE
    aa    00e5 LATIN SMALL LETTER A WITH RING ABOVE
    o/    00f8 LATIN SMALL LETTER O WITH STROKE
    a?    00e3 LATIN SMALL LETTER A WITH TILDE
    i?    0129 LATIN SMALL LETTER I WITH TILDE
    u?    0169 LATIN SMALL LETTER U WITH TILDE
    kk    0138 LATIN SMALL LETTER KRA (Greenlandic)

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    T.61-8bit (iso 103)





Alvestrand                 Expires Aug 2 94                  [Page 14]

draft                Languages and character sets               Feb 94


    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)


    3.12.  ?? Gaelic

    Required characters

    a'    00e1 LATIN SMALL LETTER A WITH ACUTE
    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    o'    00f3 LATIN SMALL LETTER O WITH ACUTE
    a!    00e0 LATIN SMALL LETTER A WITH GRAVE
    e!    00e8 LATIN SMALL LETTER E WITH GRAVE
    i!    00ec LATIN SMALL LETTER I WITH GRAVE
    o!    00f2 LATIN SMALL LETTER O WITH GRAVE
    u!    00f9 LATIN SMALL LETTER U WITH GRAVE

    Character sets covering the whole

    GB_2312-80 (iso 58)
    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-1:1987 (iso 100)
    T.61-8bit (iso 103)
    ISO_8859-3:1988 (iso 109)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    JIS_X0212-1990 (iso 159)


    3.13.  ga Irish

    Required characters

    a'    00e1 LATIN SMALL LETTER A WITH ACUTE
    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    i'    00ed LATIN SMALL LETTER I WITH ACUTE
    o'    00f3 LATIN SMALL LETTER O WITH ACUTE
    u'    00fa LATIN SMALL LETTER U WITH ACUTE

    Character sets covering the whole

    GB_2312-80 (iso 58)





Alvestrand                 Expires Aug 2 94                  [Page 15]

draft                Languages and character sets               Feb 94


    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-1:1987 (iso 100)
    ISO_8859-2:1987 (iso 101)
    T.61-8bit (iso 103)
    ISO_8859-3:1988 (iso 109)
    CSA_Z243.4-1985-gr (iso 123)
    T.101-G2 (iso 128)
    CSN_369103 (iso 139)
    ISO_6937-2-add (iso 142)
    ISO_8859-9:1989 (iso 148)
    latin6 (iso 157)
    JIS_X0212-1990 (iso 159)


    3.14.  cy Welsh

    Required characters

    w'    1e83 LATIN SMALL LETTER W WITH ACUTE
    y'    00fd LATIN SMALL LETTER Y WITH ACUTE
    a'    00e1 LATIN SMALL LETTER A WITH ACUTE
    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    i'    00ed LATIN SMALL LETTER I WITH ACUTE
    o'    00f3 LATIN SMALL LETTER O WITH ACUTE
    u'    00fa LATIN SMALL LETTER U WITH ACUTE
    a!    00e0 LATIN SMALL LETTER A WITH GRAVE
    e!    00e8 LATIN SMALL LETTER E WITH GRAVE
    i!    00ec LATIN SMALL LETTER I WITH GRAVE
    o!    00f2 LATIN SMALL LETTER O WITH GRAVE
    u!    00f9 LATIN SMALL LETTER U WITH GRAVE
    w!    1e81 LATIN SMALL LETTER W WITH GRAVE
    y!    1ef3 LATIN SMALL LETTER Y WITH GRAVE
    a>    00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX
    e>    00ea LATIN SMALL LETTER E WITH CIRCUMFLEX
    i>    00ee LATIN SMALL LETTER I WITH CIRCUMFLEX
    o>    00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX
    u>    00fb LATIN SMALL LETTER U WITH CIRCUMFLEX
    w>    0175 LATIN SMALL LETTER W WITH CIRCUMFLEX
    y>    0177 LATIN SMALL LETTER Y WITH CIRCUMFLEX
    a:    00e4 LATIN SMALL LETTER A WITH DIAERESIS
    e:    00eb LATIN SMALL LETTER E WITH DIAERESIS
    i:    00ef LATIN SMALL LETTER I WITH DIAERESIS





Alvestrand                 Expires Aug 2 94                  [Page 16]

draft                Languages and character sets               Feb 94


    o:    00f6 LATIN SMALL LETTER O WITH DIAERESIS
    u:    00fc LATIN SMALL LETTER U WITH DIAERESIS
    w:    1e85 LATIN SMALL LETTER W WITH DIAERESIS
    y:    00ff LATIN SMALL LETTER Y WITH DIAERESIS
    This language has no known character set


    3.15.  br Breton

    Required characters

    e>    00ea LATIN SMALL LETTER E WITH CIRCUMFLEX
    u!    00f9 LATIN SMALL LETTER U WITH GRAVE
    u:    00fc LATIN SMALL LETTER U WITH DIAERESIS
    n?    00f1 LATIN SMALL LETTER N WITH TILDE

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-1:1987 (iso 100)
    T.61-8bit (iso 103)
    ISO_8859-3:1988 (iso 109)
    CSA_Z243.4-1985-gr (iso 123)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    JIS_X0212-1990 (iso 159)


    3.16.  fy Frisian

    Required characters

    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    u'    00fa LATIN SMALL LETTER U WITH ACUTE
    a>    00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX
    e>    00ea LATIN SMALL LETTER E WITH CIRCUMFLEX
    o>    00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX
    u>    00fb LATIN SMALL LETTER U WITH CIRCUMFLEX
    a:    00e4 LATIN SMALL LETTER A WITH DIAERESIS
    e:    00eb LATIN SMALL LETTER E WITH DIAERESIS
    i:    00ef LATIN SMALL LETTER I WITH DIAERESIS
    o:    00f6 LATIN SMALL LETTER O WITH DIAERESIS





Alvestrand                 Expires Aug 2 94                  [Page 17]

draft                Languages and character sets               Feb 94


    u:    00fc LATIN SMALL LETTER U WITH DIAERESIS

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-1:1987 (iso 100)
    T.61-8bit (iso 103)
    ISO_8859-3:1988 (iso 109)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)


    3.17.  nl Dutch

    Required characters

    a'    00e1 LATIN SMALL LETTER A WITH ACUTE
    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    i'    00ed LATIN SMALL LETTER I WITH ACUTE
    o'    00f3 LATIN SMALL LETTER O WITH ACUTE
    u'    00fa LATIN SMALL LETTER U WITH ACUTE
    a:    00e4 LATIN SMALL LETTER A WITH DIAERESIS
    e:    00eb LATIN SMALL LETTER E WITH DIAERESIS
    i:    00ef LATIN SMALL LETTER I WITH DIAERESIS
    o:    00f6 LATIN SMALL LETTER O WITH DIAERESIS
    u:    00fc LATIN SMALL LETTER U WITH DIAERESIS
    ij    0133 LATIN SMALL LIGATURE IJ

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    T.61-8bit (iso 103)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    JIS_X0212-1990 (iso 159)










Alvestrand                 Expires Aug 2 94                  [Page 18]

draft                Languages and character sets               Feb 94


    3.18.  af Afrikaans

    Required characters

    a'    00e1 LATIN SMALL LETTER A WITH ACUTE
    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    e!    00e8 LATIN SMALL LETTER E WITH GRAVE
    a>    00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX
    e>    00ea LATIN SMALL LETTER E WITH CIRCUMFLEX
    i>    00ee LATIN SMALL LETTER I WITH CIRCUMFLEX
    o>    00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX
    u>    00fb LATIN SMALL LETTER U WITH CIRCUMFLEX
    e:    00eb LATIN SMALL LETTER E WITH DIAERESIS
    i:    00ef LATIN SMALL LETTER I WITH DIAERESIS
    o:    00f6 LATIN SMALL LETTER O WITH DIAERESIS

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    T.61-8bit (iso 103)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)


    3.19.  de German

    Required characters

    a:    00e4 LATIN SMALL LETTER A WITH DIAERESIS
    o:    00f6 LATIN SMALL LETTER O WITH DIAERESIS
    u:    00fc LATIN SMALL LETTER U WITH DIAERESIS
    ss    00df LATIN SMALL LETTER SHARP S (German)

    Optional characters

    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    a!    00e0 LATIN SMALL LETTER A WITH GRAVE

    Comments

    The "ss" character exists only in lower case; the upper case
    equivalent is "SS" (2 letters).





Alvestrand                 Expires Aug 2 94                  [Page 19]

draft                Languages and character sets               Feb 94


    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-1:1987 (iso 100)
    T.61-8bit (iso 103)
    ISO_8859-3:1988 (iso 109)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    ISO_8859-9:1989 (iso 148)
    JIS_X0212-1990 (iso 159)

    Character sets covering the required characters only

    DIN_66003 (iso 21)
    ISO_8859-2:1987 (iso 101)
    ISO_8859-4:1988 (iso 110)
    CSN_369103 (iso 139)
    latin6 (iso 157)


    3.20.  fr French

    Required characters

    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    e!    00e8 LATIN SMALL LETTER E WITH GRAVE
    u!    00f9 LATIN SMALL LETTER U WITH GRAVE
    c,    00e7 LATIN SMALL LETTER C WITH CEDILLA
    a!    00e0 LATIN SMALL LETTER A WITH GRAVE

    Optional characters

    a>    00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX
    e>    00ea LATIN SMALL LETTER E WITH CIRCUMFLEX
    i>    00ee LATIN SMALL LETTER I WITH CIRCUMFLEX
    o>    00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX
    u>    00fb LATIN SMALL LETTER U WITH CIRCUMFLEX
    ae    00e6 LATIN SMALL LETTER AE
    oe    0153 LATIN SMALL LIGATURE OE
    e:    00eb LATIN SMALL LETTER E WITH DIAERESIS
    i:    00ef LATIN SMALL LETTER I WITH DIAERESIS
    u:    00fc LATIN SMALL LETTER U WITH DIAERESIS





Alvestrand                 Expires Aug 2 94                  [Page 20]

draft                Languages and character sets               Feb 94


    y:    00ff LATIN SMALL LETTER Y WITH DIAERESIS

    Comments

    ae and y: are very uncommon in current French; there have been
    arguments that all of the others should be "required".

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    T.61-8bit (iso 103)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)

    Character sets covering the required characters only

    IT (iso 15)
    NF_Z_62-010_(1973) (iso 25)
    NF_Z_62-010 (iso 69)
    ISO_8859-1:1987 (iso 100)
    ISO_8859-3:1988 (iso 109)
    CSA_Z243.4-1985-1 (iso 121)
    CSA_Z243.4-1985-2 (iso 122)
    CSA_Z243.4-1985-gr (iso 123)
    ISO_8859-9:1989 (iso 148)
    JIS_X0212-1990 (iso 159)


    3.21.  ca Catalan

    Required characters

    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    i'    00ed LATIN SMALL LETTER I WITH ACUTE
    o'    00f3 LATIN SMALL LETTER O WITH ACUTE
    u'    00fa LATIN SMALL LETTER U WITH ACUTE
    a!    00e0 LATIN SMALL LETTER A WITH GRAVE
    e!    00e8 LATIN SMALL LETTER E WITH GRAVE
    o!    00f2 LATIN SMALL LETTER O WITH GRAVE
    i:    00ef LATIN SMALL LETTER I WITH DIAERESIS
    u:    00fc LATIN SMALL LETTER U WITH DIAERESIS
    l.    0140 LATIN SMALL LETTER L WITH MIDDLE DOT





Alvestrand                 Expires Aug 2 94                  [Page 21]

draft                Languages and character sets               Feb 94


    n?    00f1 LATIN SMALL LETTER N WITH TILDE

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    T.61-8bit (iso 103)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    JIS_X0212-1990 (iso 159)


    3.22.  es Spanish

    Required characters

    n?    00f1 LATIN SMALL LETTER N WITH TILDE
    !I    00a1 INVERTED EXCLAMATION MARK
    ?I    00bf INVERTED QUESTION MARK

    Optional characters

    a'    00e1 LATIN SMALL LETTER A WITH ACUTE
    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    i'    00ed LATIN SMALL LETTER I WITH ACUTE
    o'    00f3 LATIN SMALL LETTER O WITH ACUTE
    u'    00fa LATIN SMALL LETTER U WITH ACUTE
    u:    00fc LATIN SMALL LETTER U WITH DIAERESIS
    c,    00e7 LATIN SMALL LETTER C WITH CEDILLA
    -a    00aa FEMININE ORDINAL INDICATOR
    -o    00ba MASCULINE ORDINAL INDICATOR

    Comments

    Note that this language also uses special punctuation marks.  The
    accented vowels may be mandatory; Spanish speakers who think they
    should be are encouraged to speak up.  Information from Otto Prytz
    <otto.prytz@kri.uio.no>

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)





Alvestrand                 Expires Aug 2 94                  [Page 22]

draft                Languages and character sets               Feb 94


    ANSI_X3.110-1983 (iso 99)
    ISO_8859-1:1987 (iso 100)
    T.61-8bit (iso 103)
    CSA_Z243.4-1985-gr (iso 123)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    ISO_8859-9:1989 (iso 148)
    JIS_X0212-1990 (iso 159)

    Character sets covering the required characters only

    ES (iso 17)
    ES2 (iso 85)
    NC_NC00-10:81 (iso 151)


    3.23.  gl Galician

    Required characters

    a'    00e1 LATIN SMALL LETTER A WITH ACUTE
    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    i'    00ed LATIN SMALL LETTER I WITH ACUTE
    o'    00f3 LATIN SMALL LETTER O WITH ACUTE
    u'    00fa LATIN SMALL LETTER U WITH ACUTE
    u:    00fc LATIN SMALL LETTER U WITH DIAERESIS
    n?    00f1 LATIN SMALL LETTER N WITH TILDE

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-1:1987 (iso 100)
    T.61-8bit (iso 103)
    ISO_8859-3:1988 (iso 109)
    CSA_Z243.4-1985-gr (iso 123)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    ISO_8859-9:1989 (iso 148)
    JIS_X0212-1990 (iso 159)








Alvestrand                 Expires Aug 2 94                  [Page 23]

draft                Languages and character sets               Feb 94


    3.24.  pt Portuguese

    Required characters

    a?    00e3 LATIN SMALL LETTER A WITH TILDE
    o?    00f5 LATIN SMALL LETTER O WITH TILDE
    c,    00e7 LATIN SMALL LETTER C WITH CEDILLA

    Optional characters

    a'    00e1 LATIN SMALL LETTER A WITH ACUTE
    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    i'    00ed LATIN SMALL LETTER I WITH ACUTE
    o'    00f3 LATIN SMALL LETTER O WITH ACUTE
    u'    00fa LATIN SMALL LETTER U WITH ACUTE
    a!    00e0 LATIN SMALL LETTER A WITH GRAVE
    a>    00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX
    e>    00ea LATIN SMALL LETTER E WITH CIRCUMFLEX
    o>    00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX
    u:    00fc LATIN SMALL LETTER U WITH DIAERESIS

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-1:1987 (iso 100)
    T.61-8bit (iso 103)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    JIS_X0212-1990 (iso 159)

    Character sets covering the required characters only

    PT (iso 16)
    PT2 (iso 84)
    ISO_8859-9:1989 (iso 148)


    3.25.  eu Basque

    Required characters

    n?    00f1 LATIN SMALL LETTER N WITH TILDE





Alvestrand                 Expires Aug 2 94                  [Page 24]

draft                Languages and character sets               Feb 94


    c,    00e7 LATIN SMALL LETTER C WITH CEDILLA

    Character sets covering the whole

    ES (iso 17)
    videotex-suppl (iso 70)
    ES2 (iso 85)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-1:1987 (iso 100)
    T.61-8bit (iso 103)
    ISO_8859-3:1988 (iso 109)
    CSA_Z243.4-1985-gr (iso 123)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    ISO_8859-9:1989 (iso 148)
    JIS_X0212-1990 (iso 159)


    3.26.  mt Maltese

    Required characters

    a!    00e0 LATIN SMALL LETTER A WITH GRAVE
    e!    00e8 LATIN SMALL LETTER E WITH GRAVE
    i!    00ec LATIN SMALL LETTER I WITH GRAVE
    o!    00f2 LATIN SMALL LETTER O WITH GRAVE
    u!    00f9 LATIN SMALL LETTER U WITH GRAVE
    i>    00ee LATIN SMALL LETTER I WITH CIRCUMFLEX
    c.    010b LATIN SMALL LETTER C WITH DOT ABOVE
    g.    0121 LATIN SMALL LETTER G WITH DOT ABOVE
    h/    0127 LATIN SMALL LETTER H WITH STROKE
    z.    017c LATIN SMALL LETTER Z WITH DOT ABOVE

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    T.61-8bit (iso 103)
    ISO_8859-3:1988 (iso 109)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    JIS_X0212-1990 (iso 159)





Alvestrand                 Expires Aug 2 94                  [Page 25]

draft                Languages and character sets               Feb 94


    3.27.  it Italian

    Required characters

    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    o'    00f3 LATIN SMALL LETTER O WITH ACUTE
    a!    00e0 LATIN SMALL LETTER A WITH GRAVE
    e!    00e8 LATIN SMALL LETTER E WITH GRAVE
    i!    00ec LATIN SMALL LETTER I WITH GRAVE
    o!    00f2 LATIN SMALL LETTER O WITH GRAVE

    Optional characters

    i'    00ed LATIN SMALL LETTER I WITH ACUTE
    u'    00fa LATIN SMALL LETTER U WITH ACUTE
    u!    00f9 LATIN SMALL LETTER U WITH GRAVE
    i:    00ef LATIN SMALL LETTER I WITH DIAERESIS

    Comments

    The accented characters appear only in the lower case variant in
    the Italian version of ISO 646 (ISO-IR-15).

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-1:1987 (iso 100)
    T.61-8bit (iso 103)
    ISO_8859-3:1988 (iso 109)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    JIS_X0212-1990 (iso 159)

    Character sets covering the required characters only

    GB_2312-80 (iso 58)


    3.28.  rm Rhaeto-Romance

    Required characters






Alvestrand                 Expires Aug 2 94                  [Page 26]

draft                Languages and character sets               Feb 94


    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    a!    00e0 LATIN SMALL LETTER A WITH GRAVE
    e!    00e8 LATIN SMALL LETTER E WITH GRAVE
    o!    00f2 LATIN SMALL LETTER O WITH GRAVE
    a>    00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX
    e>    00ea LATIN SMALL LETTER E WITH CIRCUMFLEX
    i>    00ee LATIN SMALL LETTER I WITH CIRCUMFLEX
    o>    00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX
    o:    00f6 LATIN SMALL LETTER O WITH DIAERESIS
    u:    00fc LATIN SMALL LETTER U WITH DIAERESIS

    Comments

    In van Wingen's table, this appeared as "Rhaetian".

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-1:1987 (iso 100)
    T.61-8bit (iso 103)
    ISO_8859-3:1988 (iso 109)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    JIS_X0212-1990 (iso 159)


    3.29.  ro Romanian

    Required characters

    a>    00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX
    i>    00ee LATIN SMALL LETTER I WITH CIRCUMFLEX
    a(    0103 LATIN SMALL LETTER A WITH BREVE
    s,    015f LATIN SMALL LETTER S WITH CEDILLA
    t,    0163 LATIN SMALL LETTER T WITH CEDILLA

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-2:1987 (iso 101)





Alvestrand                 Expires Aug 2 94                  [Page 27]

draft                Languages and character sets               Feb 94


    T.61-8bit (iso 103)
    T.101-G2 (iso 128)
    CSN_369103 (iso 139)
    ISO_6937-2-add (iso 142)
    JIS_X0212-1990 (iso 159)


    3.30.  hu Hungarian

    Required characters

    a'    00e1 LATIN SMALL LETTER A WITH ACUTE
    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    i'    00ed LATIN SMALL LETTER I WITH ACUTE
    o'    00f3 LATIN SMALL LETTER O WITH ACUTE
    u'    00fa LATIN SMALL LETTER U WITH ACUTE
    o:    00f6 LATIN SMALL LETTER O WITH DIAERESIS
    u:    00fc LATIN SMALL LETTER U WITH DIAERESIS
    o"    0151 LATIN SMALL LETTER O WITH DOUBLE ACUTE
    u"    0171 LATIN SMALL LETTER U WITH DOUBLE ACUTE

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-2:1987 (iso 101)
    T.61-8bit (iso 103)
    T.101-G2 (iso 128)
    CSN_369103 (iso 139)
    ISO_6937-2-add (iso 142)
    JIS_X0212-1990 (iso 159)


    3.31.  sq Albanian

    Required characters

    e:    00eb LATIN SMALL LETTER E WITH DIAERESIS
    c,    00e7 LATIN SMALL LETTER C WITH CEDILLA

    Character sets covering the whole

    videotex-suppl (iso 70)





Alvestrand                 Expires Aug 2 94                  [Page 28]

draft                Languages and character sets               Feb 94


    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-1:1987 (iso 100)
    ISO_8859-2:1987 (iso 101)
    T.61-8bit (iso 103)
    ISO_8859-3:1988 (iso 109)
    CSA_Z243.4-1985-gr (iso 123)
    T.101-G2 (iso 128)
    CSN_369103 (iso 139)
    ISO_6937-2-add (iso 142)
    ISO_8859-9:1989 (iso 148)
    JIS_X0212-1990 (iso 159)


    3.32.  tr Turkish

    Required characters

    a>    00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX
    i>    00ee LATIN SMALL LETTER I WITH CIRCUMFLEX
    u>    00fb LATIN SMALL LETTER U WITH CIRCUMFLEX
    o:    00f6 LATIN SMALL LETTER O WITH DIAERESIS
    u:    00fc LATIN SMALL LETTER U WITH DIAERESIS
    i.    0131 LATIN SMALL LETTER I WITH NO DOT
    c,    00e7 LATIN SMALL LETTER C WITH CEDILLA
    s,    015f LATIN SMALL LETTER S WITH CEDILLA
    g(    011f LATIN SMALL LETTER G WITH BREVE

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    T.61-8bit (iso 103)
    ISO_8859-3:1988 (iso 109)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    ISO_8859-9:1989 (iso 148)


    3.33.  hr Croatian

    Required characters






Alvestrand                 Expires Aug 2 94                  [Page 29]

draft                Languages and character sets               Feb 94


    c'    0107 LATIN SMALL LETTER C WITH ACUTE
    d/    0111 LATIN SMALL LETTER D WITH STROKE
    c<    010d LATIN SMALL LETTER C WITH CARON
    s<    0161 LATIN SMALL LETTER S WITH CARON
    z<    017e LATIN SMALL LETTER Z WITH CARON

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-2:1987 (iso 101)
    T.61-8bit (iso 103)
    T.101-G2 (iso 128)
    CSN_369103 (iso 139)
    JUS_I.B1.002 (iso 141)
    ISO_6937-2-add (iso 142)
    JIS_X0212-1990 (iso 159)


    3.34.  sl Slovenian

    Required characters

    c<    010d LATIN SMALL LETTER C WITH CARON
    s<    0161 LATIN SMALL LETTER S WITH CARON
    z<    017e LATIN SMALL LETTER Z WITH CARON

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-2:1987 (iso 101)
    T.61-8bit (iso 103)
    ISO_8859-4:1988 (iso 110)
    T.101-G2 (iso 128)
    CSN_369103 (iso 139)
    JUS_I.B1.002 (iso 141)
    ISO_6937-2-add (iso 142)
    latin6 (iso 157)
    JIS_X0212-1990 (iso 159)







Alvestrand                 Expires Aug 2 94                  [Page 30]

draft                Languages and character sets               Feb 94


    3.35.  sk Slovak

    Required characters

    y'    00fd LATIN SMALL LETTER Y WITH ACUTE
    a'    00e1 LATIN SMALL LETTER A WITH ACUTE
    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    i'    00ed LATIN SMALL LETTER I WITH ACUTE
    o'    00f3 LATIN SMALL LETTER O WITH ACUTE
    u'    00fa LATIN SMALL LETTER U WITH ACUTE
    a:    00e4 LATIN SMALL LETTER A WITH DIAERESIS
    o>    00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX
    l'    013a LATIN SMALL LETTER L WITH ACUTE
    r'    0155 LATIN SMALL LETTER R WITH ACUTE
    c<    010d LATIN SMALL LETTER C WITH CARON
    d<    010f LATIN SMALL LETTER D WITH CARON
    l<    013e LATIN SMALL LETTER L WITH CARON
    n<    0148 LATIN SMALL LETTER N WITH CARON
    s<    0161 LATIN SMALL LETTER S WITH CARON
    t<    0165 LATIN SMALL LETTER T WITH CARON
    z<    017e LATIN SMALL LETTER Z WITH CARON

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-2:1987 (iso 101)
    T.61-8bit (iso 103)
    T.101-G2 (iso 128)
    CSN_369103 (iso 139)
    ISO_6937-2-add (iso 142)


    3.36.  cs Czech

    Required characters

    y'    00fd LATIN SMALL LETTER Y WITH ACUTE
    a'    00e1 LATIN SMALL LETTER A WITH ACUTE
    e'    00e9 LATIN SMALL LETTER E WITH ACUTE
    i'    00ed LATIN SMALL LETTER I WITH ACUTE
    o'    00f3 LATIN SMALL LETTER O WITH ACUTE
    u'    00fa LATIN SMALL LETTER U WITH ACUTE





Alvestrand                 Expires Aug 2 94                  [Page 31]

draft                Languages and character sets               Feb 94


    e<    011b LATIN SMALL LETTER E WITH CARON
    u0    016f LATIN SMALL LETTER U WITH RING ABOVE
    c<    010d LATIN SMALL LETTER C WITH CARON
    d<    010f LATIN SMALL LETTER D WITH CARON
    n<    0148 LATIN SMALL LETTER N WITH CARON
    r<    0159 LATIN SMALL LETTER R WITH CARON
    s<    0161 LATIN SMALL LETTER S WITH CARON
    t<    0165 LATIN SMALL LETTER T WITH CARON
    z<    017e LATIN SMALL LETTER Z WITH CARON

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-2:1987 (iso 101)
    T.61-8bit (iso 103)
    T.101-G2 (iso 128)
    CSN_369103 (iso 139)
    ISO_6937-2-add (iso 142)


    3.37.  pl Polish

    Required characters

    o'    00f3 LATIN SMALL LETTER O WITH ACUTE
    a;    0105 LATIN SMALL LETTER A WITH OGONEK
    e;    0119 LATIN SMALL LETTER E WITH OGONEK
    c'    0107 LATIN SMALL LETTER C WITH ACUTE
    n'    0144 LATIN SMALL LETTER N WITH ACUTE
    s'    015b LATIN SMALL LETTER S WITH ACUTE
    z'    017a LATIN SMALL LETTER Z WITH ACUTE
    l/    0142 LATIN SMALL LETTER L WITH STROKE
    z.    017c LATIN SMALL LETTER Z WITH DOT ABOVE

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-2:1987 (iso 101)
    T.61-8bit (iso 103)
    T.101-G2 (iso 128)





Alvestrand                 Expires Aug 2 94                  [Page 32]

draft                Languages and character sets               Feb 94


    CSN_369103 (iso 139)
    ISO_6937-2-add (iso 142)
    JIS_X0212-1990 (iso 159)


    3.38.  ?? Sorbian

    Required characters

    o'    00f3 LATIN SMALL LETTER O WITH ACUTE
    e<    011b LATIN SMALL LETTER E WITH CARON
    c'    0107 LATIN SMALL LETTER C WITH ACUTE
    n'    0144 LATIN SMALL LETTER N WITH ACUTE
    s'    015b LATIN SMALL LETTER S WITH ACUTE
    z'    017a LATIN SMALL LETTER Z WITH ACUTE
    l/    0142 LATIN SMALL LETTER L WITH STROKE
    c<    010d LATIN SMALL LETTER C WITH CARON
    r<    0159 LATIN SMALL LETTER R WITH CARON
    s<    0161 LATIN SMALL LETTER S WITH CARON
    z<    017e LATIN SMALL LETTER Z WITH CARON

    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    ISO_8859-2:1987 (iso 101)
    T.61-8bit (iso 103)
    T.101-G2 (iso 128)
    CSN_369103 (iso 139)
    ISO_6937-2-add (iso 142)


    3.39.  eo Esperanto

    Required characters

    u(    016d LATIN SMALL LETTER U WITH BREVE
    c>    0109 LATIN SMALL LETTER C WITH CIRCUMFLEX
    g>    011d LATIN SMALL LETTER G WITH CIRCUMFLEX
    h>    0125 LATIN SMALL LETTER H WITH CIRCUMFLEX
    j>    0135 LATIN SMALL LETTER J WITH CIRCUMFLEX
    s>    015d LATIN SMALL LETTER S WITH CIRCUMFLEX






Alvestrand                 Expires Aug 2 94                  [Page 33]

draft                Languages and character sets               Feb 94


    Character sets covering the whole

    videotex-suppl (iso 70)
    iso-ir-90 (iso 90)
    ANSI_X3.110-1983 (iso 99)
    T.61-8bit (iso 103)
    ISO_8859-3:1988 (iso 109)
    T.101-G2 (iso 128)
    ISO_6937-2-add (iso 142)
    ISO_8859-supp (iso 154)
    JIS_X0212-1990 (iso 159)


    4.  Other languages with appropriate character sets
    Other languages for which appropriate character sets are known are
    listed in the table below.

    Language        Character set

    ar Arabic       ISO-8859-6
    be Byelorussian ISO-8859-5
    bg Bulgarian    ISO-8859-5
    el Greek        ISO-8859-7
    en English      USASCII
    fa Persian      ISO-8859-6
    iw Hebrew       ISO-8859-8
    ja Japanese     ISO-IR-87 (Japanese JIS C6226-1983)
    ko Korean       ISO-IR-149 (Korean KS C 5601-1989)
    la Latin        USASCII
    lo Laotian      ISO-IR-166
    ru Russian      ISO-8859-5
    sw Swahili      USASCII
    th Thai         ISO-IR-166
    uk Ukrainian    ISO-8859-5
    ur Urdu         ISO-8859-6
    vo Volapuk      ISO-8859-1
    zh Chinese      ISO-IR-58 (Chinese GB 2312-80)

    Additional entries in this table are welcome!










Alvestrand                 Expires Aug 2 94                  [Page 34]

draft                Languages and character sets               Feb 94


    4.1.  ISO 10646 only languages

    The following languages can (to the author's limited knowledge) be
    written with the current ISO 10646 standard, but with no other
    registered character sets:


    Language               Country(ies)             Script(s)

    aa Afar                 Somalia, Ethiopia, Djibouti     Latin
    ab Abkhazian            Georgia                         Cyrillic
    am Amharic              Ethiopia                        Ethiopic
    as Assamese             India, Nepal                    Bengali
    ay Aymara               Bolivia, Peru, Chile            Latin
    az Azerbaijani          SNC, Iran, Iraq, Turkey         Cyrillic, Arabic
    ba Bashkir              SNC                             Cyrillic
    bh Bihari               India                           Gujarati (or Kaithi)
    bi Bislama              Vanuatu, New Caledonia          Latin
    bn Bengali              India                           Bengali
    co Corsican             France                          Latin
    fj Fiji                 Fiji                            Latin
    gd Scots                UK                              Latin
    gn Guarani              Paraguay                        Latin
    gu Gujarati             India                           Gujarati
    ha Hausa                Nigeria, Niger, Chad, Sudan,... Latin
    hi Hindi                India                           Devanagari
    hy Armenian             Armenia                         Armenian
    ia Interlingua          None (Artificial Language)      Latin
    ie Interlingue          None (Artificial Language)      Latin
    ik Inupiak              USA, Cannada                    Latin, Cree
    in Indonesian           Indonesia                       Latin
    ji Yiddish              Germany, USA, SNC, Israel       Hebrew
    jw Javanese             Indonesia, Malaysia             Latin, Javanese
    ka Georgian             Georgia                         Georgian
    kk Kazakh               SNC, Afghanistan                Cyrillic, Arabic
    km Cambodian            Cambodia                        Khmer
    kn Kannada              India                           Kannada
    ks Kashmiri             India, Pakistan                 Arabic
    ku Kurdish              SNC, Turkey, Iraq, Iran         Cyrillic, Arabic
    ky Kirghiz              SNC, China, Afghanistan         Cyrillic, Arabic
    ln Lingala              CAR, Congo, Zaire               Latin
    mg Malagasy             Madagascar, Comoro Islands      Latin, Arabic
    mi Maori                New Zealand                     Latin
    mk Macedonian           Greece, Yugoslavia              Greek, Cyrillic





Alvestrand                 Expires Aug 2 94                  [Page 35]

draft                Languages and character sets               Feb 94


    ml Malayalam            India                           Malayalam
    mn Mongolian            Mongolia                        Cyrillic, Mongolian
    mo Moldavian            Romania                         Latin
    mr Marathi              India                           Devanagari
    ms Malay                Malaysia, Thailand              Latin
    my Burmese              Myanmar                         Burmese
    na Nauru                Nauru                           Latin
    ne Nepali               Nepal                           Devanagari
    oc Occitan              France                          Latin
    or Oriya                India                           Oriya
    pa Punjabi              India                           Gurmukhi
    ps Pashto (Western)     Afghanistan, Iran               Arabic
    qu Quechua              Peru                            Latin
    rm Rhaeto               Swizerland                      Latin
    rn Kirundi              Burundi, Uganda                 Latin
    rw Kinyarwanda          Rwanda, Uganda, Zaire           Latin
    sa Sanskrit             India                           Devanagari
    sd Sindhi               Pakistan, India, Afghanistan    Arabic, Gurmukhi
    sg Sangro               Central African Republic        Latin
    si Singhalese           Sri Lanka                       Sinhalese
    sm Samoan               Samoa, USA, New Zealand         Latin
    sn Shona                Zimbabwe, Zambia, Mozambique    Latin
    so Somali               Somalia, Ethiopia, Djibouti     Latin
    sr Serbian              former Yugoslavia               Cyrillic
    ss Siswati              S. Africa, Swaziland            Latin
    st Sesotho              S. Africa, Lesotho              Latin
    su Sudanese             Sudan                           Latin
    ta Tamil                India, Malaysia                 Tamil
    te Tegulu               India                           Telugu
    tg Tajik                Tajikistan                      Arabic
    ti Tigrinya             Ethiopia                        Latin, Ethiopic
    tk Turkmen              SNC, Iran, Afghanistan          Cyrillic, Arabic
    tl Tagalog              Phillipines                     Latin
    tn Setswana             S. Africa, Botswana, Namibia    Latin
    to Tonga (3)            Mozambique                      Latin
    ts Tsonga               Mozambique, Swaziland           Latin
    tt Tatar                SNC                             Cyrillic
    tw Twi (Ewe)            Ghana                           Latin
    uz Uzbek (Southern)     Afghanistan, Turkey             Arabic
    vi Vietnamese           Vietnam, Cambodia, China        Latin
    wo Wolof                Senegal, Mauritania             Latin
    xh Xhosa                S. Africa                       Latin
    yo Yoruba               Nigeria, Togo, Benin            Latin
    zu Zulu                 S. Africa, Lesotho, Malawi      Latin





Alvestrand                 Expires Aug 2 94                  [Page 36]

draft                Languages and character sets               Feb 94


    The information about languages in ISO 10646 was kindly supplied
    by Glenn Adams <glenn@metis.com>

    Languages for which the author does NOT know any proper character
    set include:


    bo Tibetan
    dz Bhutani
    et Estonian
    lt Lithuanian
    lv Latvian, Lettish
    mt Maltese
    sh Serbo-Croatian



    5.  Encoded format of charset data

    This section contains, in a very compact format, all the
    information used to make the technical content of this RFC, apart
    from the content of ISO 639 and RFC 1345.

    It would be helpful if new information was also supplied in this
    format.


    # A list of languages and their required/optional characters.
    # Format:
    # &language Name
    # Required characters
    # Important characters
    # Comments
     &language Lithuanian
     a; e; i; u; e. u- c< s< z<

     &language Latvian
     a- e- i- o- u- g, k, l, n, r, c< s< z<

     &language Estonian
     o? a: o: u: s< z<

     &language Finnish
     a: o:





Alvestrand                 Expires Aug 2 94                  [Page 37]

draft                Languages and character sets               Feb 94


     &language Sami
     a' d/ ng t/ c< s< z<
     e' a> a: e: i: o: u: ae aa o/ n'
    Information from Otto Prytz <otto.prytz@kri.uio.no>
    This information is for the current Norwegian North Sami ortography.
    The letters aa, ae and o/ are in use for Norwegian/Swedish names, but
    not for Sami proper. a> and n' are no longer used.
    There is some doubt about whether e: and i: were ever used, but they
    are listed by van Wingen.

     &language Swedish
     a: o: aa
     a' e' e: u:

     &language Norwegian
     ae aa o/
     e' o' o> a! u: a<
    Information from Johan van Wingen and Otto Prytz

     &language Danish
     ae aa o/
     a' e' i' o' u' y'

     &language Faeroese
     a' i' o' u' y' ae o/ d-

     &language Icelandic
     a' e' i' o' u' y' o: ae d- th

     &language Greenlandic
     a' e' i' u' a> e> i> o> u> ae aa o/ a? i? u? kk

     &language Gaelic
     a' e' o' a! e! i! o! u!

     &language Irish
     a' e' i' o' u'

     &language Welsh
     w' y' a' e' i' o' u' a! e! i! o! u! w! y! a> e> i> o> u> w> y> a: e: i: o: u: w: y:

     &language Breton
     e> u! u: n?






Alvestrand                 Expires Aug 2 94                  [Page 38]

draft                Languages and character sets               Feb 94


     &language Frisian
     e' u' a> e> o> u> a: e: i: o: u:

     &language Dutch
     a' e' i' o' u' a: e: i: o: u: ij

     &language Afrikaans
     a' e' e! a> e> i> o> u> e: i: o: 'n

     &language German
     a: o: u: ss
     e' a!
    The "ss" character exists only in lower case; the upper case equivalent
    is "SS" (2 letters).

     &language French
     e' e! u! c, a!
     a> e> i> o> u> ae oe e: i: u: y:
    ae and y: are very uncommon in current French; there have been arguments
    that all of the others should be "required".

     &language Catalan
     e' i' o' u' a! e! o! i: u: l. c,
     n?
    Information from van Wingen and Otto Prytz.

     &language Spanish
     n?  !I ?I
     a' e' i' o' u' u: c, -a -o
    Note that this language also uses special punctuation marks.
    The accented vowels may be mandatory; Spanish speakers who think they
    should be are encouraged to speak up.
    Information from Otto Prytz <otto.prytz@kri.uio.no>

     &language Galician
     a' e' i' o' u' u: n?

     &language Portuguese
     a? o? c,
     a' e' i' o' u' a! a> e> o> u: o!
    Information from van Wingen and Otto Prytz

     &language Basque
     n? c,





Alvestrand                 Expires Aug 2 94                  [Page 39]

draft                Languages and character sets               Feb 94


     &language Maltese
     a! e! i! o! u! i> c. g. h/ z.

     &language Italian
     e' o' a! e! i! o!
     i' u' u! i:
    The accented characters appear only in the lower case variant in
    the Italian version of ISO 646 (ISO-IR-15).

     &language Rhaeto-Romance
     e' a! e! o! a> e> i> o> o: u:

    In van Wingen's table, this appeared as "Rhaetian".

     &language Romanian
     a> i> a( s, t,

     &language Hungarian
     a' e' i' o' u' o: u: o" u"

     &language Albanian
     e: c,

     &language Turkish
     a> i> u> o: u: i. c, s, g(

     &language Croatian
     c' d/ c< s< z<

     &language Slovenian
     c< s< z<

     &language Slovak
     y' a' e' i' o' u' a: o> l' r' c< d< l< n< s< t< z<

     &language Czech
     y' a' e' i' o' u' e< u0 c< d< n< r< s< t< z<

     &language Polish
     o' a; e; c' n' s' z' l/ z.

     &language Sorbian
     o' e< c' n' s' z' l/ c< r< s< z<






Alvestrand                 Expires Aug 2 94                  [Page 40]

draft                Languages and character sets               Feb 94


     &language Esperanto
     u( c> g> h> j> s>




    6.  REFERENCES


    [ISO 8859]
         Information technology - 8-bit single-byte coded graphic
         character sets

    [ISO 6937]
         Information processing - Coded graphic character set for text
         communication

    [ISO 639]
         Codes for identifying languages (1988 version)

    [ISO 10646]
         Information technology - Universal Multiple-Octet Coded
         Character Set

    [RFC-KELD]
         Keld Simonsen: Character Mnemonics & Character Sets, RFC
         1345, June 1992






















Alvestrand                 Expires Aug 2 94                  [Page 41]

draft                Languages and character sets               Feb 94


    Table of Contents


     Abstract ...................................................    1
     Status of this Memo ........................................    1
    1 Introduction ..............................................    3
    2 Introduction to language tables ...........................    3
    2.1 Table structure .........................................    4
    2.2 Sources utilized ........................................    5
    2.3 What accents mean .......................................    5
    3 Language tables ...........................................    6
    3.1 lt Lithuanian ...........................................    6
    3.2 lv Latvian ..............................................    7
    3.3 et Estonian .............................................    8
    3.4 fi Finnish ..............................................    8
    3.5 ?? Sami .................................................    9
    3.6 sv Swedish ..............................................   10
    3.7 no Norwegian ............................................   11
    3.8 da Danish ...............................................   12
    3.9 fo Faeroese .............................................   13
    3.10 is Icelandic ...........................................   13
    3.11 kl Greenlandic .........................................   14
    3.12 ?? Gaelic ..............................................   15
    3.13 ga Irish ...............................................   15
    3.14 cy Welsh ...............................................   16
    3.15 br Breton ..............................................   17
    3.16 fy Frisian .............................................   17
    3.17 nl Dutch ...............................................   18
    3.18 af Afrikaans ...........................................   19
    3.19 de German ..............................................   19
    3.20 fr French ..............................................   20
    3.21 ca Catalan .............................................   21
    3.22 es Spanish .............................................   22
    3.23 gl Galician ............................................   23
    3.24 pt Portuguese ..........................................   24
    3.25 eu Basque ..............................................   24
    3.26 mt Maltese .............................................   25
    3.27 it Italian .............................................   26
    3.28 rm Rhaeto-Romance ......................................   26
    3.29 ro Romanian ............................................   27
    3.30 hu Hungarian ...........................................   28
    3.31 sq Albanian ............................................   28
    3.32 tr Turkish .............................................   29
    3.33 hr Croatian ............................................   29





Alvestrand                 Expires Aug 2 94                  [Page 42]

draft                Languages and character sets               Feb 94


    3.34 sl Slovenian ...........................................   30
    3.35 sk Slovak ..............................................   31
    3.36 cs Czech ...............................................   31
    3.37 pl Polish ..............................................   32
    3.38 ?? Sorbian .............................................   33
    3.39 eo Esperanto ...........................................   33
    4 Other languages with appropriate character sets ...........   34
    4.1 ISO 10646 only languages ................................   35
    5 Encoded format of charset data ............................   37
    6 REFERENCES ................................................   41







































Alvestrand                 Expires Aug 2 94                  [Page 43]

