From xemacs-m  Sun May 11 23:23:25 1997
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1])
	by xemacs.org (8.8.5/8.8.5) with SMTP id XAA26417
	for <xemacs-beta@xemacs.org>; Sun, 11 May 1997 23:23:24 -0500 (CDT)
Received: from Eng.Sun.COM ([129.146.1.25]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id VAA24536 for <xemacs-beta@xemacs.org>; Sun, 11 May 1997 21:35:56 -0700
Received: from kindra.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id VAA29168; Sun, 11 May 1997 21:22:49 -0700
Received: from xemacs.eng.sun.com by kindra.eng.sun.com (SMI-8.6/SMI-SVR4)
	id VAA21797; Sun, 11 May 1997 21:22:51 -0700
Received: by xemacs.eng.sun.com (SMI-8.6/SMI-SVR4)
	id VAA14675; Sun, 11 May 1997 21:22:46 -0700
Date: Sun, 11 May 1997 21:22:46 -0700
Message-Id: <199705120422.VAA14675@xemacs.eng.sun.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
From: Martin Buchholz <mrb@Eng.Sun.COM>
To: XEmacs Beta Test <xemacs-beta@xemacs.org>
Subject: Emacs 20.1 NEWS file from Mule mailing list
X-Mailer: VM 6.24 under 20.1 XEmacs Lucid (beta15)
Reply-To: Martin Buchholz <mrb@Eng.Sun.COM>

The Mule part of the pre-release NEWS file for Emacs 20.1 follows.

Martin
----------------------------------------------------------------------
** International character set support (MULE)

Emacs now supports a wide variety of international character sets,
including European variants of the Latin alphabet, as well as Chinese,
Devanagari (hindi and Marathi), Ethiopian, Greek, IPA, Japanese,
Korean, Lao, Russian, Thai, Tibetan, and Vietnamese scripts.  These
features have been merged from the modified version of Emacs known as
MULE (for "MULti-lingual Enhancement to GNU Emacs")

Users of these scripts have established many more-or-less standard
coding systems for storing files.  Emacs uses a single multibyte
character encoding within Emacs buffers; it can translate from a wide
variety of coding systems when reading a file and can translate back
into any of these coding systems when saving a file.

Keyboards, even in the countries where these character sets are used,
generally don't have keys for all the characters in them.  So Emacs
supports various "input methods", typicaly one for each script or
language, to make it possible to type them.

The Emacs internal multibyte encoding represents a non-ASCII
character a sequence of bytes in the range 0200 through 0377.

C-x RET is a new prefix command used for commands that pertain to
coding systems and multibyte characters.

*** Input methods

An input method is a kind of character conversion which is designed
specifically for interactive input.  In Emacs, typically each language
has its own input method (though sometimes several languages which use
the same characters can share one input method).  Some languages
support several input methods.

The simplest kind of input method works by mapping ASCII letters into
another alphabet.  This is how the Greek and Russian input methods
work.

A more powerful technique is composition: converting sequences of
characters into one letter.  Many European input methods use
composition to produce a single non-ASCII letter from a sequence which
consists of a letter followed by diacritics.  For example, a' is one
sequence of two characters that might be converted into a single
letter.

The input methods for syllabic scripts typically use mapping followed
by conversion.  The input methods for Thai and Korean work this way.
First, letters are mapped into symbols for particular sounds or tone
marks; then, sequences of these which make up a whole syllable are
mapped into one syllable sign--most often a "composite character".

None of these methods works very well for Chinese and Japanese, so
they are handled specially.  First you input a whole word using
phonetic spelling; then, after the word is in the buffer, Emacs
converts it into one or more characters using a large dictionary.

Since there is more than one way to represent a phonetically spelled
word using Chinese characters, Emacs can only guess which one to use;
typically these input methods give you a way to say "guess again" if
the first guess is wrong.

*** The command C-x RET m (toggle-enable-multibyte-characters)
turns multibyte character support on or off for the current buffer.

If multibyte character support is turned off in a buffer, then each
byte is a single character, even codes 0200 through 0377--exactly as
they did in Emacs 19.34.  This includes the features for support for
the European character, ISO Latin-1 and ISO Latin-2.

However, there is no need to turn off multibyte character support to
use ISO Latin-1 or ISO Latin-2; the Emacs multibyte character set
includes all the characters in these character sets, and Emacs can
translate automatically to and from either one.

*** Displaying international characters on X Windows.

A font for X typically displays just one alphabet or script.
Therefore, displaying the entire range of characters Emacs supports
requires using many fonts.

Therefore, Emacs now supports "fontsets".  Each fontset is a
collection of fonts, each assigned to a range of character codes.

A fontset has a name, like a font.  Individual fonts are defined by
the X server; fontsets are defined within Emacs itself.  But once you
have defined a fontset, you can use it in a face or a frame just as
you would use a font.

If a fontset specifies no font for a certain character, or if it
specifies a font that does not exist on your system, then it cannot
display that character.  It will display an empty box instead.

The fontset height and width are determined by the ASCII characters
(that is, by the font in the fontset which is used for ASCII
characters).  If another font in the fontset has a different height,
or the wrong width, then characters assigned to that font are clipped,
and displayed within a box if highlight-wrong-size-font is non-nil.

*** Defining fontsets.

Emacs creates a standard fontset automatically according to the value
of standard-fontset-spec.  This fontset's name is `fontset-standard'.
Bold, italic, and bold-italic variants of the standard fontset are
created automatically.

If you specify a default ASCII font with the `Font' resource or `-fn'
argument, a fontset is generated from it.  This works by replacing the
FOUNDARY, FAMILY, ADD_STYLE, and AVERAGE_WIDTH fields of the font name
with `*' then using this to specify a fontset

Emacs checks resources of the form Fontset-N where N is 0, 1, 2...
The resource value should have this form:
	FONTSET-NAME, [CHARSET-NAME:FONT-NAME]...
FONTSET-NAME should have the form of a standard X font name, except:
	* most fields should be just the wild card "*".
	* the CHARSET_REGISTRY field should be "fontset"
	* the CHARSET_ENCODING field can be any nickname of the fontset.
The construct CHARSET-NAME:FONT-NAME can be repeated any number
of times; each time specifies the font for one character set.
CHARSET-NAME should be the name name of a character set, and
FONT-NAME should specify an actual font to use for that character set.

Each of these fontsets has an alias which is made from the
last two font name fields, CHARSET_REGISTRY and CHARSET_ENCODING.
You can refer to the fontset by that alias or by its full name.

For any character sets that you don't mention, Emacs tries to choose a
font by substituting into FONTSET-NAME.  For instance, with the
following resource,
	Emacs*Fontset-0: -*-fixed-medium-r-normal-*-24-*-*-*-*-*-fontset-24
the font for ASCII is generated as below:
	-*-fixed-medium-r-normal-*-24-*-ISO8859-1
Here is the substitution rule:
    Change CHARSET_REGISTRY and CHARSET_ENCODING to that of the charset
    defined in the variable x-charset-registries.  For instance, ASCII has
    the entry (ascii . "ISO8859-1") in this variable.  Then, reduce
    sequences of wild cards -*-...-*- with a single wildcard -*-.
    (This is to prevent use of auto-scaled fonts.)

The function which processes the fontset resource value to create the
fontset is called create-fontset-from-fontset-spec.  You can also call
that function explicitly to create a fontset.

With the X resource Emacs.Font, you can specify a fontset name just
like an actual font name.  But be careful not to specify a fontset
name in a wildcard resource like Emacs*Font--that tries to specify the
fontset for other purposes including menus, and they cannot handle
fontsets.

*** The command M-x set-language-environment sets certain defaults
for a particular choice of language.

The effects of selecting a language environment typically include a
default input method and an order for trying to recognize various
coding systems.  It may also include a default choice of character
code for new files.

*** The command C-x RET f (set-buffer-file-coding-system)
specifies the file coding system for the current buffer.  This
specifies what sort of character code translation to do when saving
the file.  As an argument, you must specify the name of one of the
coding systems that Emacs supports.

*** If you specify a prefix argument in the commands to visit, read,
and write files, when calling them interactively, you can now specify
which coding system to use for reading or writing the file.

These commands include file-file and find-alternate-file, as well as
all of their variants; also insert-file, write-file and
append-to-file.

*** The command C-x RET t (set-terminal-coding-system) specifies
the coding system for terminal output.  If you specify a character
code for terminal output, all characters output to the terminal are
translated into that characters code.

This feature is useful for certain character-only terminals built in
various countries to support the languages of those countries.

By default, output to the terminal is not translated at all.

*** The command C-x RET k (set-keyboard-coding-system) specifies
the coding system for keyboard input.

Character code translation of keyboard input is useful for terminals
with keys that send non-ASCII graphic characters--for example,
some terminals designed for ISO Latin-1 or subsets of it.

By default, keyboard input is not translated at all.

Character code translation of keyboard input is similar to using an
input method, in that both define sequences of keyboard input that
translate into single characters.  However, input methods are designed
to be convenient for interactive use, while the code translations are
designed to work with terminals.

*** The command C-x RET p (set-current-process-coding-system)
specifies the coding system for input and output to a subprocess.
This command applies to the current buffer; normally, each subprocess
has its own buffer, and thus you can use this command to specify
translation to and from a particular subprocess by giving the command
in the corresponding buffer.

By default, process input and output are not translated at all.

*** The command C-\ (toggle-input-method) activates or inactivates
an input method.  If no input method has been selected before, the
command prompts for you to specify the language and input method you
want to use.

C-u C-\ (select-input-method) lets you switch to a different input
method.  C-h C-\ descriibes the current input nethod.

*** The command C-h C (describe-current-coding-system) displays
the coding systems currently selected for various purposes, plus
related information.

*** The command C-h h (view-hello-file) displays a file called
HELLO, which has examples of text in many languages, using various
scripts.

*** The command C-h C-l (describe-language-support) displays
information about the support for a particular language.
You specify the language as an argument.

*** The mode line now starts with a letter that identifies the
coding system used in the visited file.  It is followed by a colon.

When you are using a character-only terminal (not a window
system), three additional characters appear before the colon.
They describe (respectively) the coding system for the visited
file, the coding system for keyboard input, and the coding system
for terminal output.

A dash indicates the default state of affairs: no code conversion,
(except CRLF => newline if appropriate).  `=' means no conversion
whatsoever.  Nontrivial code conversions are represented by various
letters--for example, X refers to ISO Latin-1.

*** The new variable rmail-file-coding-system specifies the code
conversion to use for RMAIL files.  The default value is nil.

When you read mail with Rmail, each message is decoded automatically
into Emacs' internal format.  This has nothing to do with
rmail-file-coding-system.  That variable controls reading and writing
Rmail files themselves.

*** The new variable sendmail-coding-system specifies the code
conversion for outgoing mail.  The default value is nil.

*** The command C-h t (help-with-tutorial) accepts a prefix argument
to specify the language for the tutorial file.  Currently, English,
Japanese, Korean and Thai are supported.  We welcome additional
translations.

...

