From xemacs-m  Tue Aug  5 03:20:35 1997
Received: from turnbull.sk.tsukuba.ac.jp (root@turnbull.sk.tsukuba.ac.jp [130.158.99.4])
	by xemacs.org (8.8.5/8.8.5) with SMTP id DAA12229
	for <xemacs-beta@xemacs.org>; Tue, 5 Aug 1997 03:19:36 -0500 (CDT)
Received: from turnbull.sk.tsukuba.ac.jp(really [127.0.0.1]) by turnbull.sk.tsukuba.ac.jp
	via smtpd with esmtp
	id <m0wveb2-00006oC@turnbull.sk.tsukuba.ac.jp>
	for <xemacs-beta@xemacs.org>; Tue, 5 Aug 1997 17:04:00 +0900 (JST)
	(Smail-3.2 1996-Jul-4 #3 built 1997-Jun-24)
Message-Id: <m0wveb2-00006oC@turnbull.sk.tsukuba.ac.jp>
To: =?iso-8859-1?Q?Bj=F8rn?= Stabell <bjoern@stabell.priv.no>
cc: mule@etl.go.jp, xemacs-beta@xemacs.org, debian-i18n@lists.debian.org
Subject: Re: Frustration: Universally available input methods 
In-reply-to: Your message of "Mon, 04 Aug 1997 17:55:41 +0200."
             <199708041555.RAA01257@dindin.sima.sintef.no> 
Date: Tue, 05 Aug 1997 17:04:00 +0900
From: "Stephen J. Turnbull" <turnbull@turnbull.sk.tsukuba.ac.jp>

>>>>> "Bjoern" == =?iso-8859-1?Q?Bj=F8rn?= Stabell <iso-8859-1> writes:

    Bjoern> Hi, This e-mail is crossposted to the Mule, XEmacs-beta,
    Bjoern> and debian-i18n lists in a hope to gather the best
    Bjoern> comments from each... :)

I don't think this is appropriate for XEmacs-beta; almost all XEmacs-
related discussion will fit into Mule except for XIM which bypasses
most of the Mule "Library of Emacs Input Methods", and XIM flunks your
windowing-system-specific test.  I'm not going to post any more in
this thread in xemacs-beta after this message.  I guess I'll have to
join debian-i18n and mule....  My apologies to the denizens of those
lists for jumping in.

    Bjoern> Asian languages tend to require more sophisticated input
    Bjoern> methods (IM) that can interact with the user and help her
    Bjoern> select the right character, and some even consider the
    Bjoern> context in which the character will be put.

Check out the Mule support for Devangari.  The glyphs jump around like 
crazy.

    Bjoern> There are really a lot of input methods for Asian
    Bjoern> languages, and they are quite complex. ...

And so are Emacs modes.  Have you been following the discussion of
where to bind `find-function'? :-)

    Bjoern> It's rather needless and backwards that, e.g., the Linux
    Bjoern> console and X need to use two different ways of mapping

This is a completely different issue from the "input methods"
question.  As long as hardware differs, there's going to need to be a
"physical to logical" layer.  The Linux console cannot assume it can
read the keyboard; the console might very well be a serial port.

    Bjoern> keys.  We're never going to resolve the input method
    Bjoern> problem if each application have to invent their own way
    Bjoern> of doing it.

Be careful:  Linux and X are not "applications"....

    Bjoern> What I, and probably a lots of others, am dreaming of is
    Bjoern> this functionality:

    Bjoern> 	Be able to input and display any character, using a
    Bjoern> language specific input system, as Unicode in any text
    Bjoern> widget (terminal, editor, etc).

Current implementations of Unicode (UCS-2) can't satisfy this, as the
Asian hanzi/kanji/hanja alone number in the 100,000s.  I'm not a
native Japanese speaker and don't speak any other Asian language, but
to compress Asian languages into a 16-bit space requires "Han
unification", and even I can see that this is at best 99.44%
satisfactory.  For specialists (comparative ancient Chinese
literature, eg), it may not be good enough.  This is absurdly picky,
of course, but if the system is going to be truly "universal" it needs
to handle those needs.

Other kinds of "specialists" (eg, professional stenographers) may have 
other needs.  Look up "TCode" (if you can't find it on the Web, Ken
Lunde's "Japanese Information Processing" or whatever its successor's
name is probably has a section on it).  Much more efficient in terms
of keystrokes than Wnn or Quail, much shallower learning curve.

    Bjoern> Does anyone know what the proper long-term solution is?
    Bjoern> Is anyone working on making input methods available in
    Bjoern> all text windows, and not just per application or per
    Bjoern> windowing system?  Is there a common movement, or at least
    Bjoern> a common movement in the non-commercial part of the world
    Bjoern> to increase the availability and consistency of input
    Bjoern> methods?

There are tendencies in that direction.  However, the relatively
recent withdrawal of Wnn from the freely available arena (with Wnn6, a 
commercial product) shows that the opposite tendency is also strong.

Also, considering the fact that Microsoft doesn't sell a multilingual
version of Windows95, not here in Japan anyway, (although I guess
WinNT could be made so, since it's POSIX), I rather don't think your
dream of "windowing system independence" is feasible.  My best guess
is that Microsoft intends to sell you one copy of Windows for every
non-English language you speak.  If they thought they could sell a
version each for Cambridge Massachusetts and Cambridge England, they
would.  (There's a rumor going around my local Linux Abusers' Group
that the difference between "NT Server" and "NT Workstation" is a two
line patch to the registry costing about US$800 at list prices....)
You're going to have to live with X/Motif, close relatives (NextStep),
and possibly MacOS.  But that's a minority of screens, I would
suppose.

What we have now (to my unreliable knowledge):

    * Mule:  provides Quail natively; with shell-mode, this is
             everything you want both under X and on a terminal,
             except that it won't fit into the BIOS ROM :->
    * XIM:   provides a standard means for an X Window to receive
             character input in an arbitrary language and character
             set.  However, there is no front-end implementation I
             know of that handles several languages.
    * Wnn:   provides a standard means to turn a stream of Wnn
             protocol (more or less a character stream) first into
             phonetic symbols if necessary, then into ideographs
             as required.

However, the size of Mule (no Mule without Emacs...) makes it a less
than general solution (and some misguided people just hate Emacs
anyway).  LEIM doesn't seem to have a developer's guide, and doesn't
provide access to Wnn, Canna, or XIM when available.  XIM is designed
for an X-like environment.  XIM could probably be generalized, but I
don't think that's what you want.  XIM does not provide a user
interface at all; it is a protocol that allows an input manager to
communicate with an application.  Each IM provides its own UI.  The
provisions for multilingual input are primitive and I've never seen
them implemented.  You seem to want a unified user interface, perhaps
with a lot of the supporting code shared.  Wnn provides that, but it's
specialized to Asian languages (I don't think that heavily accented
European languages are supported, for example, although I don't know),
Wnn4 does not easily provide for switching among them, and Wnn6 is
proprietary.

The fact is that user interfaces are not going to be completely
unified anyway, not for a long time.  TCode is just one extreme
example, but Japanese provides many more.  For example, my Sharp
Zaurus (a PIM) provides a menu of 8 different input methods for
Japanese, including 2 flavors of handwriting recognition, plus 4
subsidiary menus of special graphic characters, punctuation in half
and full width, and a numeric keypad.  I use 4 of the 8 daily, another
of those occasionally, and 3 of the 4 subsidiary menus daily.  Plus
the dictionaries can be used in a crude but transparent way to
keystroke English and enter Japanese into the text (with only about
50% overhead, negative overhead if you assume I'd have to look up the
word in any case :-).

As for sharing the supporting code, it's not clear how much of that
can be done.  The various user interfaces I've mentioned above are
very different; similarly, handling different hardware interfaces
(pen-based, direct keyboard, serial line, voice recognition,
lip-reading, sign language :-P ).  Some code can be shared for similar
tasks, eg, dictionary lookup, but even then it's not clear that the
dictionary for handwriting-based input would be the same as the one
for keystroke-based input.  This might or might not mean coding them
differently; you could probably use the same dictionary with different
attributes, but that might not be the most effective way to do the
job.

The best should not be the enemy of the good, of course.  What should
be feasible is implementation of a Unicode-based set of tools for
multilingual input and output.  However, this is still going to
require some kind of localization, since the same things typed
phonetically in pinyin and romaji should be expected to produce wildly
different ideograph output (I'm sure there are a fair number of such
key sequences).

I haven't looked at the Plan 9 stuff yet, but I think it's available
as Debian packages ("9term", etc).  This I would guess meets most of
your requirements, at least on the Unicode side.  But you gotta be
running X, I guess, and I don't think the "input manager" problem is
solved at all.

>>>>> Meishing Wang writes:

    MW> How about Java based Asian languages' input methods?

Java will help the multi-platform issue a little, but only if you're
able to work in the AWT context.  C/C++ plus curses really provide
most of the portability you need anyway at the terminal level.

A big problem is the dictionaries and the ancillary algorithms for
learning.  Many of these are proprietary, and Java won't help with
that.

>>>>> "David" == David Bakhash <cadet@MIT.EDU> writes:

    David> do you think that, as an input method, the XEmacs/strokes
    David> based method is satisfactory?  (imagining that OCR will
    David> eventually be implemented)?
Not a chance for general use, although it satisfies the universality
criterion.  If they take way your QWERTY and give you a Dvorak, are
you going to memorize Dvorak or type Latin characters using strokes?

OCR is unnecessary for this, don't you think?  Just more big
dictionaries of (abstract) strokes.  However, you will have to be
careful about it.  In Japanese, for example, a square has only three
sides (the upper and right sides of kuchi, "mouth", are joined into
one stroke).  I guess you need a dictionary for those who have
served their sentences in Japanese schools, and a different one for
those who think a square has four sides.

It's fun to think about, though.

Steve

-- 
                            Stephen J. Turnbull
Institute of Policy and Planning Sciences                    Yaseppochi-Gumi
University of Tsukuba                      http://turnbull.sk.tsukuba.ac.jp/
Tel: +81 (298) 53-5091;  Fax: 55-3849              turnbull@sk.tsukuba.ac.jp

