Message-ID: <3ABF5245.A80015DD@csi.com>
Date: Mon, 26 Mar 2001 09:29:25 -0500
From: John Colagioia <JColagioia@csi.com>
Organization: No Conspiracy Here...
X-Mailer: Mozilla 4.76 [en] (Win98; U)
X-Accept-Language: en,fr,ru,es,it,ga,de,ja,gd,eu
MIME-Version: 1.0
Newsgroups: rec.arts.int-fiction
Subject: Re: [glk] Latin-1 and other languages
References: <87y9u0crq0.fsf@zaraza.fep.ru> <3ab8b684.1293731@news.bright.net> <Xns906C9CC2D219Ehuftis@127.0.0.1>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
NNTP-Posting-Host: 208.34.37.104
X-Original-NNTP-Posting-Host: 208.34.37.104
X-Trace: excalibur.gbmtech.net 985616638 208.34.37.104 (26 Mar 2001 09:23:58 EST)
Lines: 47
X-Authenticated-User: jnc
Path: news.duke.edu!newsgate.duke.edu!nntp-out.monmouth.com!newspeer.monmouth.com!news.maxwell.syr.edu!sjc-peer.news.verio.net!news.verio.net!uunet!lax.uu.net!chi.uu.net!arb.uu.net!nyc.uu.net!excalibur.gbmtech.net
Xref: news.duke.edu rec.arts.int-fiction:84593

Sergei Barbarash wrote:

> Okay, forget the technical details above, I could be right, or I could be
> wrong; some will say it's there for historical reasons, some will insist
> that this was a valid design decision. But notice: the same thing happens
> with Glk, which is relatively young, and which is meant to be a
> write-once-run-everywhere kind of solution. Indeed, one can easily violate
> the official spec and write his own implementation as he pleases. What
> could be done to make the _official_ Glk to become more i18n-friendly? This
> I don't know.
> So, with the wonderful IF archive at my disposal, and having some coding
> skills, I can easily create an appropriate set of tools for nearly any
> similar task. I'm grateful for that, I'm doing that, and I'm enjoying this
> immensely. But, what could be done to collaborate with IF community
> efforts, and not to create a completely independent branch of development?
>
> Any suggestions are very much welcome.
>

Mind you, I have a personal hatred for temporary solutions (because if
they're JUST useful enough, people use them, and then they have to be
supported forever and ever and ever and...), but it occurs to me that a
potential temporary (or even permanent, really, depending on how
authors feel) solution might be to take the HTML approach to
Unicode--that is, have the compiler convert any "non-Z" characters into
the &somenumber; format, which intelligent interpreters can optionally
render in the appropriate way.

The good news is that the bulk of the existing work (Latinate-language
games) remains unchanged.  The better news is that a mostly Latinate
game with a few non-Latin characters is only slightly larger.

The bad news, of course, is that a game in Chinese (or Korean, or
Arabic, or Hebrew, or...well, you get the idea) is going to
be...uhm...let's see...somewhere in the neighborhood of seven times
larger, in terms of text storage.  I'd imagine that most Eastern
European languages (and the "Cyrillic families") would similarly
average about two to three times larger, since their alphabets are very
similar to the Latin.  However, that might actually be acceptable in
this day and age.  I don't know.

I can tell you, though, that it was an acceptable approach on a project
I worked on for a Korean customer, and that I used a simple, 20-line C
program to convert from Unicode to HTML (which I'll gladly donate to
anybody who cares and doesn't want to write it themselves).


