Newsgroups: rec.arts.int-fiction
Path: news.duke.edu!newsgate.duke.edu!nntp-out.monmouth.com!newspeer.monmouth.com!news.maxwell.syr.edu!news.he.net!pulsar.dimensional.com!dimensional.com!coop.net!world!buzzard
From: buzzard@world.std.com (Sean T Barrett)
Subject: Re: XML Archive index
Message-ID: <GIL12F.Awx@world.std.com>
Date: Fri, 24 Aug 2001 17:11:03 GMT
References: <9m3l0r$gqs$1@news.panix.com> <1f688a41.0108231842.7e5df8f1@posting.google.com> <9m5qc3$c6a$2@news.panix.com>
Organization: The World Public Access UNIX, Brookline, MA
Lines: 35
Xref: news.duke.edu rec.arts.int-fiction:91570

Andrew Plotkin  <erkyrath@eblong.com> wrote:
>> An Invalid character was found in text content. Line 6646, Position 35
>
>Oh, non-ASCII characters. Hm. Okay, I'll work on that.
>
>Does XML use the same set of named entities (&eacute; and so on) as
>HTML does? Grn.

I imagine the answer is somewhere within the 170K
XML specification: http://www.w3.org/TR/2000/REC-xml-20001006
but I don't see it. The section "characters" says it uses
the character set ISO/IEC 10656-1:2000, but does not provide
links to any online info on it; however, the description of "Char"
seems to imply that XML allows any Unicode character in valid
input. Section 2.4 defines "text" as accepting essentially
any input character--but perhaps "text content" above means a
different sense of "text"?

Hopefully someone who actually understands XML can explain.

As to your final question, the character entity section
only lists five default named entities: lt,gt,amp,apos,quot
so I think you'll have to go numeric.

I will take the opportunity to reiterate: I think XML is way
too heavyweight and way too non-text-friendly for most applications.
One could write a non-SGML-compatible specification for an XML-like
language which uses <> tags for structure, only requires "lt,gt,amp"
escaping, omits DTDs and the entity macro stuff, and it would probably
be about 20 lines long, 1 KB, and get you 90% of the power of XML--and
99.9% of the power used by applications like HTML (yay for design
by committee).

SeanB
The one I wrote at my last job was called "MiniML"
