Newsgroups: rec.arts.int-fiction
Path: nntp.gmd.de!news.ruhr-uni-bochum.de!news-koe1.dfn.de!main.Germany.EU.net!Germany.EU.net!EU.net!howland.erols.net!feed1.news.erols.com!cwix!uunet!in2.uu.net!192.35.48.11!hearst.acc.Virginia.EDU!murdoch!not-for-mail
From: nr@mamba.cs.Virginia.EDU (Norman Ramsey)
Subject: Re: Questionable Languages & Compilers
X-Nntp-Posting-Host: mamba-fo.cs.virginia.edu
Message-ID: <59uj4v$i2@mamba.cs.Virginia.EDU>
Sender: usenet@murdoch.acc.Virginia.EDU
Organization: University of Virginia
References: <32b72f97@beachyhd.demon.co.uk> <59h6sq$6nc@life.ai.mit.edu> <ant221443d07M+4%@gnelson.demon.co.uk> <59k5a6$597@life.ai.mit.edu>
Date: Thu, 26 Dec 1996 19:19:59 GMT
Lines: 49

In article <59k5a6$597@life.ai.mit.edu>, David Baggett <dmb@ai.mit.edu> wrote:
>In article <ant221443d07M+4%@gnelson.demon.co.uk>,
>Graham Nelson  <graham@gnelson.demon.co.uk> wrote:
>
>>I wonder if you appreciate how poorly the parsers produced by
>>tools such as "lex" and "yacc" perform?  They generally work
>>(provided the language design makes compromises so as to be
>>expressible to "lex" and "yacc" -- which would worsen Inform
>>in my view) but impose speed penalties of up to a factor of 2.
>
>The idea that it's reasonable to use a tremendously error-prone method to
>solve a problem for which trivial tools exists, simply to save time on
>lexical analysis -- a small proportion of the entire compilation task -- is
>totally misguided, in my opinion.
>
>Suppose that the best machine-generated lexical analyzer *were* twice as
>slow as your hand-coded one.  How much longer should it take to compile
>code?  Fractionally longer, unless your compiler's doing basically no work
>to generate and optimize code.

Actually, measurements show that in many compilers, lexical analyis is
the single most time-consuming phase, primarily because it is the only
one in which each input character is touched.  In a compiler like
Inform, in which global optimization is (probably) unnecessary, there
is little static semantic checking, and the machine target is
essentially quads, I find it unsurprising that the cost of lexical
analysis is significant.

Hanson and Fraser's lcc compiler is twice as fast as gcc in part
becausae of clever lexical analysis.  People writing new code should
look into the re2c tool (described in LOPLAS around 1994, I think),
which embodies the work of a very clever person who figured out how to
apply Hanson and Fraser's tricks automatically.  I've got to look up
the name, beause I keep recommending the work, and the people involved
should get credit.  Waite's paper in SP&E entitled `The Cost of
Lexical Analysis' is also a good read, adn I think there are a couple
of SP&E papers on parsing as well.

It's my opinion that *compilation* speed (as opposed to speed of the
generated code) doesn't get enough attention from the
programming-language community.  

N

P.S. LOPLAS --> ACM Letters on Programming Languages and Systems
     SP&E --> Software---Practice & Experience
-- 
Norman Ramsey
http://www.cs.virginia.edu/~nr
