This is an overview of the bootstrap process for Perl6 rules (aka
regular expressions).


[In the beginning...]

* PCRE - "Perl Compatible Regular Expressions"

Provided initial pugs regex support.  Currently used when you specify
:perl5.  Fast.  Available on all platforms.  Limited to a subset of perl5
syntax and semantics.  No support for perl6 style.  Mature.  Some minor
bugs.  Significant additional development is not expected.

[Then parrot was linked to pugs...]

* PGE

Intended to become the primary engine on parrot-based perl6.  Requires a
parrot, internal or external.  Currently used when you don't specify
:perl5.  Currently supports a limited subset of perl6 regexes.  PGE
consists of a PIR (parrot assembly code) parser (which builds a tree), and
an emitter (which crawls the tree and generates a PIR implementation of the
matcher).  The parser is a throwaway, intended to be replaced by a
perl6-based parser.  The parser is currently a bottleneck. [Is primary
development taking place in parrot or pugs?]  [FIXME - this section should
be checked by someone familiar with PGE.]

* P6CRE

Intended to serve as a temporary perl6 regexp parser to permit PGE codegen
to zip ahead.  Translates perl6 regexps into a combination of PCRE perl5
regexps and capture "repackers" which create a perl6-style match tree.  A
match tree can serve as a parse tree.  The match tree for a perl6-rules
specification of perl6-rules can be walked, creating an P6CRE rx which can
generate the same match tree.  Limitations: NOT CHECKED IN; unfinished; glacial
(especially rx generation); exposes PCRE semantics (no subrule
left-recursion, odd constraints on right-recursion, etc); full perl6 regex
support is not intended (no embedded code, ...).

[Then perl5 was linked to pugs...]

* Regexp::Parser

Also intended serve as a temporary perl6 regexp parser to permit PGE
codegen to zip ahead.  And ... probably more?  It's a version of CPAN's
Regexp::Parser updated to understand perl6 syntax.  Written perl5.
Generates perl5 object-based parse tree.  Limitations: NOT CHECKED IN; not
quite finished(?); parser only.  [FIXME - this section should be reviewed
by someone familiar with Regexp::Parser.]

* Unnamed per5-based engine.

With perl5 now linkable, with callbacks (which PGE and PCRE currently
lack), we could use the perl5 regex engine with P6CRE-like regexp
transliteration and capture repacking.  This could provide fast perl6
regexps, with good backtracking, and callbacks (external aliases and
partial embedded code support).  Depending on the current state of PGE,
this may or may not help folk work on large grammars (perl6, ruby, etc).
The code generator should probably be written in perl5 for speed.  Either
Regexp::Parser or P6CRE might be used to parse.

Risks: may be obsoleted by PGE - with parsing unstuck, PGE may begin to
develop rapidly, providing basic regexs (risk high), and parrot codegen for
pugs may happen rapidly, providing callbacks (risk low?high?; alternate
mechanism?); platform availability of perl5 linkage (???); friction in the
linkage (???); complexity (most of P6CRE development time was spent
fighting pugs and perl6, rather than the domain, but still, P6CRE is
unfinished, and the domain has lots of details to look after); may be
obsoleted by Parsec combinators; depends on experimental regexp features
which are apparently... unevenly supported across perl5 versions.

Limitations: paper airplane - just an idea.

A possiblity: do a very rough and initial version, and *check it in* (this
being written by the P6CRE author who hasn't yet;).  Then depending on how
easy it is, and how much PGE is or isn't a bottleneck, we can flesh it out.

* Unnamed - compiling rules into Parsec combinators

See hw2005.txt.
