^TI^'
^TITW^regexp 
^TIS^regExpr string mainVar subVar1 subVar2...
^TI^' Command
^P^
The 
^B^regexp
 command is used to perform 
^I^regular expression pattern matching
 on strings. The 
^S^regExpr
 argument is a string which is treated as a 
^I^regular expression
, meaning that certain characters within it have special meanings. The 
exact rules for regular expressions are discussed later. The 
^S^string
 argument is the string on which pattern matching is being performed--
no characters in 
^S^string
 have special meanings. If present, the 
^S^mainVar
, 
^S^subVar1
, etc. arguments, are the names of variables which 
^B^regexp
 will set to indicate which parts of 
^S^string
 matched certain portions of 
^S^regExpr
.  
^B^regexp
 attempts to find some substring of 
^S^string
 which matches 
^S^regExpr
; if it finds such a substring, the match succeeds, and 
^B^regexp
 returns 1; otherwise, the match fails, and 
^B^regexp
 returns 0.
^P^
^TI^Regular Expressions
^P^
Regular expressions are strings which may be used to test for, and extract 
information from, other strings of a certain format. The simplest regular 
expressions are just strings of "ordinary" characters, which only 
match themselves. For example, the command:
^P^
^T^
^TW^regexp "aabc" $test
^P^
will return true only if the variable 
^B^test
 is set to a string 
^I^containing
 "
^B^aabc
"; otherwise, it will return false.
^P^
The power of regular expressions derives from the use of 
^I^metacharacters
, which are characters that have special meaning, and are used to 
build 
^I^patterns
 from simpler regular expression. The simplest example is the metacharacter 
^B^*
, which is used to match 0 or more occurences of the immediately preceding 
regular expression. We describe this concisely by saying that 
^I^R
^B^*
 will match any number of occurences of the regular expression 
^I^R
.
^P^
The following metacharcters may be used:
^P^
^B^.
 : The dot metacharacter matches a single occurence of any character.
^P^
^I^R
^B^*
 : The asterisk metacharacter matches 0 or more occurences of the regular expression 
^I^R
.
^P^
^I^R
^B^+
 : The plus metacharacter matches 1 or more occurences of the regular expression 
^I^R
.
^P^
^I^R
^B^?
 : The question mark matches 0 or 1 occurences of the regular expression 
^I^R
.
^P^
^B^^
 : The "hat" metacharacter, if it occurs as the 
^I^first
 match character in 
^S^regExpr
, matches the beginning of 
^S^string
. That is, it forces any match to 
^I^not
 leave out any leading part of 
^B^string
. If it doesn't occur as the first match character of the entire pattern,
 then it is treated as an ordinary character, i.e. it will only match with 
another hat character.
^P^
^B^$
 : The dollar metacharacter is similar to 
^B^^
, except that it has a special meaning only if it is the 
^I^last
 match character in 
^S^regExpr
, in which cases it matches the 
^I^end
 of 
^S^string
.
^P^
^B^\
^I^C
 : The backslash metacharacter can be used to force any character 
^I^C
 to be treated as an "ordinary" character. For example, the regular 
expression "
^B^\*
" matches an occurence of an asterisk character, since the backslash takes 
away the asterisk's "specialness". However, the regular expression "
^B^\\*
" matches any number of backslashes--since the first backslash takes away 
the second backslash's specialness, meaning it has no effect on the 
asterisk.
^P^
^B^[
^I^chars
^B^]
 : The braces metacharacters define a 
^I^range
; a range will match any single character which is contained in the braces. 
For example, the regular expression 
^B^[xyz]
 will match either an "x", or a "y", or a "z". The dash character may be used 
within a range to denote an entire subsequence of characters, i.e. the 
regular expression 
^B^[a-z]
 will match any single lowercase character. If a "^" is the first character 
of a range, then it means match anything 
^I^not
 contained in the braces; for example, 
^B^[^0-9]
 will match any character that is 
^I^not
 a digit. To include a "]" in a range, make it the first character (after 
a possible "^"); 
^B^[^])}]
 matches anything that isn't a close brace/bracket/parenthesis. To include 
a "-" in a range, make it the last (or first) character; 
^B^[+*/-]
 will match any of the four standard arithmetic operators.
^P^
^B^(
^I^R
^B^)
 : The parentheses metacharacters do not have any effect on whether a 
match succeedes or not. However, if a match is successful, then 
^B^regexp
 will remember which parts of 
^S^string
 matched parenthesized sections of 
^S^regExpr
. It works like this; every matching pair of parentheses in 
^S^regExpr
 is assigned a number 
^I^n
, where the left parenthesis of the pair is the 
^I^nth
 parenthesis from the start of 
^S^regExpr
, counting the first left parenthesis as 1. Whenever 
^B^regexp
 finds a match, then that portion of 
^S^string
 which matched the part of 
^S^regExpr
 contained in the 
^I^n
th pair of parentheses, is stored in the variable named by 
^S^subVarn
. 
^I^n
 may be any number from 1 to 9.
^P^
^I^R1
^B^|
^I^R2
^B^|
^I^R3
^B^|
... : The alternation metacharacter "
^B^|
" creates a regular expression which matches a string if any of 
^I^R1
, 
^I^R2
, 
^I^R3
, etc., match the string. For example, the regular expression 
^B^[a-z]+|[A-Z]+
 matches nonempty strings which are either all 
lowercase or all uppercase. You may have to use parentheses to group 
patterns for use with the alternation metacharacter (even if you don't 
want to remember that substring); experiment to find out what works and 
what doesn't.
^P^
^I^R1R2...
 : Concatenation can be used for complex regular expressions just as for 
simple characters, to form a new regular expression which matches strings 
where the first part of the string matches 
^I^R1
, the second part of the string matches 
^I^R2
, and so on. For example, the regular expression 
^B^[a-z][0-9]+
 will match only strings which start with a single lowercase letter, and end with 1 or more digits.
^P^
^TI^Choosing Among Alternative Matches
^P^
In general there may be more than one way to match a regular 
expression to an input string. For example, consider the command
^P^
^T^
^TW^regexp (a*)b* aabaaabb x y
^P^
Considering only the rules given so far, 
^B^x
 and 
^B^y
 could end up with the values 
^B^aabb
 and 
^B^aa
, 
^B^aaab
 and 
^B^aaa
, 
^B^ab
 and 
^B^a
, or any of several other combinations. To resolve this potential 
ambiguity, 
^B^regexp
 chooses among alternatives using the rule ``first then longest''. 
In other words, it considers the possible matches in order working from 
left to right across the input string and the pattern, and it attempts 
to match longer pieces of the input string before shorter 
ones. More specifically, the following rules apply in 
decreasing order of priority:
^P^
[1] If a regular expression could match two different 
parts of an input string then it will match the one 
that begins earliest.
^P^
[2] If a regular expression contains 
^B^|
 operators then 
the leftmost matching sub-expression is chosen.
^P^
[3] In 
^B^*
, 
^B^+
, and 
^B^?
 constructs, longer matches are chosen in preference to shorter ones.
^P^
[4] In sequences of expression components the components are considered 
from left to right.
^P^
In the example from above, 
^B^(a*)b*
 matches 
^B^aab
: the 
^B^(a*)
 portion of the pattern is matched first and it consumes 
the leading 
^B^aa
; then the 
^B^b*
 portion of the pattern con0sumes the next 
^B^b
. Or, consider the following example:
^P^
^T^
^TW^regexp (ab|a)(b*)c abc x y z
^P^
After this command 
^B^x
 will be 
^B^abc
, 
^B^y
 will be 
^B^ab
, and 
^B^z
 will be an empty string. Rule 4 specifies that 
^B^(ab|a)
 gets first shot at the input string and Rule 2 specifies that 
the 
^B^ab
 sub-expression is checked before the 
^B^a
 sub-expression. Thus the 
^B^b
 has already been claimed before the 
^B^(b*)
 component is checked and 
^B^(b*)
 must match an empty string.







