@part[TEXT, Root "TMAN.MSS"]    @Comment{-*-System:TMAN-*-}
@chap[Characters and strings]


@tau[] has a special data type for representing @i[characters].
Characters are objects which may be stored in strings and communicated
between the @tau[] system and external media such as files and
terminals.  Most characters represent printed graphics
such as letters, digits, and punctuation.

@label[character syntax]        @Comment{ref: syntax chapter}
The external syntax @tc[#\]@i[x] is used for characters.
@i[x] may either be a single character or the @qu[name] of a character.
Valid character names include @tc[SPACE], @tc[TAB], @tc[FORM],
and @tc[NEWLINE].  For example:
@Begin[Display]
@Tabdivide(5)
@tc[#\b]       @\The alphabetic character lower-case @i[b]
@tc[#\7]       @\The digit 7
@tc[#\;]       @\The special character @i[semicolon]
@tc[#\tab]     @\The tab character
@tc[#\NEWLINE] @\The new-line character
@End[Display]

Some graphic characters are also readable by name:
  @begin[ProgramExample]
#\LEFT-PAREN      @ce[]   #\(
#\RIGHT-PAREN     @ce[]   #\)
#\LEFT-BRACKET    @ce[]   #\[
#\RIGHT-BRACKET   @ce[]   #\]
#\LEFT-BRACE      @ce[]   #\{
#\RIGHT-BRACE     @ce[]   #\}
#\BACKSLASH       @ce[]   #\\
#\QUOTE           @ce[]   #\'
#\BACKQUOTE       @ce[]   #\`
#\DOUBLEQUOTE     @ce[]   #\"
#\COMMA           @ce[]   #\,
#\SEMICOLON       @ce[]   #\;
  @end[ProgramExample]

The syntax @tc<#[Char @i[n]]> may also be used for characters,
where @i[n] is the ASCII code for the character (see section
@ref[CHAR->ASCII]).  This is not preferred, however, since it is
less readable and less abstract than the @tc[#\] syntax.

  @begin[ProgramExample]
#[Char 65]  @ce[]  #\A
  @end[ProgramExample]

Unlike numbers, characters @i[are] uniquely instantiated.
There is only one object which represents a given graphic or other character.
  @begin[ProgramExample]
(EQ? #\x #\x)  @yl[]  @r[true]
  @end[ProgramExample]

Characters and strings are self-evaluating.  There is no need to quote them
to use them as constants in programs.

Strings are sequences of characters.  @dc{ Elaborate? }
Strings actually consist of two
distinct components, a @i[header] and a @i[text], which may be
manipulated independently of each other.  If one doesn't use the
routines in section @ref[string header section]@dc{ ???}, one need not
even be aware of this fact, and can treat strings as if they are similar
to lists of characters.

Strings are notated simply by enclosing the actual sequence of characters
within double quote characters.  For example,
  @begin[ProgramExample]
"Horse"
  @end[ProgramExample]
notates a five-character string consisting of the characters
@tc[#\H], @tc[#\o], @tc[#\r], @tc[#\s], and @tc[#\e].
The escape character (also known as backslash: @tc[\]) may be used
to include a double-quote or a backslash within a string:
  @begin[ProgramExample]
"The \"second\" word in this string is enclosed in double-quotes."
"\\ This string begins with one backslash."
  @end[ProgramExample]
There is no standard way to notate a string which contains non-graphic
characters (e.g. control characters).

Strings are not uniquely instantiated; e.g.
  @begin[ProgramExample]
(EQ? "Eland" "Eland")
  @end[ProgramExample]
may or may not yield true, depending on the implementation.

@section[Predicates]

@info[NOTES="Type predicate"]
@desc[(CHAR? @i[object]) @yl[] @i[boolean]]
Returns true if @i[object] is a character.
  @begin[ProgramExample]
(CHAR? #\X)  @ev[]  @r[true]
  @end[ProgramExample]
@EndDesc[CHAR?]

@info[NOTES="Type predicate"]
@desc[(STRING? @i[object]) @yl[] @i[boolean]]
Returns true if @i[object] is a string.
  @begin[ProgramExample]
(STRING? "Tapir.")  @ev[]  @r[true]
  @end[ProgramExample]
@EndDesc[STRING?]

@desc[(GRAPHIC? @i[character]) @yl[] @i[boolean]]
Returns true if @i[character] is either the space character (@tc[#\SPACE]) or
it corresponds to a printed graphic such as a letter, digit, or punctuation 
mark.
  @begin[ProgramExample]
(GRAPHIC? #\X)        @ev[]  @r[true]
(GRAPHIC? #\NEWLINE)  @ev[]  @r[false]
  @end[ProgramExample]
@EndDesc[GRAPHIC?]

@desc[(WHITESPACE? @i[character]) @yl[] @i[boolean]]
Returns true if @i[character] is a whitespace character (blank, tab,
newline, carriage return, line feed, or form feed).
@index[Whitespace]
  @begin[ProgramExample]
(WHITESPACE? #\X)        @ev[]  @r[false]
(WHITESPACE? #\NEWLINE)  @ev[]  @r[true]
  @end[ProgramExample]
@EndDesc[WHITESPACE?]

@desc[(ALPHABETIC? @i[character]) @yl[] @i[boolean]]
Returns true if @i[character] is an alphabetic (upper or lower case)
character.
  @begin[ProgramExample]
(ALPHABETIC? #\y)  @ev[]  @r[true]
(ALPHABETIC? #\7)  @ev[]  @r[false]
  @end[ProgramExample]
@EndDesc[ALPHABETIC?]

@desc[(UPPERCASE? @i[character]) @yl[] @i[boolean]]
Returns true if @i[character] is an upper-case letter.
  @begin[ProgramExample]
(UPPERCASE? #\y)      @ev[]  @r[false]
(UPPERCASE? #\Y)      @ev[]  @r[true]
(UPPERCASE? #\COMMA)  @ev[]  @r[false]
  @end[ProgramExample]
@EndDesc[UPPERCASE?]

@desc[(LOWERCASE? @i[character]) @yl[] @i[boolean]]
Returns true if @i[character] is a lower-case letter.
  @begin[ProgramExample]
(LOWERCASE? #\y)      @ev[]  @r[true]
(LOWERCASE? #\Y)      @ev[]  @r[false]
(LOWERCASE? #\COMMA)  @ev[]  @r[false]
  @end[ProgramExample]
@EndDesc[LOWERCASE?]

@desc[(DIGIT? @i[character radix]) @yl[] @i[boolean]]
Returns true if @i[character] is a digit with respect to the given
@i[radix].
@begin[ProgramExample]
(DIGIT? #\5 10)  @ev[]  @r[true]
(DIGIT? #\a 10)  @ev[]  @r[false]
(DIGIT? #\a 16)  @ev[]  @r[true]
@end[ProgramExample]
@EndDesc[DIGIT?]

@section[Comparison]

@descN[
F1="(CHAR= @i[char1 char2]) @yl[] @i[boolean]",  FN1="CHAR=",  NL1,
F2="(CHAR< @i[char1 char2]) @yl[] @i[boolean]",  FN2="CHAR<",  NL2,
F3="(CHAR> @i[char1 char2]) @yl[] @i[boolean]",  FN3="CHAR>",  NL3,
F4="(CHARN= @i[char1 char2]) @yl[] @i[boolean]", FN4="CHARN=", NL4,
F5="(CHAR>= @i[char1 char2]) @yl[] @i[boolean]", FN5="CHAR>=", NL5,
F6="(CHAR<= @i[char1 char2]) @yl[] @i[boolean]", FN6="CHAR<=", NL6
]
@dc{ Horrible explanation. }
Six comparison predicates are defined for characters.
@tc[CHAR=] and @tc[CHARN=] are defined for all characters.
The others are defined only when the arguments are both
upper-case letters, or both lower-case letters, or both digits.
@EndDescN[]

@desc[(STRING-EQUAL? @i[string1 string2]) @yl[] @i[boolean]]
Returns true if the two strings have the same length and characters.
@EndDesc[STRING-EQUAL?]

@section[String constructors]
@desc[(MAKE-STRING @i[length]) @yl[] @i[string]]
Makes a string of null characters whose length is @i[length].
@EndDesc[MAKE-STRING]

@desc[(STRING-APPEND . @i[strings]) @yl[] @i[string]]
Returns a new string which is the concatenation of @i[strings].
  @begin[ProgramExample]
(STRING-APPEND "llama" " and " "alpaca")  @ev[]  "llama and alpaca"
  @end[ProgramExample]
@EndDesc[STRING-APPEND]

@desc[(COPY-STRING @i[string]) @yl[] @i[string]]
Returns a new string, with new text,
whose characters and length are the same
as those of @i[string].
@EndDesc[COPY-STRING]

@desc[(CHAR->STRING @i[character]) @yl[] @i[string]]
Creates a string of length one whose single element is @i[character].
  @begin[ProgramExample]
(CHAR->STRING #\B)  @ev[]  "B"
  @end[ProgramExample]
@EndDesc[CHAR->STRING]

@desc[(LIST->STRING @i[list]) @yl[] @i[string]]
Converts a list of characters to a string.
  @begin[ProgramExample]
(LIST->STRING '(#\Z #\e #\b #\u))  @ev[]  "Zebu"
  @end[ProgramExample]
@EndDesc[LIST->STRING]

@desc[(STRING->LIST @i[string]) @yl[] @i[list]]
Converts a string to a list of characters.
  @begin[ProgramExample]
(STRING->LIST "Zebu")  @ev[]  (#\Z #\e #\b #\u)
  @end[ProgramExample]
@EndDesc[STRING->LIST]


@section[String access]

@info[NOTES="Settable"]
@desc[(STRING-LENGTH @i[string]) @yl[] @i[integer]]
Returns @i[string]'s length.
A string's length may be @tc[SET], but the new length must be less
than or equal to the original length.
@EndDesc[STRING-LENGTH]

@descN[
F1="(STRING-EMPTY? @i[string]) @yl[] @i[boolean]", FN1="STRING-EMPTY?",
]
Returns true if @i[string] is an empty string.
@begin[ProgramExample]
@tabclear
(STRING-EMPTY? "")         @^@ev[]  @r[true]
(STRING-EMPTY? "Bharal")@\@ev[]  @r[false]
(STRING-EMPTY? @i[string])@\@ce[]  (=0? (STRING-LENGTH @i[string]))
@tabclear
@end[ProgramExample]
@EndDescN[]

@AnEquivE[Tfn="STRING-ELT",Efn="GETCHAR"]
@AnEquivE[Tfn="STRING-ELT",Efn="GETCHARN"]
@info[NOTES="Settable"]
@descN[
F1="(STRING-ELT @i[string n]) @yl[] @i[character]", FN1="STRING-ELT",
F2="(NTHCHAR @i[string n]) @yl[] @i[character]", FN2="NTHCHAR"
]
Returns the @i[n]@+[th] character in @i[string] (zero-based).
@begin[ProgramExample]
(NTHCHAR "SAIGA" 2)  @ev[]  #\I
@end[ProgramExample]
@EndDescN[]

@info[NOTES="Settable"]
@descN[
F1="(STRING-HEAD @i[string]) @yl[] @i[character]", FN1="STRING-HEAD",
F2="(CHAR @i[string]) @yl[] @i[character]", FN2="CHAR"
]
Returns first character in @i[string].
@EndDescN[]

@descN[
F1="(STRING-TAIL @i[string]) @yl[] @i[string]", FN1="STRING-TAIL",
F2="(CHDR @i[string]) @yl[] @i[string]", FN2="CHDR"
]
Returns the @qu"tail" of @i[string].
@begin[ProgramExample]
(STRING-TAIL "Ibex.")  @ev[]  "bex."
@end[ProgramExample]
@EndDescN[]

@descN[
F1="(STRING-NTHTAIL @i[string n]) @yl[] @i[string]",
FN1="STRING-NTHTAIL",
F2="(NTHCHDR @i[string n]) @yl[] @i[string]", FN2="NTHCHDR"
]
Returns the @i[n]@+[th] tail of @i[string].
  @begin[ProgramExample]
(NTHCHDR "SAIGA" 2)  @ev[]  "IGA"
  @end[ProgramExample]
@EndDescN[]

@descN[
F1="(SUBSTRING @i[string start count]) @yl[] @i[string]", FN1="SUBSTRING",
]
Returns a substring of @i[string], beginning with the @i[start]@+[th] character,
for a length of @i[count] characters.
@begin[ProgramExample]
(SUBSTRING "A small oryx" 2 5)  @ev[]  "small"
@end[ProgramExample]
@EndDescN[]

@AnEquivE[Tfn="STRING-SLICE",Efn="NSUBSTRING"]
@desc[(STRING-SLICE @i[string start count]) @yl[] @i[string]]
Returns a substring (slice) of @i[string], beginning with the @i[start]@+[th]
character, for a length of @i[count] characters.
@begin[ProgramExample]
(STRING-SLICE "A small oryx" 2 5)  @ev[]  "small"
@end[ProgramExample]
Unlike @tc[SUBSTRING], the characters returned by
@tc[STRING-SLICE] are shared with the original string;
that is, any changes to characters
in the original string which have been selected in the substring
are reflected in the substring, and vice versa.
@EndDesc[STRING-SLICE]


@section[String manipulation]

@desc[(STRING-POSQ @i[character string]) @yl[] @i[integer] @r[or] @i[false]]
Returns the index of the first occurrence of
@i[character] in @i[string], if it is there;
otherwise returns false.
@begin[ProgramExample]
(STRING-POSQ #\i "oribi")  @ev[]  2
(STRING-POSQ #\s "oribi")  @ev[]  @r[false]
@end[ProgramExample]
@EndDesc[STRING-POSQ]

@desc[(STRING-REPLACE @i[destination source count]) @yl[] @i[string]]
Copies @i[count] characters from the @i[source] string to the
@i[destination] string, destructively, and return the modified
@i[destination].
@begin[ProgramExample]
(DEFINE S (COPY-STRING "The bison"))
(STRING-REPLACE S "Any how" 3)  @ev[]  "Any bison"
@end[ProgramExample]
@dc[This may be renamed to be @tc[STRING-REPLACE!].]
@EndDesc[STRING-REPLACE]

@desc[(MAP-STRING @i[procedure string]) @yl[] @i[string]]
Calls @i[procedure] on each character in @i[string], collecting
the successive return values which should be characters
in a new string.
@begin[ProgramExample]
(MAP-STRING CHAR-UPCASE "A grisbok")  @ev[]  "A GRISBOK"
@end[ProgramExample]
@EndDesc[MAP-STRING]

@desc[(MAP-STRING! @i[procedure string]) @yl[] @i[string]]
Calls @i[procedure] on each character in @i[string], storing the results
which should be characters back into @i[string].
@EndDesc[MAP-STRING!]

@desc[(WALK-STRING @i[procedure string]) @yl[] @i[undefined]]
Calls @i[procedure] on each character in @i[string].
@EndDesc[WALK-STRING]


@section[String header manipulation]
@Label(string header section)

A @iixs[string header] is a structure of fixed size which contains a
pointer into a @iixs[string text], and a length.  A string text is a
vector of characters themselves.  The string text is not itself a directly
accessible object, but can only be manipulated
via a string header.  Several string headers may point into the same text.
The term @i[string] is used to refer to a header and text considered
as a whole.

@desc[(CHOPY @i[string]) @yl[] @i[string]]
Makes a new string header pointing to the same string text,
and with the same length, as the header for @i[string].
@EndDesc[CHOPY]

@desc[(CHOPY! @i[destination source]) @yl[] @i[string]]
Copies the header for the @i[source] string into the header
for the @i[destination] string, which is returned.
@EndDesc[CHOPY!]

@descN[
F1="(STRING-TAIL! @i[string]) @yl[] @i[string]", FN1="STRING-TAIL!",
F2="(CHDR! @i[string]) @yl[] @i[string]", FN2="CHDR!"
]
Destructively modifies @i[string]'s header to point to the next
character in its text, and decrements its length.
@begin[ProgramExample]
(LET ((S (COPY-STRING "String.")))
  (CHDR! S)
  S)
        @ev[]
"tring."
@end[ProgramExample]
@EndDescN[]

@descN[
F1="(STRING-NTHTAIL! @i[string n]) @yl[] @i[string]", FN1="STRING-NTHTAIL!",
F2="(NTHCHDR! @i[string] n) @yl[] @i[string]", FN2="NTHCHDR!"
]
Destructive version of @tc[STRING-NTHTAIL].
@EndDescN[]


@section[Case conversion]

@desc[(CHAR-UPCASE @i[character]) @yl[] @i[character]]
If @i[character] is a lower-case character, returns the corresponding
upper-case character.  Otherwise @i[character], which must be a character,
is returned.
@EndDesc[CHAR-UPCASE]

@desc[(CHAR-DOWNCASE @i[character]) @yl[] @i[character]]
If @i[character] is an upper-case character, returns the corresponding
lower-case character.  Otherwise @i[character], which must be a character,
is returned.
@EndDesc[CHAR-DOWNCASE]

@desc[(STRING-UPCASE @i[string]) @yl[] @i[string]]
Returns a copy of @i[string] with all lower-case characters converted to
upper case.
@begin[ProgramExample]
(STRING-UPCASE @i[string])  @ce[]  (MAP-STRING CHAR-UPCASE @i[string])
@end[ProgramExample]
@EndDesc[STRING-UPCASE]

@desc[(STRING-DOWNCASE @i[string]) @yl[] @i[string]]
Returns a copy of @i[string] with all upper-case characters converted to
lower case.
@begin[ProgramExample]
(STRING-DOWNCASE @i[string])  @ce[]  (MAP-STRING CHAR-DOWNCASE @i[string])
@end[ProgramExample]
@EndDesc[STRING-DOWNCASE]

@desc[(STRING-UPCASE! @i[string]) @yl[] @i[string]]
Destructive version of @tc[STRING-UPCASE].
@begin[ProgramExample]
(STRING-UPCASE! @i[string])  @ce[]  (MAP-STRING! CHAR-UPCASE @i[string])
@end[ProgramExample]
@EndDesc[STRING-UPCASE!]

@desc[(STRING-DOWNCASE! @i[string]) @yl[] @i[string]]
Destructive version of @tc[STRING-DOWNCASE].
@begin[ProgramExample]
(STRING-DOWNCASE! @i[string])  @ce[]  (MAP-STRING! CHAR-DOWNCASE @i[string])
@end[ProgramExample]
@EndDesc[STRING-DOWNCASE!]


@section[Digit conversion]

@desc[(CHAR->DIGIT @i[character radix]) @yl[] @i[integer]]
Returns the @iix[weight] of the character when treated as a digit.
@i[Character] must be a digit in the given @i[radix].
@begin[ProgramExample]
(CHAR->DIGIT #\A 16)  @ev[]  10
@end[ProgramExample]
@EndDesc[CHAR->DIGIT]

@desc[(DIGIT->CHAR @i[integer radix]) @yl[] @i[character]]
Given a non-negative @i[integer] less than @i[radix],
returns a character (a digit) whose weight is the @i[integer].
@begin[ProgramExample]
(DIGIT->CHAR 10 16)  @ev[]  #\A
@end[ProgramExample]
@EndDesc[DIGIT->CHAR]

@desc[(DIGIT @i[character radix]) @yl[] @i[integer] @r[or] @i[false]]
If @i[character] is a digit, returns its weight;
otherwise returns false.
@begin[ProgramExample]
(DIGIT #\5 10)  @ev[]  5
@end[ProgramExample]
@EndDesc[DIGIT]

@section[ASCII conversion]

@dc{Talk about coercion to and from integers, and the arbitrariness of the ASCII
character set.  Refer to Appendix @ref[ascii appendix].}

@AnEquivE[Tfn="CHAR->ASCII",Efn="CHAR-CODE"]
@desc[(CHAR->ASCII @i[character]) @yl[] @i[integer]]
Given a character, returns its ASCII representation as an integer.
@EndDesc[CHAR->ASCII]

@AnEquivE[Tfn="ASCII->CHAR",Efn="CODE-CHAR"]
@desc[(ASCII->CHAR @i[integer]) @yl[] @i[character]]
Given an integer which is the ASCII code for some character,
returns the character.
@EndDesc[ASCII->CHAR]

@desc[*NUMBER-OF-CHAR-CODES* @yl[] @i[integer]]

The value of @tc[*NUMBER-OF-CHAR-CODES*] is a number that is
1 larger than the largest value that will ever be returned by
@tc[CHAR->ASCII].  This may be used to make tables which are
to be indexed by ASCII codes.

@begin[ProgramExample]
(DEFINE *TABLE* (MAKE-VECTOR *NUMBER-OF-CHAR-CODES*))  @ev[]  @i[vector]
(VSET *TABLE* (CHAR->ASCII #\F) 'COW)                  @ev[]  COW
(VREF *TABLE* (CHAR->ASCII #\F))                       @ev[]  COW
@end[ProgramExample]

@EndDesc[*NUMBER-OF-CHAR-CODES*]


@section[Symbols]

Symbols are similar to strings, but are instantiated uniquely; only one
symbol with a given print name exists.  Symbols are used to identify
variables, among other things.  The fact that they have a convenient
external representation makes them useful for many purposes.

Symbols may be coerced to strings and vice versa.
If two strings are equal to each other (e.g. according
to @tc[STRING-EQUAL?]), then they will both convert to the same symbol.
  @begin[ProgramExample]
(EQ? (STRING->SYMBOL @i[string1]) (STRING->SYMBOL @i[string2]))
  @end[ProgramExample]
if and only if
  @begin[ProgramExample]
(STRING-EQUAL? @i[string1] @i[string2])
  @end[ProgramExample]

See also sections @ref[SYMBOL?] and @ref[reader].

@desc[(STRING->SYMBOL @i[string]) @yl[] @i[symbol]]
Returns the symbol whose print name is equal to @i[string].
  @begin[ProgramExample]
(STRING->SYMBOL "COW")    @ev[]  COW
(STRING->SYMBOL "123")    @ev[]  \123
(STRING->SYMBOL "bison")  @ev[]  \b\i\s\o\n
(STRING->SYMBOL "")       @ev[]  #[Symbol ""]
  @end[ProgramExample]
Note that it is @tc[READ-OBJECT] (page @pageref[READ-OBJECT]), not
@tc[STRING->SYMBOL], which coerces alphabetic characters to upper case.
@EndDesc[STRING->SYMBOL]

@desc[(SYMBOL->STRING @i[symbol]) @yl[] @i[string]]
Returns a string for which @tc[STRING->SYMBOL] will return @i[symbol].
  @begin[ProgramExample]
(SYMBOL->STRING 'COW)  @ev[]  "COW"
  @end[ProgramExample]
@EndDesc[SYMBOL->STRING]
