Next: Debugging, Previous: Miscellaneous, Up: GNU troff Reference [Contents][Index]
troff InternalsGNU
troff processes input in three steps.
It gathers one or more input characters into a
token,183
the smallest meaningful unit of
troff input.
The process of formatting translates tokens into nodes
that populate a pending output line
(recall
Manipulating Filling and Adjustment).
A
node
is a data structure representing any object
that may ultimately appear in the output,
like a glyph or motion on the page.
When the pending output line breaks,
the formatter applies any relevant adjustment,
line number,
and margin character,
and finally appends it to the current diversion.
Periodically,
the formatter
flushes
accumulated output line(s) to the output device,
a process that translates each node
into a device-independent output language representation
understood by all output drivers.
Copy mode tokenizes but does not format;
diversions
(apart from that at the top level)
format but do not write output.
For example,
GNU
troff
converts the input
‘Gi\[:u]\%seppe’
into a
character token for
‘g’,
a character token for
‘i’,
a special character token for
‘:u’
(representing
‘u’
with an umlaut),
a token encoding a hyphenation break point,184
and further character tokens.
You can observe this process
by storing the foregoing input into a string—which,
because its contents are read in copy mode,
is only tokenized,
not formatted—and
dumping it with the
pm request.185
(Using printf(1) requires us to double the ‘\’
and ‘%’ characters.)
$ printf '.ds str Gi\\[:u]\\%%seppe\n.pm str\n' \
| groff 2>&1 | jq
Similarly,
we can observe the details of the formatting process
by interpolating the string,
or supplying its contents directly as input,
and invoking the
pline
request.
$ printf 'Gi\\[:u]\\%%seppe\n.pline\n' | groff -z 2>&1 | jq
We now see a list of nodes, including an output line start node, several glyph nodes, a discretionary break node containing a glyph node for the special character ‘:u’ and a glyph node for the special character ‘hy’ (hyphen), and a word space node at the end corresponding to the newline at the end of input.186
If we change ‘G’ to ‘f’, we see that the first two glyph nodes, for ‘f’ and ‘i’, become contained by a ligature node (provided the current font has a glyph for this ligature). All output glyph nodes are “processed”, which means that they are associated with a given font, type size, advance width, and so forth.
Macros, diversions, and strings collect elements in two chained lists: a list of tokens that have been passed unprocessed, and a list of nodes. Consider the following diversion.
.di xxx a \!b c .br .di
It contains these elements.
| node list | token list | element number |
| line start node | — | 1 |
glyph node a | — | 2 |
| word space node | — | 3 |
| — | b | 4 |
| — | \n | 5 |
glyph node c | — | 6 |
| vertical size node | — | 7 |
| vertical size node | — | 8 |
| — | \n | 9 |
troff
inserts elements 1,
7,
and 8;
the latter two
(which are always present)
specify the vertical extent of the last line,
possibly modified by \x.
The
br
request finishes the pending output line,
inserting a newline token,
which is subsequently converted to a space
when the diversion is interpolated.
Note that the word space node
has a fixed width that isn’t adjustable anymore.
To convert horizontal space nodes back into tokens,
use the
unformat
request.
Macros only contain elements in the token list (and the node list is empty); diversions and strings can contain elements in both lists.
The chop request simply reduces the number of elements in a
macro, string, or diversion by one. Exceptions are compatibility
save and compatibility ignore tokens, which are ignored. The
substring request also ignores those tokens.
Some requests like tr or cflags work on glyph identifiers
only; this means that the associated glyph can be changed without
destroying this association. This can be very helpful for substituting
glyphs. In the following example, we assume that glyph ‘foo’ isn’t
available by default, so we provide a substitution using the
fchar request and map it to input character ‘x’.
.fchar \[foo] foo .tr x \[foo]
Now let us assume that we install an additional special font ‘bar’ that has glyph ‘foo’.
.special bar .rchar \[foo]
Since glyphs defined with fchar are searched before glyphs in
special fonts, we must call rchar to remove the definition of the
fallback glyph. Anyway, the translation is still active; ‘x’ now
maps to the real glyph ‘foo’.
Macro and request arguments preserve compatibility mode enablement.
.cp 1 \" switch to compatibility mode
.de xx
\\$1
..
.cp 0 \" switch compatibility mode off
.xx caf\['e]
⇒ café
Since compatibility mode is enabled while de is invoked, the
macro xx enables compatibility mode when it is called. Argument
$1 can still be handled properly because it inherits the
compatibility mode enablement status that was active at the point where
xx was called.
After interpolation of the parameters, the compatibility save and restore tokens are removed.
Next: Debugging, Previous: Miscellaneous, Up: GNU troff Reference [Contents][Index]