Next: Input Conventions, Previous: Input Format, Up: Text [Contents][Index]
Recall from
Groff Options,
that the
groff
command’s
-k
option runs the
preconv
preprocessor
to perform input character encoding conversions to satisfy
GNU
troff’s
requirement of a single-byte encoding compatible with
ISO 646:1991 IRV (US-ASCII).
Localization influences automatic hyphenation in two distinct but related respects. A macro file specific to a character coding identifies which character codes correspond to letters expected in the language’s hyphenation pattern files and sets up case equivalences for those letters. A language’s macro file determines which of these letters are equivalent to other letters for hyphenation purposes.
For example, in English, the letter ‘ñ’ occurs in loan words. The latin1.tmac and latin9.tmac macro files define a hyphenation code for ‘ñ’ and make ‘Ñ’ equivalent to it. The English localization file en.tmac furthermore makes ‘ñ’ equivalent to ‘n’. In Spanish (es.tmac), however, ‘ñ’ and ‘n’ are not equivalent. The language localization file (see Manipulating Hyphenation) loads an appropriate encoding localization file; a document need not do so directly.
koi8-rTo use KOI8-R, an encoding for the Russian language, either place
‘.mso koi8-r.tmac’ at the very beginning of your document or
supply ‘-m koi8-r’ as a command-line argument to groff.
The
ru.tmac
localization file loads
koi8-r.tmac
automatically.37
latin1ISO Latin-1 is an encoding for Western European languages. The de.tmac, en.tmac, it.tmac, and sv.tmac localization files load latin1.tmac automatically.
latin2To use ISO Latin-2, an encoding for Central and Eastern European
languages, invoke ‘.mso latin2.tmac’ at the beginning of your
document or supply ‘-m latin2’ as a command-line argument to
groff.
The
cs.tmac
and
pl.tmac
localization files load
latin2.tmac
automatically.
latin5To use ISO Latin-5, an encoding for the Turkish language, invoke
‘.mso latin5.tmac’ at the beginning of your document or
supply ‘-m latin5’ as a command-line argument to groff.
latin9ISO Latin-9 succeeds Latin-1; it includes a Euro sign and better
coverage for French. To use this encoding, invoke ‘.mso latin9.tmac’ at the beginning of your document or supply
‘-m latin9’ as a command-line argument to groff.
The
es.tmac
and
fr.tmac
localization files load
latin9.tmac
automatically.
Some characters from an input encoding may not be available with a particular output driver, or their glyphs may not have representation in the font used. For terminal devices, fallbacks are defined, like ‘EUR’ for the Euro sign and ‘(C)’ for the copyright sign. For typesetter devices, you may need to “mount” fonts that support glyphs required by the document. See Font Positions.
Because a Euro glyph was not historically defined in PostScript fonts,
groff comes with a font called freeeuro.pfa that provides
the Euro in several styles. Standard PostScript fonts contain the
glyphs from Latin-5 and Latin-9 that Latin-1 lacks, so these
encodings are supported for the ps and pdf output
devices as groff ships, while Latin-2 is not.
Unicode supports characters from all other input encodings; the utf8 output driver for terminals therefore does as well. The DVI output driver supports the Latin-2 and Latin-9 encodings if the command-line option ‘-m ec’ is used as well. 38
Next: Input Conventions, Previous: Input Format, Up: Text [Contents][Index]