National Series 32000 Assembler, Linker, Library Utilities

Report Bugs to:

Bruce Culbertson
2083 Princeton Street
Palo Alto, CA 94306
culberts@hplabs.hp.com


I.  Assembler Syntax

The AS32K assembler's syntax follows closely that of the National NSX
assembler.  Most differences in syntax were necessitated by differences
in features between the two assemblers.  Some differences exist to make
the assembler more compatible with the Free Software Foundation C
compiler.

A.  Lines

Source code lines are limited in length, currently to 200 characters.
All characters to the right of a semicolon are considered to be
comments.  Tokens (e.g.  mnemonics, numbers, labels, etc.) in source
code lines may be separated by spaces or tabs.  Lines may be blank or
may contain one or more of the following: labels; instructions and
assembler directives; and comments.

B.  Labels

Labels must precede instructions and must be followed by one or two
colons.  Labels followed by a single colon are not visible to the linker
and, consequently, may not be referenced by other files.  Conversely,
labels followed by two colons may be referenced by other files.  Using
two colons has the same effect as using the .EXPORT assembler directive.
It is not an error to reference a label which is not defined, i.e.  does
not appear to the left of a colon, in the current file; it is assumed
that these references will be resolved by the linker.

Labels may be composed of upper and lower case alphabetic characters,
digits, and "." and "_".  Labels may not begin with a digit.  Upper case
letters are considered different from lower case, i.e.  the label "a" is
different from "A".  The length of a label is limited only by the
maximum line length.  Two labels are the same only if all of their
characters match.  All characters of each label are exported to the
linker.  Labels must be different from assembler mnemonics, directives,
and register names.

C.  Constants

Examples:

d'1234, D'1234, 12345				decimal number

b'10110, B'10110				binary numbers

o'765, O'765, q'765, Q'765			octal numbers

h'12ef, H'12ef, x'12ef, X'12ef			hexidecimal numbers

f'1, F'123.456, f'1.2e203			floating point numbers

Strings and character constants are surrounded by single or double
quotes.  The two adjacent occurrences of the quote character in a string
causes a single copy of the quote character to be emitted in the string,
e.g.  '5 o''clock'.  Strings are used with the .BYTE directive.

The current location counter is indicated by "*".

Register lists and configuration lists are treated as constants.
Consequently, instruction sequences like the following are allowed:

			; Start of new function
	func::
			enter	reg_list, 10
			...			; function body
			restore	reg_list
			ret	10
	reg_list:	.equ	[r0,r1,r2]

This feature was added for the convenience of compilers, like the AT&T
portable C compiler, which emit function prologs before knowing what
registers a function will use.

Floating point constants are not allowed in expressions.  They may be
used as immediate operands in floating point instructions and they may
be used with the .FLOAT and .LONG directives.

E.  Registers

Register names may use upper or lower case letters.  The following
registers are recognized by the assembler:

R0	R1	R2	R3	R4	R5	R6	R7
F0	F1	F2	F3	F4	F5	F6	F7
US	FP	SP	SB	PC	PSR	INTBASE	BPR0
BPR1	PF0	PF1	SC	MSR	BCNT	PTB0	PTB1
EIA	DCR	BPC	DSR	CAR	USP	CFG	MCR
TEAR	IVAR0	IVAR1

F.  Assembler Directives

Directives may be upper or lower case.

.ALIGN

Syntax: .ALIGN <expression>.  Increase location counter for the current
segment until it is a multiple of <expression>.  Note: when the linker
combines multiple relocatable text segments into a single executable
text segment, it aligns relocatable segments on four byte boundaries
within the new segment.  The same is true for data and bss segments.
Consequently, just because an object is aligned on, say, a sixteen byte
boundary in the relocatable segment, it may not still be aligned on a
sixteen byte boundary after linking.  Thus, the only reasonable values
for <expression> are between 1 and 4.

.ASCII

Syntax: .ASCII <gcc string>.  This directive has been added to minimize
changes which need to be made to the GNU C compiler (gcc).  It uses a
different string syntax than .BYTE.  Gcc strings are delimited by single
or double quotes.  A non-printing character may be included in a string
by inserting a back-slash followed by the character's octal value.  A
back-slash or a string delimiter may be included by preceding it with a
back-slash.  <gcc string> may be arbitrarily long.
Example: .ASCII "a quote: \", a back-slash: \\, a tab: \11"

.BLKB	.BLKW	.BLKD

Syntax: .BLKi <expression>.  Reserve space in the current segment.
Number of bytes reserved is the value of the expression times 1, 2, or 4
for .BLKB, .BLKW, and .BLKD, respectively.

.BYTE	.WORD	.DOUBLE

Syntax: .BYTE <expression>[,<expression>...].  Reserve a byte, word or
double for each expression, and initialize the object to the value of
the expression.  For the .BYTE directive only, <expression> may be a
string.  .LONG is synonymous with .DOUBLE.

.EQU

Syntax: <label>: .EQU <expression>.  Create a symbol and set its value
to that of the expression.

.EXPORT	.GLOBL

Syntax: .EXPORT <label>[,<label>...].  Make each label visible to the
linker so that it can be referenced from other files.  This has the same
effect as putting two colons after the symbol when the symbol is
defined.

.FLOAT	.LONG

Syntax: .FLOAT <constant>[,<constant>...].  Reserve four [.FLOAT] or
eight [.LONG] bytes for each constant, and initialize space to the value
of the constant.

.PROGRAM	.TEXT

Set the current segment to the text segment.

.STATIC	.DATA

Set the current segment to the data segment.

.BSS

Set the current segment to the bss segment.


G.  Expression Operators, in order of precedence

+ (unary), - (2's complement), com (1's complement), not (complement LSB)
*, /, mod (modulus), and, shl (shift left), shr (shift right)
+, -, or, xor

Parentheses may be used to override the precedence.

H.  Addressing Modes

Some 32000 assemblers allow you to define a label in such a way that any
subsequent use of the label implies a certain addressing mode.  This is
not the case with this assembler; addressing modes must always be given
explicitly.  A "gen" operand which is a single label will always cause
immediate addressing to be used.

Addressing mode syntax is the same as it appears in the various 32000
instruction manuals, with the following exceptions and clarifications.
<expression> emits immediate mode, @<expression> emits absolute mode,
and <expression>(pc) emits program memory mode.  In the three
situations, bsr <expression>, b<cc> <expression>, and <expression>(pc),
the current location counter is automatically subtracted from the
expression.  Thus, to jump to my_label, use "br my_label", not "br
my_label - *".

II.  Program Environment, Linking and Relocating Features

The program model, i.e.  the environment in which a program runs, has
four segments: text, data, bss, and stack.  The operating system, when
it loads a program, should allocate memory for each of these segments.
Then the operating system should initialize the text and data segments
from copies of those segments which are stored in the object file.  The
operating system may establish its own conventions for initializing the
bss and stack segments; there are no copies of these segments in the
object file and it is not possible to specify their contents with the
assembler.  The sizes of the text and data segments are recorded in the
object file.  The object file also contains an initial size for the bss
segment.  The operating system must choose the initial size of the stack
segment.  Typically, operating systems allow the stack and bss sizes to
grow.  The program nm, when run with the -v option, will print the
segment sizes of an object file.

The precise use of each segment is not strictly enforced.  A programmer
may choose not to use some of the segments, in which case the sizes of
those segments will be zero.  Nevertheless, each segment does have an
intended use which is described below.

A.  Text Segment

The text segment is the only segment which can contain instructions.  It
can also contain data and variables.  If care is taken to place only
instructions and read-only data in the text segment, then the text
segment can be placed in ROM or may be made read-only by memory
management hardware.  This allows multiple processes to safely share the
same text segment.

B.  Data Segment

The data segment is intended to contain variables which must be
initialized before the program is run.  Since the data segment contains
variables, it must be loaded in write-able memory and must not be shared
among processes.  The .BYTE, .WORD, and .DOUBLE assembler directives are
typically used to specify the contents of the data segment.

C.  Bss Segment

The bss segment is intended to contain variables which are not
initialized before the program is run or are initialized to some default
by the operating system.  The bss segment exists simply to minimize the
size of the object file -- all the variables in the bss segment could be
placed in the data segment if this were not a concern.  Since the bss
segment contains variables, it must be loaded in write-able memory and
must not be shared among processes.  The .BLKB, .BLKW, and .BLKD
assembler directives are typically used to reserve space in the bss
segment.

D.  Stack Segment

The stack segment is used to store return addresses during subroutine
calls, subroutine parameters, and temporary variables.

E.  Relocation

An object file produced by the assembler is "relocatable", meaning the
location at which it is intended to run has not yet been determined.
The linker makes it possible to conbine multiple relocatable files,
resolve references they make to each other, fix the location at which
each piece of code will run, and output a single executable object file.
The linker will combine several relocatable files into a single larger
relocatable file if the -r option is given.  The linked object file
contains a single text segment and a single data segment.  The new text
segment is the concatenation of the text segments from the relocatable
files.  Similarly, the new data segment is the concatenation of the data
segments from the relocatable files.  By default, the new text segment
is relocated to be loaded starting at address 0 and the new data segment
is relocated to be loaded starting at the first page beyond the end of
the text segment.  This allows the text and data segments to be assigned
different memory management permissions.  The bss segment is relocated
to be contiguous with the data segment.  The linker options -D and -T
override the default starting addresses of the data and text segments.

The linker only edits four byte values.  For example, it is an error to
try to assemble ".byte some_label" if some_label is defined in another
file or in another segment in the same file.  ".double some_label" will
be relocated correctly.  "bsr some_label" will be assembled with a four
byte displacement if some_label is not defined in the current file.  The
linker will edit all of the following types of objects if they contain
references to other files or other segments: immediate values in
instructions, absolute addresses in instructions, displacements in
instructions, and data.

The linker does not support the external addressing mode.  The
programmer must explicitly create a module table and link tables if
external addressing is used.

In addition to linking individual relocatable files, the linker can link
relocatable files which are contained in library files.  Libraries are
created by first running AR32K to combine relocatable files into a
single archive file.  Then RANLIB is run to add a symbol index to the
archive file.  When the linker is run, it is given a list of files to
link.  It processes the files in order from left to right.  When
processing a library, the symbol index is searched for symbols which are
currently unresolved, i.e.  symbols referenced, but not defined, in
files which have already been processed.  When such a symbol is found,
the relocatable file in the library which defines the symbol is linked
into the new object file being created.  The symbol index is searched
repeatedly until a complete scan of the index finds no new unresolved
variables.  This makes the ordering of the files in the library
unimportant.

The special symbol "start" is considered to be unresolved when the
linker starts processing files.  If an executable file is being created,
then "start" must be defined in some file or library.  The value of the
"start" symbol is recorded in the new object file; the operating system
should initialize the program counter to this value when the program is
run.  Typically, programs are linked to a standard initialization
routine which runs before the user-written code runs.  If the "start"
symbol is defined in such a routine and the routine is placed in a
standard library which is always given in the list of files to link,
then the routine will automatically be linked into programs and will be
the first code run when the program is executed.

The linker creates several symbols when linking an executable file.  The
addresses of the beginning and end of the text segment are assigned to
the symbols _btext and _etext and the addresses of the beginning and end
of the data segment are assigned to the symbols _bdata and _edata.  The
address of the end of the bss segment is assigned to the symbol _end.
These symbols may be referenced by the program being linked.  For
example, if a program needed to have the bss segment initially all zero,
it could contain a loop to set to zero all bytes between _edata and
_end.  Because the text segment is rounded up to a multiple of PAGESZ
bytes, the text segment size is generally larger than the amount of
actual code.  The symbol _etext, on the other hand, is set to the
address of the first byte beyond the end of the actual code.

III.  The Tools and How to Run Them

A.  AS32K -- Assembler

Syntax: AS32K [options] <filename>.  This assembles the given file.  A
relocatable object file is created.  If <filename> ends with the suffix
".s", then object file's name is the same except that the suffix is
replaced with ".o".  Otherwise, the suffix ".o" is added to the source
file name to produce the object file name.  The "-o <filename>" option
overrides this default; the object file name will be the given name.
The "-A" option causes a listing to be printed to the screen.  The "-a
<filename>" option causes a listing file to be made with the given name.
Example: "as32k -a foo.lst foo.s" assembles foo.s, produces an object
file foo.o, and produces a listing file foo.lst.

B.  LD32K -- Linker

Syntax: LD32K [options] files.  The given files are linked and an
executable object file is created called "a.out".  The files may be
relocatable files or libraries.  The "-o <filename>" option causes the
object file to have the given name instead of "a.out".  The "-r" option
causes a relocatable file to be created (suitable for further linking)
instead of an executable file.  The option "-e <symbol name>" changes
the name of the start symbol from "start" to the given name.  The "-T
<hexidecimal address>" option relocates the text segment to be loaded at
the given address.  Normally, the text segment is relocated to be loaded
starting at address 0.  The "-D <hexidecimal address>" option relocates
the data segment to be loaded at the given address.  Normally, the data
segment is relocated to be loaded at the first page following the text
segment.  A convention that is sometimes used is to name libraries
lib<suffix>.a, for example "libc.a" is often used with the C language.
The shorthand "-l<suffix>" is allowed to specify such files, e.g.  "-lc"
links the file "libc.a".  Example: "ld32k -o my_prog foo.o bar.o" links
the object or library files foo.o and bar.o to produce an executable
file my_prog.

C.  AR32K -- Archive Manager

Syntax: AR32K [options] <archive file> <file list>.  An archive file is
one large file which contains copies of one or more smaller files.  It
is sometimes more convenient to have a collection of files organized
into a single file than to store them as individual files.  Libraries
are an important example.  Many frequently used subroutines, assembled
into relocatable files, can be combined into a library.  The linker
knows how to selectively link from a library exactly those files which
have been referenced by other files being linked.  A library is produced
by first using AR32K to combine relocatable files into an archive and
then using RANLIB to add an additional file, an index, to the archive.

AR32K is a program which manages archives: it adds, deletes and extracts
files to and from an archive.  "AR32K -a <archive file> <file list>"
appends the files named in the file list to the end of the given
archive.  The archive is created if it does not already exist.  "AR32K
-p <archive file> files" prepends the files named in the file list to
the beginning of the given archive.  The archive is created if it does
not already exist.  When files are added to the archive, the copies in
the current directory are unchanged.  "AR32K -x <archive file> <file
list>" copies the listed files from the archive into files in the
current directory.  The archive is unchanged.  Existing files in the
directory with names that match the extracted files are overwritten.
"AR32K -d <archive file> <file list>" deletes the listed files from the
archive.  "AR32K -t <archive file>" lists the files stored in the
archive.

D.  NM32K -- Symbol Table Program

Syntax: NM32K [-v] <file list>.  NM32K lists the symbols in relocatable
or executable files and gives the addresses to which each symbol has
been relocated.  This information is invaluable for debugging programs.
If the -v option is given then additional information about the files is
printed.

E.  RANLIB -- Random Access Library Maker

Syntax: RANLIB <archive file>.  The given archive file must contain only
relocatable files.  This program converts the archive into a library by
prepending a new file, called __.SYMDEF, to the archive.  The new file
is a symbol table which is used by the linker.  The symbol table lists
the symbols from the relocatable files and tells which file each symbol
is in.  __.SYMDEF is not in readable form.

F.  EXTRACTT -- Text Segment Extractor

Syntax: EXTRACTT <input file> <output file>.  This program extracts the
text segment from the input file, which must be an executable or
relocatable file, and writes it to the output file.

G.  EXTRACTD -- Data Segment Extractor

Syntax: EXTRACTD <input file> <output file>.  This program extracts the
data segment from the input file, which must be an executable or
relocatable file, and writes it to the output file.
