.nr # 0 1
.EQ
delim $$
.EN
.TL
Introduction to the Csound File System
.AU
D. Gareth Loy
.AI
Computer Audio Research Laboratory
Center for Music Experiment, Q-037
University of California, San Diego
La Jolla, California  92093
.AB 
At least three distinct problems must be addressed in order
to record, store, manipulate and play back high quality digital sound samples. 
The problems are 1) the sheer volume of data involved, 2) the high speed
at which the data must travel and 3) the bookkeeping involved in managing
the sound's vital statistics.  Each of these problems goes beyond
what can be handled conveniently by the regular 
.UX
file system, 
so a special file system just for sound has been constructed,
called the 
.B csound
file system.  This document covers most of
what is needed to get started using it.
.AE
.bp
.NH 1
In General
.PP
The 
.B csound
file system has many points of similarity to the regular 
.UX
file system.
They both are used to store and retrieve information from 
mass storage systems such as disks.  But where the UNIX file system
is suitable for storing most things, the
.B csound
file system has been developed to cover its weaknesses in sound sample
storage.  In many cases, the names of commands which manipulate
.B csound
files are synonyms to their UNIX counterparts.
Where UNIX has an 
.B ls 
command to list the files in a directory, the 
.B csound
file system has an 
.B lsf 
command (``list sound files'').  Where UNIX has a 
.B mv
command to move files, the 
.B csound
file system has a 
.B mvsf.  
Similarly the UNIX 
.B cp
command, which copies files, has a counterpart named 
.B cpsf 
in 
.B csound. 
Lastly, the UNIX 
.B mkdir 
program (``make a directory'') becomes 
.B mksfdir.
.PP
From here the resemblance begins to fade.  Whereas UNIX allows you a variety of
ways to create new files, 
.B csound
has but two: 
.B sndout 
and 
.B record.
The 
.B record 
program creates a new sound file by reading the Analog to Digital
Converters (ADC) and storing the samples in a sound file.  
.B sndout
reads its standard input as the source of sound samples to be stored
on the sound file system.  
.PP
Similarly, there are two ways to get sound back from the
.B csound
file system: 
.B sndin 
and 
.B play.
.B play 
writes samples from the 
.B csound 
file system
to the Digital to Analog Converters (DAC),
which allow a
sound to be heard.  
.B sndin 
reads sound samples from the 
.B csound
file system and writes
them on its standard output. 
You can keep your sense of direction as to what ``in'' and ``out'' mean for
.B sndin
and
.B sndout
by remembering that the direction of transfer is always
.I
with respect to yourself.
.R
That is,
.B sndout
sends your samples
.I out
to the 
.B csound
file system,
.B sndin
reads them back
.I in.
.NH 1
Sndout
.PP
.B sndout 
reads its standard input and writes what it finds
there to a file on the 
.B csound
file system.
In order to demonstrate 
.B sndout 
we must first have a program that writes
samples for 
.B sndout 
to read.
For example, at CARL there is a program called 
.B wave 
which produces a 1 second sine waveform, (sampled
at 16KHz; sampling rates will be discussed further
below).
.B wave
writes sound samples on its 
standard output suitable for input to 
.B sndout.  
We can pipe this output to 
.B sndout 
like this:
.DS
% wave | sndout waveform
.DE
This action causes the samples to be fed from 
.B wave 
to 
.B sndout, 
which sends
them to the 
.B csound
file system
to be stored in a file called 
.I waveform.  
To be explicit, samples 
will be saved in a file whose full name 
is 
.I /some_filesystem/your_login_name/waveform
where 
.I some_filesystem
is the name of a 
.B csound
filesystem.  For example, at CARL, there are currently three possibilities,
for this, snda, sndb and sndc.
You need only remember the 
.I waveform 
part when you
are dealing with files you yourself have written.
More on this subject later.
.PP
Now that the sound is stored in a file on the 
.B csound
file system
we can play it by saying:
.DS
% play waveform
.DE
.PP
(Note: 
.B play 
will only work for files stored on the 
.B csound
file system.  
It will 
.I not 
work
for files on the UNIX file system).
.PP
Let us say that after hearing the sound we 
decided 
to rescale its amplitude.
A program to do this is called
.B gain.  
We would
use 
.B sndin 
to read the samples from the file
.I waveform,
then pipe them to 
.B gain.
Lastly, we pipe it back to the 
.B csound
file system to be saved.
.DS
% sndin waveform | gain .7 | sndout wave1
.DE
A couple of observations: 
.B gain
has one argument which is
a coefficient (in this case .7) with which to scale the signal on
its standard input.
Also, note that in writing out the file with 
.B sndout, 
we created a new file, 
.I wave1, 
instead
of simply writing back to 
.I waveform.  
In general, 
.I
it is not possible to both
read and write the same sound file at the same time.  
.R
While this is not strictly true under all circumstances, it is for the
ones being described here.  What would happen, if for instance we gave
the command
.DS
% sndin foo | sndout foo
.DE
is that 
.B sndout
would create a new file
.I foo
before 
.B sndin
had a chance to read up the old one.  The net result would be that the contents
of the file would be lost.
.PP
If we really wanted this scaled-down version of the file to be called 
.I waveform,
then we could use the command
.B mvsf 
to ``move'' the sound to a file of a different name, e.g.:
.DS
% mvsf wave1 waveform
.DE
.NH 1
Sound File Space Allocation
.PP
Former releases of this software utilized a
.I
fixed contiguous block 
.R
scheme of storage allocation which required that the user
know in advance of writing a sound file how much storage space would be
required for it.  This was sufficiently obnoxious that
it has been extended to use a
.I
variable noncontiguous block
.R
scheme which alleviates this requirement almost entirely.  This
means it is no longer necessary to specify sizes or durations; space
is simply claimed as needed.  One program,
.B record,
still uses the contiguous block method for reasons of efficiency (see
below).
.NH 1
About Samples
.PP
Sound is stored digitally as a stream of numbers called 
.I samples.
Sound samples represent the instantaneous values of an
acoustic waveform somewhat in
the same fashion that successive frames of a moving picture store 
visual motion. 
.NH 2
Sample Representation
.PP
There are three representations of samples:
.IP \(bu
binary short integer (called
.I shortsams),
.IP \(bu
binary floating point, (called
.I floatsams),
and
.IP \(bu
ASCII (you were expecting, perhaps,
.I jetsams?).
.LP
.I Shortsam
format is required by the DAC and ADC converters.
It is in this format that samples are (usually) stored on the 
.B csound
file system.  
.I Floatsam 
format
is capable of a much wider dynamic range
than 
.I shortsam, 
and is (usually) 
used by all CARL programs whenever programs pass samples between themselves
via the standard input/output.  
.I ASCII\u\s-2\n+#\s0\d
.FS
\n#.  ASCII stands for American Standard Code for Information Exchange.
.FE
is human-readable format, presented whenever 
a program notices that its standard output is connected to
a terminal.  As the term ``ASCII'' is not very mnemonic, 
and covers a broader context than just numeric data, we adopt the term 
.I Arabic
to refer to generalized human-readable numeric format.
.PP
.I Shortsams
are 16-bit, two's complement, which means they have
a range of integer values from \(mi32768 to +32767.  The DACs convert these
numbers to voltages between roughly \(mi10 and +10 volts.  Each 16-bit sample
occupies two 8-bit bytes on the disk.  
.I Floatsams
have a precision of 6 decimal places.
Only values in the 
.I 
signed unit interval
.R
(that is, values in the interval [\(mi1,+1])
are (ordinarily) used to represent sound sample data.
Floatsams are 32 bits, and occupy 4 bytes.
All CARL 
programs which must convert from shortsam to floatsam sample formats
equate floatsam \(mi1.0 with shortsam \(mi32767 and floatsam +1.0
with shortsam +32767.  (Note that the value \(mi32768 is not ordinarily used).
.PP
As already mentioned, samples are stored on disk as shortsams.
But 
.B sndout,
(like a good CARL program)
reads floatsams on its standard input.
Thus, one of the actions of 
.B sndout
is to convert from floatsams to shortsams before storing samples on the 
disk.  Obversely, 
.B sndin
must first convert shortsams to floatsams.
.PP
Programs can tell whether their standard input/output is connected to
a terminal or a file or pipe.  All CARL and
.B csound
programs which write sample data via their standard output first
determine whether the output is a terminal.  If so, the samples are 
printed in Arabic
format suitable for display on a terminal, otherwise they are written as
floatsams.
There are several specific formats of Arabic printout for CARL programs.
Most programs default to printing a sample index number followed on
the same line by the sample value in floating point format.  
Here as well, samples are subject to scaling.  E.g., 
shortsams will be converted
to floatsams within the signed unit interval, then printed.  Some CARL
programs allow you to view the sample stream in other Arabic formats,
such as octal and hexidecimal, or to substitute time in seconds for
the sample index (c.f. btoa(1carl), xform(1carl)).
.NH 2
Sample Frames
.PP
A 
.I
sample frame 
.R
is taken to mean one sample for each channel.  
For mono files, each sample is a sample frame.
In Multi-channel
files, the channels are stored in sample interleaved order.  
For instance, in a 4-channel file 
the individual samples are stored:
.DS
A,B,C,D, A,B,C,D, A,B,C,D, ...
.DE
The sample index 
.I i,  
which corresponds to a particular time in seconds
.I T,
and channel
.I a, 
at sampling rate
.I R,
is given by 
.EQ
i ~=~ TRN + a
.EN
where
.I N
is the number of channels in a sample frame.
The time corresponding to a particular sample frame index is
.EQ
T ~=~ i over {RN}
.EN
.PP
The number of channels is not limited by the
.B csound
file system.
However, there is a limit placed on the number of channels that can
be converted by the DACs and ADCs.  The current limits at CARL are
four channels of DAC and 2 channels of ADC.
.NH 2
Calculating Length of Files
.PP
Length of sound files in the
.B csound
file system
is measured in units of 
.I cylinders.
A
.I cylinder
is a large unit of storage on a disk, comprising roughly 300k to 400k
bytes on disks used at CARL.
Knowing the byte size
of a sample and how many there are per second allows us to calculate
the storage requirements of sound files of different duration.  For
instance, if we take a typical sample rate of 16384 samples per second,
one second of mono shortsams requires 
.EQ
{2 ~ bytes} over {sample} {16384 ~ samples} over {second} ~=~ 
{32768 ~ bytes} over {second}.
.EN
The actual cylinder size of one of the CARL disks (CDC 9766, mounted
as /snd1) is 311296 bytes.
Thus, a cylinder
can store 311296/32768 = 9.5 seconds worth of shortsams at that
sampling rate.  
.NH 2
Sample Rates
.PP
The 
.I 
sample rate
.R
is the number of sample frames sent to (or fetched from)
the converters per second.
It is possible to record a sound with the
.B record
program at nearly any sampling rate, from about 50 samples per second
(sampling rates are conventionally measured in Hz units)
up to a maximum of 49152 Hz\u\s-2\n+#\s0\d,
.FS
\n#.  Sampling rates may be different at different sites.  At IRCAM
for instance, the high sampling rate is 48000 samples per second.
.FE
often written as 48K\u\s-2\n+#\s0\d.   
.FS
\n#.  The notation ``48K'' is to be read as $48 ~ * ~ 1024 ~ = ~ 49152$.
At IRCAM, one could say 48k, which is $48 ~ * ~ 1000$.
.FE
At CARL, two sample rates are typically used, 
48K Hz for high-quality sound and 16K Hz for tests and speech.
If, as shown above 9.5 seconds can be stored at 16K Hz,
the amount of sound that can be stored on a cylinder at 48K Hz
goes down to 3.166... seconds. 
But the frequency range that
can be represented goes up by a factor of three.  
For the 48K Hz rate, the
frequency response goes from 0Hz up to nearly 20KHz, 
which covers the range of
human hearing pretty well.  For the 16KHz rate, the frequency
range is from 0Hz
up to 6.5KHz.  
.PP
The sampling rate is analogous to the speed a tape recorder
runs when recording a sound.  Once recorded at some rate, playing
back the sound at any other
sampling rate merely changes the pitch and duration, not the sound quality
nor the storage requirements.  So the only places the sampling rate can be
set is where sound files are created: 
.B record 
and 
.B sndout.  
For both programs, you
supply a flag specifying the sampling rate you want.  For instance:
.DS
% record \(miR48K \(miT6 foo
.DE
will create sound file foo with the fast sampling rate\u\s-2\n+#\s0\d.   
.FS
\n#.   
.B csound
programs understand 48K to mean 49152 by interpreting the 'K' as a
.I postoperator.  
See below.
.FE
If you do not supply a sampling rate flag, 
.B record
uses the 48K Hz rate.
Also, notice the presence of the 
.B \(miT 
flag given to 
.B record.  
For 
.B record, 
you must always supply the length
of the file desired with a \(miTN flag, where N is the duration in
seconds desired.  This is required because 
.B record
can only store sound on contiguous blocks of the disk, and it has
to know in advance how big a block it will need.  The actual duration
of the file will be rounded up to the nearest cylinder boundary.
.NH 1
More on File Lengths
.PP
Because sound files tend to be quite large, the size of a file
can be significant with respect to the amount of storage available.
Some benchmarks are worth pointing out.
.PP
As mentioned above,
the size of a file storage unit on the
.B csound
file system is a cylinder.  We have shown that
the CDC 9766 disk holds 9.5 seconds per cylinder at 16K Hz.
The space requirement for stereo is exactly double that for mono, and
for N channels, is N times the mono length.
For the 48K sampling rate, space requirements are multiplied by
another factor of 3.
.PP
Let us figure the storage capacity of this disk.
It has 821 cylinders available for sound
storage.  This comes out to 
.EQ
{821 ~ cyl} {311296 ~ bytes} over {cyl} {1 ~ sec} over {32768 ~ bytes}
{1 ~ minute} over {60 ~ sec} ~ = tilde ~ 130 ~ minutes
.EN
or roughly 2.1 hours of mono sound storage available
on the whole disk at that sampling rate.  At 48K that becomes
about 43 minutes of mono.
.PP
Another useful benchmark that only requires you to know how many
megabytes of storage are available on a disk to know how much sound
it can store goes as follows.  The CDC 9766 stores 300MB (megabytes,
or 300,000,000 bytes) of data.  Figuring for a minute that 48K is
nearly 50,000, that means that one second of the fast sampling
rate requires
.EQ
{50000 ~ samples} over {second} 
{2 ~ bytes} over {sample} ~=~ {100,000 ~ bytes} over
{second}
.EN
or .1MB/second.  Multiplying this by 10 gives us the rough benchmark
of 10 seconds per megabyte of storage for the fast sampling rate.
This comes out to 
.EQ
{300~*~10} over 60 ~=~ 50 ~ minutes
.EN
of storage, which is off by 7 minutes but a useful thing to remember
nonetheless.
.PP
For comparison, the three RA81 disks (mounted as /snda, /sndb and /sndc
at CARL) each hold 450MB.  How much sound can they each hold?
.PP
The moral here is that
there is obviously a tradeoff of high frequency response
.I vs. 
storage space needed.  Since the sound file storage space
is finite, and since one can get 3 times as much sound sampled
at the lower rate to fit on a disk, 
there is a strong impetus to work most
of the time at the slower sampling rate.  Not only do files written at
the 16K sampling rate occupy smaller space, they also can be computed
three times as fast.  So here is a good rule of thumb: use the slow rate
for tests or sketches, and the faster rate for final products.
.NH 1
Sample Data Headers
.PP
All programs that write a sample stream on their standard output 
prefix that stream with a 
.I header
containing information about the sampling rate, number of channels,
and possibly other things.  For instance, 
.B sndin
writes such a header, as does
.B cmusic
and even
.B wave.
.B sndout 
reads such a header, which means that it is usually not
necessary to tell 
.B sndout
explicitly about the kind of data it is getting; it can figure it
out from the header.  All CARL programs are able to deal with headers,
so that the header will survive being piped through any number
of programs.
.NH 1
Arithmetic Expressions In Flags
.PP
You may use arithmetic expressions in calculating numeric
values written after flags.  For instance, these three commands are
equivalent:
.DS
% sndout \(miR49152
% sndout \(miR48K
% sndout \(miR"3*2^14"
.DE
In the second example, the 
.B K 
is a postoperator, 
which acts as a multiplier
.I 
times 1024.  
.R
Other postoperators include 
.B k
which multiplies by 1000.
Also,
using the exponentiation operator, 3*2^14 = 49152.  
The binary operators 
.B +,
.B \(mi,
.B *,
and
.B /,
as well as 
.B ()
are also available\u\s-2\n+#\s0\d.   
.EQ
delim off
.EN
.FS
\n#.  Care must be exercised
when using the binary operator ``$*$''
(for multiplication) and parenthesis.
The UNIX shell will try to interpret them as part of
a regular expression, and search for a matching file,
usually with unsuccessful results which abort the command.
It is necessary to enclose expressions with ``$*$''
and ``()''
in them in double quote marks ``"'' to avoid this.
On the other hand, ``+'', ``/'', and ``-'' work fine anywhere.
.FE
Other postoperators which are available are 
.B k
for ``times 1000'',
.B S
for samples, 
.B s 
for seconds, 
.B ms 
for milliseconds, and 
.B m 
for minutes.
These are useful for retrieving parts of a sound file, described next.
.NH 1
Sndin
.PP
.B sndin 
can
read flags in addition to a file name
to determine its operation.
We'll focus here on begin time and an end time flags.
For instance:
.DS
% sndin \(mib1 \(mie1.1 test
.DE
reads file test between times 1 and 1.1 seconds.  That is, it starts
with the 16384th sample (assuming the sampling rate is 16K), 
and writes out 1638 samples.  
If you say nothing, or
if you simply supply the name of a file (as in our examples above), 
these times default to the beginning and end of the file.
We could have
asked it to calculate time in samples instead of seconds by using the
.B S
postoperator. 
For instance, to look at
the first 200 sample frames\u\s-2\n+#\s0\d,
.FS
\n#.  Note we said sample frames: if the file being read were stereo, we would
read out 400 actual samples, two from each frame.
.FE
regardless of sampling rate, we would say
.DS
% sndin \(mib0 \(mie200S
.DE
.EQ
delim $$
.EN
Postoperators are also cumulative; we could have said 200KS to read in the
first $200 * 1024$ samples.
.PP
It is often more convenient to specify a begin time and duration rather
than a begin time and end time.  To this end, the \(mib and \(mid flags
behave as you would expect:
.DS
% sndin \(mib3 \(mid64S
.DE
reads starting at three seconds, and goes for 64 samples. 
.NH 1
Lsf \(mi Listing Sound Files
.PP
The
.B lsf
command,
if given by itself,
lists all the files in your current 
.B csound
directory.
It is possible to get a list of someone else's files simply by adding their
name as an argument, preceded by a slash:
.DS
% lsf /frm
.DE
lists the names of frm's files that happen to be on the same filesystem
as you are.  
If you are interested in only whether a
particular file exists, just name it:
.DS
% lsf /frm/joy
.DE
To refer to a different filesystem, prefix its name:
.DS
% lsf /sndb/frm
.DE
will print all files frm has on the /sndb filesystem.
.NH 1
Manual Pages For Sound Programs
.PP
There is more information about a sound file than
its name, which you can see by providing flags to 
.B lsf
to modify its behavior.
You can find out about these other flags and more about 
.B lsf 
itself, 
as well as all the rest of
the programs discussed here, by reading their entries
in the CARL section of the
Unix Programmer's Manual.
These manual pages give all the gory details too numerous
to cover here.  The manual exists in two forms: hardcopy and on-line,
that is, in pressed wood pulp and
on the computer.  To read the on-line manual entry for 
.B lsf 
say:
.DS
% man lsf
.DE
The 
.B man 
command shows you a screenful at a time from the manual.
After a full page of text is displayed on the screen it prints ``\(mimore\(mi''
at the bottom of the screen and waits for
you to press the spacebar before showing the next screenful.  
If you would like to see all the sound file programs about
which there are manual page entries, try another command:
.DS
% apropos csound
.DE
This will give you a list (it goes on for more than one screenful; ask
someone to show you how to make it slow down) of
all the sound programs in the manual.  Each has its own manual entry
available with the 
.B man 
command.
.NH 1
Cpsf \(mi Copy Sound Files
.PP
This is a very simple program, similar to 
.B mvsf.  
But where 
.B mvsf 
simply
changes the name of a sound file, 
.B cpsf 
physically copies the samples into another file. 
For example,
.DS
% cpsf source destination
.DE
where 
.I source 
is the name of the file to be copied, 
.I destination 
is where
to put it.
.NH 1
Directories, and Filename Defaults
.PP
Just as the UNIX file system has a tree-structured directory, so does
.B csound.  
When you log in, unless you change it, you will be set up to access 
.B csound
files in your
.I home
.B csound
directory.
Your home directory may vary with the installation, and you may
also change it with the program
.B cdsf.
All 
.B csound
programs assume that an incomplete filename means to look in your home
directory by default.  
For instance, when you just say 
.B lsf 
by itself this is the directory 
.B lsf
examines.  You can
create subdirectories (branches) from this directory with 
.B mksfdir.
For instance:
.DS
% mksfdir noises
.DE
creates directory /some_filesystem/your_login_name/noises.  
.PP
To use this directory, you must change your current sound file directory
to be this new one.  This is accomplished by the following statement:
.DS
% cdsf /some_filesystem/your_login_name/noises
.DE
Then saying 
.DS
% wave | sndout fudge
.DE
will write a file /some_filesystem/your_login_name/noises/fudge.  
This state of affairs will persist until you either log out or run
.B cdsf
again, specifying a different directory.
You can see that
.B cdsf 
determines where sound file programs will
look for, or create, files.  
You can change to other user's directories.
For instance:
.DS
% cdsf /some_filesystem/frm
.DE
will cause the command
.DS
% sndin joy
.DE
to read file /some_filesystem/frm/joy.  
.PP
Note: you must supply the entire path to
.B cdsf
when changing directories.  
In this way it is unlike its UNIX counterpart,
.B cd
in that it does not support the idea of ``relative'' directories.
However, to change back to your home csound directory, you need only
say 
.B cdsf
by itself.
.PP
If you need to refresh your memory as to what directory you are in,
use the command
.B pwsf
which prints the current working sound file directory.
.NH 1
Csound Programs
.LP
Here is a list of some of the currently available 
.B csound
programs.  The number in the first column is an indication as to
the ``relevance'' the program has to beginning users, with 0 being
important, and 9 being relatively unimportant.  Some recent additions
may be missing from this list.
.DS L
0 cdsf - change csound file directory
0 lsf - list sound files, sound file directories
0 play - play sound file(s) through DACs
0 pwsf - print working sound file directory
0 rmsf - remove sound file(s)
0 sndin - read csound files onto standard output
0 sndout - write sound files.
1 cpsf - move sound file
1 mvsf - move sound file
2 mksfdir - make a sound file directory
2 record - record sound file through ADCs
3 catsf - concatenate sound files
5 sndcmp - compare two sound files.
5 sndhist  - produce histogram of sound file
6 visf - edit sound file parameters
9 burpsf - free space compaction for csound file system
9 dumpsf - dump sound files to tape
9 restorsf - restore sound files from magtape
9 sfck - check sound file system for soundness
.DE
.NH 1
Epilogue
.PP
There, now you know more than you need to know about the sound file system.
Go make some music.
