.nr # 0 1
.EQ
delim $$
.EN
.TL
Introduction to BICSF, the Berkeley/IRCAM/CARL Sound Filesystem
.AU
D. Gareth Loy
.AI
Computer Audio Research Laboratory
Center for Music Experiment, Q-037
University of California, San Diego
La Jolla, California  92093
.AB 
At least three distinct problems must be addressed in order
to record, store, manipulate and play back high quality digital sound samples. 
The problems are 1) the sheer volume of data involved, 2) the high speed
at which the data must travel and 3) the bookkeeping involved in managing
the sound's vital statistics.  Each of these problems goes beyond
what can be handled conveniently by the regular 
.UX
file system. 
As a result, a special filesystem has been created for manipulating sound.
This document covers most of what is needed to get started using it.
.AE
.NH 1
A Bit of History
.LP
At least two special filesystems for managing sound under
the UNIX operating system have been constructed.  First came the
\fIcsound\fP(1carl) filesystem written by Gareth Loy at CARL.  This was taken
as a prototype by Rob Gross and Dan Timis at IRCAM, who created the IRCAM
sound filesystem.  While their implementation was original, it
incorporated many features of \fIcsound\fP(1carl) making it possible for
users to switch over easily.  The IRCAM work has been collected
here together with some additional programs, and retooled to be simpler to
install, maintain and operate.  To distinguish it from the IRCAM sound filesystem,
I've named it BICSF, for  the ``Berkeley/IRCAM/CARL'' Sound Filesystem, those being
the principal contributors to the architecture of the filesystem.
Other contributors include Brad Garton at Columbia, who wrote the Digisound-16
device driver and their associated play and record programs.
.NH 1
In General
.PP
BICSF
is very similar to the regular
.UX
file system.
They both are used to store and retrieve information from 
mass storage systems such as disks.  But where the UNIX file system
is suitable for storing most things, 
BICSF
has been developed to cover its weaknesses in sound sample
storage.  In many cases, the names of commands which manipulate
BICSF
files are synonyms to their UNIX counterparts.
Where UNIX has an 
.B ls 
command to list the files in a directory, the 
BICSF
file system has an 
.B lsf 
command (``list sound files'').  Where UNIX has a 
.B mv
command to move files, the 
BICSF
file system has a 
.B mvsf.  
Similarly the UNIX 
.B cp
command, which copies files, has a counterpart named 
.B cpsf 
in 
BICSF. 
Lastly, the UNIX 
.B mkdir 
program (``make a directory'') becomes 
.B mkdirsf.
In fact, the rule is that all BICSF programs have an \fIsf\fP suffix.\u\s-2\n+#\s0\d
.FS
\n#.  This is different from both \fIcsound\fP(1carl)\fP and the IRCAM soundfile
system where some programs had an \fIsf\fP suffix, but some had a \fIsnd\fP prefix.
For users switching over to BICSF, there are aliases that allow old names to
effect the same operation, but the programs have been all renamed in the interest
of coherency.  A complete list of programs is reproduced below.
.FE
.PP
From here the resemblance begins to fade.  Whereas UNIX allows you a variety of
ways to create new files, 
BICSF
has but two: 
.B tosf 
and 
.B record.
The 
.B record 
program creates a new sound file by reading the Analog to Digital
Converters (ADC) and storing the samples in a sound file.  
.B tosf
reads its standard input as the source of sound samples to be stored
on the sound file system.  
(\fIcsound\fP and IRCAM users note: \fBtosf\fP
and \fBsndout\fP are synonimous. See below.)
.PP
Similarly, there are two ways to get sound back from the
BICSF
file system: 
.B fromsf 
and 
.B play.
.B play 
writes samples from the BICSF file system to the Digital to Analog Converters (DAC),
which allow a sound to be heard.  
.B fromsf 
reads sound samples from the BICSF file system and writes them on its standard output. 
(\fIcsound\fP and IRCAM users note: \fBfromsf\fP
and \fBsndin\fP are synonimous. See below.)
You can keep your sense of direction as to what ``to'' and ``from'' mean for
.B tosf
and
.B fromsf
by remembering that the direction of transfer is always
.I
with respect to yourself.
.R
That is,
.B tosf
sends your samples
.I to
the BICSF file system,
.B fromsf
reads them back
.I from
BICSF.
.NH 1
tosf
.PP
.B tosf 
reads its standard input and writes what it finds
there to a file on the 
BICSF
file system.
In order to demonstrate 
.B tosf 
we must first have a program that writes
samples for 
.B tosf 
to read.
For example, at CARL there is a program called 
.B wave 
which produces a 1 second sine waveform, (sampled
at 16KHz; sampling rates will be discussed further
below).
.B wave
writes sound samples on its 
standard output suitable for input to 
.B tosf.  
We can pipe this output to 
.B tosf 
like this:
.DS
% wave | tosf waveform
.DE
This action causes the samples to be fed from 
.B wave 
to 
.B tosf, 
which sends
them to the 
BICSF
file system
to be stored in a file called 
.I waveform.  
To be explicit, samples 
will be saved in a file whose full name 
is 
.I /some_filesystem/your_login_name/waveform
where 
.I some_filesystem
is the name of a 
BICSF
filesystem.  For example, at CARL, there are currently three possibilities,
for this, 
\fIsnda, sndb, and sndc.\fP
At CMIL, another UCSD system, there is only one: \fIsnd\fP.
You need only remember the name
.I waveform 
when you
are dealing with files you yourself have written.
More on this subject later.
.PP
Now that the sound is stored in a file on the 
BICSF
file system
we can play it by saying:
.DS
% play waveform
.DE
.PP
(Note: 
.B play 
will only work for files stored on the 
BICSF
file system.  
It will 
generally not 
work
for files on the UNIX file system).
.PP
Let us say that after hearing the sound we 
decided 
to rescale its amplitude.
A program to do this is called
.B gain.  
We would
use 
.B fromsf 
to read the samples from the file
.I waveform,
then pipe them to 
.B gain.
Lastly, we pipe it back to the 
BICSF
file system to be saved.
.DS
% fromsf waveform | gain .7 | tosf wave1
.DE
A couple of observations: 
.B gain
has one argument which is
a coefficient (in this case .7) with which to scale the signal on
its standard input.
Also, note that in writing out the file with 
.B tosf, 
we created a new file, 
.I wave1, 
instead
of simply writing back to 
.I waveform.  
In general, 
.I
it is not possible to both
read and write the same sound file at the same time.  
.R
While this is not strictly true under all circumstances, it is for the
ones being described here.  What would happen, if for instance we gave
the command
.DS
% fromsf foo | tosf foo
.DE
is that 
.B tosf
would create a new file
.I foo
before 
.B fromsf
had a chance to read up the old one.  The net result would be that the contents
of the file would be lost.
.PP
If we really wanted this scaled-down version of the file to be called 
.I waveform,
then we could use the command
.B mvsf 
to ``move'' the sound to a file of a different name, e.g.:
.DS
% mvsf wave1 waveform
.DE
.NH 1
About Samples
.PP
Sound is stored digitally as a stream of numbers called 
.I samples.
Sound samples represent the instantaneous values of an
acoustic waveform somewhat in
the same fashion that successive frames of a moving picture store 
visual motion. 
.NH 2
Sample Representation
.PP
There are three representations of samples:
.IP \(bu
binary short integer (called
.I shortsams),
.IP \(bu
binary floating point, (called
.I floatsams),
and
.IP \(bu
ASCII (you were expecting, perhaps,
.I jetsams?).
.LP
.I Shortsam
format is required by the DAC and ADC converters.
It is in this format that samples are (usually) stored on the 
BICSF
file system.  
.I Floatsam 
format
is capable of a much wider dynamic range
than 
.I shortsam, 
and is (usually) 
used by all CARL programs whenever programs pass samples between themselves
via the standard input/output.  
.I ASCII\u\s-2\n+#\s0\d
.FS
\n#.  ASCII stands for American Standard Code for Information Exchange.
.FE
is human-readable format, presented whenever 
a program notices that its standard output is connected to
a terminal.  As the term ``ASCII'' is not very mnemonic, 
and covers a broader context than just numeric data, I adopt the term 
.I Numeric
to refer to generalized human-readable numeric format, including the following.
.DS
.ta 2i
integers:	0 1 32768,
floating point:	0.0 1. 3276.8 3.2768e3
.DE
.PP
.I Shortsams
are 16-bit, two's complement binary numbers, which means they have
a range of integer values from \(mi32768 to +32767.  The DACs convert these
numbers to voltages between roughly \(mi10 and +10 volts.  Each 16-bit sample
occupies two 8-bit bytes on the disk.  
.I Floatsams
are 32-bit binary floating point numbers, and
have a precision of 6 decimal places.
Only values in the 
.I 
signed unit interval
.R
(that is, values in the interval [\(mi1,+1])
are (ordinarily) used to represent sound sample data.
Floatsams are 32 bits, and occupy 4 bytes.
All programs which must convert from shortsam to floatsam sample formats
equate floatsam \(mi1.0 with shortsam \(mi32767 and floatsam +1.0
with shortsam +32767.  (Note that the value \(mi32768 is not ordinarily used).
.NH 2
Some rules about floatsams and shortsams
.IP (1)
Samples are (ordinarily) stored on disk as shortsams so that they can be
directly played by the DACs (and also because they require half the storage
space of floatsams).
.IP (2)
Samples are (ordinarily) passed between programs on pipes as floatsams.
.IP (3)
Programs (ordinarily) act inernally on samples in their floatsam form.
.PP
So,
.B tosf
ordinarily expects to read floatsams on its standard input.
Thus, one of the default actions of 
.B tosf
is to convert from floatsams to shortsams before storing samples on the 
disk.  Obversely, 
.B fromsf
must first convert shortsams to floatsams if they are stored as shortsams
on the disk.
.PP
Programs can tell whether their standard input/output is connected to
a terminal or a file or pipe.  All CARL and
BICSF
programs which write sample data via their standard output first
determine whether the output is a terminal.  If so, the samples are 
printed in Numeric
format suitable for display on a terminal, otherwise they are written as
floatsams.
There are several specific formats of Numeric printout for these programs.
Most programs default to printing sample values in floating point Numeric format.
Here as well, samples are subject to scaling.  E.g., 
shortsams will be converted
to floatsams within the signed unit interval, then printed.  Some
programs allow you to view the sample stream in other Numeric formats,
such as octal and hexidecimal, or to substitute time in seconds for
the sample index (c.f. \fIbtoa\fP(1carl), \fIxform\fP(1carl)).
.NH 2
Sample Frames
.PP
A 
.I
sample frame 
.R
is taken to mean one sample for each channel.  
For mono files, each sample is a sample frame.
In Multi-channel
files, the channels are stored in sample interleaved order.  
For instance, in a 4-channel file 
the individual samples are stored:
.DS
A,B,C,D, A,B,C,D, A,B,C,D, ...
.DE
The sample index 
.I i,  
which corresponds to a particular time in seconds
.I T,
and channel
.I a, 
at sampling rate
.I R,
is given by 
.EQ
i ~=~ TRN + a
.EN
where
.I N
is the number of channels in a sample frame.
The time corresponding to a particular sample frame index is
.EQ
T ~=~ i over {RN}
.EN
.PP
The number of channels is not limited by the
BICSF
file system.
However, there is a limit placed on the number of channels that can
be converted by the DACs and ADCs.  The current limits at CARL are
four channels of DAC and 2 channels of ADC.
At CMIL, the limit is stereo, both ways.
.NH 2
Calculating Length of Files
.PP
Length of sound files in the
BICSF
file system
is measured in units of 
sample frames.  As discussed above, the size of sample frame varies as the number
of channels, and also varies with the units of storage required to store the samples.
Knowing the byte size
of a sample and how many there are per second allows us to calculate
the storage requirements of sound files of different duration.  For
instance, if we take a typical sample rate of 16000 samples per second,
one second of mono shortsams requires 
.EQ
{2 ~ bytes} over {sample} {16000 ~ samples} over {second} ~=~ 
{32000 ~ bytes} over {second}.
.EN
.NH 2
Sample Rates
.PP
The 
.I 
sample rate
.R
is the number of sample frames sent to (or fetched from)
the converters per second.
It is possible to record a sound with the
.B record
program at nearly any sampling rate, from about 50 samples per second
(sampling rates are conventionally measured in Hz units)
up to a maximum of about 48000 Hz\u\s-2\n+#\s0\d,
.FS
\n#.  Sampling rates may be different at different sites.  At IRCAM
for instance, the high sampling rate is 48000 samples per second.
.FE
often written as 48k\u\s-2\n+#\s0\d.   
.FS
\n#.  The notation ``48k'' is to be read as $48 ~ * ~ 1000 ~ = ~ 48000$.
The notation ``48K'' on the other hand
is to be read as $48 ~ * ~ 1024 ~ = ~ 49152$.
.FE
At CARL, two sample rates are typically used, 
48KHz for high-quality sound and 16KHz for tests and speech.
At IRCAM, the typical rates are 48kHz and 16kHz.  (These rates are much more common
because the recording industry adopted 48000Hz as a standard rate, along
with 44.1kHz.)
.PP
The amount of sound that can be stored in the same space varies with the sampling
rate.  A sound recorded at 16kHz will take up 1/3 of the space of a sound
recorded at 48kHz.
For the 48k Hz rate, the
frequency response goes from 0Hz up to nearly 20kHz, 
which covers the range of
human hearing pretty well.  For the 16kHz rate, the frequency
range is from 0Hz
up to roughly 6kHz.  
.PP
The sampling rate is analogous to the speed a tape recorder
runs when recording a sound.  Once recorded at some rate, playing
back the sound at any other
sampling rate merely changes the pitch and duration, not the sound quality
nor the storage requirements.  So the only places the sampling rate can be
set is where sound files are created: 
.B record 
and 
.B tosf.  
For both programs, you
may 
supply an (optional) flag specifying the sampling rate you want.  For instance:
.DS
% record \(miR48k foo
.DE
will record sound into file 
.I foo 
with the fast sampling rate.   
If you do not supply a sampling rate flag, 
.B record
uses the 48k Hz rate.
For 
.B record, 
you must have an existing sound file to record into, or create
one in advance with 
.I createsf.
This is required because 
.B record
is too busy simply writing samples to the disk to be able to also spend time
figuring out where to put the next block of samples.  This step must be
done in advance.
So the actual sequence for recording a new soundfile might be this:
.DS
% createsf -r 16k -c 2 -i=int myfile
% record myfile
.DE
.NH 1
More on File Lengths
.PP
Because sound files tend to be quite large, the size of a file
can be significant with respect to the amount of storage available.
Some benchmarks are worth pointing out.
.PP
Let us figure the storage capacity of a 300 mega-byte (MB) disk.
It has 821 cylinders available for sound
storage.  This comes out to 
.EQ
{821 ~ cyl} {311296 ~ bytes} over {cyl} {1 ~ sec} over {32768 ~ bytes}
{1 ~ minute} over {60 ~ sec} ~ = tilde ~ 130 ~ minutes
.EN
or roughly 2.1 hours of mono sound storage available
on the whole disk at that sampling rate.  At 48K that becomes
about 43 minutes of mono.
.PP
Another useful benchmark that only requires you to know how many
megabytes of storage are available on a disk to know how much sound
it can store goes as follows.  The CDC 9766 stores 300MB (megabytes,
or 300,000,000 bytes) of data.  Figuring for a minute that 48K is
nearly 50,000, that means that one second of the fast sampling
rate requires
.EQ
{50000 ~ samples} over {second} 
{2 ~ bytes} over {sample} ~=~ {100,000 ~ bytes} over
{second}
.EN
or .1MB/second.  Multiplying this by 10 gives us the rough benchmark
of 10 seconds per megabyte of storage for the fast sampling rate.
This comes out to 
.EQ
{300~*~10} over 60 ~=~ 50 ~ minutes
.EN
of storage, which is off by 7 minutes but a useful thing to remember
nonetheless.
.PP
The moral here is that
there is obviously a tradeoff of high frequency response
.I vs. 
storage space needed.  Since the sound file storage space
is finite, and since one can get 3 times as much sound sampled
at the lower rate to fit on a disk, 
there is a strong impetus to work most
of the time at the slower sampling rate.  Not only do files written at
the 16k sampling rate occupy smaller space, they also can be computed
three times as fast.  So here is a good rule of thumb: use the slow rate
for tests or sketches, and the faster rate for final products.
.NH 1
Sample Data Headers
.PP
All programs that write a sample stream on their standard output 
prefix that stream with a 
.I header
containing information about the sampling rate, number of channels,
and possibly other things.  For instance, 
.B fromsf
writes such a header, as does
.B cmusic
and even
.B wave.
.B tosf 
reads such a header, which means that it is usually not
necessary to tell 
.B tosf
explicitly about the kind of data it is getting; it can figure it
out from the header.  All CARL programs are able to deal with headers,
so that the header will survive being piped through any number
of programs.
.NH 1
Arithmetic Expressions In Flags
.PP
You may use arithmetic expressions in calculating numeric
values written after flags.  For instance, these three commands are
equivalent:
.DS
% tosf \(miR49152
% tosf \(miR48K
% tosf \(miR"3*2^14"
.DE
In the second example, the 
.B K 
is a postoperator, 
which acts as a multiplier
.I 
times 1024.  
.R
Other postoperators include 
.B k
which multiplies by 1000.
Also,
using the exponentiation operator, 3*2^14 = 49152.  
The binary operators 
.B +,
.B \(mi,
.B *,
and
.B /,
as well as 
.B ()
are also available\u\s-2\n+#\s0\d.   
.EQ
delim off
.EN
.FS
\n#.  Care must be exercised
when using the binary operator ``$*$''
(for multiplication) and parenthesis.
The UNIX shell will try to interpret them as a ``wildcard'' part of
a regular expression, and search for a matching file,
usually with unsuccessful results which abort the command.
It is necessary to enclose expressions with ``$*$''
and ``()''
in them in double quote marks ``"'' to avoid this.
On the other hand, ``+'', ``/'', and ``-'' work fine anywhere.
.FE
Other postoperators which are available are 
.B k
for ``times 1000'',
.B S
for samples, 
.B s 
for seconds, 
.B ms 
for milliseconds, and 
.B m 
for minutes.
These are useful for retrieving parts of a sound file, described next.
.NH 1
fromsf
.PP
.B fromsf 
can
read flags in addition to a file name
to determine its operation.
We'll focus here on begin time and an end time flags.
For instance:
.DS
% fromsf \(mib1 \(mie1.1 test
.DE
reads file test between times 1 and 1.1 seconds.  That is, it starts
with the 16384th sample (assuming the sampling rate is 16K), 
and writes out 1638 samples.  
If you say nothing, or
if you simply supply the name of a file (as in our examples above), 
these times default to the beginning and end of the file.
We could have
asked it to calculate time in samples instead of seconds by using the
.B S
postoperator. 
For instance, to look at
the first 200 sample frames\u\s-2\n+#\s0\d,
.FS
\n#.  Note we said sample frames: if the file being read were stereo, we would
read out 400 actual samples, two from each frame.
.FE
regardless of sampling rate, we would say
.DS
% fromsf \(mib0 \(mie200S
.DE
.EQ
delim $$
.EN
Postoperators are also cumulative; we could have said 200KS to read in the
first $200 * 1024$ samples.
.PP
It is often more convenient to specify a begin time and duration rather
than a begin time and end time.  To this end, the \(mib and \(mid flags
behave as you would expect:
.DS
% fromsf \(mib3 \(mid64S
.DE
reads starting at three seconds, and goes for 64 samples. 
.NH 1
lsf \(mi Listing Sound Files
.PP
The
.B lsf
command,
if given by itself,
lists all the files in your current 
BICSF
directory.
It is possible to get a list of someone else's files like this:
.DS
% lsf /snd/frm
.DE
lists the names of frm's files that happen to be on the same filesystem
as you are.  
If you are interested in only whether a
particular file exists, just name it:
.DS
% lsf /frm/joy
.DE
To refer to a different filesystem, prefix its name:
.DS
% lsf /sndb/frm
.DE
will print all files frm has on the /sndb filesystem.
.NH 1
Manual Pages For Sound Programs
.PP
There is more information about a sound file than
its name, which you can see by providing flags to 
.B lsf
to modify its behavior.
You can find out about these other flags and more about 
.B lsf 
itself, 
as well as all the rest of
the programs discussed here, by reading their entries
in the CARL section of the
Unix Programmer's Manual.
To read the on-line manual entry for 
.B lsf 
say:
.DS
% man lsf
.DE
The 
.B man 
command shows you a screenful at a time from the manual.
After a full page of text is displayed on the screen it prints ``\(mimore\(mi''
at the bottom of the screen and waits for
you to press the spacebar before showing the next screenful.  
When you have reached the bottom, or are ready to stop, press the ``q'' key to quit.
Each has its own manual entry
available with the 
.B man 
command.
.NH 1
cpsf \(mi Copy Sound Files
.PP
This is a very simple program, similar to 
.B mvsf.  
But where 
.B mvsf 
simply
changes the name of a sound file, 
.B cpsf 
physically copies the samples into another file. 
For example,
.DS
% cpsf source destination
.DE
where 
.I source 
is the name of the file to be copied, 
.I destination 
is where
to put it.
.NH 1
Directories, and Filename Defaults
.PP
Just as the UNIX file system has a tree-structured directory, so does
BICSF.  
When you log in, unless you change it, you will be set up to access 
BICSF
files in your
.I home
BICSF
directory.
Your home directory may vary with the installation, and you may
also change it with the program
.B cdsf.
For instance,
.DS
% cdsf /snd/frm
.DE
All 
BICSF
programs assume that an incomplete or missing pathname means to start looking in your 
current working sound file directory by default.  
For instance, when you just say 
.B lsf 
by itself this is the directory 
.B lsf
examines.  You can
create subdirectories (branches) from your own root sound file
directory with 
.B mkdirsf.
For instance:
.DS
% mkdirsf noises
.DE
creates directory /some_filesystem/your_login_name/noises
(assuming your current working soundfile directory is /some_filesystem/your_login_name).  
.PP
You can now put files in this directory, list them, make that directory your
current sound file directory, etc.
For example,
.DS
% cdsf noises
.DE
or more explicitly
.DS
% cdsf /some_filesystem/your_login_name/noises
.DE
Then saying 
.DS
% wave | tosf fudge
.DE
will write a file 
.I /some_filesystem/your_login_name/noises/fudge.  
This state of affairs will persist until you either log out or run
.B cdsf
again, specifying a different directory.
You can see that
.B cdsf 
determines where sound file programs will
look for, or create, files.  
You can change to other user's directories.
For instance:
.DS
% cdsf /some_filesystem/frm
.DE
will cause the command
.DS
% fromsf joy
.DE
to read file /some_filesystem/frm/joy.  
.PP
If you need to refresh your memory as to what directory you are in,
use the command
.B pwsf
which prints the current working sound file directory.
.NH 1
BICSF Programs
.LP
Here is a list of some of the currently available 
BICSF programs.  
.LP
aimonitor, 
aiplay, 
airecord, 
catsf, 
cdsf, 
chgrpsf, 
chmodsf, 
chownsf, 
cpsf, 
createsf, 
dsplay, 
dsrecord, 
dyplay, 
dyrecord, 
fromsf, 
gainsf, 
lsf, 
mkdirsf, 
mksfdir, 
monitor, 
mvsf, 
normsf, 
pansf, 
peaksf, 
play, 
pwdsf, 
pwsf, 
querysf, 
record, 
restorsf, 
retrosf, 
rmdirsf, 
rmsf, 
scalesf, 
setsf, 
sfcreate, 
sndawk, 
sndcat, 
sndgain, 
sndin, 
sndinfo, 
sndnorm, 
sndout, 
sndpan, 
sndpeak, 
sndreverse, 
sndscale, 
sndset, 
sndtransp, 
swabsf, 
tarsf, 
tosf, 
transpsf
