From dgl Wed May 25 21:52:47 1983
To: mit-ems!ircam!adrian mit-ems!ircam!loy
Subject: Re:  sound file system
Date: 25-May-83 21:52:47-PDT (Wed)
Status: O

	From mit-ems!ircam!adrian  Wed May 25 20:05:48 1983
	To: loy
	Subject: sound file system

	Idea:
		sound file headers need not get in the way in front of
		sound files.  if you include a magic sequence or a
		special non-existant flotating point value at the
		beginning of a sound file which has an attached header,
		programs can jump the header, modify it and pass it
		on.

BAD idea!  You have re-invented the Stanford sound file system.  Consider:
	- otherwise simple subroutines must now be position sensitive.
You have to know whether this file has a header, and if it does,
a reference to the 0'th sample must have an offset of some sort.  This
test usually ends up having to be made on each call for samples.
	- what happens when you try to expand the information in a header?
This breaks down into two categories of question:
	* what happens when you invent another property that all files should
keep track of?  You have to recompile all programs that access the header,
since the offset into the header for all data must usually be modified.
In my system, properties are identified by keys, not
by the position of the data in the file, so they can be entered in any order,
and old programs are made so that they ignore properties they don't understand.
	* what happens when a property datum needs to grow in size in an
existing file? You have to rewrite the file.  What if the file is VERY LONG?
You sit around and wait for the disk i/o to complete!  
You could pad the header with a certain amount of blank
space so you wouldn't have to rewrite it.  But then some adventurous 
programmer will discover the blank space and start using it for his own
purposes.  This happened at Stanford, when Loren Rush discovered that Edsnd
could scribble in the unused portion of the header.  He made some bad
decisions on what to put in there, and effectively eliminated anybody else's
use of headers to keep useful data.  Another idea we kicked around at
Stanford was having the first header point to optional additional "headers" 
in the file as a way of allowing them to expand.  But the complications on
the i/o subroutines were so frightening we gave up.
	- what happens when you try to edit the data in the header?  You
have to write a program to extract the stuff, allow the user to modify it,
and write it back in.  This program will eventually grow to have the features
of a complete text editor.  Why not just have the stuff in text format in
a text file in the first place and use a real editor that everyone knows?
You pay no time penalty in accessing the properties listed in the text
file, and it has the marvelous advantage that anyone can look at it with the
standard tool of their choice.

Your idea is so close to the Stanford system that I would guess some renegade
from the PDP-10 put the idea in your ear.  If you indeed thought of it 
yourself, then I think you should call in an exorcist quickly, because the
ghost of the 10 is trying to reincarnate!  My familiarity with the Stanford
approach is, as you can see, painfully familiar.  Indeed, I was "born and
raised" on it.  Let's not reinvent that particular wheel.



From mit-ems!smh  Thu May 26 21:41:30 1983
To: ircam!adrian sdcarl!dgl
Subject: Beg pardon...
From: (Steven M. Haflich at MIT-EMS)
Status: RO

Gareth:
    Pardon my gauchness in reading someone else's mail, but I noticed
several messages which were apparently copies of one another.  Your
mail to Adrian seems routinely cc'd to yourself.  I suspect you are
using the brain-damaged Berkeley mailer's `r'eply command, which seems
to generate exponential pathnames on remote mail.  I have already
spoken to Adrian about this, but reply to mail from a!b!c generates
targets of a!b!c and a!b!a.  The net effect is that many extra uucp
phone calls are generated; our budget is presently in very bad shape,
and getting worse.  So until someone fixes this feature, may I ask
that you avoid using reply, and instead `m'ail explicitly?  Thanks.

Gareth and Adrian:
     As long as I have shamelessly read your private communications,
I might as well join in the discussion.  Our Vax should be up (finally)
in a week or two, and shortly we will also be contending with a new
soundfile system.  I have a very old copy of the famous Loy system
from a November or December CARL distribution, but I will ask for a
more current one when the time comes.  By the way, since I am
speaking literally out of both sides of my mouth(!), in what follows
below I shall address you both in the third person.
     Some months ago, before I got entirely snowed under with other
trash, I looked over the system and thought about what I would like
to do with it, based on MIT's soundfile experience.  I had started
writing up some proposed changes with the intention of communicating
them, but I learned that you seemed to making a number of them anyway,
and other things started competing very hard for my time.  Now I find
in the purloined mail that some of the very same points are under
discussion.
     The particular issue I would like to address is the maintenance
of attribute data along with soundfiles.  I surmise the discussion
so far was motivated by Adrian's (?) suggestion that attribute data be
sent along with sample data (in pipes and other files).  From this
follows Gareth's warnings about the old CCRMA soundfile system.
     Adrian's (and my) argument makes sense.  It would certainly be nice
for pipe-fitted programs such as filters automatically to know the sample
rate (etc.) of the data passing through them.  The human user ought not
be required to tell each of several successive processes any information
which could have been reasonably preserved from the source of the data.
(It is too much work, and there is too great a chance for error.)  Nor
should the user be forced to use inappropriate units, such as the
dimensionless HZ/SR, when he really wants to think in units of HZ.
     Gareth's cautions, however, are certainly to the point.  One
should not ignore his extensive experience in this matter.
     However, I think it is possible to accommodate both sides.
What follows is an extract from my earlier sketches on soundfile
design, with the [nt]roff junk removed:

     It is useful to divide digital sound files stored or processed in a
computer into three classes.  These are not distinguished so much by the
medium of storage than they are by the protocols for using the data:

 (1) Raw sound samples with no associated identification of attribute data.
     At the very least, this type is needed for communication with any other
     systems (e.g off-site or strange hardware) which do understand the
     locally-enforced protocols employed for the next two types.
 (2) Files stored within the soundfile system itself, whether in
     contiguous or non-contiguous storage.
     [Elsewhere I maintain that this distinction is not really useful;
     provided the allocation quantum is large compared to real-time
     buffer size and seek time, very little is gained by contiguous
     storage.  My understanding is that the Loy system has been or
     will be redesigned in this regard.  Yes???]
 (3) Data stored or under processing in other Unix files, which will be
     understood to include tapes and pipes as well as disk files.

[Now go back and reread definitions (2) and (3) carefully.  They are
central to everything that follows!]

Certainly the primary function of a soundfile system is to provide
efficient, high-bandwidth storage for large amounts of sample data.
However, it is also essential for the soundfile system to associate
attribute data with each file of sample data.  For example, when the
machine is requested to play a soundfile by performing real-time D/A
conversion, the user should not normally be required to specify the
sample rate.  Attribute data supported by the system should include
many things [see discussion below, which I will mostly defer to a later
mailing] but the data stored in the CARL Sound Descriptor Files (SDF) --
sample rate, number of channels, creation date, etc. -- include the
obvious elements and is a good set upon which to build.  It is obviously
desirable that the soundfile system always associate attribute data with
a soundfile and facilitate its use by people and processes except when
it is absolutely impossible to do so. It is also clear that attribute
data should be both human and machine readable, notwithstanding that
some items may be intended for only one or the other.

   Of the three classes of soundfiles above, it is clear that class
(1) cannot usefully have associated machine-readable attributes simply
because transportability precludes protocols for formatting the data.
However, since use of classes (2) and (3) is controlled within a known
programming environment, attribute data protocols may be enforced.
The existing CARL system maintains attribute data for class (2) files,
but fails to provide any such support for class (3).  In practice,
this can be a severe operational liability.  The most obvious example
is with pipes.	The CARL system frequently passes sample streams from
process to process using pipes.  Postprocessing equalization or
reverberation is typically accomplished in this way.  It is most
natural to specify parameters to such a postprocessor in terms of
seconds or CPS.  However, since the data does not have its sample rate
associated with it, the user must either resort to control using
dimensionless parameters or else take the responsibility for informing
programs about the data.  There is possibility for error any time a
person gives a machine data, and in any case, the user should not be
needlessly burdened to specify data the machine already has stored
elsewhere.

   The solution is to keep attribute data along with the sample data.
The problem is doing so in a reasonable manner without compromising
efficiency.  There are two obvious strategies.	One is to associate an
attribute file with each soundfile.  The other is to put both kinds of
data in a single file.

   Keeping the attribute data separate is the method employed by class (2)
soundfiles.  Obviously, storage efficiency, real-time access, and canonic
addressability are prime considerations here.  Also, since class (2) files
are by nature static -- they are `stored away' semi permanently within
the soundfile system, it is relatively easy to associate two files
and keep track of them.  The SDF mechanism does this appriateloy [sic].

   Class (3) files are under less stringent control -- any kind of
Unix file can hold sound samples.  Pipes are perhaps the most interesting
representatives.  If one is doing a lot of pipe fitting, it seems best
to combine the data together to minimize the number of connections
and relations that have to be manipulated.  These files are, with the
exception of digital magtapes, transitory and difficult to keep track
of.  Combining the data provides a foolproof method of keeping the
data together, if nothing else.  (Think how much easier this makes
magtape archives -- one file holds everything!)

[End of included text: Now I begin to extemporize again.]

   The proper implementation would seem to circumvent most of Gareth's
objections.  I propose that the attribute data be similar to the SDF
data, and similarly encoded in ascii.  The attribute data would be
at the head of the class (3) soundfile.  For obvious efficiency
reasons, it would be ascii padded to occupy an integral number of the
least-common-multiple (LCM) of system blocksizes.  The escape need not
be magic -- how about "\nEND\n" as the last 5 characters, to coincide
with the end of a block.  Notice that if some (human?) editing makes
the header the "wrong" size, later filters ought to still work,
albeit at reduced efficiency, and should silently correct the problem.
Now for Gareth's objections:

-	Position sensitivity:  Almost all soundfile access is sequential.
	The only time origin needs be taken into account is in seeks,
	and these aren't even meaningful for pipes.  Remember, I am
	talking about class (3) soundfiles, not class(2)!

-	Invention of new attribute datum:  Since the datum is encoded
	in ascii using keywords (well, keyletters, but that ought to
	change), programs which don't understand a datum will just
	pass it along unmodified.  (Good programs are like good mail
	administrators -- they avoid trouble by not reading other
	people's mail.)  So where's the problem?

-       Difficulty expanding information in the header:  Expanding info
	in class (2) is just as easy as it ever was, since the files
	are separate.  Since the attribute data precedes the sound data,
	it is all written (down the pipe, onto the tape, etc.) before
	any samples are written.  The attribute data expands (or contracts)
	in LCM hunks fairly automatically.  In case you haven't caught on,
	the real problem is how to write attribute data which cannot be
	determined until after the sample are generated (e.g. soundfile
	length on a real-time A-D).  This sounds like a serious problem,
	and I propose to take it up in a later mailing.  But for now,
	consider that instances of this problem are few and far between.
	Do you *need* to know the length of a real-time performance
	prior to its completion?  If so, you are in deep trouble with
	certain physical principles like causality!

-	The difficulty of writing an editor:  I agree -- use ascii.
	Actually, I think you are over simplifying the problem of
	`editing' as both humans and programs want to edit attribute
	data.  See the following remarks on subroutine packages.

   How to package this mess in a manner that doesn't burden the writer
of new soundfile filters and also which doesn't destroy years of work
on programs using the existing convention?  The obvious tactic of
packaging almost everything inside a set of standardly-available
open / close / parse-command-line subroutines would seem to win.
I am not hugely familiar with the entire CARL system yet, but it seems
that most of the existing programs which might source or sink clas
(3) soundfiles do so with subroutines.	Certainly the SDF open-for-input
routines should know how to extract SDF data into `stdout' format,
and conversely, the SDF open-for-output should be able to snarf
attributes from `stdin' and create/merge-into the SDF entry.
The functionality would seem to be reasonably localized, although
Gareth is best equipped to judge.  Routines which just do stdin to
stdout transformations, hence do not invoke the SDF routines, can
be either encapsulated inside a controlling process which copies
the attribute data before exec'ing the filter, or else a single
subroutine call performing the same function can be added at the
start of such filters.

   This approach wins for compatibility, but clearly does not
optimally use the attribute data system.  If the problem were
simply one of communicating attribute data from point A to point
B, other mechanisms (e.g. parallel pipes?) might be cleaner.
But the idea is to provide the information to each process along
the path so the data can be used, and even more interesting, can
be emended as appropriate by the process.  Still until processes
are rewritten to maintain the newly-available information --
mechanically, in cases such as supplying SR to a filter, but
perhaps sophisticatedly in case of a note onset detector --
this subroutine encapulation preserves all previous functionality.

   Now, before my fingers give out, a brief tantalizing sketch of
additional kinds of data I think should be included in soundfile
attributes -- both in class (2) and class (3).

   Wouldn't it be nice if each program which sourced or passed
a soundfile would put its stamp on it?	For instance, suppose a
soundfile is the product of running a sound synthesis language
(a la Music11) on an orchestra and a score file.  Wouldn't it
be nice to be able, hours or years later, to find out the pathnames
(and modification times) of those files?  (Hence the interface
with command line parsing suggested above.  If argument files were
transient, then there is little chance of finding them on tape
archives --  but nontransient source files might be retrievable.)
Suppose next that a nearly finished piece is being postprocessed,
e.g. to add reverb.  Wouldn't it be nice if the soundfile system
could keep track of the reverb parameters of successive versions,
rather than entrusting such data to memory of slips of paper?  A-B
listening comparisons might start meaning something!  Lastly, more
exotic uses might appear.  We have at EMS a "digitized drumset" --
one hit each of all the sounds of a trap set.  Rather like a
modern version of musique concrete, composers have been known
to splice together backup tracks for pieces out of these snippets.
For such marathon splicing pieces, the soundfile attribute system
could almost accumulate and support a complete score/map!

   A problem:  Soundfiles that are repeatedly processed might
accumulate rediculously huge attribute files.  Perhaps some automatic
(but overridable) method for stripping excess junk would have to
be devised.  However, it is difficult to imaging many cases where the
attribute data could begin to reach the volume of the sample data.
By default, I would say it is better to let data accumulate useless
than to throw anything away.  But then, you should see my attic.....

   I greatly appreciate your forbearance if you have made it to the
bottom of this epistle.  The ideas of the final section in particular
need better exposition and some rethinking, I know.

Anyway, if you two have no objections, please include me in these
electronic-mail design discussion of the soundfile system.  One
way or another, I will immersed in the problem soon enough.

				Your postman,
				Steve
				mit-ems!smh



(Following is my reply to smh and adrain)

Of course your idea has merit, and indeed a scheme to handle this case
was kicked around here once when we were getting started, but we dropped
it.  The reason we let the idea go was not so much for lack of merit,
but simply because at that point our efforts were much more on writing
programs that could communicate AT ALL, let alone WELL.

For the purposes of this discussion, I would collapse Steve's three file
types into two: static and dynamic, for files on disk and files on pipes,
respectively.

As for static files:

I feel the case for separating properties from data for static files is
irrefutable.  The only issue to debate there is how to store the properties,
but storing it as text has such obvious merit that even there I find little
to argue.  The only real better way would be for UNIX to extend its filesystem
to include user-definable properties, similar to the like-named capability
of LISP.  Assuming we're stuck with text files,
the remaining compass of arguement is the format of the properties.  Naturally,
the more coherent the format the better.  What could be more isomorphic
with standard UNIX practice than to base the format on the notion of the flag?
Thus, the csound system uses a simple keyletter-plus-argument system
both to recognize a flag on the command line
specifying a property of a new sound file, and to
store that property in the sound descriptor files.

As for dynamic files:

Now, 1) could this system be expanded to automate the passing of properties
between cooperating programs communicating via pipes?  If so, 2) could
it be done effectively enough so as to be entirely concealed from the user
of the system?  If so, 3) could it be further augmented to allow the definition
and communication of arbitrary properties without requiring the modification of 
existing loosely coupled program systems that may (or may not)
rely on the property list?

(1)  Happily, I think the system could be put in place from existing components
in the sound file subroutine library.  There are some trickey implementation
details, that I'll deal with here by the seat of my pants.
The simplest implementation would involve modification to
the routines putfloat(), putshort(), getfloat() and getshort().  
Insofar as these routines are universally used by CARL software, a single
coherent modification of them, allowing for downwards compatability, would
suffice.  As they
stand, they read/write floatsams and shortsams only.  Put{float,short}()
would be modified to write a "sentinal word" indicating a header follows.
At Stanford, the sentinal was the number 052525252...  which seemed to work
well enough.  This would be followed by a call to the routine wsfd() 
to write the descriptor down the pipe.  The end of header would be followed
by another sentinal to be defined.  It could be immediately followed
without padding by the data.
Get{float,short}() would similarly be modified to detect the sentinal,
and finding it, to trap text up to the next sentinal, and foreward it on
to the routine rsdf().  Failing to find the sentinal, get{float,short}()
would default to act as it currently does, supplying the required downward
compatability.

So far, so good.  The only thought that must go into such a scheme is how
to interface this to the user program environment.  The simple case is
for sndin and sndout which already have access to the sound file descriptor.
They plug directly into the scheme without a problem.  

But what about
such a program as cmusic?  Believe it or not, cmusic is a causal system,
because it can't anticipate when it will finish.  For instance,
it has a unit generator called "test", which can stop an instrument when
it detects certain phenomina.  The typical use is to terminate a run when
the reverberation dies below -60dB.  Thus, the length of data is indeterminate
for cmusic and many other programs.  All other aspects besides the length
of data can be known at startup time.  This isn't a killing problem, but
it must be kept in mind as a serious limitation.

What about the question of setting and fetching properties inside programs
that produce and consume sample data?  
This enlarges to the question of specifying
the software discipline for such a system.  The quickest implementation 
(and one that would also increase the coherency of the system as a whole)
would be
based on the one currently used for manipulating sound file descriptors.  A
routine called setsfd() takes a pointer to a sound file descriptor structure,
a keyletter specifying a structure subfield, and a datum, and stuffs it.
If the pointer is NULL, a new structure is malloc()'ed.  Thus, one
subroutine manages creation and stuffing of a structure. 
The disadvantage is that the structure contains a lot of cruft that is only
interesting to csound routines that manipulate the disk.

An implementation problem comes up when considering a case of a program that
can both get the descriptor from a pipe, and can also read the command line
for flags that want to override the piped-in descriptor.  The typical layout
of such programs is:

main()
{
	getargs(); /* deal with flags and other arguments */
	while (getfloat(&signal) > 0) {
		signal = frobnicate(signal);
		putfloat(&signal);
	}
	flushfloat();
}

so that flags are dealt with first, and the piped-in descriptor comes along
later.  We'd presumably want the flags to be evaluated after the piped
version so that they would override the piped version instead of vice versa,
but this can't be done in the simple case.  However, this would work:

main()
{
	proplist = getprop(stdin);/* read properties, if any */
	proplist = getargs(proplist); /* deal with flags and other arguments */
	putprop(proplist);
	while (getfloat(&signal) > 0) {
		signal = frobnicate(signal, proplist);
		putfloat(&signal);
	}
	flushfloat();
}

The subroutine getprop() would read stdin, look for a sentinal, read up
the property list if found, otherwise call ungetc() or whatever and
return emptyhanded.  Getargs() then gets its crack at it, overriding
the piped property list as desired.  Getfloat() could still blindly look
for the sentinal, but simply wouldn't find it.  A problem comes up as to
how to pass the properties out via putfloat().  I've avoided having to
add an argument to putfloat() by inventing a routine putprop() which does
the obvious.  One could modify putfloat(), retaining downward compatability,
it by adding variable argument list code to putfloat() with varargs(3).

(2) Is such a system effective enough so as to be entirely concealed 
from the user of the system?   We're here only considering the non-extensible
version discussed above which is hacked together out of the sound file system.
It does a pretty good job with regard to user-level interface streamlineing,
but as the code examples demonstrate, it introduces some subtleties to
the creation of programs.

My gut feeling about such endeavors is that it constitutes putting in an
automatic transmission.  You still have to have a gear stick so the bloody
thing can back up.  It also allows the perpetration of fictions such as
"neutral" and "park" which form a prophilactic barrier to a person trying
to go from being a user to being a programmer.  Will it do more harm than
good?  A good question.

(3) Could it be further augmented to allow the definition
and communication of arbitrary properties without requiring the modification of 
existing loosely coupled program systems that may (or may not)
rely on the property list?

The real test of such a system is how to make it extensible.  
We'd ideally want to be able to redefine the structure used to store 
properties on the fly.  Unfortunately, this goes
against the grain of C, which does not allow realtime redefinition of
data types.  We'd want to have more of the facility of LISP property
functions.  However, I believe we're stuck with C for the forseeable future.
One could kludge around this limitation by defining the property list
as a linked list of structures of the following form:

struct property {
	struct property *lastprop;
	char *symbol;
	char *data_type;
	char *datum;
	char *semantic;
	struct property *nextprop;
} *proplist;

We'd then have to have subroutines like defprop() to stuff the property
list.  We'd probably want stdprop() which returns a vanilla flavored one.
We could cause the symbol "datum" to be a union of various pointer data types
if character data weren't appropriate.
If the usage of the symbol "semantic" isn't immediately clear, I thought
of that as a way to communicate the import of the symbol, so that some
other program, reading this property list on its standard input and seeing
a property it didn't know about, might somehow be able to grok its meaning
by examining this (good luck!).  

Reviewing in my mind some of the CARL programs that analize sound files
and report statistics, it is evident to me that the property stuff could
easliy be overused.  But at least it would accumulate in an extensible fashion.
However, I'll point out that the net result of an automatic transmission
is that your gas milage goes down.  What's more, the drivers start to
hanker for an air conditioner, then a stereo, cruise control, automatic
windows, seats, antennae retractors, rotating headlamps, portable wet
bars, TV, then they start bitching about the lousy gas milage.
