
Pvc is a NeXTstep phase vocoder application.  Phase vocoders perform
frequency analysis of input signals by use of overlapped fast fourier
transforms (FFT).  A phase vocoder such as this one is typically used
for frequency and time scaling of audio signals without modification
of the opposing domain.  In other words, a phase vocoder can be used
to change the length of sampled mouth harp twangs without
significantly changing their "pitch", and vice versa.  Handy.

What many people don't realize though, is that the phase vocoder
provides super powerful spectral representations of audio signals.  I
have an entire battery of digital signal processors that modify phase
vocoder data.  They are currently available in vanilla C, but they
will eventually be adopted into this application.  I am not happy to
document them all in one night for you.  Sorry.  One or two at a time.

A groovy in-application help feature is on the way.



Pvc parameters:

 FFT size	size of the fourier transform this must be an integer
		and a power of two
 Window Size	the size of the window is normally set to the FFT size.
		making it smaller provides better time resolution at
		the cost of frequency aliasing.  Making it larger has
		provides better frequency resolution at the cost of
		time aliasing.  Opinion: I like larger window values.
 decimation	the decimation factor determines the analysis sampling rate.
		phase vocoder analysis requires overlap between successive
		analysis frames.  this value, determines the amount of
		analysis overlap (overlap = N - D). See notes on the
		"Dolson rule".  D is specified in samples thus it must be
		a positive integer.
 interpolation	the interpolation factor determines the amount of resynthesis
		overlap between successive analysis frames overlap
		(overlap = N - I).
 frequency	this flag acts in two ways.  it toggles oscillator bank 
 multiplier	resynthesis and it specifies a frequency multiplier for the 
		output signal; thus, this flag can be used to to specifiy
		oscillator re-synthesis (-F1.) alone, or used to transpose 
		the spectrum (pitch transposition) of the input.



More explanation!

Ok, you need to learn what I call the "Dolson Rule":

The largest value of either the D (D = decimation factor) and I (I =
interpolation factor) values should never be greater than W/8 (W =
Window size) if you wish to avoid gross amplitude modulation.  D is
the input or analysis overlap (overlap in samples = W-D) while I is
the output or resynthesis overlap (overlap in samples = W-I).  The
ratio between D and I determines time scaling.  If D/I is greater than
1, then the sound will have a shorter duration.  If D/I is less than
1, then the sound will be longer.


Example:

	FFT Size:		1024
	Window Size:		1024
	Decimation:		128
	Interpolation:		64
	Frequency Multiplier:	0

This will decrease the time-scale of the signal by a factor of two; in
other words, the duration of the sound will be be half of its original
length, and its spectral disposition (or loosely: "pitch") will remain
approximately constant.

	FFT Size:		1024
	Window Size:		1024
	Decimation:		64
	Interpolation:		128
	Frequency Multiplier:	0

This will increase the time-scale of the signal by a factor of two.

	FFT Size:		1024
	Window Size:		1024
	Decimation:		128
	Interpolation:		128
	Frequency Multiplier:	1.5

The resultant signal will have an approximately unchanged time-scale, and its
spectrum will be scaled by 1.5, or a perfect fifth in musical parlance.


Remember that frequency resolution is [nyquist frequency] / (N/2).
Thus, if you want to represent noisy or dense signals (at 44.1KHz),
then window sizes (N) of 4096 are not uncommon.  Window size must
always be a power of two.  Also, decimation (D) determines the
sampling rate of analysis, so for large values of N, it is typical to
use overlap values (maximum of D and I) <= 128.


Christopher Penrose
penrose@silvertone.princeton.edu

4/10/92
