.VR 1.2 12/10/92
.TI "CG-DDF Domain"
.ds DO "CG-DDF
.EQ
delim off
.EN
.AU
Soonhoi Ha
.AE
.H1 "Introduction"
.pp
.b Warning:
this code is experimental and unfinished; at this point demos
of interesting parallel scheduling schemes for graphs with
dynamic behavior are supplied, with little else.
.pp
All code generation domains included in this release assume that
the dataflow graph is synchronous (or, SDF), that the number of 
tokens consumed and produced for each star is not varying at run time. 
In an SDF graph, therefore, we can determine the execution order of blocks
at compile-time and minimize the runtime overhead of scheduling them.
If we target a multiprocessor architecture, we furthermore assume that
the relative execution times of blocks are specified. We do not
allow blocks with dynamic behavior such as case construct,
data-dependent iteration, and recursions. The
.c CG-DDF
domain is the code generation version of the 
.c DDF
domain (so, called CG-DDF), to overcome this limitation. 
Using the CG-DDF domain inside
a code generation domain, the user may want to use some
.Ir "dynamic constructs, CG-DDF"
predetermined dynamic constructs in the application graph.
The CG-DDF domain always resides in a code generation domain as a
wormhole:
.c CGWormhole .
Moreover, the target of the code generation domain should be
a multiprocessor target.
.pp
The dynamic constructs we support in the CG-DDF domain are
\fIcase, for, do-while,\fR and \fIrecursion\fR.
The \fIcase\fR construct is a generalization of the more familiar
\fIif-then-else\fR construct.
Unlike the simulation DDF domain, the graphical topologies of
these dynamic constructs are enforced. In the simulation DDF domain, 
all connected SDF stars are collected to make a DDF wormhole 
automatically to reduce the runtime overhead by \fIquasi-static\fR
.Ir "quasi-static scheduling, CG-DDF"
scheduling.  In the CG-DDF domain, however, the user should create
CG-DDF wormholes explicitly, so that the graph in the CG-DDF domain
consists of only CG-DDF stars and CG-DDF wormholes. The topology of
the graph is identified with a predetermined topology of dynamic
constructs supported by the \*(PT. 

.H1 "Target"
.pp
There is only one target in the CG-DDF domain,
.c CGDDFTarget .
.Ir "CGDDFTarget, class"
The CGDDFTarget is derived from the
.c MultiTarget
class. 
Note that the CG-DDF domain does not change the physical target
of the outside code generation domain. Also, the code generation
domain inside the CG-DDF wormholes do not, either.  Hence, we set
a target parameter,
.c inheritProcessors ,
.Ir "inheritProcessors, parameter"
\fBYES\fR meaning that the target inside a wormhole uses 
the same physical architecture (or child targets) as the outside domain.
The targets of all code generation domains inside CG-DDF wormholes 
should be set this parameter to YES. If the parameter is set YES, the
.c nprocs 
target parameter is ignored.
Here is the reason why the outside domain of the CG-DDF domain should
have a multiprocessor target: the target of the CG-DDF domain inherits
the child targets of the outside target. A single processor target does
not have any child target.
.pp
The CG-DDFTarget has a string parameter
.c constructType
.Ir "constructType, parameter"
to specify which dynamic construct the graph represents. The user
has to specify one of the dynamic constructs supported: case, do-while,
for, recursion. We examine the first character (type-insensitive) of the
given string to decide the construct. By default it is set to
"case". After the graph in the CG-DDF domain is confirmed to the 
topology of the specified construct, the scheduler is selected automatically.

.H1 "Scheduler"
.pp
Existence of dynamic constructs in a program graph keeps the user
from using the static scheduling algorithms.
Some of the static scheduling algorithms are implemented in the \*(PT,
and described in the scheduler section in the CG domain document.
If an application has dynamic behavior,
the conventional solution is to discard static scheduling and incur the
substantial cost of dynamic scheduling. But, dynamic scheduling is
not a must in most signal processing algorithms; thus a much
simpler approach based on \fIquasi-static\fR scheduling is proposed.
.Id "quasi-static scheduling"
In quasi-static scheduling, most of the scheduling decisions are
made at compile-time. Some scheduling decisions are made at run time,
but only when absolutely necessary. Refer to [1] for detailed discussion
of the scheduling scheme.
.pp
The scheduling idea is as follows; We first treat each dynamic construct
as a special SDF star and use static scheduling algorithm. The SDF star
from the dynamic construct is special in the sense that it may 
require more than one processors to be mapped onto, and the execution
time on the assigned processors are varying at runtime (assumed fixed
when we compute the schedule).  Most conventional scheduling algorithms
assume that a block is assigned to a processor. Therefore, we had to
modify the scheduling algorithms to support the case when some blocks
can be scheduled onto more than one processors. The scheduling results
decide the assignment and ordering of blocks on the processors (child
targets). At run time, we do not achieve the expected performance from the
compile-time scheduling since dynamic constructs will behave
differently from what we assumed at compile time. Sometimes, the
dynamic construct will finish its execution earlier than expected, and 
sometimes later. Our goal is to minimize the expected makespan of the
program graph at run time.
.pp
We assume that the run-time behavior of each dynamic construct
is known or can be approximated with a certain probability distribution.
For example, the number of iteration for \fIfor\fR or \fIdo-while\fR
constructs is the variable. And the recursion depth is a variable of
\fIrecursion\fR construct.
This information should be entered by the following CGDDF target parameters.
.ip
.c paramType :
.Ir "paramType, parameter"
type of the distribution. Currently, we support "geometric" distribution,
"uniform" distribution, and other "general" distribution specified by a table. 
By default, the geometric distribution is chosen.
.ip
.c paramGeo :
.Ir "paramGeo, parameter"
geometric constant of a geometric distribution. Its value is effective only
when the geometric distribution is selected as the
.c paramType .
If the construct is a \fIcase\fR construct, this parameter indicates the
probability of branch 1 being taken. The branch number stars from 0. 
Therefore,
there are only two branches, the parameter indicates the probability
of the "TRUE" branch being taken. In case there are more than two
branches, we have to use 
.c paramFile
parameter to specify the probabilities of taking each branch.
.ip
.c paramMin :
.Ir "paramMin, parameter"
minimum value for a uniform distribution. It is effective only when the uniform
distribution is chosen.
.ip
.c paramMax :
.Ir "paramMax, parameter"
maximum value for a uniform distribution.  It is effective only when the
uniform distribution is chosen.
.ip
.c paramFile :
.Ir "paramFile, parameter"
file name that contains the information on the distribution. If the
construct is a \fIcase\fR
construct, each line contains the value for the probability of taking a
branch numbered from 0.
Otherwise, each line contains the integer index value and the probability
for that index. The indices should be increasing order.
It is effective only when a \fIgeneral\fR distribution is selected.
.pp
Based on the specified distribution of the run-time behavior, we
determine the compile-time \fIprofile\fR, shortly profile, of each dynamic
.Id "profile, CG-DDF"
construct. The profile of a dynamic construct consists of the number
of processors assigned to the construct and the (assumed) execution times
of the construct on the assigned processors. Suppose we have a \fIfor\fR
construct. If the loop body is scheduled with one processor, it takes
6 time units. With two processors, the loop body takes 3 and 4 time units
respectively. Moreover, each iteration cycle can be paralleled with 1 time
unit skewed. There are four processors. Then, we have to determine how
many processors to be assigned for the construct, and
how many times the loop body will be scheduled at compile time. Assign
two processors to the loop body and parallelize two iteration cycles,
thus taking all 4 processors? Or, assign one processor to the loop body
and parallelize three iteration cycles, thus taking 3 processors as a whole?
We have developed a systematic approach to answer these tricky
scheduling problems based on the distribution information [1].
We can manually determine the number of assigned processors by setting
.c fixedNum
.Ir "fixedNum, parameter"
parameter of the CG-DDF target. Note that we still have to decide how
to schedule the dynamic construct with the given number processors.
.pp
Since the gantt chart program currently implemented can not show the
schedule inside the CG-DDF wormhole, we just show the "profile" of the
dynamic construct. The outside code generation domain uses that profile
information of each dynamic construct (or CGWormhole) for overall static
scheduling.

.H1 "CGDDF Stars"
.pp
The CGDDF stars are the key for identifying dynamic constructs.
For example, the
.c Case
.Sr Case
and the
.c EndCase
.Sr EndCase
stars are used in the \fIcase\fR, \fIdo-while\fR, or \fIrecursion\fR
construct, which differ each other by the connection topology of 
these CGDDF Stars and CGDDFWormholes. Therefore, if the user wants to
use one of these dynamic constructs, no need for writing a star exists.
.pp
The \fIfor\fR construct consists of a pair of an
.c UpSample
type star and a
.c DownSample
type star, where UpSample and DownSample are not the star name but the
type of some stars. If a star produces more than consumes,
it is called an UpSample star. In the preprocessor file, we define
a method 
.c readTypeName ,
as shown below:
.(c
        method {
                name { readTypeName }
                access { public }
                type { "const char *" }
                code { return "UpSample"; }
        }
.)c
The examples of UpSample type stars are
.c Repeater
.Sr Repeater
and
.c DownCounter .
.Sr DownCounter 
The Repeater star has two inputs. One input receives a control value
specifying how many times the star repeats the value of the other input
to the output. The DownCounter star receives a positive integer from
the input and produces down-counted values to the output. The number
of tokens produced from both stars is data-dependent.
On the other hand, we can design a DownSample star that has the following
method:
.(c
        method {
                name { readTypeName }
                access { public }
                type { "const char *" }
                code { return "DownSample"; }
        }
.)c
One example of DownSample type star is
.c LastOfN .
.Sr LastOfN
The LastOfN star has two inputs of which the control input 
reads the value N. And, the star receives N inputs and 
send the last input received to the output. Thus, the number of
tokens consumed is data-dependent.
.pp
As explained above, all customized CG-DDF stars will be either
UpSample type or DownSample type. And, we do not expect that
a casual user need to write a new CG-DDF star if we provide
some representative UpSample and DownSample stars.
Currently, we haven't finished code generation part in this domain.
So, the CG-DDF stars are comment generators in this release.

.H1 "Status
.pp
In this release, we include demos showing only the scheduling result,
not the generated code. Code generation part of this domain has not
been completed yet.

.UH "References"
.ip [1]
S. Ha,
"Compile-Time Scheduling of Dataflow Program Graphs with Dynamic Constructs",
Ph.D. dissertation, U.C.Berkeley, 1992.
.EQ
delim $$
.EN

