.\" @(#)demoFor1	1.2	12/11/92
.NA demoFor1
.SD
Demonstration of the quasi-static scheduler for a graph with a "for" construct.
.DE
.SV 1.1 2/6/92
.AL "S. Ha"
.LO "~ptolemy/src/domains/cg-ddf/demo"
.LD
.Ir "for construct"
.Ir "quasi-static scheduler"
This demo displays a schedule result applying a quasi-static
scheduling technique to a program graph with a \fIfor\fR construct.
A \fIfor\fR construct, which resides in a CGWormhole, consists of a
.c Repeater
star at the front,
.c LastOfN
star at the back, and a wormhole that represents a loop-body.
The number of iteration cycles of the \fIfor\fR loop depends on
the control input to the
.c Repeater 
star. 
By setting the target parameter of the CG-DDF domain inside
the CGWormhole (green block at the top level), we indicate that 
it is a \fIfor\fR construct (
.c constructType
= for ) and the number of iteration cycles is distributed
uniformly (
.c paramType
= uniform ) from 1 to 15 (
.c paramMin
= 1,
.c paramMax
= 15 ). Since the
.c fixedNum
state is 0, we let the scheduler decide the number of processors to
assign to the for construct.
.pp
The CG-DDF galaxy contains a wormhole of CG domain
to represent the body of the loop.
The outside CG domain and this inner CG domain share
the same child targets by setting
.c inheritProcessors
target parameter "YES" in the inner target. The quasi-static scheduling
idea is that we fix the assignment and the execution order of stars in
the CG targets at the compile time.  At the top level, the "for"
construct is regarded as an atomic block. The blocks inside the
loop-body are also scheduled (assigned and ordered)
at compile-time. At run time, 
we execute the loop-body as many times as the control value that is
fed to the
.c Repeater
star.
.pp
The core of the scheduling technique is to determine the assumed
execution profile of the \fIfor\fR construct at compile-time,
based on the distribution of the number of iteration cycles.
In the displayed Gantt chart, we can find out what is the
.Ir "profile, CG-DDF"
estimated profile of the construct. The construct is assigned to all four
processors.  Each loop-body is executed in a single processor, and four
loops are parallelized since the loop-body does not possess any
intercycle dependency. The assumed execution times on the processors are not
the same on the assigned processors 
since the assumed number of iteration cycles is 3 (or 7) in this case.
Note that we pipeline the program graph by inserting delays on a cutset
manually to increase the possibility of parallelism.
In this release we do not make code
generation working properly yet.
.ES

