loop through all control statements under “par” block to find # barriers
needed and # members of each barrieradd all cells and groups neededloop through all control statements, find the statements with @sync
attribute and replace them with
Compiles @sync without use of std_sync_reg
Upon encountering @sync, it first instantiates a std_reg(1) for each thread(bar)
and a std_wire(1) for each barrier (s)
It then continuously assigns the value of (s.in) to 1’d1 guarded by the
expression that all values of bar for threads under the barrier are
set to 1’d1
Then it replaces the @sync control operator with
seq {
barrier;
clear;
}
barrier simply sets the value of bar to 1’d1 and then waits
for s.out to be up
clear resets the value of bar to 1’d0 for reuse of barrier
Using this method, each thread only incurs 3 cycles of latency overhead for
the barrier, and we theoretically won’t have a limit for number of threads
under one barrier
Transforms all par into seq. When the correctness-checking option is on,
uses analysis::ControlOrder to get a sequentialization of par such that
the program still computes the same value, and errors out when
there is no such sequentialization.