Queues
A queue is a standard data structure that maintains a set of elements in some total order.
A new element is added to the queue using the enqueue
operation, which is also known as push
or insert
in some contexts.
Because of the total order, some element of the queue is the most favorably ranked element.
We can remove this element from the queue using the dequeue
operation, which is also known as pop
or remove
in some contexts.
We provision four types of queues in Calyx. The first three follow the same shared interface, while the fourth follows a slightly extended interface. The frontend is implemented using the Calyx builder library, and the source code is heavily commented.
Installation
- Install flit
- Install the
queues
package:
$ cd frontends/queues/
$ flit install --symlink
Shared Interface
All queues in Calyx expose the same interface.
- Input port
cmd
, a 2-bit integer. Selects the operation to perform:0
:pop
.1
:push
.
- Input port
value
, a 32-bit integer. The value to push. - Register
ans
, a 32-bit integer that is passed to the queue by reference. Ifpop
is selected, the queue writes the result to this register. - Register
err
, a 1-bit integer that is passed to the queue by reference. The queue raises this flag in case of overflow or underflow.
Shared Testing Harness
Because they expose a common interface, we can test all queues using the same harness.
Data Generation
First, we have a Python module that generates randomized sequences of operations and values, and dumps the sequences out in a Calyx .data
file.
It accepts as a command line argument the number of operations to generate.
It also takes a flag, --no-err
, that generates a special sequence of operations that does not trigger any overflow or underflow errors.
If this flag is provided, the queue's length must also be provided.
The Python code is in gen_oracle_data.py
.
Oracles
Next, we have a Python module for each kind of queue that reads the .data
file, simulates the queue in Python, and dumps the expected result out in a Calyx .expect
file.
This Python code is aware of our shared interface.
The oracles are in fifo_oracle.py
, pifo_oracle.py
, pifo_tree_oracle.py
, and binheap_oracle.py
.
They all appeal to pure-Python implementations of the queues, which are found in queues.py
.
Each oracle also requires, as command line arguments, the number of operations being simulated and the queue's length.
Queue Call
The steps above lay out .data
and .expect
files for our Calyx code.
To actually pass a series of commands to a given eDSL queue implementation, we need to call the queue repeatedly with memories parsed from the .data
file.
Unlike the above, which happens at the pure Python level, this needs to happen at the eDSL level.
This is exactly what queue_call.py
does.
It accepts as a Python-level argument a handle to the queue component that is to be tested.
It inserts a main
component that reads the .data
file and calls the queue component repeatedly.
If a command-line flag, --keepgoing
, is provided, the main
component will ignore any overflow or underflow errors raised by the queue component and will complete the entire sequence of operations.
If not, the main
component will stop after the queue component first raises an error.
Putting It All Together
The testing harness can be executed, for all our queues, by running the shell script gen_test_data.sh
.
This generates the .data
and .expect
files for each queue.
Then, each queue can be tested using our runt
setup.
The queue-generating Python files themselves expect command-line arguments (the number of operations and the optional --keepgoing
flag) and you can see these being passed in the relevant runt
stanza.
FIFO
The most basic queue is the first in, first out (FIFO) queue: the most favorably ranked element is just the one that was added to the queue first.
Our queue frontend generates a simple FIFO in Calyx. The source code is available here.
Internally, our FIFO queue is implemented as a circular buffer. One register marks the next element that would be popped, another marks the next cell into which an element would be pushed, and a third marks the number of elements in the queue. The control logic is straightforward, and proceeds in three parallel arms based on the operation requested.
Using a circular buffer usually entails incrementing indices modulo the buffer size.
We use a trick to avoid this: we require the FIFO's length to be a power of two, say 2^k
, and we use adders of width k
to increment the indices.
This means we can just naively increment the indices forever and the wrap-around behavior we want is automatically provided by overflow.
Specialized PIFOs
A more complex instance is the priority queue. At time of enqueue, an element is associated with a priority. The most favorably ranked element is the one with the highest priority. Two elements may be pushed with the same priority; a priority queue that is additionally defined to break such ties in FIFO order is called a push in, first out (PIFO) queue.
We provide PIFOs in the general sense (i.e., queues that accept (item, rank)
pairs and enqueue based on rank
) shortly. For now, let's focus on specialized PIFOs that have a policy "baked in" to the queue itself.
We have two types of specialized PIFOs - Round Robin and Strict - that implement policies which determine which flow to pop from next. These PIFOs are parameterized over the number of flows, n
, that they can arbitrate between.
Round-Robin Queues
Round robin queues are PIFOs generalized to n
flows that operate in a work
conserving round-robin fashion. That is, if a flow is silent when it is its turn, that flow
simply skips its turn and the next flow is offered service.
Internally, it operates n
subqueues.
It takes in a list boundaries
that must be of length n
, using which the
client can divide the incoming traffic into n
flows.
For example, if n = 3
and the client passes boundaries [133, 266, 400]
,
packets will be divided into three flows according to the intervals: [0, 133]
, [134, 266]
, [267, 400]
.
- At
push
, we check theboundaries
list to determine which flow to push to. Take the boundaries example given earlier,[133, 266, 400]
. If we push the value89
, it will, under the hood, be pushed into subqueue 0 because0 <= 89 <= 133
, and305
will be pushed into subqueue 2 since266 <= 305 <= 400
. - The program maintains a
hot
pointer that starts off at 0, meaning the next subqueue to pop from is queue 0. Atpop
we first try to pop fromhot
. If this succeeds, great. If it fails, we incrementhot
and therefore continue to check all other flows in round robin fashion.
The source code is available in strict_or_rr.py
, which takes as arguments n
, boundaries
, and handles to the subqueues it must administer. It also takes a boolean parameter round_robin
, which, if true
, results in the generation of a round-robin queue.
Strict Queues
Strict queues support n
flows as well, but instead, flows have a strict order of priority and this which determines popping
order. That is, the second-highest priority subqueue will only be allowed to pop if the highest priority subqueue is empty.
If the higher-priority flow get pushed to in the interim, the next call to pop
will again try to pop from the highest priority flow.
Like round-robin queues, it takes in a list boundaries
that must be of length n
, which divide the incoming traffic into n
flows.
For example, if n = 3
and the client passes boundaries [133, 266, 400]
,
packets will be divided into three flows according to the intervals: [0, 133]
, [134, 266]
, [267, 400]
.
It takes a list order
that must be of length n
, which specifies the order
of priority of the flows. For example, if n = 3
and the client passes order
[1, 2, 0]
, then flow 1 (packets in range [134, 266]
) has first priority, flow 2
(packets in range [267, 400]
) has second priority, and flow 0 (packets in range
[0, 133]
) has last priority.
- At push, we check the
boundaries
list to determine which flow to push to. Take the boundaries example given earlier,[133, 266, 400]
. If we push the value89
, it will, under the hood, be pushed into subqueue 0 because0 <= 89 <= 133
, and305
will be pushed into subqueue 2 since266 <= 305 <= 400
. - Pop first tries to pop from
order[0]
. If this succeeds, great. If it fails, it triesorder[1]
, and so on.
The source code is available in [gen_strict_or_rr.py
][gen_strict_or_rr.py], which takes as arguments n
, boundaries
, order
, and handles to the subqueues it must administer. It also takes a boolean parameter round_robin
, which, if false
, results in the generation of a strict queue.
PIFO Tree
A PIFO tree is a tree-shaped data structure in which each node is associated with a PIFO. The PIFO tree stores elements in its leaf PIFOs and scheduling metadata in its internal nodes.
Pushing a new element into a PIFO tree involves putting the element into a leaf PIFO and putting additional metadata into internal nodes from the root to the leaf. A variety of scheduling policies can be realized by manipulating the various priorities with which this data is inserted into each PIFO. Popping the most favorably ranked element from a PIFO tree is relatively straightforward: popping the root PIFO tells us which child PIFO to pop from next, and we recurse until we reach a leaf PIFO. We refer interested readers to this research paper for more details on PIFO trees.
Our frontend allows for the creation of PIFO trees of any height, number of children, and with the scheduling policy at each internal node being round-robin or strict.
See the source code for an example where we create a PIFO tree of height 2. Specifically, the example implements the PIFO tree described in ยง2 of this research paper.
Internally, our PIFO tree is implemented by leveraging the PIFO frontend. The PIFO frontend seeks to orchestrate two queues, which in the simple case will just be two FIFOs. However, it is easy to generalize those two queues: instead of being FIFOs, they can be PIFOs or even PIFO trees.
We see a more complex example of a PIFO tree in [complex_tree_test.py
] complex_tree_test.py. This tree does round robin between three children, two of which are strict queues and the other is a round robin queue. This tree has a height of 3. The overall structure is rr(strict(A, B, C), rr(D, E, F), strict(G, H))
.
Minimum Binary Heap
A minimum binary heap is another tree-shaped data structure where each node has at most two children.
However, unlike the queues discussed above, a heap exposes an extended interface:
in addition to the input ports and reference registers discussed above, a heap has an additional input rank
.
The push
operation now accepts both a value
and the rank
that the user wishes to associate with that value.
Consequently, a heap orders its elements by rank
, with the pop
operation set to remove the element with minimal rank.
To maintain this ordering efficiently, a heap stores (rank, value)
pairs in each node and takes special care to maintain the following invariant:
Min-Heap Property: for any given node
C
, ifP
is a parent ofC
, then the rank ofP
is less than or equal to the rank ofC
.
To push
or pop
an element is easy at the top level: write to or read from the correct node, and then massage the tree to restore the Min-Heap Property.
The push
and pop
operations are logarithmic in the size of the heap.
Our frontend allows for the creation of minimum binary heaps in Calyx; the source code is available in binheap.py
.
One quirk of any minimum binary heap is its ambiguous behavior in the case of rank ties.
More specifically, if the value a
is pushed with some rank, and then later value b
is pushed with the same rank, it's unclear which will be popped first.
Often, it's desirable to break such ties in FIFO order: that is we'd like a guarantee that a
will be popped first.
A binary heap that provides this guarantee is called a stable binary heap, and our frontend provides a thin layer over our heap that enforces this property.
Our stable_binheap
is a heap accepting 32-bit ranks and values.
It uses a counter i
and instantiates, in turn, a binary heap that accepts 64-bit ranks and 32-bit values.
- To push a pair
(r, v)
intostable_binheap
, we craft a new 64-bit rank that incorporates the counteri
(specifically, we computer << 32 + i
), and we pushv
into our underlying binary heap with this new 64-bit rank. We also increment the counteri
. - To pop
stable_binheap
, we pop the underlying binary heap.
The source code is available in stable_binheap.py
.