# Queues

A queue is a standard data structure that maintains a set of elements in some total order.
A new element is added to the queue using the `enqueue`

operation, which is also known as `push`

or `insert`

in some contexts.
Because of the total order, some element of the queue is the *most favorably ranked* element.
We can remove this element from the queue using the `dequeue`

operation, which is also known as `pop`

or `remove`

in some contexts.

We provision four types of queues in Calyx. The first three follow the same shared interface, while the fourth follows a slightly extended interface. The frontend is implemented using the Calyx builder library, and the source code is heavily commented.

## Installation

- Install flit
- Install the
`queues`

package:

```
$ cd frontends/queues/
$ flit install --symlink
```

## Shared Interface

All queues in Calyx expose the same interface.

- Input port
`cmd`

, a 2-bit integer. Selects the operation to perform:`0`

:`pop`

.`1`

:`push`

.

- Input port
`value`

, a 32-bit integer. The value to push. - Register
`ans`

, a 32-bit integer that is passed to the queue by reference. If`pop`

is selected, the queue writes the result to this register. - Register
`err`

, a 1-bit integer that is passed to the queue by reference. The queue raises this flag in case of overflow or underflow.

## Shared Testing Harness

Because they expose a common interface, we can test all queues using the same harness.

#### Data Generation

First, we have a Python module that generates randomized sequences of operations and values, and dumps the sequences out in a Calyx `.data`

file.
It accepts as a command line argument the number of operations to generate.
It also takes a flag, `--no-err`

, that generates a special sequence of operations that does not trigger any overflow or underflow errors.
If this flag is provided, the queue's length must also be provided.
The Python code is in `gen_oracle_data.py`

.

#### Oracles

Next, we have a Python module *for each kind of queue* that reads the `.data`

file, simulates the queue in Python, and dumps the expected result out in a Calyx `.expect`

file.
This Python code is aware of our shared interface.
The oracles are in `fifo_oracle.py`

, `pifo_oracle.py`

, `pifo_tree_oracle.py`

, and `binheap_oracle.py`

.
They all appeal to pure-Python implementations of the queues, which are found in `queues.py`

.
Each oracle also requires, as command line arguments, the number of operations being simulated and the queue's length.

#### Queue Call

The steps above lay out `.data`

and `.expect`

files for our Calyx code.
To actually pass a series of commands to a given eDSL queue implementation, we need to call the queue repeatedly with memories parsed from the `.data`

file.
Unlike the above, which happens at the pure Python level, this needs to happen at the eDSL level.

This is exactly what `queue_call.py`

does.
It accepts as a Python-level argument a handle to the queue component that is to be tested.
It inserts a `main`

component that reads the `.data`

file and calls the queue component repeatedly.
If a command-line flag, `--keepgoing`

, is provided, the `main`

component will ignore any overflow or underflow errors raised by the queue component and will complete the entire sequence of operations.
If not, the `main`

component will stop after the queue component first raises an error.

#### Putting It All Together

The testing harness can be executed, for all our queues, by running the shell script `gen_test_data.sh`

.
This generates the `.data`

and `.expect`

files for each queue.

Then, each queue can be tested using our `runt`

setup.
The queue-generating Python files themselves expect command-line arguments (the number of operations and the optional `--keepgoing`

flag) and you can see these being passed in the relevant `runt`

stanza.

## FIFO

The most basic queue is the *first in, first out* (FIFO) queue: the most favorably ranked element is just the one that was added to the queue first.

Our queue frontend generates a simple FIFO in Calyx. The source code is available here.

Internally, our FIFO queue is implemented as a circular buffer. One register marks the next element that would be popped, another marks the next cell into which an element would be pushed, and a third marks the number of elements in the queue. The control logic is straightforward, and proceeds in three parallel arms based on the operation requested.

Using a circular buffer usually entails incrementing indices modulo the buffer size.
We use a trick to avoid this: we require the FIFO's length to be a power of two, say `2^k`

, and we use adders of width `k`

to increment the indices.
This means we can just naively increment the indices forever and the wrap-around behavior we want is automatically provided by overflow.

## Specialized PIFOs

A more complex instance is the priority queue.
At time of enqueue, an element is associated with a priority.
The most favorably ranked element is the one with the highest priority.
Two elements may be pushed with the same priority; a priority queue that is additionally defined to break such ties in FIFO order is called a *push in, first out* (PIFO) queue.

We provide PIFOs in the general sense (i.e., queues that accept `(item, rank)`

pairs and enqueue based on `rank`

) shortly. For now, let's focus on specialized PIFOs that have a policy "baked in" to the queue itself.

We have two types of specialized PIFOs - Round Robin and Strict - that implement policies which determine which flow to pop from next. These PIFOs are parameterized over the number of flows, `n`

, that they can arbitrate between.

### Round-Robin Queues

Round robin queues are PIFOs generalized to `n`

flows that operate in a work
conserving round-robin fashion. That is, if a flow is silent when it is its turn, that flow
simply skips its turn and the next flow is offered service.

Internally, it operates `n`

subqueues.
It takes in a list `boundaries`

that must be of length `n`

, using which the
client can divide the incoming traffic into `n`

flows.
For example, if `n = 3`

and the client passes boundaries `[133, 266, 400]`

,
packets will be divided into three flows according to the intervals: `[0, 133]`

, `[134, 266]`

, `[267, 400]`

.

- At
`push`

, we check the`boundaries`

list to determine which flow to push to. Take the boundaries example given earlier,`[133, 266, 400]`

. If we push the value`89`

, it will, under the hood, be pushed into subqueue 0 because`0 <= 89 <= 133`

, and`305`

will be pushed into subqueue 2 since`266 <= 305 <= 400`

. - The program maintains a
`hot`

pointer that starts off at 0, meaning the next subqueue to pop from is queue 0. At`pop`

we first try to pop from`hot`

. If this succeeds, great. If it fails, we increment`hot`

and therefore continue to check all other flows in round robin fashion.

The source code is available in `strict_or_rr.py`

, which takes as arguments `n`

, `boundaries`

, and handles to the subqueues it must administer. It also takes a boolean parameter `round_robin`

, which, if `true`

, results in the generation of a round-robin queue.

### Strict Queues

Strict queues support `n`

flows as well, but instead, flows have a strict order of priority and this which determines popping
order. That is, the second-highest priority subqueue will only be allowed to pop if the highest priority subqueue is empty.
If the higher-priority flow get pushed to in the interim, the next call to `pop`

will again try to pop from the highest priority flow.

Like round-robin queues, it takes in a list `boundaries`

that must be of length `n`

, which divide the incoming traffic into `n`

flows.
For example, if `n = 3`

and the client passes boundaries `[133, 266, 400]`

,
packets will be divided into three flows according to the intervals: `[0, 133]`

, `[134, 266]`

, `[267, 400]`

.

It takes a list `order`

that must be of length `n`

, which specifies the order
of priority of the flows. For example, if `n = 3`

and the client passes order
`[1, 2, 0]`

, then flow 1 (packets in range `[134, 266]`

) has first priority, flow 2
(packets in range `[267, 400]`

) has second priority, and flow 0 (packets in range
`[0, 133]`

) has last priority.

- At push, we check the
`boundaries`

list to determine which flow to push to. Take the boundaries example given earlier,`[133, 266, 400]`

. If we push the value`89`

, it will, under the hood, be pushed into subqueue 0 because`0 <= 89 <= 133`

, and`305`

will be pushed into subqueue 2 since`266 <= 305 <= 400`

. - Pop first tries to pop from
`order[0]`

. If this succeeds, great. If it fails, it tries`order[1]`

, and so on.

The source code is available in [`gen_strict_or_rr.py`

][gen_strict_or_rr.py], which takes as arguments `n`

, `boundaries`

, `order`

, and handles to the subqueues it must administer. It also takes a boolean parameter `round_robin`

, which, if `false`

, results in the generation of a strict queue.

## PIFO Tree

A PIFO tree is a tree-shaped data structure in which each node is associated with a PIFO. The PIFO tree stores elements in its leaf PIFOs and scheduling metadata in its internal nodes.

Pushing a new element into a PIFO tree involves putting the element into a leaf PIFO and putting additional metadata into internal nodes from the root to the leaf. A variety of scheduling policies can be realized by manipulating the various priorities with which this data is inserted into each PIFO. Popping the most favorably ranked element from a PIFO tree is relatively straightforward: popping the root PIFO tells us which child PIFO to pop from next, and we recurse until we reach a leaf PIFO. We refer interested readers to this research paper for more details on PIFO trees.

Our frontend allows for the creation of PIFO trees of any height, number of children, and with
the scheduling policy at each internal node being *round-robin* or *strict*.

See the source code for an example where we create a PIFO tree of height 2. Specifically, the example implements the PIFO tree described in ยง2 of this research paper.

Internally, our PIFO tree is implemented by leveraging the PIFO frontend. The PIFO frontend seeks to orchestrate two queues, which in the simple case will just be two FIFOs. However, it is easy to generalize those two queues: instead of being FIFOs, they can be PIFOs or even PIFO trees.

We see a more complex example of a PIFO tree in [`complex_tree_test.py`

] complex_tree_test.py. This tree does round robin between three children, two of which are strict queues and the other is a round robin queue. This tree has a height of 3. The overall structure is `rr(strict(A, B, C), rr(D, E, F), strict(G, H))`

.

## Minimum Binary Heap

A minimum binary heap is another tree-shaped data structure where each node has at most two children.
However, unlike the queues discussed above, a heap exposes an extended interface:
in addition to the input ports and reference registers discussed above, a heap has an additional input `rank`

.
The `push`

operation now accepts both a `value`

and the `rank`

that the user wishes to associate with that value.
Consequently, a heap *orders* its elements by `rank`

, with the `pop`

operation set to remove the element with minimal rank.

To maintain this ordering efficiently, a heap stores `(rank, value)`

pairs in each node and takes special care to maintain the following invariant:

Min-Heap Property: for any given node`C`

, if`P`

is a parent of`C`

, then the rank of`P`

is less than or equal to the rank of`C`

.

To `push`

or `pop`

an element is easy at the top level: write to or read from the correct node, and then massage the tree to restore the Min-Heap Property.
The `push`

and `pop`

operations are logarithmic in the size of the heap.

Our frontend allows for the creation of minimum binary heaps in Calyx; the source code is available in `binheap.py`

.

One quirk of any minimum binary heap is its ambiguous behavior in the case of rank ties.
More specifically, if the value `a`

is pushed with some rank, and then later value `b`

is pushed with the same rank, it's unclear which will be popped first.
Often, it's desirable to break such ties in FIFO order: that is we'd like a guarantee that `a`

will be popped first.
A binary heap that provides this guarantee is called a *stable binary heap*, and our frontend provides a thin layer over our heap that enforces this property.

Our `stable_binheap`

is a heap accepting 32-bit ranks and values.
It uses a counter `i`

and instantiates, in turn, a binary heap that accepts 64-bit ranks and 32-bit values.

- To push a pair
`(r, v)`

into`stable_binheap`

, we craft a new 64-bit rank that incorporates the counter`i`

(specifically, we compute`r << 32 + i`

), and we push`v`

into our underlying binary heap with this new 64-bit rank. We also increment the counter`i`

. - To pop
`stable_binheap`

, we pop the underlying binary heap.

The source code is available in `stable_binheap.py`

.