Passing Cells by Reference

One question that may arise when using Calyx as a backend is how to pass a cell "by reference" between components. In C++, this might look like:

#include <array>
#include <cstdint>

// Adds one to the first element in `v`.
void add_one(std::array<uint32_t, 1>& v) {
  v[0] = v[0] + 1;
}

int main() {
  std::array<uint32_t, 1> x = { 0 };
  add_one(x); // The value at x[0] is now 1.
}

In Calyx, there are two steps to passing a cell by reference:

  1. Define the component in a manner such that it can accept a cell by reference.
  2. Pass the desired cell by reference.

When we say cell, we mean any cell, including memories of various dimensions and registers.

The language provides two ways of doing this.

The Easy Way: ref Cells

Calyx uses the ref keyword to describe cells that are passed by reference:

component add_one() -> () {
  cells {
    ref mem = comb_mem_d1(32, 4, 3); // A memory passed by reference.
    ...
  }
  ...
}

This component defines mem as a memory that is passed by reference to the component. Inside the component, we can use the cell as usual.

Next, to pass the memory to the component, we use the invoke syntax:

component add_one() -> () { ... }
component main() -> () {
  cells {
    A = comb_mem_d1(32, 4, 3); // A memory passed by reference.
    one = add_one();
    ...
  }
  wires { ... }
  control {
    invoke one[mem = A]()(); // pass A as the `mem` for this invocation.
  }
}

The Calyx compiler will correctly lower the add_one component and the invoke call such that the memory is passed by reference. In fact, any cell can be passed by reference in a Calyx program. Read the next section if you're curious about how this process is implemented.

Multiple memories, multiple components

To understand the power of ref cells, let us work through an example. We will study a relatively simple arbitration logic: the invoker has six memories of size 4 each, but needs to pretend, sometimes simulatenously, that:

  1. They are actually two memories of size 12 each.
  2. They are actually three memories of size 8 each.

We will do up two components that are designed to receive memories by reference:

component wrap2(i: 32, j: 32) -> () {
  cells {
    // Six memories that will be passed by reference.
    ref mem1 = comb_mem_d1(32, 4, 32);
    // ...
    ref mem6 = comb_mem_d1(32, 4, 32);
    // An answer cell, also passed by reference.
    ref ans = comb_mem_d1(32, 1, 32);
  }
  wires { ... }
  control { ... }
}

and

component wrap3(i: 32, j: 32) -> () {
  cells {
    // Six memories that will be passed by reference.
    ref mem1 = comb_mem_d1(32, 4, 32);
    // ...
    ref mem6 = comb_mem_d1(32, 4, 32);
    // An answer cell, also passed by reference.
    ref ans = comb_mem_d1(32, 1, 32);
  }
  wires { ... }
  control { ... }
}

That is, they have the same signature including input ports, output ports, and ref cells. We have elided the logic, but feel free to explore the source code in Python. You can also generate the Calyx code by running python calyx-py/test/correctness/arbiter_6.py.

Now the invoker has six locally defined memories. By passing these memories to the components above, the invoker is able to wrap the same six memories two different ways, and then maintain two different fictional indexing systems at the same time.

component main() -> () {
  cells {
    // Six memories that will pass by reference.
    @external A = comb_mem_d1(32, 4, 32);
    //...
    @external F = comb_mem_d1(32, 4, 32);

    // Two answer cells that we will also pass.
    @external out2 = comb_mem_d1(32, 1, 32);
    @external out3 = comb_mem_d1(32, 1, 32);

    // Preparing to invoke the components above.
    together2 = wrap2();
    together3 = wrap3();
  }

  wires {
  }

  control {
    seq {
      invoke together2[mem1=A, mem2=B, mem3=C, mem4=D, mem5=E, mem6=F, ans=out2](i=32'd1, j=32'd11)();
      invoke together3[mem1=A, mem2=B, mem3=C, mem4=D, mem5=E, mem6=F, ans=out3](i=32'd2, j=32'd7)();
    }
  }
}

Observe: when "wrapped" into two chunks, \( 0 \le i < 2 \) and \( 0 \le j < 12 \); when wrapped into three chunks, \( 0 \le i < 3 \) and \( 0 \le j < 8 \).

The Hard Way: Without ref Cells

Proceed with caution. We recommend using the ref syntax in almost all cases since it enables the compiler to perform more optimizations.

If we wish not to use ref cells, we can leverage the usual input and output ports to establish a call-by-reference-esque relationship between the calling and called components. In fact, the Calyx compiler takes ref cells as descibed above and lowers them into code of the style described here.

Let us walk through an example.

Worked example: mem_cpy

In the C++ code above, we've constructed an "l-value reference" to the array, which essentially means we can both read and write from x in the function add_one.

Now, let's allow similar functionality at the Calyx IR level. We define a new component named add_one which represents the function above. However, we also need to include the correct ports to both read and write to x:

Read from xWrite to x
read_datadone
address portswrite_data
write_en
address ports

Since we're both reading and writing from x, we'll include the union of the columns above:

component add_one(x_done: 1, x_read_data: 32) ->
                 (x_write_data: 32, x_write_en: 1, x_addr0: 1) {

One tricky thing to note is where the ports belong, i.e. should it be an input port or an output port of the component? The way to reason about this is to ask whether we want to receive signal from or send signal to the given wire. For example, with read_data, we will always be receiving signal from it, so it should be an input port. Conversely, address ports are used to mark where in memory we want to access, so those are used as output ports.

We then simply use the given ports to both read and write to the memory passed by reference. Note that we've split up the read and write to memory x in separate groups, to ensure we can schedule them sequentially in the execution flow. We're also using the exposed ports of the memory through the component interface rather than, say, x.write_data.

    group read_from_x {
      x_addr0 = 1'd0;            // Set address port to zero.
      tmp_reg.in = x_read_data;  // Read the value at address zero.
      tmp_reg.write_en = 1'd1;
      read_from_x[done] = tmp_reg.done;
    }
    group write_to_x {
      x_addr0 = 1'd0;            // Set address port to zero.
      add.left = one.out;
      add.right = tmp_reg.out;   // Saved value from previous read.

      x_write_data = add.out;    // Write value to address zero.
      x_write_en = 1'd1;         // Set write enable signal to high.

      write_to_x[done] = x_done; // The group is done when the write is complete.
    }

Bringing everything back together, the add_one component is written accordingly:

component add_one(x_done: 1, x_read_data: 32) ->
                 (x_write_data: 32, x_write_en: 1, x_addr0: 1) {
  cells {
    one = std_const(32, 1);
    add = std_add(32);
    tmp_reg = std_reg(32);
  }
  wires {
    group read_from_x {
      x_addr0 = 1'd0;            // Set address port to zero.
      tmp_reg.in = x_read_data;  // Read the value at address zero.
      tmp_reg.write_en = 1'd1;
      read_from_x[done] = tmp_reg.done;
    }
    group write_to_x {
      x_addr0 = 1'd0;            // Set address port to zero.
      add.left = one.out;
      add.right = tmp_reg.out;   // Saved value from previous read.

      x_write_data = add.out;    // Write value to address zero.
      x_write_en = 1'd1;         // Set write enable signal to high.

      write_to_x[done] = x_done; // The group is done when the write is complete.
    }
  }
  control {
    seq { read_from_x; write_to_x; }
  }
}

The final step is creating a main component from which the original component will be invoked. In this step, it is important to hook up the proper wires in the call to invoke to the corresponding memory you'd like to read and/or write to:

  control {
    invoke add_one0(x_done = x.done, x_read_data = x.read_data)
                   (x_write_data = x.write_data, x_write_en = x.write_en, x_addr0 = x.addr0);
  }

This gives us the main component:

component main() -> () {
  cells {
    add_one0 = add_one();
    @external(1) x = comb_mem_d1(32, 1, 1);
  }
  wires {
  }
  control {
    invoke add_one0(x_done = x.done, x_read_data = x.read_data)
                   (x_write_data = x.write_data, x_write_en = x.write_en, x_addr0 = x.addr0);
  }
}

To see this example simulated, run the command:

fud e examples/futil/memory-by-reference/memory-by-reference.futil --to dat \
-s verilog.data examples/futil/memory-by-reference/memory-by-reference.futil.data

Multi-dimensional Memories

Not much changes for multi-dimensional arrays. The only additional step is adding the corresponding address ports. For example, a 2-dimensional memory will require address ports addr0 and addr1. More generally, an N-dimensional memory will require address ports addr0, ..., addr(N-1).

Multiple Memories

Similarly, multiple memories will just require the ports to be passed for each of the given memories. Here is an example of a memory copy (referred to as mem_cpy in the C language), with 1-dimensional memories of size 5:

import "primitives/core.futil";
import "primitives/memories/comb.futil";
component copy(dest_done: 1, src_read_data: 32, length: 3) ->
              (dest_write_data: 32, dest_write_en: 1, dest_addr0: 3, src_addr0: 3) {
  cells {
    lt = std_lt(3);
    N = std_reg(3);
    add = std_add(3);
  }
  wires {
    comb group cond {
      lt.left = N.out;
      lt.right = length;
    }
    group upd_index<"static"=1> {
      add.left = N.out;
      add.right = 3'd1;
      N.in = add.out;
      N.write_en = 1'd1;
      upd_index[done] = N.done;
    }
    group copy_index_N<"static"=1> {
      src_addr0 = N.out;
      dest_addr0 = N.out;
      dest_write_en = 1'd1;
      dest_write_data = src_read_data;
      copy_index_N[done] = dest_done;
    }
  }
  control {
    while lt.out with cond {
      seq {
        copy_index_N;
        upd_index;
      }
    }
  }
}

component main() -> () {
  cells {
    @external(1) d = comb_mem_d1(32,5,3);
    @external(1) s = comb_mem_d1(32,5,3);
    length = std_const(3, 5);
    copy0 = copy();
  }
  wires {
  }
  control {
    seq {
      invoke copy0(dest_done=d.done, src_read_data=s.read_data, length=length.out)
                  (dest_write_data=d.write_data, dest_write_en=d.write_en, dest_addr0=d.addr0, src_addr0=s.addr0);
    }
  }
}