µFork Tutorial

A pure actor-based concurrent machine architecture

with memory-safety and object-capability security

Reduce, Reuse, Recycle

Both code and data structures in uFork are immutable. This makes it safe to share these structures among actors. We can take advantage of commonality to reduce our memory footprint, and also to pass information efficiently between actors (without making copies).

When two or more actors need access to the same mutable state, we define a "state holder" actor for that state. Since shared data is immutable, the state-holder maintains the mutable state privately, and the sharing actors coordinate by sending messages to the state-holder. We will define a "storage cell" actor as a simple state-holder example.

Common Tail-Sequences

In our previous examples, you may have noticed that the code for many actors ends with the some of the same instructions. Almost all actors end with an end commit instruction. Many actors preceed this with an actor send instruction. A common pattern is to send to a customer provided as msg 1. These patterns are so common, they are provided in the std.asm module (usually imported with an std prefix).

; Common Tail-Sequences

cust_send:                  ; msg
    msg 1                   ; msg cust
send_msg:                   ; msg cust
    actor send              ; --
sink_beh:                   ; _ <- _
commit:
    end commit

It may come as a surprise that these instructions may not be sequential in memory. In fact, there is no way to tell if they are sequential or not! This is because uFork memory is organized into linked structures called quad-cells. Quad-cells are the minimum addressable unit. They have 4 value fields. Each value is a fixnum, a capability (actor address), or a pointer to a quad-cell. There is no conversion between these types, thus you can't determine the numeric value of an address. The code fragment above is illustrated by this diagram:

cust_send ---> [#instr_t, "msg", 1, k]
                                    |
               +--------------------+
               |
               V
send_msg ----> [#instr_t, "actor", "send", k]
                                           |
                                           |
sink_beh ------+---------------------------+
               |
               V
commit ------> [#instr_t, "end", "commit", #?]

This code fragment consists of 3 instructions and occupies 3 quad-cells. This first field of each quad designates the type of the quad, in this case #instr_t for executable instructions. The labels cust_send, send_msg, sink_beh, and commit denote pointers to these instruction quads. The final field of each quad contains a pointer to the next instruction in the code sequence (the continuation), regardless of where it resides in memory.

By default, the assembler translates a sequence of instructions into a linked-list of quad-cells. However, the continuation can always be specified explicitly by including an extra argument at the end of an instruction. For example, cust_send could be written as:

cust_send:                  ; msg
    msg 1 send_msg          ; msg cust

A more common approach is to use a ref pseudo-instruction to provide the continuation, as shown here:

cust_send:                  ; msg
    msg 1                   ; msg cust
    ref send_msg

A ref can also be used to introduce a named constant. The following code associates the label E_OK with the fixnum value 0.

E_OK:                       ; not an error
    ref 0

A ref occupies no storage. It's just a reference to another value. You should use ref std.commit instead of end commit to avoid creating redundant instruction-quads. The send_msg and cust_send labels allow sharing of longer common tail-sequences. Watch for opportunities to use them.

Static Data

We will use the ref mechanism to define a set of operation-codes, and then use the dict_t data constructor to define a dictionary of code-addresses.

read_op:
    ref 0
write_op:
    ref 1
CAS_op:
    ref 2

op_table:
    dict_t read_op read
    dict_t write_op write
    dict_t CAS_op CAS
    ref #nil

The dict_t data constructor defines statically-initialized instances of type #dict_t. It takes a key, a value, and an optional next pointer. As with instructions, if the final next pointer is missing, it defaults to the next statement in the source code. The #nil value represents the empty dictionary, which terminates the sequence. Note that the named constants (#?, #nil, #f and #t) are pointers to read-only memory.

Method Dispatch Table

We begin defining the "storage cell" behavior by using the staically-defined op_table to dispatch to the code specified by the op selector in the incoming message.

beh:
cell_beh:                   ; value <- cust,op,args
    push op_table           ; op_table
    msg 2                   ; op_table op
    dict get                ; op_code
    dup 1                   ; op_code op_code
    typeq #instr_t          ; op_code is_instr(op_code)
    if_not std.abort        ; op_code
    jump                    ; --

If the dictionary lookup produces a code-pointer, we jump to the selected method. Otherwise, the actor transaction is aborted. The message is discarded and there is no effect on the actor or the system. On some platforms, an "auditor" may be notified and the anomaly can be reported.

Read Method

The method code is executed in the same transaction, with the same message and the same private state as the method dispatch. The "read" method simply sends the current value to the cust provided in the message.

read:                       ; value <- cust,#read,_
    state 0                 ; value
    ref std.cust_send

Write Method

The "write" method updates the cell's private state with a new value' from the message, and sends its own address to the provided cust.

write:                      ; value <- cust,#write,value'
    msg -2                  ; value'
    push cell_beh           ; value' cell_beh
    actor become            ; --
    actor self              ; SELF
    ref std.cust_send

Sending the actor's address to the customer is the "sync signal" pattern. It allows the customer to continue processing, knowing that the write has been completed.

Compare-and-Swap Method

You might think that "read" and "write" are all the operations you need for a mutable storage cell. However, consider trying to do something as simple as incrementing a shared counter. This requires a read-modify-write cycle. With only a read/write API, the cell's value could change between the read of the original value and the write of the modified value. This is called a "data race". One mechanism to avoid this kind of hazard is to add a "compare-and-swap" operation to the API.

CAS:                        ; value <- cust,#CAS,old,new
    msg 3                   ; old
    state 0                 ; old value
    cmp eq                  ; old==value
    if_not read             ; --
    msg -3                  ; new
    push cell_beh           ; new cell_beh
    actor become            ; --
    ref read

The "CAS" method request message provides the expected old value and the desired new value. If old matches the current value, then the cell's private state is updated to new. In either case, "CAS" acts like "read" (literally reusing the same code), sending value to cust. Note that, by definition, this is the value before any updates.

If the "CAS" cust receives the old value they expected, they know the operation was successful. Otherwise, they can use the returned value as old and retry the operation, possibly with a recalculated new. With actors, nothing blocks and everyone makes progress, even when there is contention. The one-at-a-time transactional nature of actor event handling ensures consistency.

Usage Example

An actor with cas_add behavior uses a cell's "CAS" method to safely update the cell's value. The actor's state includes the expected old value (initialized to #?), the increment to add, and the cell holding the value. It receives the cell's updated old' value as the customer of a "CAS" operation.

cas_add:                    ; old,inc,cell <- old'
    state 1                 ; old
    msg 0                   ; old old'
    cmp eq                  ; old==old'
    if std.commit           ; --

If the old' value received matches the expected old value, the operation was successful and there is no more work to do.

    state 0                 ; old,inc,cell
    part 2                  ; cell inc old
    drop 1                  ; cell inc
    msg 0                   ; cell inc old'

Otherwise, we replace the old value with the updated old'.

    dup 2                   ; cell inc old' inc old'
    alu add                 ; cell inc old' new=inc+old'
    pick 2                  ; cell inc old' new old'
    push CAS_op             ; cell inc old' new old' #CAS
    actor self              ; cell inc old' new old' #CAS cust=SELF
    pair 3                  ; cell inc old' cust,#CAS,old',new
    state -2                ; cell inc old' cust,#CAS,old',new cell
    actor send              ; cell inc old'

Then we recalculate new=inc+old' and prepare another "CAS" request for the cell, designating ourself as the customer.

    pair 2                  ; old',inc,cell
    push cas_add            ; old',inc,cell cas_add
    actor become            ; --
    ref std.commit

Finally, we update our private state and commit the transaction.

Complete Module

The complete module is available in the uFork library at https://ufork.org/lib/cell.asm

Open in Playground

.import
    dev: "https://ufork.org/lib/dev.asm"
    std: "https://ufork.org/lib/std.asm"
    lib: "https://ufork.org/lib/lib.asm"

...

beh:
cell_beh:                   ; value <- cust,op,args
    ...

boot:                       ; _ <- {caps}
    ...

test:                       ; judge <- {caps}
    ...

.export
    beh
    read_op
    write_op
    CAS_op
    boot
    test

The module exports beh, rather than cell_beh, so when the module is imported with a namespace prefix of cell it can be referenced as cell.beh

The module also defines demonstration and testing procedures which are a subject for future tutorials.