µFork Tutorial
A pure actor-based concurrent machine architecture
with memory-safety and object-capability security
Reduce, Reuse, Recycle
Both code and data structures in uFork are immutable. This makes it safe to share these structures among actors. We can take advantage of commonality to reduce our memory footprint, and also to pass information efficiently between actors (without making copies).
When two or more actors need access to the same mutable state, we define a "state holder" actor for that state. Since shared data is immutable, the state-holder maintains the mutable state privately, and the sharing actors coordinate by sending messages to the state-holder. We will define a "storage cell" actor as a simple state-holder example.
Common Tail-Sequences
In our previous examples,
you may have noticed
that the code for many actors
ends with the some of the same instructions.
Almost all actors end with
an end commit
instruction.
Many actors preceed this with
an actor send
instruction.
A common pattern is to send to a customer
provided as msg 1
.
These patterns are so common,
they are provided in the
std.asm
module (usually imported with an std
prefix).
; Common Tail-Sequences cust_send: ; msg msg 1 ; msg cust send_msg: ; msg cust actor send ; -- sink_beh: ; _ <- _ commit: end commit
It may come as a surprise that these instructions may not be sequential in memory. In fact, there is no way to tell if they are sequential or not! This is because uFork memory is organized into linked structures called quad-cells. Quad-cells are the minimum addressable unit. They have 4 value fields. Each value is a fixnum, a capability (actor address), or a pointer to a quad-cell. There is no conversion between these types, thus you can't determine the numeric value of an address. The code fragment above is illustrated by this diagram:
cust_send ---> [#instr_t, "msg", 1, k] | +--------------------+ | V send_msg ----> [#instr_t, "actor", "send", k] | | sink_beh ------+---------------------------+ | V commit ------> [#instr_t, "end", "commit", #?]
This code fragment consists of 3 instructions
and occupies 3 quad-cells.
This first field of each quad
designates the type of the quad,
in this case #instr_t
for executable instructions.
The labels cust_send
, send_msg
,
sink_beh
, and commit
denote pointers to these instruction quads.
The final field of each quad
contains a pointer to the next instruction
in the code sequence (the continuation),
regardless of where it resides in memory.
By default,
the assembler translates a sequence of instructions
into a linked-list of quad-cells.
However, the continuation
can always be specified explicitly
by including an extra argument
at the end of an instruction.
For example,
cust_send
could be written as:
cust_send: ; msg msg 1 send_msg ; msg cust
A more common approach is
to use a ref
pseudo-instruction
to provide the continuation,
as shown here:
cust_send: ; msg msg 1 ; msg cust ref send_msg
A ref
can also be used
to introduce a named constant.
The following code associates the label E_OK
with the fixnum value 0
.
E_OK: ; not an error ref 0
A ref
occupies no storage.
It's just a reference to another value.
You should use ref std.commit
instead of end commit
to avoid creating redundant instruction-quads.
The send_msg
and cust_send
labels allow sharing of longer common tail-sequences.
Watch for opportunities to use them.
Static Data
We will use the ref
mechanism
to define a set of operation-codes,
and then use the dict_t
data constructor
to define a dictionary of code-addresses.
read_op: ref 0 write_op: ref 1 CAS_op: ref 2 op_table: dict_t read_op read dict_t write_op write dict_t CAS_op CAS ref #nil
The dict_t
data constructor
defines statically-initialized instances
of type #dict_t
.
It takes a key, a value,
and an optional next pointer.
As with instructions,
if the final next pointer is missing,
it defaults to the next statement in the source code.
The #nil
value
represents the empty dictionary,
which terminates the sequence.
Note that the named constants
(#?
, #nil
, #f
and #t
)
are pointers to read-only memory.
Method Dispatch Table
We begin defining the "storage cell" behavior
by using the staically-defined op_table
to dispatch to the code specified by the op
selector in the incoming message.
beh: cell_beh: ; value <- cust,op,args push op_table ; op_table msg 2 ; op_table op dict get ; op_code dup 1 ; op_code op_code typeq #instr_t ; op_code is_instr(op_code) if_not std.abort ; op_code jump ; --
If the dictionary lookup produces a code-pointer,
we jump
to the selected method.
Otherwise, the actor transaction is aborted.
The message is discarded
and there is no effect on the actor or the system.
On some platforms,
an "auditor" may be notified
and the anomaly can be reported.
Read Method
The method code is executed in the same transaction,
with the same message and the same private state
as the method dispatch.
The "read" method simply sends the current value
to the cust
provided in the message.
read: ; value <- cust,#read,_ state 0 ; value ref std.cust_send
Write Method
The "write" method updates the cell's private state
with a new value'
from the message,
and sends its own address to the provided cust
.
write: ; value <- cust,#write,value' msg -2 ; value' push cell_beh ; value' cell_beh actor become ; -- actor self ; SELF ref std.cust_send
Sending the actor's address to the customer is the "sync signal" pattern. It allows the customer to continue processing, knowing that the write has been completed.
Compare-and-Swap Method
You might think that "read" and "write" are all the operations you need for a mutable storage cell. However, consider trying to do something as simple as incrementing a shared counter. This requires a read-modify-write cycle. With only a read/write API, the cell's value could change between the read of the original value and the write of the modified value. This is called a "data race". One mechanism to avoid this kind of hazard is to add a "compare-and-swap" operation to the API.
CAS: ; value <- cust,#CAS,old,new msg 3 ; old state 0 ; old value cmp eq ; old==value if_not read ; -- msg -3 ; new push cell_beh ; new cell_beh actor become ; -- ref read
The "CAS" method request message provides
the expected old
value
and the desired new
value.
If old
matches the current value
,
then the cell's private state is updated to new
.
In either case, "CAS" acts like "read"
(literally reusing the same code),
sending value
to cust
.
Note that, by definition, this is the value
before any updates.
If the "CAS" cust
receives
the old
value they expected,
they know the operation was successful.
Otherwise, they can use the returned value
as old
and retry the operation,
possibly with a recalculated new
.
With actors, nothing blocks
and everyone makes progress,
even when there is contention.
The one-at-a-time transactional nature
of actor event handling ensures consistency.
Usage Example
An actor with cas_add
behavior
uses a cell's "CAS" method
to safely update the cell's value.
The actor's state includes
the expected old
value
(initialized to #?
),
the inc
rement to add,
and the cell
holding the value.
It receives the cell's
updated old'
value
as the customer of a "CAS" operation.
cas_add: ; old,inc,cell <- old' state 1 ; old msg 0 ; old old' cmp eq ; old==old' if std.commit ; --
If the old'
value received
matches the expected old
value,
the operation was successful
and there is no more work to do.
state 0 ; old,inc,cell part 2 ; cell inc old drop 1 ; cell inc msg 0 ; cell inc old'
Otherwise,
we replace the old
value
with the updated old'
.
dup 2 ; cell inc old' inc old' alu add ; cell inc old' new=inc+old' pick 2 ; cell inc old' new old' push CAS_op ; cell inc old' new old' #CAS actor self ; cell inc old' new old' #CAS cust=SELF pair 3 ; cell inc old' cust,#CAS,old',new state -2 ; cell inc old' cust,#CAS,old',new cell actor send ; cell inc old'
Then we recalculate new=inc+old'
and prepare another "CAS" request
for the cell
,
designating ourself as the customer.
pair 2 ; old',inc,cell push cas_add ; old',inc,cell cas_add actor become ; -- ref std.commit
Finally, we update our private state and commit the transaction.
Complete Module
The complete module is available
in the uFork library at
https://ufork.org/lib/cell.asm
.import dev: "https://ufork.org/lib/dev.asm" std: "https://ufork.org/lib/std.asm" lib: "https://ufork.org/lib/lib.asm" ... beh: cell_beh: ; value <- cust,op,args ... boot: ; _ <- {caps} ... test: ; judge <- {caps} ... .export beh read_op write_op CAS_op boot test
The module exports beh
,
rather than cell_beh
,
so when the module is imported
with a namespace prefix of cell
it can be referenced as cell.beh
The module also defines demonstration and testing procedures which are a subject for future tutorials.