Starting guile-crdt

I noticed that in the The Heart of Spritely: Distributed Objects and Capability Security whitepaper, there was mention that the Goblins team intends to implement a standard library of CRDT patterns.

Now, I have no expectation that this CRDT library I'm writing will be that standard library, but I know enough about CRDTs to use that as an excuse to write a library in Guile: guile-crdt

Way back in 2010, I got really into Riak. I interviewed with a company called Mochi Media that used Riak. I didn't end up moving out to San Francisco, but I brought back an interest in Riak.

Riak is an eventually consistent distributed key/value store. You may have a cluster of 5 Riak nodes. Commonly, a quorum of 3 nodes will have to agree to write a value, but if there's an error or network issue, a partial write may occur. In that event, a client reading the value will have to reconcile the conflicting values somehow.

This "reconcile the conflicting values somehow" made me incredibly nervous about using Riak as a data store. I was working in the news industry at the time and was looking for a scalable object store for published articles. Riak looked perfect for that use case. Articles were published infrequently, so using a highly available object store made sense. Unfortunately, I struggled with what to do in the event of conflicting writes.

Last Write Wins was a common technique, but felt insufficient. What if two reporters were altering the same article and tried to save changes to the article? Two different modifications would be made to the article, one overwriting the other. One of the reporters would be very upset that their hard work was lost.

I am having trouble remembering the sequence of events (ironic, isn't it), but this is how I remember Riak users learning of CRDTs. Bob Ippolito released a project called statebox in May of 2011. Bob independently discovered what is called an operation based replicated data type in the literature.

I remember the announcement of statebox spurring a discussion on Twitter among the developers at Basho and Mochi Media about reconciling conflicts. Someone mentioned the A comprehensive study of Convergent and Commutative Replicated Data Types paper released in January of 2011, and the rest was history.

I remember seeing this paper and all the fervor over the paper because CRDTs appeared to be the magic bullet for joining all our split brains. The only problem for me was, I had no idea how to read a computer science paper. It was full of mathematical symbols and words like "join-semilattice" that might as well be magical incantations to this self-taught programmer.

Determined and stubborn, I taught myself enough discrete math and set theory to understand formulas like b = (∀i ∈ [0, n − 1] : X.P [i] ≤ Y.P [i]. The glossary of mathematical symbols wikipedia page was my best friend for a couple of weeks. I eventually learned enough to implement a few of the fundamental CRDT types in Python and Erlang.

An inadequate introduction to CRDTs

A lot has changed in CRDT land since 2011. There are a number of libraries and data stores that natively support CRDTs. There's an excellent interactive introduction of CRDTs that does an amazing job at explaining CRDTs.

I would be doing you a disservice to try to do a better job than the interactive introduction, so I'll give you just enough in order to explain my Guile code.

A CRDT is a structure that has a merge(A, B) function with the following properties:

Commutativity: The order of the arguments don't matter merge(A, B) == merge(B, A)
Associativity: The order at which more than 3 CRDTs are merged doesn't matter, merge(A, merge(B, C)) == merge(merge(A, B), C)
Idempotent: A merge of the same value results in the same value merge(A, A) == A

One of the easiest CRDTs to implement is an increment-only counter called a G-Counter. Each replica keeps their own counts:

alice: 0
bob: 0

When a client increments the counter, they increment their count:

# alice increments once
alice: 1
bob: 0

# bob increments 3 times
alice: 1
bob: 3

The value of the counter is then the sum of all the counts: 4

The merge function is done by taking the max value of each count. Let's look at a sequence of events where Alice and Bob concurrently increment the counter.

# initial value
alice: 0
bob: 0
---
# alice increments once
alice: 1
bob: 0
---
# bob increments 3 times
alice: 0
bob: 3
---
# merge
alice: max(1, 0)
bob: max(0, 3)
---
alice: 1
bob: 3

My implementation in Guile

The formal definition of the G-Counter is

payload integer[n] P
    initial [0,0,...,0]
update increment()
    let g = myId()
    P[g] := P[g] + 1
query value() : integer v
    let v = Σi P[i]
compare (X, Y) : boolean b
    let b = (∀i ∈ [0, n - 1] : X.P[i] ≤ Y.P[i])
merge (X, Y) : payload Z
    let ∀i ∈ [0, n - 1] : Z.P[i] = max(X.P[i], Y.P[i])

I implemented my G-Counter using an Association list. For a large number of clients, an assoc list isn't the most efficient, but for a proof of concept, it is fine.

The new payload function is simple, it is an empty list:

(define (payload-init) '())

Next I defined some utilities functions for if I want to swap out the alist with something better

(define (payload-ref payload key)
  (or (assoc-ref payload key) 0))
  
(define (payload-set! payload key val)
  (assoc-set! payload key val))

The merge function creates a new alist taking the max of key each

(define (payload-merge left-payload right-payload)
  (define keys (lset-union equal?
                           (map car left-payload)
                           (map car right-payload)))
  (map (lambda (key)
         (cons key (max (payload-ref left-payload key)
                        (payload-ref right-payload key))))
       keys))

The value of the payload is simply the sum of all the counts:

(define (payload-value payload)
  (fold (lambda (x acc) (+ (cdr x) acc))
        0
        payload))

One final function that we haven't talked about yet is the compare function. The compare(X, Y) function returns true if X happened before Y or X and Y are concurrent updates. This is calculated by checking if X[key] ≤ Y[key] for every key.

(define (payload-happened-before? left-payload right-payload)
  (define keys (lset-union equal?
                           (map car left-payload)
                           (map car right-payload)))
  (every (lambda (key) (<= (payload-ref left-payload key)
                           (payload-ref right-payload key)))
         keys))

Testing my library

I intend to use guile-quickcheck to verify the properties of the CRDTs, but for the time being, I used the test API defined by SRFI-64. Each test suite is simply a Guile script that makes assertions using the test API.

(define-module (test-g-counter)
  :use-module (crdt g-counter)
  :use-module (srfi srfi-1)
  :use-module (srfi srfi-64))

(define (increment-n-times client-id n counter)
  (if (zero? n)
      counter
      (increment-n-times client-id (- n 1) (g-counter-1+ client-id counter))))

;; I'll want to replace this test suite with a quickcheck test
(test-begin "g-counter")

(test-equal "local incr works"
  5
  (g-counter-value (increment-n-times 'alice 5 (g-counter-init))))

(test-equal "merge works"
  8
  (g-counter-value (g-counter-merge (increment-n-times 'alice 5 (g-counter-init))
                                    (increment-n-times 'bob 3 (g-counter-init)))))


(define start (g-counter-init))
(define bob-alone (g-counter-1+ 'bob start))
(define alice-alone (increment-n-times 'alice 5 start))
(define bob-alice-sync (g-counter-merge bob-alone alice-alone))

(test-assert "a g-counter happens before itself"
  (g-counter-happened-before? start start))

(test-assert "the start value happens before an incremented value (bob)"
  (g-counter-happened-before? start bob-alone))

(test-assert "the start value happens before an incremented value (alice)"
  (g-counter-happened-before? start alice-alone))

(test-assert "an incremented value (bob) does not happen before its start value"
  (not (g-counter-happened-before? bob-alone start)))

(test-assert "an incremented value (alice) does not happen before its start value"
  (not (g-counter-happened-before? alice-alone start)))

(test-assert "two disconnected increments don't happen before each other"
  (not (g-counter-happened-before? bob-alone alice-alone)))

(test-assert "two disconnected increments don't happen before each other (reversed variant)"
  (not (g-counter-happened-before? alice-alone bob-alone)))

(test-assert "a disconnected counter (bob) happens before the sync of two disconnected counters"
  (g-counter-happened-before? bob-alone bob-alice-sync))

(test-assert "a disconnected counter (alice) happens before the sync of two disconnected counters"
  (g-counter-happened-before? alice-alone bob-alice-sync))

(test-end "g-counter")

Conclusion

This library is just a start. I want to implement every type defined in the Inria paper. Before I do that, I want to explain how I set up the guile-crdt repository as a Guile library using GNU autotools. That will be the next installment in the tale of Eric Codes writes a Guile library.

Resources

guile-crdt, my work in progress CRDT library for Guile
crdt.tech, a directory of CRDT resources
An Interactive Intro to CRDTs
Automerge, a CRDT library that provides a JSON like document CRDT