Zeeshan Lakhani

Riak, as an eventually consistent distributed key-value database, can use an in-band optimization called read repair to update stale values in replicas. However, for old, less-accessed values an asynchronous job must run and update keys with stale values. In the worst case, that means a full scan of all the keys. Inefficient but simple to optimize. A more efficient update scheme based on maintaining a merkel tree to identify keys that may need updating could be used, but that’ll be harder to implement. How do you verify the implementation?

You can model the merkel tree operations and the state transitions, and then use quickcheck to establish a correspondence between this model and the actual execution. It’s a bit different from the typical use of quickcheck found in languages like Haskell, since those tend to focus on data more “algebraic” properties, rather than trying to make a connection code that’s more operational in description.