Good morning again! and a big thank you to the organizers of CUFP for starting at 9pm instead of… any time earlier. Yesterday’s OCaml Workshop liveblog fell behind very quickly, but I think I’ve got an idea of how to avoid that today. The program for CUFP 2015 can be found here. I’ll kick this off just after the keynote, once the coffee takes effect.
Futures for C++11 at Facebook
Futures-based concurrency at Facebook was inspired by a Finagle, at Scala RPC project from Twitter. Futures are a value that will become determined in the future. You can construct these values, and then selectively block one to wait until the value is available. Once it’s available you can continue the computation.
One interesting thing about this futures library is that you can assign each individual future to run in whatever scheduler you like. These schedulers are called
Executors, and expose a simple interface to add a unit of work to a queue. Implementations are free to schedule and execute that work as they wish, either on OS threads, green threads, or whatnot. So you can do things like this:
future0 .then(executor1, future1) .then(executor2, future2)
future1 will be scheduled and executed with
future2 will be scheduled and executoed on
Question: Which schedulers do developers use in practice?
Answer: There are a few in use, but for the most part picking a scheduler tailored for CPU-heavy workloads or one tailored for IO-heavy workloads is enough choice to cover most use-cases.
Question: Can you do cluster scheduling with this?
Answer: It’s limited to a scheduling on a single box.
Modeling state transitions with specification-based random testing
Riak, as an eventually consistent distributed key-value database, can use an in-band optimization called read repair to update stale values in replicas. However, for old, less-accessed values an asynchronous job must run and update keys with stale values. In the worst case, that means a full scan of all the keys. Inefficient but simple to optimize. A more efficient update scheme based on maintaining a merkel tree to identify keys that may need updating could be used, but that’ll be harder to implement. How do you verify the implementation?
You can model the merkel tree operations and the state transitions, and then use quickcheck to establish a correspondence between this model and the actual execution. It’s a bit different from the typical use of quickcheck found in languages like Haskell, since those tend to focus on data more “algebraic” properties, rather than trying to make a connection code that’s more operational in description.
Fighting spam with Haskell at Facebook
- Some people want to write spam (and other bad things)
- there’s a lot of it
- want to catch as much as piossible in a completely automated way
In the write pipeline, inject a box that checks if a post is “evil” and if so, block it and provide feedback to the user.
Sigam :: Content -> Bool
Sigma is a rule engine
- each action type, evaluate a set of rules
- rules can block or take other action
- manual + machine learned rules
- rules can be updated live.
As an example, if you want to block posts from people in Vancouver posting about functional programming, you want to express rules that say: If the person is posting about Functional Programming; they are logged in from Vancouver; and they have more than 100 friends; and more than half of their friends like C++; then block.
Express these rules using the
Haxl monad, within the Haskell language and using all the tools the language makes available to you:
cufpSpammer :: Haxl Bool cufpSpammer = talkingAboutFP .&& location .== "Vancouver" .&& numFriends .> 100 .&& friendsLikeCPlusPlus where talkingAboutFP = ("Functional Programming" `isInfixOf`) <$> postContent friendsLikeCPlusPlus = do friends <- getFriends cppFriends <- filterM likesCPlusPlus frends return $ (cppFriends / friends) > .5
The programmer chooses whether to fetch data, but the implementation chooses when to fetch it. Allows for short-circuiting or concurrency, automated by the library.
The initial prototype was developed in April 2013. By the end of the year they could prosses a whole request, though it became functionally complete by summery 2014. First live traffic was handled several months afterwards. This eventually replaced completely a DSL called FXL. Migrating users to Haxl involved writing teaching materials, running multi-day hands-on workshops, and creating an internal Facebook group called “Haxl Therapy.” Code reviews provided by development team.
Does it Work? Found a multi-year-old bug in GHC which caused machines to crash, but that has since been resolved. One other runtime bug was discovered, but only affected shutdown. Besides those runtime bugs, the Haskell code doesn’t crash… which is good because diagnosing a crash in Haskell is Very Hard. Of course, that’s excluding FFI code which is inherently unsafe. Good monitoring, however, is essential to ensuring good service performance, and load on the system varies dramatically over time.
Resource Limits Server resources are finite, and the service has SLAs. Some requests to hog resources and starve the rest of the requests being processed. Some regex libraries are exceptionally good at this. To adress this, they implemented allocation limits in GHC. So at the very least, those large requests won’t starve the system.
Recognizing patterns in noisy data using trainable 'Functional' State Machines
Use ‘functional’ state machine to help extract the signal from noisy gesture recognition data. Here, functional means actually using functions to encode the state machines. The reason you’d want to do this is to “compress” the representation of the state space that is used.
With this sort of representation, it’s really to parameterize your state machine by simply adding another argument to the function representation. This could assist in, for example, redefining thresholds for gesture recognition in response to new data collection and analysis.
Side-note: This was a really good talk, but I couldn’t write down the example code fast enough for it to come through.
Building binary protocol drivers in Elixir
The setup: a bunch of self-described old OO dudes are trying to bust into this FP thing, trying to use the right tools to solve the right problem. Moved to Erlang to help manage the millions of concurrent connections from incoming clients. But aging architecture remained behind that still needed to be accessible by the application. Namely Microsoft SQL server, which Erlang’s odbc module lacked support for modern primitives in SQL, e.g.,
UNIQUE. The choice was therefore made to write a Tabular Data Stream (TDS) driver in Elixr.
Some benefits of Elixr:
- Very nice generated documentation
- Really great community
- Erlang-to-Elixr and Elixr-to-Erlang interop are both easy
Based work on TDS on postgrex. A message is a collection of packets with headers and fragments of payload. The payload is then reconstructed once all the packets are received. Elixr-style pattern matching enables dispatch that makes writing this sort of parser code very simple.
Integrated the Tds library with ecto, which looks like an EDSL for generating SQL queries.
How to create more Elixr developers?
- Evangelize - talk about it
- Open Source contributions in the language
- Think Functionally
Conclusion: Find the right tool for the job.
Question: How long did it take to do this?
Answer: Team of 2-3 developers working for 2-3 months, including ecto integration.
Haskell in production, a survey
Various ways to pitch Haskell to your employer.
Concern 1: Teaching it to existing developers will be hard, and if all the Haskell developers leave, nobody will be able to maintain the system.
Rebuttal 1: Don’t hire slow learners. Maintainability if people leave is a concern, but it’s a concern for all langauges. Type’s help but putting static info about behavior into the code.
Cabal’s sort of busted, use stackage with stack.
Writing nagios checks in Haskell is possible!
If you want to bring Haskell to your company,t hink about how you’ll do things, and why your business will benefit.
Coping with change - data schema migration in Haskell
The appraoch to data schema migration relies on two DSLs: one that describes the data model, and one that describes the changelog. As most data model schemas, it’s typed. Here’s an example.
User = record name :: Username admin :: boolean Username = basic string
If you were to add a new record
User = record name :: Username admin :: boolean logins :: LoginCount Username = basic string LoginCount = basic integer
You could express the change in the following way:
version "0.2" changed record User field added logins :: LoginCount default 0 added LoginCount basic integer
Question: Is this autogenerated or written by a programmer?
Answer: In many cases yes, it’s autogenerated, but you also have to option to manually author changelogs as well.
With these in place you can do some analysis about whether, for example, you can get from an older version to the latest version. Futhermore, you can interpret the changelog and produce a migration that you can run on your data. If the changelog language does not support a particular operation you need, there’s a release valve built-in that allows you to run arbitrary Haskell code within an upgrade.
All records in the system are stored with a schema version. So the upgrade process (which the presenter notes is a “Stop the World” upgrade) proceeds as follows:
- export old applcation’s dataset as JSON
- install new version of application
- find the version number in the changelog
- apply changes to schema and data in parallel
- check resulting schema is as expected
- import json dataset into new application
The result is easy changes are very simple and straightforward, with validation identifying erorrs before they’re executed.