Simon Marlow

The problem

In the write pipeline, inject a box that checks if a post is “evil” and if so, block it and provide feedback to the user.

Sigam :: Content -> Bool

Sigma is a rule engine

As an example, if you want to block posts from people in Vancouver posting about functional programming, you want to express rules that say: If the person is posting about Functional Programming; they are logged in from Vancouver; and they have more than 100 friends; and more than half of their friends like C++; then block.

Express these rules using the Haxl monad, within the Haskell language and using all the tools the language makes available to you:

cufpSpammer :: Haxl Bool
cufpSpammer =
  talkingAboutFP .&&
  location .== "Vancouver" .&&
  numFriends .> 100 .&&
  friendsLikeCPlusPlus
where 
  talkingAboutFP =
    ("Functional Programming" `isInfixOf`) <$> postContent
  friendsLikeCPlusPlus = do
    friends <- getFriends
    cppFriends <- filterM likesCPlusPlus frends
    return $ (cppFriends / friends) > .5

The programmer chooses whether to fetch data, but the implementation chooses when to fetch it. Allows for short-circuiting or concurrency, automated by the library.

The initial prototype was developed in April 2013. By the end of the year they could prosses a whole request, though it became functionally complete by summery 2014. First live traffic was handled several months afterwards. This eventually replaced completely a DSL called FXL. Migrating users to Haxl involved writing teaching materials, running multi-day hands-on workshops, and creating an internal Facebook group called “Haxl Therapy.” Code reviews provided by development team.

Does it Work? Found a multi-year-old bug in GHC which caused machines to crash, but that has since been resolved. One other runtime bug was discovered, but only affected shutdown. Besides those runtime bugs, the Haskell code doesn’t crash… which is good because diagnosing a crash in Haskell is Very Hard. Of course, that’s excluding FFI code which is inherently unsafe. Good monitoring, however, is essential to ensuring good service performance, and load on the system varies dramatically over time.

Resource Limits Server resources are finite, and the service has SLAs. Some requests to hog resources and starve the rest of the requests being processed. Some regex libraries are exceptionally good at this. To adress this, they implemented allocation limits in GHC. So at the very least, those large requests won’t starve the system.