I should have emphasized the speed of deployment being a first order concern more. We certainly can (and do) build our code for every change, but not at the speed that we want to be updating.
We use a monorepo for all of the benefits it has, and deploying fast business logic updates this way helps mitigate one of its downsides (particularly when you've maximally parallelized the build). I've found https://danluu.com/monorepo/ to give a quick overview of how chopping up the repo would have separate downsides.
The section about "Sticky Shared Objects" speaks directly to mutable state across code modifications, just with a Haskell-minded focus.
How much is this because of Haskell's build times in particular? Is there a sort of "target build time" that would make you more comfortable with this stuff
I don't think coming across these problems in general is Haskell specific. We've grown enough to bubble these issues up in this Haskell project, but would have needed to do something much sooner if this was C++.
> make you more comfortable with this stuff
Which stuff are you referring to?
Overall I'd love if all builds were significantly faster, so we contribute to upstream GHC to make it better in the ways we come across. Our platform has a deployment SLA that we strive to maintain as our "target build time".
I'm assuming you're asking what's important in writing "production" Haskell rather than "toy example" Haskell.
Ixiaus's point about mechanics more than theory certainly rings true, though we did think a lot about whether to use GADTs for the Dimension type.
Overall I see this as similar to writing "production" code in other languages, going through a couple feedback loops using real use-cases. Profiling to find the bottlenecks, observing how APIs are used in practice compared to intent, and reaching the service to a stable equilibrium.
Build times for this library haven't cropped up as a first-order concern. Using GHCI and `stack test` for the dev workflow has been fast enough (though could always be better).
One of the cases we found while performance profiling was a tradeoff between memory usage and computation completion. On certain requests FXL would use too much memory and be halted by the equivalent to AllocationLimits, while Haxl would happily plow on using less memory and complete the request. When looking at many of those requests in aggregate, the end result would have more successfully completed requests but with longer response times mixed in. Completing more requests was seen as a win over the apparent decrease in throughput.
Are you employing the ideas behind reactive programming? And can you explain the types of monads you used for what problem and why? I am writing a paper on Functional Reactive Programming and Haxl really made me curious. The paper (currently in german, but I'll translate it) proposes a new Hypothesis that tries to shred FRP in general, by showing a novel way that solves some of the problems automatically that naturally occur with FRP.
I am really interested in seeing how you solve problems for distributed systems with Haxl and how query sharding is handled etc..
I've wasted a whole day looking for Haxl online a few weeks ago, just to find out that it wasn't released yet. The release really makes me happy :)
Query sharding is at the data source layer, which Haxl doesn't delve into. It's up to each data source integration with Haxl to do the appropriate routing/etc.
Is Bryan O'Sullivan and the team from his Haskell-based startup Facebook acquired in 2011 still there? I sat in on a class of his a while back and remember him ruefully laughing about having to use PHP now.
Is it like a query engine, where you work with the entire query up-front, apply transforms and build a query plan?
Or is it more like an event loop, where you run as far as you can until the code blocks on IO, batch up and send all the pending IO requests, and run further when the tasks you're blocked on resolve?
Part of the beauty is that the actual way IO (note: in this version, IO here means 'reads from the network', almost always) is scheduled is abstracted away such that we could go with either approach w/o impacting client code.
That said, the way it currently works is more like the first. You can think of the entire haxl run (program) as an AST that is given to the execution. It expands as much of the AST as possible (anything that's not IO), and anywhere it needs IO it enqueues those requests to be scheduled. Once it's explored as much as possible, it aggressively schedules the IO (deduping, batching, and overlapping the calls). Once it all comes back, it unblocks the AST where it can, and repeats the process.
This isn't necessarily the optimal scheduling (as you point out, unblocking each part of the tree as each result comes in might be better). It was specifically designed to make it easy to play with this kind of stuff later. Since the concurrency is entirely implicit the implementation is entirely abstracted away.
Have a look at the SQLTap service written by the guys from DaWanda.com (https://github.com/paulasmuth/sqltap). It does basically exactly that for SQL queries but is implemented as a standalone Java/Scala SQL proxy server.
Interpreted code was no longer cutting it for perf reasons, and any time you create your own language you end up reinventing the entire tool chain (debuggers, profilers, etc.). Haskell provides so much functionality in the language itself and has mature solutions to the other issues plaguing us in FXL, so it was a natural choice.
We use a monorepo for all of the benefits it has, and deploying fast business logic updates this way helps mitigate one of its downsides (particularly when you've maximally parallelized the build). I've found https://danluu.com/monorepo/ to give a quick overview of how chopping up the repo would have separate downsides.
The section about "Sticky Shared Objects" speaks directly to mutable state across code modifications, just with a Haskell-minded focus.