Thanks for sharing your experience. I went through the callback-hell problem mys...

mtanski · on Dec 30, 2013

I haven't used boost::coroutine but I have experience working with Mordor (https://github.com/mozy/mordor), swapcontext directly and implementing my own swapcontext. Why implement your own swapcontext? Well, the default one in Linux actually is glibc can be a bottleneck if you're going to be calling it frequently. It makes the sigprocmask() syscall, if you can avoid it's a substantial speed up. You can read up more on this here: http://rethinkdb.com/blog/making-coroutines-fast/

Another thing you'll see give you a big performance boost is stack pooling (eg. not calling mmap for every makecontext). More on that on the rethinkdb article I mentioned. If you scheduler targets multiple OS threads you should also be careful here to avoid synchronization slow downs. Either some kind of lockless list or pre thread pools.

Now lets talk about 3rd party code. If you have a 3rd party library that internally uses TLS and you swap its context onto a different thread it's bound to misbehave and when it does it's usually subtle and hard to debug. So if you're using 3rd party libraries you either have to audit them (and make sure you didn't miss anything), disable context migration (and risk unbalanced workloads) or have a separate scheduler that only runs those tasks. Pick your poison.

It doesn't even have to be 3rd party code that miss behaves when green threads are migrated. I pulled out my hair for a couple weeks trying to debug an issue with a call to accept(). It was returning -1 but errno was set to 0. What gives? Well it turns out that on linux in glibc errno is a macro, that calls a function to get the address for errno for your thread. And that function is marked with the gcc __attribute__((pure)). So what it means that once the address of errno is calculated once in the body of the function the compiler is free to assume it'll always be that address (it's a pure function without side effects). Here's the sequence:

1. accept() == -1 2. errno == EAGAIN 4. errno = 0 3. scheduler_yield() 5. accept() == -1 6. errno == 0 (although it should be something else)

This will happen on Linux with glibc if your scheduler_yield() call returns but is running on a different thread when it returns. So even your own innocent code that doesn't use TLS can break in interesting ways.

If you have very small green threads and you have a naive stealing scheduler with a mutex you can be sure that you'll be spending significant on synchronization. You can get fancies with non-blocking queues and atomic instructions to overcome this.

I did have multiple schedulers for both CPU bound tasks and IO bound tasks. I would say that that if you're doing disk IO and you're just forwarding the data (versus having to process it) you're better of with non-blocking sendfile() or non-blocking vmsplice() (plus mmap) in your event loop. If you're doing lots of disk IO on a SSD array that can push 2GB/s you're going to needs lots of IO threads the latency of the message passing between the two scheduler is going to add up. Again this may or may not be problem in your application.

Those are some of my own experiences, they may or may not apply to you but I hope it helps.