Can you point at the autobatching packages? What strategy are they taking? Will ...

byt143 · on Dec 9, 2018

From the blogpost

"Automatic Batching To get the most from these accelerators – which can have significant overheads per kernel launch, but scale very well over input size – it is common to batch programs, applying the forwards and backwards passes to multiple training examples at once. In simple cases, such as with convolutional nets, it’s simple to handle this by concatenating, say, 10 images along an extra batch dimension. But this task becomes much harder when dealing with variably-structured inputs, such as trees or graphs.

Most researchers address this by taking on the significant burden of batching code by hand. Different solutions have been proposed for different frameworks (DyNet, TensorFlow Fold, which heuristically try to batch some high level operations together when possible, but these typically either have their own usability issues or do not achieve the performance of hand-written code.

We suggest that this problem is identical to that of Single Program Multiple Data (SPMD) programming, which has been well-studied by the language and compiler community for decades, and becomes visible in more recent approaches to batching like matchbox. Indeed, it is very similar to the model of parallelism used by GPUs internally, and has been implemented as a compiler transform for the SIMD units of CPUs. Taking inspiration from this work, we are implementing the same transform in Julia to provide SPMD programming both for scalar SIMD units and for model-level batching. This allows us to reach the ideal of writing simple code that operates on individual samples, while still getting the best performance on modern hardware."