Flatbuffers by Google – CapnProto alternative

kentonv · on Oct 25, 2014

Detailed (though perhaps biased) comparison I wrote up a few months ago:

http://kentonv.github.io/capnproto/news/2014-06-17-capnproto...

tbastos · on Oct 25, 2014

Even though capnproto would be our first choice, the lack of support for Windows/CMake is kind of a party killer. FlatBuffers doesn't offer everything we need either, but its codebase is simpler to grasp and hack, so it may end up being the safer choice... which is unfortunate

kentonv · on Oct 25, 2014

I'm trying to get basic MSVC support into version 0.5.0, which is planned for release in late November. The reflection and dynamic APIs probably won't be supported initially because too much would have to be rewritten to work around missing C++11 features in MSVC -- they'll come online as soon as MSVC adds support. But, most common use cases don't need them anyway.

0.5.0 will also feature cmake support (this is already in git).

halayli · on Oct 25, 2014

I found it interesting that they didn't benchmark it against capnproto (http://google.github.io/flatbuffers/md__benchmarks.html)

bch · on Oct 25, 2014

Apparently Cap'n Proto doesn't build with MSs compiler[1]. Does it build on Windows (where the benchmarks were performed [2]) at all?

[1] http://kentonv.github.io/capnproto/news/2014-06-17-capnproto... (see feature matrix)

[2]http://google.github.io/flatbuffers/md__benchmarks.html (introduction/top of article)

kentonv · on Oct 25, 2014

It builds in Cygwin -- not that I'd expect typical Windows developers to be satisfied by that.

MinGW and limited MSVC support will come soon.

kevinbowman · on Oct 25, 2014

Wow: "For applications on Google Play that integrate this tool, usage is tracked" without even an option to disable that. Sure, it's open source so can be changed by editing the source code, but does anyone else find that kinda creepy?

e.g. if I make an app using 10 FOSS libraries, then I wouldn't want my app reporting to 10 different places everything which the user is doing.

Also, on the actual homepage for it (http://google.github.io/flatbuffers), the only mention of this call-home feature is buried at the bottom of the "building" page.

[EDIT] This is incorrect; see below comments about this tracking not being a call-home feature but instead just Google scanning apps submitted to the Play Store

lilyball · on Oct 25, 2014

I'm a bit confused. Is it actually calling home? That seems kind of unlikely, given that a) that seems pretty egregious for a library like this, and b) they said this doesn't affect the application at all beyond consuming "a few bytes".

Could it instead be just that Google scans Google Play apps for a string in the binary that matches the Flatbuffers version string format? That seems more likely given what the README does say about this. And it also seems more useful in general; Google would benefit more from knowing how many applications use the library than knowing how popular these applications are.

subim · on Oct 25, 2014

> Could it instead be just that Google scans Google Play apps for a string in the binary that matches the Flatbuffers version string format?

Of course that's what's happening: https://github.com/google/flatbuffers/blob/master/include/fl...

kevinbowman · on Oct 25, 2014

Ah yes, that's possible; I'm not sure why I leapt to that conclusion. In that case, an app which uses this library and is only published on the Amazon Android App Store would not be tracked I guess.

I actually find it kind of interesting that I mentally turned "tracked" into "calls home".

on_and_off · on Oct 25, 2014

I don't see an issue with adding a string letting the Play Store scanner know that we use this lib. It is reporting who the users of this lib are (the apps that implement it), and not informing on the app users themselves.

It seems like a very good way to jauge the interest for this lib on Android in order to decide how much resource they will allow to its dev.

sandGorgon · on Oct 25, 2014

Again, taking this from a previous conversation on this topic - https://qht.co/item?id=7904443 - it seems CapnProto and Flatbuffers are much faster in C++, Go and Rust... the benchmarks may be very different in Javascript, Python, Ruby, etc.

It would be really interesting (and possibly more relevant for HN) to have benchmarks based on one dynamic language - say Python.

Oh and @kentonv - I'm not a native American English speaker (rest of the world really). I really, really have trouble pronouncing Capn'Proto. Even more difficult to pronounce it in a meeting and have people recall/Google it.

kentonv · on Oct 25, 2014

To be clear, the thing that you'd think would be a problem in dynamic languages -- lack of pointer arithmetic -- actually isn't a problem. Every language has a way to extract values from a byte string, e.g. the `struct` module in Python, TypedArrays in Javascript, ByteBuffer in Java, etc.

The real problem in dynamic languages is that they tend to be worse at inlining accessor functions. This is not really because inlining is impossible -- v8 can do it -- but because most dynamic languages don't prioritize performance in the first place and so haven't implemented such optimizations. This is actually a problem in Go as well, weirdly. Because of this, if you actually intend to consume most of the content of a message, it may make sense to parse it into a language-native data structure up front so that access doesn't need to go through accessor functions. Most Cap'n Proto implementations support this. Doing this will still be much faster than using Protobufs because the Cap'n Proto format is naturally faster to decode.

As David says, "Cap'n" should be pronounced like "happen", though pronouncing it as "captain" is OK as well (and will still get people to the right place if they Google it).

sandGorgon · on Oct 26, 2014

@kentonv - you misunderstand. I do know that all of this can be implemented in dynamic languages. The question is whether the benchmarks there will be significantly different than benchmarks on languages with direct unsafe memory access.

I dont know the answer, ergo the question.

kentonv · on Oct 26, 2014

Hmm, I thought that was what I was answering. Maybe I'm still misunderstanding. You're asking if Cap'n Proto's advantage over something like Protobufs will be less pronounced in a dynamic language compared to C++? Yes, that is likely the case, due to one or both of the inlining issue and the the language's general slowness dwarfing any gains from the encoding library.

Of course, in cases where Cap'n Proto has a more-than-constant speedup, such as reading a single field from a large message (O(1) in Cap'n Proto, O(n) in Protobufs), then the difference will still be huge regardless of language.

If you're looking for specific benchmark numbers, I don't have any handy, sorry. (But benchmarks can be manipulated to show any result, so you shouldn't trust any author-provided numbers anyway.)

sandGorgon · on Oct 27, 2014

@kentonv - yes that is what I was asking and thank you for asking. I think one aspect of my question got lost in the noise and it is my fault. On Python, protobuf vs capnproto is not apples to apples, since the former is pure python ... while yours is python wrapper over C. I have read your justifications [1] and I agree with you. But do note that there are some large usecases for Python on desktop software. C-extensions turn to be blockers in those cases. In many ways, I was hoping that you would have a pure-python version as well (since you did build one at Google) which sacrifices speed for compatibility.

I was thinking in that context.

[1] http://kentonv.github.io/capnproto/news/2013-09-04-capnproto...

kentonv · on Oct 28, 2014

It would be great if someone were to contribute a pure-Python implementation, but it's unlikely the sandstorm.io team will work on this since it has no real use to us.

I actually think it's likely that a pure-Python version of Cap'n Proto would be significantly faster than the pure-Python protobuf implementation. Parsing Protobufs in Python is really horrible performance-wise since you have to inspect and branch on almost every byte. The way to make Python fast is to delegate as much work as possible to the built-in libraries that are written in C. But, there's just nothing that can be delegated in the case of Protobufs. In contrast, a Cap'n Proto parser could pretty easily leverage the existing `struct` module.

That said, if you enable Cap'n Proto's "packed" mode, then this advantage is lost, since that's another byte-by-byte algorithm that will perform poorly in pure Python.

dwrensha · on Oct 25, 2014

For what it's worth, I pronounce "Cap'n" to rhyme with "happen", and sometimes I fall back to saying "Captain Proto".

userbinator · on Oct 25, 2014

I looked at their implementation at http://google.github.io/flatbuffers/md__internals.html and found this rather confusing paragraph:

Strings are simply a vector of bytes, and are always null-terminated. Vectors are stored as contiguous aligned scalar elements prefixed by a 32bit element count (not including any null termination).

So... does the count include the null terminator byte or not?

ultimape · on Oct 27, 2014

I think the first use of the term 'vector' is more conceptual - but is actually defining a string type that is implemented using a c-style string strategy. The second mention "Vector" is a more direct reference to the C++/Java Vector class and its imlementation.

Technically speaking, you an implement a c-style string using a STD:Vector by ignoring the length preamble and ensuring room is made for the null character. I got away with this in my intro to c++ class after showing the teacher that I already knew how to implement strings in C from a previous class.

C-style strings: http://www.learncpp.com/cpp-tutorial/66-c-style-strings/

C++ Vector class: http://en.cppreference.com/w/cpp/container/vector

Java Vector Class: http://docs.oracle.com/javase/7/docs/api/java/util/Vector.ht...

zeroxfe · on Oct 25, 2014

That sounds like an unambiguous no to me (null-terminator not included in count.)

robert_tweed · on Oct 25, 2014

I believe you are correct, mainly because the basic purpose of the protocol is "no parsing", so therefore it must work when loaded directly into RAM.

The spec is a bit confusing though, because of the statement that "Strings are simply a vector of bytes". The way I understand it is that a string is a vector FOLLOWED BY a null terminator. The spec should probably say that rather than the current wording.

This would appear to be necessary so that a string can be treated as either a vector like any other (with the correct number of elements) and can also be accessed directly by a (char *) pointer without things going awry.

Disclaimer: I haven't read the whole spec yet; this is my off-the-top-of-my-head interpretation and I may have misunderstood it completely.

mockery · on Oct 25, 2014

I'd expect that strings are just a vector of bytes, with strlen()+1 bytes, the last of which is always null.

jhallenworld · on Oct 25, 2014

I used the C preprocessor as the schema compiler in my serialization library: https://github.com/jhallen/joes-sandbox/tree/master/lib/sdu

ultimape · on Oct 27, 2014

Cool. I'd love to know more about what makes your system awesome - It is a very creative idea! Have you thoguht about creating a DTD out of the schema or vice-versa. Having a DTD to validate the file against would allow for some serious robustness in hot-loading stuff from the web.

I think the big draw for the flatbuffer system is that it can stream data in with a low memory foot-print.

desdiv · on Oct 25, 2014

Related discussion 130 days ago: https://qht.co/item?id=7901991

WhitneyLand · on Oct 25, 2014

So when did it become acceptable to leave tracking code turned on by default (opt out) in open source repos?

maxerickson · on Oct 25, 2014

Are there rules attached to the plurality of (so called) open source licenses, or even OSI approved licenses? Not really.

Are you finding this particular tracking code acceptable? Apparently not.

So there was never any coherent whole that could have found something unacceptable to begin with and in the end there are still disparate parts that continue to find it unacceptable.

I guess this is an obvious and tiresome answer, but I'm not sure what else you would expect anyone to say.

WhitneyLand · on Oct 25, 2014

Just because there are no rules against something does not make it acceptable or good.

Just because I review code doesn't meant I accept or use it so your assumption is wrong.

There should be nothing tiresome about calling out for discussion of evolving trends in software or anything else.

If you see nothing negative about this practice then speak out constructively.

maxerickson · on Oct 25, 2014

I assumed that you did not find the tracker here acceptable.

What I find tiresome is insisting that the use of some license or the other is a statement of values (your phrasing also implies that history clearly agrees with you, which I tend to find tiresome).

If the readme and other materials made repeated attempts to invoke some set of values and the source was contrary to that, fine you have a valid gripe, but the readme doesn't mention the license and the homepage ( http://google.github.io/flatbuffers/ ) keeps it to "It is available as open source under the Apache license, v2 (see LICENSE.txt)."

kevb · on Oct 26, 2014

It's a version string, which as far as I know, has been acceptable since the beginning of open source. They just let us know that Google Play scans APKs for that string. I imagine Google Play also scans for other libraries, open source or otherwise.

https://github.com/google/flatbuffers/blob/master/include/fl...