Even though capnproto would be our first choice, the lack of support for Windows/CMake is kind of a party killer. FlatBuffers doesn't offer everything we need either, but its codebase is simpler to grasp and hack, so it may end up being the safer choice... which is unfortunate
I'm trying to get basic MSVC support into version 0.5.0, which is planned for release in late November. The reflection and dynamic APIs probably won't be supported initially because too much would have to be rewritten to work around missing C++11 features in MSVC -- they'll come online as soon as MSVC adds support. But, most common use cases don't need them anyway.
0.5.0 will also feature cmake support (this is already in git).
Wow: "For applications on Google Play that integrate this tool, usage is tracked" without even an option to disable that. Sure, it's open source so can be changed by editing the source code, but does anyone else find that kinda creepy?
e.g. if I make an app using 10 FOSS libraries, then I wouldn't want my app reporting to 10 different places everything which the user is doing.
Also, on the actual homepage for it (http://google.github.io/flatbuffers), the only mention of this call-home feature is buried at the bottom of the "building" page.
[EDIT] This is incorrect; see below comments about this tracking not being a call-home feature but instead just Google scanning apps submitted to the Play Store
I'm a bit confused. Is it actually calling home? That seems kind of unlikely, given that a) that seems pretty egregious for a library like this, and b) they said this doesn't affect the application at all beyond consuming "a few bytes".
Could it instead be just that Google scans Google Play apps for a string in the binary that matches the Flatbuffers version string format? That seems more likely given what the README does say about this. And it also seems more useful in general; Google would benefit more from knowing how many applications use the library than knowing how popular these applications are.
Ah yes, that's possible; I'm not sure why I leapt to that conclusion. In that case, an app which uses this library and is only published on the Amazon Android App Store would not be tracked I guess.
I actually find it kind of interesting that I mentally turned "tracked" into "calls home".
I don't see an issue with adding a string letting the Play Store scanner know that we use this lib.
It is reporting who the users of this lib are (the apps that implement it), and not informing on the app users themselves.
It seems like a very good way to jauge the interest for this lib on Android in order to decide how much resource they will allow to its dev.
Again, taking this from a previous conversation on this topic - https://qht.co/item?id=7904443 - it seems CapnProto and Flatbuffers are much faster in C++, Go and Rust... the benchmarks may be very different in Javascript, Python, Ruby, etc.
It would be really interesting (and possibly more relevant for HN) to have benchmarks based on one dynamic language - say Python.
Oh and @kentonv - I'm not a native American English speaker (rest of the world really). I really, really have trouble pronouncing Capn'Proto. Even more difficult to pronounce it in a meeting and have people recall/Google it.
To be clear, the thing that you'd think would be a problem in dynamic languages -- lack of pointer arithmetic -- actually isn't a problem. Every language has a way to extract values from a byte string, e.g. the `struct` module in Python, TypedArrays in Javascript, ByteBuffer in Java, etc.
The real problem in dynamic languages is that they tend to be worse at inlining accessor functions. This is not really because inlining is impossible -- v8 can do it -- but because most dynamic languages don't prioritize performance in the first place and so haven't implemented such optimizations. This is actually a problem in Go as well, weirdly. Because of this, if you actually intend to consume most of the content of a message, it may make sense to parse it into a language-native data structure up front so that access doesn't need to go through accessor functions. Most Cap'n Proto implementations support this. Doing this will still be much faster than using Protobufs because the Cap'n Proto format is naturally faster to decode.
As David says, "Cap'n" should be pronounced like "happen", though pronouncing it as "captain" is OK as well (and will still get people to the right place if they Google it).
@kentonv - you misunderstand. I do know that all of this can be implemented in dynamic languages. The question is whether the benchmarks there will be significantly different than benchmarks on languages with direct unsafe memory access.
Hmm, I thought that was what I was answering. Maybe I'm still misunderstanding. You're asking if Cap'n Proto's advantage over something like Protobufs will be less pronounced in a dynamic language compared to C++? Yes, that is likely the case, due to one or both of the inlining issue and the the language's general slowness dwarfing any gains from the encoding library.
Of course, in cases where Cap'n Proto has a more-than-constant speedup, such as reading a single field from a large message (O(1) in Cap'n Proto, O(n) in Protobufs), then the difference will still be huge regardless of language.
If you're looking for specific benchmark numbers, I don't have any handy, sorry. (But benchmarks can be manipulated to show any result, so you shouldn't trust any author-provided numbers anyway.)
@kentonv - yes that is what I was asking and thank you for asking.
I think one aspect of my question got lost in the noise and it is my fault. On Python, protobuf vs capnproto is not apples to apples, since the former is pure python ... while yours is python wrapper over C. I have read your justifications [1] and I agree with you. But do note that there are some large usecases for Python on desktop software. C-extensions turn to be blockers in those cases. In many ways, I was hoping that you would have a pure-python version as well (since you did build one at Google) which sacrifices speed for compatibility.
It would be great if someone were to contribute a pure-Python implementation, but it's unlikely the sandstorm.io team will work on this since it has no real use to us.
I actually think it's likely that a pure-Python version of Cap'n Proto would be significantly faster than the pure-Python protobuf implementation. Parsing Protobufs in Python is really horrible performance-wise since you have to inspect and branch on almost every byte. The way to make Python fast is to delegate as much work as possible to the built-in libraries that are written in C. But, there's just nothing that can be delegated in the case of Protobufs. In contrast, a Cap'n Proto parser could pretty easily leverage the existing `struct` module.
That said, if you enable Cap'n Proto's "packed" mode, then this advantage is lost, since that's another byte-by-byte algorithm that will perform poorly in pure Python.
Strings are simply a vector of bytes, and are always null-terminated. Vectors are stored as contiguous aligned scalar elements prefixed by a 32bit element count (not including any null termination).
So... does the count include the null terminator byte or not?
I think the first use of the term 'vector' is more conceptual - but is actually defining a string type that is implemented using a c-style string strategy. The second mention "Vector" is a more direct reference to the C++/Java Vector class and its imlementation.
Technically speaking, you an implement a c-style string using a STD:Vector by ignoring the length preamble and ensuring room is made for the null character. I got away with this in my intro to c++ class after showing the teacher that I already knew how to implement strings in C from a previous class.
I believe you are correct, mainly because the basic purpose of the protocol is "no parsing", so therefore it must work when loaded directly into RAM.
The spec is a bit confusing though, because of the statement that "Strings are simply a vector of bytes". The way I understand it is that a string is a vector FOLLOWED BY a null terminator. The spec should probably say that rather than the current wording.
This would appear to be necessary so that a string can be treated as either a vector like any other (with the correct number of elements) and can also be accessed directly by a (char *) pointer without things going awry.
Disclaimer: I haven't read the whole spec yet; this is my off-the-top-of-my-head interpretation and I may have misunderstood it completely.
Cool. I'd love to know more about what makes your system awesome - It is a very creative idea! Have you thoguht about creating a DTD out of the schema or vice-versa. Having a DTD to validate the file against would allow for some serious robustness in hot-loading stuff from the web.
I think the big draw for the flatbuffer system is that it can stream data in with a low memory foot-print.
Are there rules attached to the plurality of (so called) open source licenses, or even OSI approved licenses? Not really.
Are you finding this particular tracking code acceptable? Apparently not.
So there was never any coherent whole that could have found something unacceptable to begin with and in the end there are still disparate parts that continue to find it unacceptable.
I guess this is an obvious and tiresome answer, but I'm not sure what else you would expect anyone to say.
I assumed that you did not find the tracker here acceptable.
What I find tiresome is insisting that the use of some license or the other is a statement of values (your phrasing also implies that history clearly agrees with you, which I tend to find tiresome).
If the readme and other materials made repeated attempts to invoke some set of values and the source was contrary to that, fine you have a valid gripe, but the readme doesn't mention the license and the homepage ( http://google.github.io/flatbuffers/ ) keeps it to "It is available as open source under the Apache license, v2 (see LICENSE.txt)."
It's a version string, which as far as I know, has been acceptable since the beginning of open source. They just let us know that Google Play scans APKs for that string. I imagine Google Play also scans for other libraries, open source or otherwise.
http://kentonv.github.io/capnproto/news/2014-06-17-capnproto...