This is really interesting and very cool to see. The approach we're taking with ...

marpaia · on Oct 29, 2014

Right now osquery only supports read operations but you're totally right; if you could kick off tasks, kill processes, unload kexts, etc. via CREATE, DELETE, etc statements, that would be so killer!

vezzy-fnord · on Oct 29, 2014

So in essence, you're motivated by the same underlying concept as the Plan 9/Inferno developers: define a small set of abstractions and apply them ruthlessly, in contrast to the myriad of non-uniform interfaces one is practically using every day in an operating system.

For Plan 9, it was "everything is a file" and the power of using simple operations like bind mounts to create complex software interactions that would otherwise require monolithic protocol and library stacks anywhere else.

Here, it's... "everything is a table"? I'm not very familiar with table-oriented programming, but what advantages does having the RDBMS be the prime metaphor over the file system really bring? Structure? Rob Pike had some interesting words on that: http://slashdot.org/story/50858

grandalf · on Oct 29, 2014

> what advantages does having the RDBMS be the prime metaphor over the file system?

Easy queryability and easy joining of data across a whole datacenter.

This makes it easy to think about system data across all sorts of boundaries that the file metaphor makes somewhat cumbersome.

trhway · on Oct 29, 2014

> This makes it easy to think about system data across all sorts of boundaries that the file metaphor makes somewhat cumbersome.

while file metaphor became big in previously SQL domain of data processing, i.e. the whole ecosystem of HDFS and everything on top of it

electrum · on Oct 29, 2014

At the end of the day, almost everyone queries those HDFS files with query language using Pig, Hive, Presto, etc.

geofft · on Oct 30, 2014

"Everything is a file" means that you need to parse files to get useful data out (think things like /etc/mtab and /proc/mounts), which is closely tied to another UNIX philosophy, that tools should generate plain text and parse it using generic text-processing tools. This is great for getting things done quickly. It's also great for security holes (think CVE-2011-1749 and related issues; arguably, see also Shellshock).

One advantage of "everything is a table" is that your structures are well-formatted and there's no risk of problems when you put a space in a pathname. For most implementations of "table", you can also have the data formats be well-typed. This brings reliability and security benefits.

I think there's validity to Rob Pike's argument in many contexts -- for instance, you absolutely won't see me defending the semantic web over the greppable/Googleable one. But in the specific case of text files with a single, well-defined structure, his own argument seems to imply that there's no sense in a second tool having to infer the structure on its own.

(The usual way this is worked around these days is separate files for each field, or files designed to be parseable, which is why Linux's /proc/*/ is such a mess. Compare /proc/self/stat and /proc/self/status, and /proc/self/mounts and /proc/self/mountinfo. Also look around /sys a bit.)

harelba · on Oct 31, 2014

There's a command line tool called q, which allows performing SQL-like queries directly on text files, basically treating text as data and auto detecting column types.

http://harelba.github.io/q/

geofft · on Nov 2, 2014

Neat, but auto-detection is exactly what I don't want. We have structure on one side. Why round-trip it through an unstructured format and attempt to guess the exact same structure on the other side? If I guess wrong, it's a security hole.

emmelaich · on Oct 30, 2014

Yes!

This is great. One of the frustrations I've had with Puppet and Ansible is the lack of a clear model for data. It's quite difficult to know the scope and dependencies and origin of all the variables that one deals with.

If one could update tables and then have that representation be reified to the machines it would be awesome.

7952 · on Oct 29, 2014

>> The next step would be manipulating the OS as relations.

This approach could be a good fit for package management. So that packages are updated and run within a transaction and changes can be committed in a single seamless step.

lifty · on Oct 29, 2014

Package managers usually have non-idempotent actions that might change parts of the operating system in non desired ways. That means that you could not have atomic operations ala SQL. There is one packager manager that solves that, Nix(from NixOS), on top of which you could apply something like an SQL language.

amelius · on Oct 30, 2014

I'd rather see an OS and package manager that has a "functional" design (as in functional programming language, functional data structures). This would allow conflicting packages to be installed next to eachother in different branches of a functional filesystem.

mej10 · on Oct 30, 2014

The Nix package manager refers to itself as "The Purely Functional Package Manager", and that is exactly what it lets you do.

"It provides atomic upgrades and rollbacks, side-by-side installation of multiple versions of a package, multi-user package management and easy setup of build environments."

http://nixos.org/nix/

amelius · on Oct 31, 2014

This seems interesting. But I find it a bit unsatisfying that it can only be used as a package manager then. How about using this functional machinery, e.g., for a general build system? A "functional make" so to speak. And I bet there are plenty of other use cases.

kbenson · on Oct 29, 2014

This would really require the underlying package management system support this, and then it's simply a shim from osquery to the package manager to do the actual work. The main problem would probably being making it work across all the disparate systems it's supposed to support.

That said, it would be awesome to query specific package versions, or even individual package file MD5s from an SQL interface to check system exposure when new exploits come down the pipe.

andrewchambers · on Oct 29, 2014

package managers usually work with transactions already.

maaku · on Oct 29, 2014

While you're at it, one related piece that I'd like to see explored is relations-as-code, code-as-relations a la Lisp. It would be interesting to see what a program would look like if represented as tuples in a relation, and able to self-modify by updating that relation.

Abstract, I know, but I don't think anyone has looked at this in any detail yet.

ibdknox · on Oct 29, 2014

We're already doing that one :)

p4bl0 · on Oct 30, 2014

Would you mind sharing a link to a paper or something about that?

jamii · on Oct 31, 2014

The only good references I know of:

http://p2.berkeley.intel-research.net/papers/EvitaRacedVLDB2...

http://www.seas.upenn.edu/~mengmeng/papers/datalog2.0.pdf

maaku · on Oct 31, 2014

Datalog isn't what I was talking about. I would like to see a programming language where both the basic data type and the representation of code itself are relations, as is the case with Lisp and lists.

jamii · on Oct 31, 2014

Datalog is relational. Both of those papers show datalog compilers in datalog.

maaku · on Nov 4, 2014

Thank you. I have had a misunderstanding of what datalog is for years. I have always thought of it as a DSL for making queries against a database using the relational calculus. I have never considered, nor have the texts I've encountered demonstrated how the syntax of datalog is itself relational, though it seems obvious now that I've noticed.

I have some planned projects which require a homoiconic relational language. I was hoping someone else could be inspired to design such a thing so that I don't have to. It looks like someone already did. I am happy to be proven wrong :)

maaku · on Oct 30, 2014

Yes please!

emmanueloga_ · on Oct 30, 2014

ibdknox, here's [1] an example of a relational scheme on top of HTTP APIs, maybe this could serve as inspiration for some aspect of your project. (BTW I feel like I keep bringing this [seemingly defunct] project up... oh well).

p.s.: Oh! there's a Meijer paper behind it [2].

1: http://ql.io/ 2: http://queue.acm.org/detail.cfm?id=1961297