The downside is that you have to recreate full data constraints within your application - the database offers none of it.
Of course constraints are tedious when you're in "experimentation" mode (to quote another post I see here) and are doing rapid, early development. But once you're in production with data that's critical/important (i.e., not someone's list of their favorite songs; more like, their bank statements and medical histories), constraints are the bees knees.
Once you have data constraints in place, now migrations are hard - whether or not you're on a SQL database. You need to either update all old documents to match new schemas, or open up your constraints to "expect both" (where by "both" I really mean, "any number of 18 different formats...oh make that 19") - and that is the potentially slippery slope here into a coding crapfest.
Disclaimer: I'm the author of a very popular SQL tool for Python (SQLAlchemy) as well as a new database migrations tool (Alembic).
While I can see the advantages of document databases in some use cases and I can't help but feel that a lot of adoption of MongoDB and its ilk are more related to the rough edges of interfaces/ORMS to relational databases and not fundamental flaws in the relational databases them themselves. This coupled with fact the rising generation of developers is less familiar with SQL than the last is leading to some curious choices for datastores. It should be an interesting next couple of years.
Also great work on Alembic, I started using it a few weeks ago and am very pleased.
It has to be noted that a lot of the >referential< constraints are just necessary because an RDBMS wants you to split your "object" over several tables. What I found pleasant with the non-relational databases is that these constraints CAN just fall away because you e.g. nest "comments" inside of an array in a "blogpost" object. When deleting the blogpost all comments will be deleted too and cascading deletes are just not necessary.
Some of the other constraints you'll have to implement in your software. The advantage: you don't put application logic outside of your application. The disadvantage: Every bit of code touching that value has to know the limitations.
I wonder if this could be solved by using a message queue and just have dedicated step for updating/deleting data
> I wonder if this could be solved by using a message queue and just have dedicated step for updating/deleting data
See but now you're building some big thing. Let's just include that in MongoDB or whatever, a "constraints engine". So that you don't have to build it from scratch each time, and can have some mature, well tested thing instead of something ad-hoc and probably buggy. Now you need to carefully build migrations again !
One of the messier parts of having to implement referrential constraints in the application is that you have to handle the concurrency issues yourself. A RDBMS handles the locking of the referred rows for you.
Referential integrity is to databases what pointers are to code. You definitely need it.
What's worse than migrations? Being unable to turn an application off, since it uses the old schema.
With ChronicDB we support indefinite backward compatibility. Unlike per-record versioning tricks, application code does not need to be aware of migration code.
Using conventions for data constraints is quite dangerous. Whereas bugs in software can be corrected, bugs in corrupt data often cannot, as the data is created in a temporally-dependent way (i.e. you captured it wrong) as well as that it may be massive (i.e. your corruption is widespread across TBs of data). Basically, you might only get one chance with data.
A convention-based approach, or even a well thought out data-enforcement approach, will have bugs and failures, and you just have to hope these failures aren't severe enough that you lose your "one" chance.
Relational constraints OTOH when used in their usual way make it virtually impossible to have situations like this.
OMG I'm so excited because it is schema-less! Was there something insightful in that blog post that helped it bump to the frontpage, or is it just because it had MongoDB in the title?
Downvote away, but is it too much to ask to upvote meaningful blog posts that present something new?
HN is so easy to game - it takes very few votes in a short period of time to get something on the front page. The guy probably sent the link around to his buddies - I've been on the receiving end of a number of such requests (and ignored them).
I'm curious, when you change the schema on the fly, the app code has to deal with both versions, right? I'm afraid this means you shift the pain instead of blowing it away.
Distributed system development is a zero sum game. That is the dirty little secret/feature of NoSQL. It is great if the shifted work can be partially addressed; simply addressed; or entirely ignored (for your domain). But if you find yourself reinventing an RDBMS, it is time to re-evaluate your choices.
(I'm a NoSQL OSS developer/contributor and enthusiast.)
I have some hacks for making migrations literally painless I'll share soon. It takes some tinkering to figure things out but that's why it's exciting, because the solutions are actually better.
I've been developing using MongoDB for a few months now and the thing that always comes back to me as missing is the need to group a set of actions so that they perform atomically. It's not exactly meaning i need transactions (i don't care about the commit/rollback part), just need the need to say - hey server, do X, Y, and Z as a batch so that another thread won't do something in the middle of that.
The general consensus on this is to structure your data so that it encapsulates your business needs in one document structure (which is atomic on changes), but i find it hard to always conform to in the real world.
So now i have to use zookeeper (memcached also works) to setup global locks on those specific batch update actions. I guess it's a small price to pay right? Right?
I tend to use a lock just long enough to write a transaction document. After you get the transaction captured in a document you can go back and fix up the primary documents without a lock.
I couldn't agree more with this blog post. The drastic reduction in friction has allowed me to experiment a lot more painlessly. Mongo may not always be as robust as a mature SQL solution, but the friction is so low I just don't care.
The fact is that Mongo is _fun_ to develop with and that's invaluable. But there's another side to databases and that's DB administration. Most Mongo reviews I've seen so far are from developers, but I haven't seen many positive reviews from DBAs.
I'm still struggling to understand how embedded documents can be useful in Mongo, considering there is no way to do a SELECT with a WHERE clause on their contents. https://jira.mongodb.org/browse/SERVER-142 . The age of this bug report has me wondering whether there is some fundamental design flaw with a schemaless database that causes this. Thoughts?
Embedded documents currently serve a smaller purpose that one would assume at first glance. I answer a lot of questions in the MongoDB-User group by telling people to "pull it into its own collection".
However, there are a couple planned features that'll change this. Virtual collections is one, but the other is the $ operator in field selection..which I believe is planned for 2.1.
Even as-is though, they are useful. The tag example is simplistic...let me give you a real case from mogade.com. We have a scores collection, which looks something like:
We essentially store the user's top score for each scope (daily, weekly and overall). You could store a scope-per-document, which is how it initially was....but, that isn't how it's modeled and, it takes a lot more space (user and leaderboard get repeated 3x, along with the index which is on those two fields).
Edit:
I wouldn't rely on the age of a feature request as a sign that there's something fundamentally wrong with a design. Not everything can be top priority.
I'm currently working on an Android app, and started looking at couchdb a few days ago. I have to say I'm pretty intrigued, but I'm going to be dealing with location data, so the lack of querying has put me off for now.
I had read a lot about NoSQL stuff over the last few months and never really got it. What was the advantages / disadvantages, etc. I was happy with MySQL and it worked for me over the years ... why change?
Then, last week I was working with a 3rd party API and returned a big JSON response for their transactions. I wanted to store a lot of their response in a database and it looked like a huge pain. Searching around for the best ways to go about storing JSONs in MySQL I found the following comment (http://stackoverflow.com/questions/3564024/storing-data-in-m...):
"CouchDB and MySQL are two very different beasts. JSON is the native way to store stuff in CouchDB. In MySQL, the best you could do is store JSON data as text in a single field. This would entirely defeat the purpose of storing it in an RDBMS and would greatly complicate every database transaction.
Don't."
Wait, NoSQL systems' default store is JSON?!? A few clicks later and I was playing with Mongo over at https://mongolab.com/ ... installing the PHP Mongo extension was a piece of cake. I was up and running in minutes.
Now, instead of developing one or several tables to store the information from the 3rd party API, I just dump their JSON response right into a Mongo collection. I can query whatever I want from that ... and there might be information that I want later on, that I didn't realize to store initially. If I had created a regular MySQL table that info wouldn't be there, but with Mongo I'm storing everything, so I'll be able to use that other info later if I want.
I don't think that Mongo is a replacement for MySQL — they are too different tools with distinct advantages and disadvantages. But Mongo definitely suits certain applications better. So, use the tool that best suits your project!
I did it in sqlite. Putting json in SQL table is trivial.
Putting additional table for kv search idx is also trivial and extremely fast. Best of all, it works in android iOS and python with one simple wrapper class wrapper layer (<100 line of code) I was able to implement an user authentication web server + db code in 300 line of python with and kind user attributes that can be added later on.
I read somewhere that fb use mysql as pure kv data store.
Of course constraints are tedious when you're in "experimentation" mode (to quote another post I see here) and are doing rapid, early development. But once you're in production with data that's critical/important (i.e., not someone's list of their favorite songs; more like, their bank statements and medical histories), constraints are the bees knees.
Once you have data constraints in place, now migrations are hard - whether or not you're on a SQL database. You need to either update all old documents to match new schemas, or open up your constraints to "expect both" (where by "both" I really mean, "any number of 18 different formats...oh make that 19") - and that is the potentially slippery slope here into a coding crapfest.
Disclaimer: I'm the author of a very popular SQL tool for Python (SQLAlchemy) as well as a new database migrations tool (Alembic).