From what I have read Facebook use a similar method: commit and deploy often and rollback if something messes up. We also use this method on Plurk.com and have done so for about a year. Thought, IMVU's case is pretty extreme :)
The major problem is rolling back client side changes (that are located in scripts or CSS). This is pretty costly to rollback, because of browser cache - we solve this by having real versioning of the static files so we can force a refresh of browser cache (real versioning = script_{timestamp}.js and not script.js?v={timestamp}).
I've read your post on unit tests, and I didn't understand what you were trying to say.
Were you saying don't write automated tests that test your code, instead focus on monitoring the actual production invironment?
Or were you saying that specifically the "unit test" class of automated tests are not worth their time?
I can imagine a system that monitors the business metrics well enough to prevent defects from slipping into production (it's a stretch, metrics are soft and squishy moving targets), but I can't imagine using only those metrics to find every bug you ever slip into production. Metrics are so distant from the bug that caused their downturn; you'd waste so many cycles debugging. The gap between writing the code and finding the problem would be much larger than if unit tests found them; that has to slow things down as well.
- Monitoring the production environment, tons of effort. We record and analyze an incredible amount of data about everything that happens on the site, and have more and more automated processes looking for anomalies (though still nowhere near as many as I would like).
- Automated testing not including unit tests, some effort. I wouldn't be opposed to us doing more of this, but it's not incredibly high-priority and there always seems to be something else that's more important.
- Unit testing, yeah, not worth our time as far as I'm concerned.
I've worked at places that have done this sort of thing too. We basically dropped css and js files in new directory with the svn rev number which made it very easy to deploy and break through the client side cache.
That makes sure the code is consistent if the user refreshes the webpage or just visits it the first time. But what happens if the user just keeps the AJAXy web-page open for hours (as I do with Gmail for instance)? If you deploy too often and both frontend+backend code are in flux, you're more likely to end up with an inconsistent code state.
I guess you could make the frontend code aware of the code version, include it as a param with each XHR request, have the server check versions and return a "version mismatch", and then produce some alert on the browser asking to refresh the page. But this would tradeoff far too much usability.
Last time we ran into this one, we made sure as much page state as possible was pushed into the fragment portion of the URL (for bookmarkability as much as anything else).
Then when the AJAX stuff saw a version mismatch, it would wait until the user completed any operation that -wasn't- stored in the fragment and put up an "updating, gimme a sec" box, and refresh itself.
It was a hell of a lot of work but -extremely- slick (which I'm allowed to say because it wasn't me who wrote that part ;)
The major advantage to using "script.js?v={timestamp}" is that it maintains a consistent URI for the resource. Whereas with "script_{timestamp}.js", everything that points to it needs to be updated every time it changes.
You could create a symbolic link or rewrite rule that directs requests for "script.js" to the latest "script_{timestamp}.js" but it's more convenient to use a URI parameter.
The problem with script.js?v={timestamp} is that it's ignored by some browsers while script_{timestamp}.js isn't. And with script.js?v={timestamp} you can't set good cache headers.
Also, if you ever move to a CDN, then you are forced to use real versioning (at least with Amazon Cloudfront).
The versioning scheme we use is `md5 hash of name + file contents + file extension` (and not timestamp).
I'm not aware of a browser that ignores URI parameters.
Moving to a CDN does not force you to put versioning in the path or filename. The URI parameter merely tricks the browser into thinking there is a new file. The parameter itself is otherwise ignored.
Unless you specify "Cache-control: no-cache" header you aren't really sure how the browser caches your static files (especially if the user is behind a proxy - and even "Cache-control: no-cache" can easily be ignored).
deploy/rollback is probably ok for a consumer site. But not everything is a public website (no really...) - if you're deploying a service with an SLA with dollar penalties for downtime you might want to stick to a more traditional release cycle. I sure hope the phone network, the stock exchange and my bank aren't using deploy/rollback and releasing 50 times a day!
The major problem is rolling back client side changes (that are located in scripts or CSS). This is pretty costly to rollback, because of browser cache - we solve this by having real versioning of the static files so we can force a refresh of browser cache (real versioning = script_{timestamp}.js and not script.js?v={timestamp}).