Thanks jedberg. You've definitely captured how we feel about it.
I will say that while our sustained use discount (SUD) works as you describe, our committed use discounts (CUD) are more similar to the AWS Flexible RIs. For some customers, even if it's obviously cheaper to use on-demand and just get the SUD benefit, they prefer the predictable "let me be sure I pay X" model.
As a note, even though CUDs and Flexible RIs are similar, I prefer ours (obviously). It's just a pile of cores and RAM in a region that you sign up to. That is easy enough to do in arrears that no ETL is necessary, and like you imagine: we want to automate this, too. We also don't play games with upfront payment or not, but most enterprises don't care.
There is a certain amount of clarity or volatility in a system. Volatility will entail cost that will effectively be passed onto the buyer one way or another.
Google can be smart about it and try to do some predictions, but ultimately, the customer should in many cases know much more about what their usage patterns will be. This information reduces volatility and therefore cost ... passed on to the buyer.
If you say "I want to rent 10 cars for a month" vs. "I want the ability to rent from 2 to 20 cars for a month, not sure about what" - what is the intrinsic cost going to be for the provider? Even with demand smoothing over a large client base ... the increased volatility is cost.
I don't think this is related to gambling.
In many ways this is really similar to financial planning and I think the language of this article will make way more sense to financial types than technical types. I might even go so far as saying this is really a problem for financial ops, and not devops.
It's not a bad deal, companies purchase physical servers all the time which are around 3 year investments. Plus you can prorate your reserved instances to larger ones for only the difference in cost.
I wonder if this contributes to amazon's ability to have better capacity planning. I've had issues with available quota from google in the past, and haven't had the same experience with amazon.
I agree that there is more dollars to be made on consulting, however there is little money to be saved by the client. It's all about promising cost savings while pocketing the difference and some more.
Whatever GCP has is rock-solid and often superior to AWS.
For example: when AWS encounters non-catastrophic issues with their hypervisor, you are on the hook for moving the instances away (meaning stop-start, or termination and relaunch for instance store). Depending on the instance type, this can cause service disruption.
GCP will transparently migrate the VM while it is running for you. You never see it, you customers don't either.
Same for networking: if you use their "premium" network, you can have anycast IPs to the closest POP, which will route traffic on Google's Network, not the open internet. AWS does not have anything close to this, the closest is multi-region VPC peering, without the fancy routing.
AWS offers more features though, which could be important if you require them.
> Whatever GCP has is rock-solid and often superior to AWS.
A number of global outages in various network-related APIs or features on GCP in the past year says otherwise. I don't think there has been anyglobal or even multi-region outage at AWS in many many years, the biggest recent outage for S3 in one region (us-east-1) impacted the internet more than any of the (~3) global outages at GCP and some other recent ones at Azure.
> For example: when AWS encounters non-catastrophic issues with their hypervisor, you are on the hook for moving the instances away (meaning stop-start, or termination and relaunch for instance store).
Apparently this is only for older instance types.
> GCP will transparently migrate the VM while it is running for you. You never see it, you customers don't either.
If you never see it from GCP, how do you know AWS doesn't do it for more recent instance types?
> Same for networking: if you use their "premium" network, you can have anycast IPs to the closest POP, which will route traffic on Google's Network, not the open internet.
> GCP will transparently migrate the VM while it is running for you. You never see it, you customers don't either.
This is just hot-migration between hosts. It "stuns" a VM for a fraction of a second - very high performance databases and videogame servers will sometimes notice, while everything else just sees a spot of lag and keeps going.
AWS are an odd cloud vendor who don't support many common cloud features:
- No hot migrate between hosts.
- No hot-add RAM or CPU.
- No memory-state snapshot, only disk snapshot.
- No arbitrary CPU or RAM quantities, only "t-shirt" sizes - can't build servers in "nonsensical" configurations like 12 CPUs and 1GB RAM, or 1 CPU and 128 GB RAM.
This is on top of having arbitrary separators in their data centers, so they send you strange messages about having to delete and rebuild your servers in the same data center. AWS may think it's cute to sell different "areas" in their data center, but the way to have redundancy for servers in AWS us-east-1 is to have servers Azure US West 2 or GCP us-central-1.
Like early iOS in the smartphone space, AWS are dominant in the VM space through marketing, not features.
From re:invent presentations, we know that even availability zones might be made up of multiple datacenters. In 2014, James Hamilton's presentation said that one of the AZs in us-east-1 had 6 separate datacenters.
I don't think it's really accurate to say AWS is selling us different 'areas of a datacenter' when we know that AZs are not only not sharing a datacenter, but might be multiple datacenters themselves.
From a certain point of view, it's Amazon using different terminology - what others call a data center, they call an AZ...or even a subset of an AZ, and we'll never know exactly - nor the capacity of each[0]. And what they call a data center, others call a region.
[0] When a server class is "sold out" in a region, you can't start your server - but there's no indication of this anywhere until you try to start your server. Other cloud providers auto-rebalance VMs to make space - using AWS is sometimes more like using physical servers than VMs - maybe moreso with paravirtualization.
I'm really confused as to what you're trying to say.
AWS has been very open over the years about what their terminology means. When they say datacenter, they mean it in the traditional sense of the word. When they say an AZ is made up of at least one but sometimes multiple datacenters, they mean that that an AZ has multiple physical datacenters. They're not slicing up a server room and calling these multiple datacenters.
We also know that AZs are physically separated from each other.
So an AWS region has at least as many physical datacenters as it has AZs, and potentially quite a few more.
James Hamilton has talked pretty extensively about this stuff at re:invent, and as an AWS customer, his talks have been some of the most interesting to me.
Other people calling a datacenter a region doesn't suddenly reduce an AWS region down to a datacenter. A datacenter is a word with a pretty specific definition, and an AWS region does not fit that definition.
The AWS equivalent of Anycast / closest region is to route all traffic through Cloudfront. That way users enter the AWS fiber within 50ms (sometimes 5ms) and have SSL terminated there as well. Only works for HTTP(S) traffic, though, not general networking.
It's not so much sane, as possibly a little more friendly to those who can't - or don't want to do capacity planning.
Visibility in terms of outcomes means savings, or rather, variability means cost.
Ultimately, you're going to bear the cost if you cannot provide visibility because Google is not likely ever going to do it as well as you can for your own business.
Ultimately, if you knew exactly what you needed over the next few years, the cost would be significantly cheaper as there's no need to have slack or wasted capacity. Google effectively forgoes this option entirely and assumes at least some minimum volatility, which means more cost.
I think it would be nice to have Google's offer, but then also a longer term 'lock in' low price option as well, as frankly, this fits a lot of businesses. Most of the economy is not as dynamic as Valley startups.
> I think it would be nice to have Google's offer, but then also a longer term 'lock in' low price option as well, as frankly, this fits a lot of businesses.
We hear you. That's why we offer Committed Use Discounts [1]. Are you saying that a 3-year commitment to a specific price (or lower, as we do price cuts) is insufficient though? (I want to understand)
I'm only making a very general reference to the fact that long-term visibility and predictability entails lower cost and therefore lower price, and that business owners are likely more empowered to determine that outlook than the cloud provider, either AWS or Google. Ergo - some kind of customer oriented long term lock-in would likely, in the long run, produce the cheapest prices in the system. That's all.
Ahh, I misunderstood your point! Yes, it’s certainly easier for us providers if you provide a clear demand signal. RIs, CUDs, SUDs, and even general contracting terms each provide some measure of information between customer and provider.
We actually had an interesting debate about this topic at the NSF Workshop on Cloud Economics [1]. The AWS person sadly had to cancel, but both some Google people and MSFT people were present along with CS and Economics academics. There are lots of industries where similar behavior exists, e.g., airlines or hotels. If you book in advance, or commit to a room block, you get a discount in exchange for certainty. We had lots of amusing debate about how similar cloud actually is to say energy markets (which turn out to be incredibly distorted, regulated, and confusing). Hopefully David and the rest of the organizers will have their summary report out within a few months.
The notion that the break even point is 70% is ignoring some really important stuff.
If you reserve workload x on hardware y for n years, you're effectively strapping yourself into a sure-to-be-obsolete and more expensive platform which you'll have to then move off of at an arbitrary point n years in the future.
If you don't move, you wind up paying a premium to be stuck with the obsolete / more expensive platform just to avoid the cost of migration.
It's not unless you depend on a huge amount of other AWS services. Buying hardware and colocating - or even paying month to month to rent servers from a dedicated hosting provider will typically be much cheaper than reserved instances.
Which is likely the reason AWS bandwidth charges are so high. That's their lock-in factor that often makes it infeasible to use cheaper servers elsewhere as long as anything else is on AWS.
My experience hasn’t been so black and white. There is still a trade-off on giving up flexibility, which is one reason to move to the cloud. When I modeled it out (3 years ago for AWS, more recently for Microsoft) the one year commit struck the best balance between cost and flexibility.
There is one truism: cloud costs always seem to grow faster than revenue. :-)
I don't see how arbitrary AWS instances are in any way going to go 'obsolete'. They have been around for a decade and are becoming more and more normative.
Second, the underlying financial principle is that with visibility comes lower volatility comes lower cost - that's some very basic financial logic that's at play here.
Yes, of course the contract implies a degree of vendor lock-in, but this is inherent in the nature underlying operational costs.
"RIs are a lock in." - of course. And if you don't want to be locked in, then you're going to have to pay a lot more: AWS, GCC it doesn't matter, it's the same financial reality everywhere.
No, the GP is referring to things like c3 vs c5 families. When you're locked into the 3 year old hardware, you have to pay the cost of upgrading eventually.
First - that 'lock in' is very evident by the nature of the contract.
Second - because newer things are available, it does not mean that others become 'obsolete' in three years by any means.
The vast majority of business do not need to have access to the latest, specific version of what are essentially commodity bits of hardware and software.
It's not really a risk for most businesses, and if it is, then obviously they can pay a higher price for the volatility inherent in switching at any time if they so chose.
My experience with AWS reserved instances has not been very good previously.
1. Once you buy a reserved instance, you're locked in to that type and price for the duration, even though newer types at lower prices may get introduced (as they almost definitely would over 1-3 yrs).
2. If you're from outside the US, you might not be able to resell your reserved instance. So you're stuck with an old instance type at an inflated cost.
In contrast, Google Cloud just gives you a price equivalent to a reserved instance price (or better), based on hours of usage, without asking for an upfront commitment.
I’ve gotten proactive emails from our account manager when they release new/cheaper instances and they offer us the option to transition and get a credit for our existing RIs.
We aren’t a huge account (less than 30k/month) so I thought this was a nice gesture on Amazon’s part.
I'm consistently surprised at how big the variance in quality of service is from different AWS account managers; seemingly regardless of the size of account.
The 2/3 reps we've had have been night and day in the level of service they've given us, and we're a top 10% customer by volume.
Recommend checking who the account emails are configured to be sent to. Often they go to a finance person who may not understand the importance of some communications such as this.
This is terrible advice for all but the largest of organizations.
Running your own hardware is AWFUL. Get ready to dedicate an entire team to network engineering, fixing broken hard disks, patching operating systems, screwing around with RAID controllers, upgrading switches, planning power and cooling, and endless vendor negotiation -- with ISPs, hardware manufacturers, datacenter operators, etc.
Oh, and did I mention, throw elasticity out the window -- all of this has to be planned in advance, and purchased, and installed, months ahead of when it will be operable. So forget the ease and convenience of just spinning up more capacity.
Also, there's a massive distraction of having to focus management attention on this non-value-adding part of the business, all so that you can shave down some cost, rather than investing in growing revenue.
Having been in this position firsthand, don't do this. If netflix can run 1/3 of Internet traffic off of AWS, I guarantee they're much larger than you, and it should say something, that they'd rather outsource this part of their business than dealing with all this crap.
Focus on software and product/market fit. It's just a much, much, much better use of expensive technical people, that will be done on day 1, without any risk, hassle, or complexity, than trying to replicate something someone else already does for you, much better and at competitive costs, than trying to reinvent all of this yourself.
To be honest, I don't even want to deal with EC2 anymore; I'd rather just use a PaaS.
>This is terrible advice for all but the largest of organizations.
Don't start a conversation with an opening generalization like that if you want something constructive. Especially when the rest of your post is clearly based on the single anecdote of your experience.
>Running your own hardware is AWFUL.
Maybe for you. Not for any sysadmin with even just a couple of years of experience.
>patching operating systems
We're talking about instances, none of that sysadmin stuff goes away if you're on AWS. If you don't have patching management for operating systems on AWS then your instances are screwed. AWS instances don't eliminate the need for sysadmin work.
The only real difference is the hardware management. And if you read my post you would have seen that I said using aws for the on-demand flexibility is okay. All of the static workloads are what belongs down in your datacenter.
Netflix doesn't run 1/3 of the Internet traffic off of AWS, only a tiny subset because of the aforementioned shitty economics. The real workhorses are in custom netflix servers at peering points. Netflix would be bankrupt if they used AWS for video. Do some research before spreading free marketing propaganda.
This forum tends to only think in terms of explosive growth of traffic, which <0.1% of companies actually have to deal with. AWS flexibility is needed by very few successful B2C companies, but it's supported by the cargo-culting of orders of magnitude more developers ("jedberg said this worked for reddit, we need this because we're like reddit").
Also, your whole argument about non 'value-add' is bogus. That's the same excuse that management uses to outsource all development. Everything has a cost and provides some value to the company.
> Don't start a conversation with an opening generalization like that if you want something constructive. Especially when the rest of your post is clearly based on the single anecdote of your experience.
You are doing as much, if not more generalization by way of the assumptions you're making.
> >Running your own hardware is AWFUL.
> Maybe for you. Not for any sysadmin with even just a couple of years of experience.
Sysadmins aren't real estate attorneys or facilities managers or security personnel. A lot of them aren't even tech ops who physically manage the DC hardware and are oncall 24/7 to fix problems on-site if needed. You need all of those things to run your own DC, and potentially a lot more if you're physically building the center itself (architects, contractors, civil engineers, etc.). And if you're serious about latency, availability, and durability, you're going to need those things in multiples for however many datacenters are needed to meet your targets. How many organizations have the millions of dollars of capex needed to get that off the ground and keep it all running?
> Netflix doesn't run 1/3 of the Internet traffic off of AWS, only a tiny subset because of the aforementioned shitty economics.
To how many organizations do the economics of serving 1/3 of Internet traffic apply? 2? How is that a counterexample to his point about datacenters only making sense for the very largest?
Even if you sidestep all those costs by renting instead of building and even if we take for granted that your "shitty economics" are still shitty down numerous orders of magnitude from Netflix-size, you're still burning money making your devs design and operate your system twice - once for the DC, once for AWS, and however much work it is to glue the 2 together. The end result may be cheaper infrastructure-wise but it will also be unavoidably less reliable and more complex purely by virtue of having more than twice as many moving parts.
Let's say implementing things this way takes a single dev time-and-a-half compared to just doing it on one or the other. Let's say (very conservatively) you pay your dev $100k a year + $50k benefits. You're now $75k in the hole from the get-go. That's enough to pay for roughly 83 m5.large EC2 instances on-demand (no RI) for a year. Your company has to be very large for the marginal savings of using a DC to outweigh that kind of deficit.
> How many organizations have the millions of dollars of capex needed to get that off the ground and keep it all running?
I think you are exaggerating when talking about "millions". Companies running in old school DC will not build the actual DC. They will rent a 42U cabinet or part of it.
You can typically rent 1/4 of a cabinet for a few hundred bucks a month.
I you operate at a small scale, you typically don't have that many servers, maybe 4 or 5. A decent server is in the range of 4000 to 6000$ and will last generally around 5 years.
And these servers rarely breaks (we have a fleet of more 300 servers, and maybe 1 or 2 "wake the on call" crashes a year).
Throw in 1 or 2 decent switches at about 1000 to 2000$ each and you are mostly good to go.
You end up with a capex of ~50 000$ to get you started, with a depreciation over 5 years.
Power is the dominant cost in datacenters unless you have very, very expensive switches. Power drives direct electricity consumption, the need for backup batteries and UPSes, cooling, and fans.
You can get pretty high-powered supermicro machines for only like 1000 or 2000 dollars these days. Over 5 years, that works out to only $16/month.
A 42U cabinet from HE (Hurricane Electric) that's "on sale" (http://he.net/colocation.html) runs $400/month. You need 20-30 servers before the spend on machines starts to overtake power. And I honestly doubt you'd be able to put 30 machines in that cabinet before hitting their power ceiling. I walked through an Equinix facility a while ago and if you hit 10KW/rack that's considered "hot". It's not hard to do if you stuff an entire 42U rack with 1U multi-core machines that each have CPUs drawing 50-100 watts/core (not unlikely with high-end Xeons). 15KW/rack is really hot.
What kind of specs are you getting for that kind of money? Just played around with AWS pricing calculator and honestly it seems like EC2 is even more cost-effective than I thought, depending on specs...
Edit: Try using https://awstcocalculator.com and see what it claims your savings would be (I'm curious how accurate it is)
+ dual power suply + Rails with cable management + 5 years support
Over 5 years, it's 6000 / 12 * 5 = 100$ per month.
Something comparable like an i3.2xlarge (less CPUs but more storage) is at 455.52$ per month On Demand price.
3 years full upfront, it's at 192.72$, better but still more expensive.
And i3 are less convenient than your server because they can go down at anytime. This means loosing the local NVMe storage you have to come up with a mix of EBS volume+local NVMe if you want some persistencies, or heavy clustering where loosing a node is not a big deal.
AWS is really expensive if you are using it wrong. And a lot of us are using it wrong ('"move <INSERT LEGACY APP> to the cloud" says management' mode). Using it right is quite difficult in fact, it requires a lot of engineering complexity to be resilient when AWS chose to shot the hypervisor under your feet. And truly leveraging the elasticity provided by things like AutoScaling groups or Lambdas, specially at the storage layer, is far from simple. I've seen instance where attempts to build "SERVICE THAT SCALE" ended-up being even worst than <LEGACY APP IN THE CLOUD> in term of costs.
This has been my experience being part of a team managing a large mixed deployment with ~5000 EC2 instances on one side, and, on the other 300 physical servers, each handling ~10 LXC containers in legacy DCs.
Where AWS shines is the fact you get a lot of flexibility. Need more capacity? increase the instance size and you are good to go. Need to create 300 ELB in a rush with some DNS records? it's done in 1 or 2 hours and 80 lines of boto. That level of instrumentation is not maintainable by any companies except Cloud providers and the biggest internet actors.
But if your load is fairly static and/or predictable, if you have the capital to buy the servers upfront, if your customer base is fairly localized, if you can manage the added complexity of having to do some capacity planing and hardware inventorying once or twice a year, then legacy DCs are still cheaper but I know and understand it's a lot of IFs.
That has exactly been our experience when deciding to go AWS vs self hosted. For our environment, it makes sense for time and money to go self hosted. I get it may be different than others, but not for use.
I get what you're saying about the benefits of PaaS or other managed services, but Netflix serves most content (by volume) off of appliances [1] they ship to ISPs that feed into the network at points-of-presence somewhere close to an edge. They use AWS for logic and compute (including transcoding [3]), but video streams from this globally-distributed CDN [2].
Having run a small hosting company for about 17 years now, I am laughing at this post... You greatly overstated the difficulty and understate the massive cost savings.
When it's your business, of course these things are easy. It's just part of routine maintenance and overhead. When you're in a different business, you generally want to be focusing on your own core competencies and solving your own specific business challenges, not care and feeding of servers. Classic division of labor problem.
The hyperbolic difficulty that folks are applying to running your own hardware is laughable. It is as if by miracle alone that civilization made it through the precarious transitionary period where we racked our own servers into the Wonderous Utopia of Cloud.
I definitely wouldn't tie my own shoe laces, only the largest orgs need laces, almost everyone can focus on walking if they use Velcro.
Right now we have 3 sysadmins for our own DC, and they can’t keep up with all the maintenance (upgrades on infra are the worst. Software quality on switches/SANs/appliances is terrifying. And those are better than quality of Server management, BIOS, etc. Even VMware has caused lots of pain with patches and upgrades in recent years.
If you don’t ever patch/upgrade to fix all the security vulnerabilities that exist at the infrastructure layer maybe “do it yourself” is cheaper. But that doesn’t fly in the banking industry.
We’re half-in the cloud and moving more there just to keep our head above water. We’re spending way less on cloud services than the $500K/year it would cost to double our infrastructure staff and keep it all in house.
I've written a lot on this thread. I think the linchpin of this entire thing is how much salaries for technical people have lifted.
If you can get a qualified dev/dc engineer for 50 or 60K maybe you rack your own. But when facebook is paying all-in median wages of 240K/year things start to look different. I understand not everyone is going to be able to work at fb and not everyone lives in California. But when one of your critical inputs doubles or triples in price, of course that's going to adjust how things get done an in industry.
It's been a few years since doing the cost benefit analysis between AWS and self-hosting, but for around 50 racks worth of servers and storage, the numbers came in on AWS's side.
That didn't even take into account the "free" multi-region capability you get from Amazon. Splitting our physical servers into a second region with enough capacity to failover would have nearly doubled our costs.
Were those numbers using 50 racks worth of instances (e.g. 20,000 of them) for the comparison? Did you remember to take into account the obscene bandwidth rates (sorry if that seems like a dumb question but I've seen this bite multiple companies moving to AWS)?
The break evens happen a lot earlier in my experience for static workloads, but I would love to see a breakdown if you're willing to share details.
Why would you compare AWS vs managing your own data center?
You could also compare AWS vs building your own silicon.
I think it would be better to compare AWS vs renting dedicated servers from a large provider? I think you will find that the scales tip heavily in favor of renting bare metal as far as price is concerned.
Why would you compare AWS vs managing your own data center?
Because we were already managing our own data center.
I think it would be better to compare AWS vs renting dedicated servers from a large provider? I think you will find that the scales tip heavily in favor of renting bare metal as far as price is concerned.
We offloaded a lot of work to Amazon that we were doing ourselves -- database hosting, storage system management, etc (lots of little used data went into S3/Glacier that previously we had on live disks)
Also, we liked the ability to have a failover region essentially for free - we only pay for enough servers to replicate the key data we need for failover, and keep the rest of the infrastructure powered off.
I was a bit incredulous that any truly all-inclusive analysis could ever show AWS being cheaper, but this phrasing made me realize that it could have been the one (remarkably common) case where it usually does: enterprise hardware.
That world is easily more expensive than AWS, especially considering that hardware maintenance contracts are a thing (and a shockingly expensive one, to those of us accustomed to the commodity hardware world).
> Also, we liked the ability to have a failover region essentially for free - we only pay for enough servers to replicate the key data we need for failover, and keep the rest of the infrastructure powered off.
That's a useful advantage, though there's a pitfall in that there's no powering off EBS volumes.
Netflix runs a third of internet traffic off of their OpenConnect CDN appliances, not AWS (Netflix.com and other control plane-ish functions aside).
Once you’re spending a few million dollars a year on cloud (arguably even less in some cases), it behooves you to explore hosting yourself. Cloud provider margins are substantial for a reason.
If you're truly spending "a few million dollars"/year, then MAYBE.
200k/month would get you there. My personal breakpoints are 50k/mo PaaS->EC2/IaaS and then something like 100k or 150k start thinking about physical. Maybe. By the time you hire everyone, get all the planning right, etc. you might be there need-wise, but it's not a sure thing.
I'd expect that to have nearly all the disadvantages of cloud (other than virtualization).
One is still locked into the provider's pricing structure.
One is still locked into the provider's ISP choices.
One is still locked into the provider's internal network architecture choices.
One is still locked into the provider's limited hardware choices.
The last one is the biggest one, if only because there are so many opportunities in so many components to maximize performance and minimize cost, with a little forethought and customization, all while staying well within the commodity market. The "one size fits all" model does many people a disservice.
This is especially true even at fairly low scale, where the issue of obtaining certain components in large volumes doesn't come into play.
I understand why some people insist it's just too hard to deal with hardware, but I find it disingenuous to advocate that opnion without at least a more comprehensive understanding of what modern, commodity hardware is capable of. I think most of the claims of difficulty are coming from those without this knowledge and have primarily software backgrounds (and are relying on hearsay or other second-hand experience).
Most providers I have seen that rent hardware offer unlimited customization when it come to the box you are renting. Network architecture would obviously be off the table though.
How exactly does that solve the problem? Don't you have to do the same capacity planning to decide how many servers to buy for your datacenter? Except you get less flexibility because you can't buy servers and have them instantly available like you can for reserved instances?
Also, what kind of workloads are you running that don't require databases? The biggest expense in any distributed system is moving data. If you have a datacenter with all the data, you've to move that data to the cloud and back for every bursted request. Whatever you might save in running your own DC will be lost to bandwidth charges.
And on the topic of running your own datacenter, it's unlikely you can run it as efficiently as AWS. What you might save in not paying AWS's profit margin you will probably spend in not being able to be as efficient as they are.
Right, you have to do the same capacity planning, but you are getting the massive upside involved in that work instead of Amazon.
>What you might save in not paying AWS's profit margin you will probably spend in not being able to be as efficient as they are.
This isn't how I've seen the numbers work out for the huge chunk of workloads that require mostly static instances (a.k.a haven't been modernized into a serverless code base). You are right about Amazon having an efficiency edge, but you are wrong about that benefit being to the customer's bottom line instead of theirs.
We are nowhere near the real commoditized pricing of massive scale compute. Even with the inefficiency of smaller datacenters, you can easily best AWS prices.
Where did you get the impression that you have to move all of the data into the cloud for every bursted request? That's a lazy strawman architecture to attack.
You don't have to move all the data, but you have to constantly move the needed data back and forth, unless you store a second copy in the cloud. And then you have to start capacity planning again.
> This isn't how I've seen the numbers work out for the huge chunk of workloads that require mostly static instances.
Any time someone says this I have to question if they really looked at the "all in" number. Did you include the salary of the person in purchasing who orders the servers? Did you include the lost engineering time dealing with dead servers (instead of just shutting them off)? Did you include the cost of spare hardware sitting around for emergencies? Did you include the cost of downtime due to broken hardware while waiting for it to be repaired or replaced?
There are so many other costs to running your own datacenter besides the servers and the space, which Amazon gets to amortize over all their customers, but you have to bear 100% on your own.
>There are so many other costs to running your own datacenter besides the servers and the space, which Amazon gets to amortize over all their customers, but you have to bear 100% on your own.
Yes, but those costs may be low (or zero) for you, but Amazon has to architect at a level much higher than that. For example, I have researchers with data that has zero backup/DR requirements. This is 10s of TB of data, but if they lost it all due to a fire or a catastrophic system crash, they would just shrug, order a new storage array from the insurance money, and request new copies of the data from the research labs at other institutions that also have it. Amazon doesn't offer any storage products at that reliability level, and the ones that are even close have significant data access latencies or file transfer costs to run analysis over a significant chunk of the data.
So, they buy a basic NAS, stuff if with 12T drives, and pay $0.19/Gig for it. That's one time, not monthly, and at only 50% utilization. Assuming S3 Reduced Redundancy is $0.02/Gig/mo (it's actually a little more, but we're being generous), they start saving money in month 10 not counting the data transfer or compute costs associated with processing that data either locally or in the cloud.
> Any time someone says this I have to question if they really looked at the "all in" number.
Yes we did.
> Did you include the salary of the person in purchasing who orders the servers?
That's a one time cost, and very low.
> Did you include the lost engineering time dealing with dead servers (instead of just shutting them off)?
Of course- but AWS has higher engineering costs to make it work, so self-hosting comes out in the end.
> Did you include the cost of spare hardware sitting around for emergencies?
You don't really need spare hardware. You have redundancy and get same day service from your vendors. And hardware nowadays rarely fails.
> Did you include the cost of downtime due to broken hardware while waiting for it to be repaired or replaced?
That would be zero, because a properly designed system has no downtime.
> There are so many other costs to running your own datacenter besides the servers and the space, which Amazon gets to amortize over all their customers, but you have to bear 100% on your own.
Sure, but there are so many other costs to running on Amazon that aren't there with self hosted.
I think this is a really illustrative example of how business strategy can influence decision-making.
"Paying AWS's profit margin" assumes you can get access to their cost structure. You can't. In order to get AWS's cost structure, you need to (a) be buying servers by the truckload to get volume discounts, (b) have a scaled labor force for physically moving, racking, and installing all of this that's insured, directed, and managed to high rates of utilization (want to pay devs to rack servers?), (c) hire expensive network engineers whose cost is fully amortized across AWS's massive installed base, (d) fully amortize all the software engineering required to control all of this, etc.
So the more realistic option for a typical company is, do I (a) try to do this myself, with the time delay, risk, and cost profile of a nonspecialist provider, or (b) pay Amazon, which will cost about the same as (a), but be better in every other way, EVEN THOUGH amazon's superior cost structure lets them make a decent profit off of that decision?
(b) is clearly the right choice. It costs you nothing more, Amazon gets to make profit, and everyone is better off. Point being, you are forced to go with Amazon because they have structural advantages you don't, which gives them access to a better cost structure that you can't replicate.
Rediculous false choice. You make a huge assumption that aws charges as much as it would cost you to build your own without bulk discounts, etc. They charge way more than that.
I've worked on 40-rack build outs using supermicro without any special pricing that beat the cost of AWS for an equivalent number of reserved instances.
What are you doing with 40 racks of hardware? Serious question. That's A LOT of computers! I assume you mean 40 42 racks? That's like, 800-1000 computers depending on whether you do 1 vs. 2u machines, or use blades, how dense the switching is, how full the racks are, etc.
I was in charge of tech ops for a billion-device scale analytics company and we ran 100-200 VM instances on EC2. I can't imagine needing hundreds of bare-metal instances. I even lived with the guys running Firebase. Before they got acquired by Google, I think they had, maybe a dozen or two bare-metal instances at Softlayer.
Did you include your own salary and the salary of the person who did the POs and the person who racked them and networked then and configured the networks?
What about the salary of the person who maintains all of that? The cost of spare parts? The cost of downtime when hardware breaks?
Not the parent, but, I, too, have done the cost analysis, at much smaller scale.
And, yes, all-inclusive, AWS is 1.5x-10x more expensive than commodity hardware, depending on how poorly optimized AWS's hardware choices were for the particular workload and the commercial datacenter market at the time.
> What about the salary of the person
In general, I've found the need for the quantity of "person", bizarrely, exaggerated.
> The cost of spare parts?
Included, and it's low. Is this another aspect that's exaggerated?
> The cost of downtime when hardware breaks?
This is identical to the cost of downtime when AWS's hardware breaks.
Yep, these people are called sysadmins. With AWS they are the ones who manage firewall rules, patching, etc on your instances as well as API keys etc. Some people like to call them devops because they didn't realize sysadmins could write scripts.
Maintaining 40 racks of hardware takes a surprisingly trivial amount of time aside from regular OS management.
The only products that actually eliminate these folks are things like Lambda + DB as a service.
> Maintaining 40 racks of hardware takes a surprisingly trivial amount of time aside from regular OS management.
I think it's only surprising to people who have been listening to the "it's awful" mythology or believe in a much higher than reality hardware failure rate.
I think the worst I've ever seen was with spinning disks from the time of the Thailand flooding, and even those were only 10-15% AFR and only for certain models.
Absent that kind of black swan event, one can easily engineer around even disk failures, such that failed ones can just be spun down and left in place, and, of course, one can do the same with server-level redundancy.
If the labor burden were actually onerous, it would be easy enough to avoid it, but I think it's telling that such techniques are so rarely spoken of.
Only when there's no argument in the first place and the decision is already made.
I don't think I've ever seen AWS (or any public cloud) require more labor in terms of hours/persons, but I routinely see it require labor that's more expensive, since we're back to a market where programmers are paid more than sysadmins.
Opportunity costs are very, very real and very often overlooked when people give this kind of advice. For 95% if all workloads encountered by the types of people who dwell on this forum, there is zero value add in running your own data center.
“Saving money” by running your own servers is penny wise, pound foolish. You’ll never, ever compete with AWS for features or cost.
(And “you” means almost everybody reading this...)
> For 95% if all workloads encountered by the types of people who dwell on this forum,
I call shennanigans on this as a made-up statistic. You may be able to convince that percentage of people of that, but they may well not be aware of what's possible with commodity hardware.
> there is zero value add in running your own data center.
Sure there is. Money and maximum (I/O) performance.
> “Saving money” by running your own servers is penny wise, pound foolish. You’ll never, ever compete with AWS for features or cost.
You may be right about (software) features, but certainly not about cost. It's not "saving money" with quotes. It's actually saving money. You just have to hire someone that happens to know how to save money.
AWS will never, ever compete with you for hardware features or cost.
One of the major issues we've seen with our customers is that many of them (especially startups and SMBs/SMEs) don't have the ability to dedicate a team to just managing their RI capacity. We've also seen enterprise customers optimizing up to 70% of their EC2 usage, but many of them have trouble ensuring a level of utilization due to rapidly changing infrastructure.
I'd definitely argue that GCP has a better model for some use cases as it requires less active effort for optimizing billing, however if you manage your RIs on AWS effectively you can often get a better price. Looks like Azure has also gone down the same route as AWS, which is quite an interesting move on their part.
Disclosure: I head engineering/devOps at Engineer.ai - one of our products Cloudops.ai allows our customers to save up to 15% of their AWS bill without making RI purchases, as well as get discounted prices and additional flexibility (custom lock-in periods) for RIs they do wish to purchase. Feel free to reach out for information - my email address is in my about section.
Disclosure: I am a
PM on Oracle Cloud Infrastructure (OCI).
I am aware that AWS and GCP are the go-to options for this audience, and that Oracle isn’t particularly favored for the Java lawsuit (among other things). If you are able to set these grievances aside, the OCI pricing team has done something unique: they have created a means by which you can effectively buy credits from Oracle and use them for whatever service (current or future) you need. It is called the Universal Credits Model (UCM) [1].
If you anticipate usage above a certain threshold, tier-based discounts are available at the time of purchase. It’s like a store gift card; buy whatever you want. This takes away some of the stress of capacity planning and instance-type selection. Additionally, you can adopt new services and avail lower prices in the future.
With UCM, customers:
1. Sign one single contract that provides unlimited access to all current and future Oracle PaaS and IaaS services (Compute, DB, Block Storage, Blob Storage, Network, etc.) spanning both Oracle Cloud and Oracle Cloud at Customer.
2. Gain on-demand access to all services plus the benefit of the lower cost of pre-paid services. Depending on the projected spend, customers can negotiate discounts.
3. Possess the flexibility to upgrade, expand or move services across datacenters based on their requirements.
4. Have the freedom to switch PaaS or IaaS services they are using without having to notify Oracle.
5. Can adopt new services when they GA.
Please send any questions my way, and I will get answers to you.
My experience from Sumo Logic is to take full advantage of RIs you need to do capacity planning and that takes some effort. Still that's way over 30% of savings which are needed if you run at scale.
Would recommend using CloudHealth or other tool vs. using custom ETL. I tried do it myself on my tools, but got worse results than using dedicated tool.
However, dedicated tool need input from development. Sometimes it's worth to buy non-convertible RIs for bigger instance. Sometimes convertible RIs are easier. I just found that convertible RIs with some upfront are incredible tricky to calculate amortisation.
> To automate this, we built an ETL process in SQL and Python that detects when we fall outside this band and automatically prepares a purchase for us to approve.
@Stripe: Will this (or parts of it) be open sourced?
They published the code in a gist. It doesn't have a license, but since the python code is only 61 lines, would be trivial to rewrite yourself from their example.
Suppose you have a data source and business logic which you want to run periodically on the data source. Here are two scenarios which you could reasonably implement this as:
Method one: You write a SQL query and some Python. You put a sticky note on your computer "Remember to run that biweekly."
Method two: You pull up your shop's documentation for how to add the (BIG_NUMBER)th entry into the data processing pipeline. This gets you automatic scheduling, retries, monitoring, audit trails, alerts to the right people in case of breakage, etc etc. You write a SQL query and some Python. You plug it into the existing infrastructure.
I thought that payment processors are using their own hardware. How is AWS protecting their own customers' privacy? - can uncle Bob insert his fancy flash drive, copy my data, and sell it? Before you say it is encrypted - where does the encryption happen and doesn't AWS employees have access to the keys too?
What are you talking about? I have so many questions - who is uncle Bob in this scenario, an AWS employee? Who's uncle is he and why is that important? And what makes his flash drive fancy?
AWS has several encryption products you can easily look up, such as KMS. No, the employees don't have the keys. [1]
KMS is a hardware security module, kind of like the secure enclave on an iPhone. The private key doesn't leave the hardware, your process requests that KMS should encrypt or decrypt something (which is probably another disposable key used for your session to the a DB or whatever, like in a browser TLS session). All of AWS's core services are neatly integrated with KMS: EBS, EFS, RDS, DynamoDB, etc.
I'd trust the AWS datacenter security and processes over your average big-corp datacenter any day, having seen quite a few.
I had the same question as you and took a look at their FQA.
#1 You should check what "HSM" is, and will know the answer to your question :D.
#2 KMS offers client-side encryptions. So if you don't trust AWS for whatever reason, you can choose to encrypt at client-side too. :D
AWS has different options for different companies/data. They even have options for US government data that are certified by DSS I believe, and they have options if you need PCI, HIPAA, and other types of compliance.
I'd expect and hope that a Hardware Security Module is in the chain of trust somewhere. For certain cases you just don't want a key to ever be physically accessible except in one heavily-defended location.
Google's approach to pricing is, "do it as efficiently and quickly as possible, and we'll make sure that's the cheapest option".
AWS's approach is more, "help us do capacity planning and we'll let you get a price break for it.".
Google applies bulk discounts after the fact, AWS makes you ask for them ahead of time.