A big selling point for cloud computing is resilience. That is, the cloud shouldn't break, at least not very easily, as it's hosted in multiple data centres with back up power, separate network connections and knowledgeable techies on call 24/7 in case something happens.
The truth is however that the cloud can and will break, as happened this week when Amazon Web Services in Sydney went offline partially. Power was lost at one the cloud giant's data centres during the massive storm that hit the city, which in turn took out lots of online businesses. It took six hours before AWS was back to normal. Big names such as Domain, Carsales, The Iconic, streaming video company Stan and Domino's Pizza that depend on the internet for their business were inaccessible.
For an online business, six hours of problems must seem like an eternity. More so because there's not a huge amount admins can do, besides trying to connect to the cloud provider and attempt to start up their site services.
Outages, however, are something any cloud-based business must factor in, because one day that remote data centre somewhere will be unreachable, with potentially disastrous financial effects.
The problem with cloud hosting is that when problems strike, your users will expect you to fix it. And why should they not? It's your business they are dealing with, and not some nebulous cloud provider somewhere. Good luck trying to tell them it's not your fault, because nobody will believe you. Don't expect to be compensated either if things go wrong. In its service level agreement for the Elastic Cloud Compute (EC2) and Elastic Block Store (EBS) products, AWS says:
"AWS will use commercially reasonable efforts to make Amazon EC2 and Amazon EBS each available with a Monthly Uptime Percentage [defined below] of at least 99.95%, in each case during any monthly billing cycle [the "Service Commitment"]."
For the month of June with 30 days, that SLA allows for just over 21 minutes of the cloud being inaccessible; AWS says it will offer credits if the downtime is longer than that, but customers have to ask for it, and there are a number of caveats including no guaranteed performance.
One direct effect of the Sydney AWS outage is that the companies that were hit by it have started reviewing their cloud architectures. In that way, the outage was a good thing, as it didn't last that long and showed the businesses where the weak links in the chain are in case an AWS data centre loses power.
Working out a good strategy for cloud resilience can be a daunting task though, that requires reading and understanding often opaque technical documents with odd terminology - AWS for instances calls data centres "availability zones" with one or more facilities in each. Nevertheless, if your business is in the cloud, you should regularly review its architecture (and connectivity options) and plan ahead for disasters.
Devising a cloud survival strategy will no doubt cost more, but it'll still be cheaper than lost sales and the reputation of the business in tatters as angry users and customers take to social media to spread the word about how inept your company is.
You might even need to look at hosting in multiple geographic areas, so that if a Sydney-sized storm ravages one data centre, another one elsewhere can take over. Some companies go with different cloud providers for that reason, but this is technically much harder to do, due to the platforms often being disparate with different features.
Cloud computing offers many compelling advantages and can provide cost-savings and agility for fast growing businesses, if done right. Just remember the old adage that the cloud means running your software keeping your data on someone else's computers that could break. When that happens, your business needs to be prepared or it might go under.