No tech company, big or small, can guarantee 100% uptime for their online services. With factors like environmental damage and human error always in play, it’s simply not possible for an Internet-reliant business to always be available. Even something as simple as scheduled maintenance can knock out Internet service for a potentially long amount of time. Unfortunately, these outages can happen at any time and disable a large amount of the service – or potentially the entire thing. It’s annoying for both the providers who need to fix it and the users who rely on the service.

This is true for cloud providers as well. When part of a cloud provider’s infrastructure goes down, it affects the entirety of their userbase. Because many cloud providers, like AWS, Azure, and Google Cloud, offer multiple services through their cloud platform, outages can vary wildly in terms of size and effect. Some outages might only affect a handle of cloud services. Others might bring down the entire system.

In the last few months a number of public cloud outages have raised the question of whether the cloud is reliable enough to run business-critical environments.

Modernization trends in Enterprise
  • Adoption of public cloud
  • Digital transformation
  • Self service and help desk
  • Usage of Cloud technology
  • ‘As a service‘ consumption of everything from software to hardware
  • Container first architectures
Challenges that came along with the Modernization
  • Public cloud outages impacting enterprise IT
  • Difficulty in achieving HA in traditional business critical applications in public cloud
  • High availability of applications in container based architecture
  • Loss of control on cloud infrastructure monitoring
  • Cloud providers SLA not meeting business critical requirements


Google Cloud - PaaS outages

A database glitch affecting Google's application development platform caused headaches for some high-profile Google Cloud customers on Feb. 15,2019.
Problems with Google Cloud Datastore, a NoSQL database designed for scale, started appearing just before noon Pacific Time.
Users of Google App Engine, a Platform-as-a-Service that provides access to Cloud Datastore, saw errors and high latency for more than an hour.
Gamers were particularly annoyed, as many popular online games take advantage of those Google services. Pokemon Go and Snapchat were among the applications affected.

Let's Talk

Cloudflare data center outage

On July 2, 2019, sites running on Cloudflare began returning 502 errors, caused by a massive spike in CPU utilization across Cloudflare’s global network.

More than 140 data centers were affected.

This CPU spike was caused by a bad software deploy that was rolled back.

Once rolled back the service returned to normal operation and all domains using Cloudflare returned to normal traffic levels.



Let's Talk

Our Work Showcase