Data Center

When it comes to business/mission/safety critical applications, and the performance of the data center, companies put a lot of cash to see results, however, the investment doesn't always deliver the hoped-for outcome.

Despite advances in infrastructure robustness, many IT organizations still face database, hardware, and software downtime, lasting short periods to shutting down the business for days.

Despite mounting statistics that touch nearly every major enterprise software vendor and customer, from ERP to CRM and more, just bringing up the topic of outages still terrifies those in the industry. Against this backdrop, IT failures have become an accepted, virtually expected, aspect of enterprise life.

According to the Information Technology and Intelligence Corp., their high availability survey revealed that while companies can't achieve zero downtime, one out of 10 companies said they need greater than 99.999% availability.

For enterprises with revenue models that depend solely on the data centers' ability to deliver IT and networking services to customers – such as telecommunications service providers and e-commerce companies – downtime can be particularly costly, with the highest cost of a single event topping $1 million (more than $11,000 per minute)

For a total data center outage, which had an average recovery time of 134 minutes, average costs were approximately $680,000.

The average cost of data center downtime across industries was approximately $5,600 per minute.

Modernization trends in Data Center

Focus on Cloud deployments
Pay as you go model
Managed services and SaaS(Software as a Service) based solutions
PaaS(Platform as a Service) and IaaS(Infrastructure as a Service)
Eco system of service providers

Challenges that came along with the Modernization

Availability of infrastructure and services
Availability across the regions, distributed computing i.e. Geo-redundancy
Increased fault points because of Hypervisors, virtualized infrastructure manager(VIM), (Software defined Networking)SDN, etc
Fault detection, isolation, service recovery and repair in milli seconds.
Zero downtime upgrade

Impact

AWS Outages in 2017-2018

A year after the massive AWS S3 outage of February 2017, AWS customers including critical enterprise IT solutions providers Atlassian, Slack and Twilio experienced downtime in March 2018. This time, the outage had hit the AWS-East Region and affected several applications relying on the AWS servers at its Ashburn, Virginia data center space.

In July, 2018 on the Amazon Prime day, the peak sales duration for the Amazon.com ecommerce site, the service experienced an outage for six hours out of the 36 hour record-breaking promotional sales event. The outage was tied to a software issue.

Let's Talk

Google Cloud Outage, July 2018

Like all major cloud service providers, Google has had its fair share of issues delivering infrastructure services to an exploding customer-base in 2018. The affected services included the Google App Engine, Stackdriver, Diagflow and Global Load Balancers. Customers including Spotify, Discord, Pokemon Go app and Snapchat rely on these cloud networking services to reach a global audience, thereby cascading the impact globally.

The outage lasted for around 30 minutes and up to 87 percent of the customers experienced some form of errors on the App Engine, HTTPS Load Balancer or the TCP/SSL Proxy Load Balancer solutions.

According to a detailed description by Google, the issue was caused due to a bug in the new security feature added Google Front Ends (GFE) architecture layer.
The affected customers were provided credits refund as per the Service Level Agreement (SLA) as a common compensation by any cloud vendor. However, the true cost of data center downtime that averages around $750,000 as of 2015 according to a Ponemon Institute research report far outweighed the offered compensation.

Let's Talk