High Availability (HA) is not just for the data center. While the principles of achieving extreme uptime have been honed by enterprise IT teams, it’s just as important for industrial and embedded applications, which are often deployed in mission-critical environments. By understanding and leveraging HA principles perfected in the enterprise environment, industrial and embedded servers can be made more robust, reliable, and resilient.
HA originated in the enterprise, but not all of the techniques and design patterns that work in the data center translate directly to industrial applications.
While the enterprise approach is able to successfully achieve HA even with relatively unreliable hardware, it depends greatly on trained IT personnel to design, monitor, and maintain complex HA infrastructure and software.
Industrial and embedded systems operate in a significantly different context. These systems are often called on to perform with little or no maintenance, and when they’re set out into the field, they often have to “just work” without IT staff continually monitoring and configuring them. In addition, space is usually at a premium at the system level, and often in the environment as well. Space-constrained industrial systems can’t afford as many layers of redundancy as large data centers, and industrial environments often can’t accommodate the multiple servers that are typical in an enterprise environment.
All this means that industrial systems have to be designed to provide HA that works out of the box. In many cases a hardware redundancy approach is not the complete solution. Reliable software components are key for improving availability without massive redundancy. In addition, monitoring and failover processes have to be automated and foolproof as there will often be little to no staff in the field to monitor and configure the system.
By focusing on automated recovery from software failure, industrial and embedded systems can achieve HA that’s easily deployable. Availability of these systems can be addressed by improving reliability, redundancy, or a combination of both.