January 23, 2017 — Gravic Recently Published a Two-Part Series in The Connection on Improving Availability via Staggered System

Diagram depicting graph of mean time to failure formulaPart 1 covers MTTF  Mean Time To Failure while Part 2 focuses on Mitigating Redundant Failures via System Staggering. The reliability of a redundant system is optimized by minimizing the probability that both systems will fail simultaneously. If they both have the same failure probability distribution, then when one system is most likely to fail, so is the other system. Previous methods for calculating estimated availability from any point in time are flawed because they are based on memoryless random variables. The calculation of the average time to the next failure is always the same, regardless of how long a system has been in service. By staggering the system starting times so that their probability distributions are not aligned, the time that the two systems are most likely to fail are different. When one system is most likely to fail, the probability that the other system will fail is significantly reduced. Therefore, the probability of a dual system failure is reduced. Redundant system reliability can be greatly enhanced by staggering the starting times of the two systems. This strategy applies both to hardware failures and to software failures.