Shadowbase Product Management

Keith B Evans

Keith B. Evans Shadowbase Product Management

 Is Your Business Continuity Plan Adequate?

While everyone acknowledges that unplanned outages do happen, are costly, and need to be protected against, there is substantial evidence that IT departments are not applying sufficient resources to business continuity in practice (even though they might think otherwise). The first lesson is to take a thorough and objective look at your business continuity plans, asking if they are adequate and will they work, or do you just hope they will?

Because of various issues such as failover faults, recovery times for an active/passive business continuity architecture may be in the order of several hours, potentially costing millions of dollars. Worse, if a serious failover fault occurs, it is possible that the standby system may never be able to be brought into service; the mission-critical application is down and stays down, denying service to users for a prolonged period. This approach therefore offers insufficient protection for a mission-critical application.

However, there are alternative business continuity technologies which may be deployed today that do not suffer from these issues. The first of these is known as “sizzling-hot-standby.” This technology looks much the same as an active/passive architecture (all transactions are routed to and executed by a primary system, with data replication to a standby system), but it has one big difference – the standby system is “hot”, the business applications are all up and running on the standby system with the database open read-write. Overall, a sizzling-hot-standby architecture improves failover times and failover reliability significantly, decreasing recovery times and outage costs substantially. This architecture represents an excellent solution when the application cannot run in full active/active mode for some reason, and it is no more complex to implement than an active/passive architecture.

This technology leads into active/active architectures. In an active/active configuration there are two or more geographically separated systems, each running online business transactions and updating their local copy of the database, with data replication occurring between each system. Replication is bi-directional, meaning two-way between each active system. Active/active solutions provide the absolute fastest takeover times, with minimal data loss. Recovery times are measured in seconds, and because half of the users see no outage at all, outage costs are half those of the sizzling-hot-standby architecture, and several orders of magnitude less than for an active/passive architecture.

Active/active solutions can suffer from complexities which do not arise in active/passive modes. Principal among these complexities is the possibility of data collisions. Because the same logical database is being updated on multiple nodes, and the same business applications are executing on those nodes, it is possible for a transaction to be executed simultaneously on each system, which updates the same record in the database. When that change is replicated to the other system, each will overwrite its update with that from the other system, and consequently both databases will be incorrect. In addition, all of the technologies so far described suffer from some amount of data loss, however small.

Synchronous replication resolves all of these issues. With synchronous replication, application data updates are not committed (made visible and permanent) by either system unless the updated data has been replicated to the standby system. This technology guarantees that no data is lost in the event of an outage of the system performing the update (known as “zero data loss”). Additionally, in an active/active environment, it is not possible for data collisions to occur because the updated data records are locked on both systems before any changes are committed on either system. Synchronous replication therefore further reduces outage costs by avoiding any data loss, and, by eliminating data collisions, opens up the benefits of active/active architectures to any application. It represents the pinnacle of business continuity replication solutions.

In summary, even though you may already have a business continuity plan in place, it may not be adequate, well-tested, nor well-supported. Worse, it may be providing you with a false sense of security, and will not work properly when called upon. If this plan relies on an active/passive replication architecture, there are significant issues with this approach which could hamper a fast and successful takeover in the event of an outage. The key point is that you can avoid this risk, since there are other replication technologies readily available, such as sizzling-hot-standby and active/active architectures, which mitigate the issues with active/passive, and with better TCO. Further, for the highest levels of availability with no data collisions and zero data loss, synchronous replication may be utilized. If your business is relying on an active/passive architecture for service continuity, take another look at whether or not it really provides a sufficient guarantee of availability. It may now be time to consider moving to one of the other higher level replication architectures.