Disaster Recovery and Active/Passive Replication Systems

Not all applications require the same level of business continuity protection. Less critical applications/data can tolerate longer recovery times and amounts of lost data, while highly critical applications may not be able to tolerate any downtime or data loss. To satisfy this range of needs, the Shadowbase business continuity product suite supports both high and continuous availability solutions.

To measure the characteristics of a business continuity solution, the parameters Recovery Time Objective (RTO) and Recovery Point Objective (RPO) are used. RTO is the time taken to perform a failover recovery and resumption of business services following an outage. RPO is the amount of lost data resulting from an outage. The closer these parameters are to zero (faster time to recovery, less amount of lost data), the more effective is the business continuity solution.

Business Continuity Continuum

Figure 1 — The Shadowbase Business Continuity Continuum

As can be seen from the Business Continuity RTO/RPO Continuum depicted in Figure 1, Shadowbase solutions support a range of RTO and RPO levels, from times to recovery and amounts of data loss measured in minutes and seconds respectively (high availability), to no loss of service or data loss following an outage at all (continuous availability).

Data Replication from Source to Target

Figure 2 — Data Replication

Shadowbase software provides this range of business continuity solutions by using data replication technology, as shown in Figure 2. The purpose of data replication is to keep a target database synchronized with a source database that is being updated by a source application, in real-time.

Shadowbase Data Replication Engine

Figure 3 — The Shadowbase Data Replication Engine

The source database is hosted by the source node and the target database is hosted by the target node. The two (or more) nodes comprise a redundant distributed data-processing system. As an application makes changes (inserts, updates, and deletes) to its local database (the source database), those changes are sent immediately over a communication channel by Shadowbase replication to the target system, where they are applied to the target database (Figure 3). The target database typically resides on another independent node that may be hundreds or thousands of miles away. The Shadowbase data replication engine is the facility that gathers changes made to the source database and applies them to the remote target database.

Uni-directional (Active/Passive) Shadowbase Replication for Disaster Recovery (High Availability)

Achieving high levels of service availability requires that a backup node exists which can take over in subseconds or seconds in the event of an active node failure. Shadowbase replication provides this service by using, uni-directional data replication, which is the simplest form of data replication (see Figure 2 and Figure 3). Since an active node processes all transactions and replicates the database changes that it makes to a remote standby database, the two databases are in (or are nearly in) synchronization. If the active node fails, the backup (or passive) node is available with a current copy of the database, ready to take over processing.

A Shadowbase asynchronous uni-directional (active/passive) system has an RPO measured in tens or hundreds of milliseconds (the replication latency of the data replication channel). If Shadowbase synchronous replication is used, no data is lost following a source node failure, and an RPO of zero is achieved. (Synchronous replication is an upcoming Shadowbase feature, please contact Gravic for more details.) The RTO of an active/passive system is measured in minutes or longer as applications are started following a failure of the active node, the databases are mounted, and the network is reconfigured. Additional recovery time is typically required for the management decision time to failover to the backup system and for testing to ensure that the backup is performing properly.

In an active/passive configuration, the passive system is typically idle as far as update-processing is concerned. However, applications may also be up and running in read-only mode in the standby node, and the standby database may be actively used for query and reporting purposes. Shadowbase replication provides for the target database to be a consistent copy of the source database, though delayed by the replication latency. If the active node fails, the applications at the backup node can remount the database for read/write access and take over the role of the original active node. This process typically takes only a few minutes, leading to RTOs measured in minutes. Therefore, uni-directional architectures provide high availability – RTOs measured in minutes and RPOs measured in subseconds (or zero if synchronous replication is used).

This replication method is used for classic disaster recovery, active/passive configurations. It supports applications that must be highly available but where some small data loss is tolerable. Customer relationship management (CRM) and human resources (HR) corporate applications are examples of this class of application, as are ATM transactions, which have a low value. If the ATM machine is down, the customer can go to a different ATM machine serviced by a different bank.

While an active/passive architecture certainly offers high availability, there are other Shadowbase business continuity solutions which should also be considered, particularly a Shadowbase SZT architecture, providing a small step-up from an active/passive configuration, but offering significant advantages.


Related Solutions:
Related White Paper:
Related Case Studies:
Related Information: