"The best way to test your Business Continuity plan is to make it real."

Sizzling-Hot-Takeover (SZT) Replication Systems

Everyone Has a Plan - Until Disaster Strikes

One HPE NonStop Technologist noticed a disturbing trend: HPE NonStop users thought they were protected – except they were not.

Too often – HPE NonStop customers use uni-directional Disaster Recovery (DR) solutions as their “business continuity plan.” The problem is that these products require too much manual intervention to provide acceptable Recovery Time Objective (RTO) levels.

HPE Shadowbase Sizzling-Hot-Takeover (SZT, also known as “Sizzling-Hot-Standby”) solutions support continuous availability (see Figure 1), and offer significant benefits. In addition, RTOs measured in seconds or subseconds are possible, which is considerably better than is typically achievable using an active/passive (A/P) architecture.

Real-world Shadowbase Business Continuity Use Cases

Figure 1 — The HPE Shadowbase Business Continuity Continuum

Shadowbase SZT configurations are suitable for applications that require continuous availability, but for which some small data loss is acceptable. A Shadowbase SZT configuration also offers the best solution for applications, which cannot avoid or tolerate data collisions. Typical applications include telco applications (many call-related transactions worth pennies). Point-of-sale (POS) transactions are another example, because like ATM transactions, they generally have low value. However, if a POS application goes down, retailers cannot service customers using credit or debit cards. Shadowbase SZT should be considered the absolute minimum business continuity architecture for any mission-critical application, since it is only a small move from an A/P architecture, and results in a significant benefit in terms of RTO.

In addition, a Shadowbase SZT configuration can achieve a zero Recovery Point Objective (RPO) if synchronous replication is used, or RPOs measured in tens or hundreds of milliseconds if asynchronous replication is used.

Getty Images

Similarities Between Active/Passive & Sizzling-Hot-Takeover (SZT)

A Shadowbase SZT system is similar to an active/passive (A/P) architecture, except that the backup system is immediately ready to start processing transactions (see Figure 2). The Shadowbase SZT system is configured with reverse replication up and running so that after a takeover, the passive (now active) system will have a backup once the down primary system is recovered, ensuring continued availability protection after an outage of one system. With reverse replication enabled, Shadowbase replication on the backup (now active) node queues the changes that it is making to its copy of the database.

When the formerly active node is recovered, Shadowbase replication will replay the queued updates to rapidly resynchronize the two databases. This step provides continuous availability protection if the now active node fails. To operate in this mode, it is essential that the replication engine being used supports bi-directional replication. While this configuration is true for Shadowbase technology, it is not true for some other replication products.

Figure 2 — Sizzling-Hot-Takeover with Reverse Replication

Essentially, it is an active/active (A/A) architecture, but with the significant exception that all user transactions are only directed to the primary node, thereby avoiding data collision issues, which can arise with fully A/A systems. With HPE Shadowbase data replication operating, the SZT system can take over processing transactions within seconds, because its local database is synchronized with the active database and completely consistent and accurate. Also, since the applications are already up and running with the database open for read/write access, no time is needed to start the applications or switch them from read-only database access.

The Shadowbase SZT configuration has another big advantage over A/P disaster recovery systems, which is the absence of failover faults. It is known at all times that the backup node is working, because it can be easily exercised without requiring an outage of the primary system, by periodically submitting test or verification transactions to the application to ensure proper operation. Consequently, failover is guaranteed to be successful and can be automated, which is a requirement if very short RTOs are to be satisfied.