Disaster Recovery for Brokerage Firm’s Sizzling-Hot-Takeover (SZT)

Problem:

A brokerage firm located in the Midwestern U.S. needed to implement data replication to provide a redundant active/backup HPE NonStop system for a sizzling-hot-takeover (SZT) in the event of a primary node failure.

HPE Shadowbase Solution: Active/Almost Active Backup Disaster Recovery

In Figure 1, brokers are connected to a broker application that routes data to the “active” NonStop server for processing security buy/sell orders. This NonStop server is connected to another NonStop server via Shadowbase bi-directional replication in an active/almost-active sizzling-hot-takeover architecture. The application on the “hot backup” node is up and running, and waiting for requests. If the active node fails, the hot backup node will process the broker requests, and the firm will achieve business continuity.

Disaster Recovery for Brokerage Firms Sizzling-Hot-Takeover

Figure 1 – Shadowbase for Heterogeneous Data and Application Integration and Service Uptime

  • In this configuration, the active and hot backup nodes are identically (or similarly) configured with active/almost active bi-directional replication used to maintain synchronization between the two databases.
  • Though the applications are active on both nodes, client interactions (broker requests) are only directed to the primary node.
  • If the primary node fails or is taken offline, routers switch the clients to the hot backup node within a sub-second time frame.
  • Takeover is virtually instantaneous and imposes very little impact on user processing, since the application is already active (up and running and waiting for requests) on the backup node.
  • It is impossible to tell which node is the primary node and which is the backup node. They are both similarly configured, with all processes active at all times (databases opened read/write, external connections enabled, etc.).
  • To avoid failover faults, the brokerage periodically tests the application on the hot standby system by sending test or verification transactions to it and confirming they are properly processed end-to-end. These tests validate that the hot standby environment is ready and available to take over at any time.
  • Data collisions are impossible, since broker requests are sent to only one node at a time in this architecture.
  • To test the system’s failover capability, management often forces a router switch to reverse the nodes.
  • It is extremely important in a failover architecture to actually test the disaster recovery plan to ensure that it works, that the disaster recovery plan stays up-to-date, and that the operations staff is well-versed in the process. Periodically switching the polarity of the nodes accomplishes this goal.

Additional Shadowbase Benefits:

  • Offloading Queries/Reporting from Production and Avoiding Planned Outages
    • The brokerage firm also takes advantage of off-loading queries and reporting from the active node, which reduces the load on the active node and utilizes the capacity of the stand-by node for productive work.
    • When the brokerage firm needs to upgrade its application, database, or perform other forms of changes that would normally require production downtime, it can avoid the planned outage by leveraging Shadowbase bi-directional replication capabilities to keep the databases synchronized, even if the data formats/schemas are changed.  This is called a Zero Downtime Migration (ZDM).
    • When such an upgrade is needed, the router feed is switched to the standby node, the original node is downed, replication to the downed node is stopped, the downed node is upgraded and restarted, and Shadowbase replication is then used to resynchronize the downed node’s database after it is brought back online in preparation for a switchover.
    • The changes can then be rolled through to update the standby node.  Please visit our page on HPE Shadowbase Zero Downtime Migration for more information.

Contact us for more information on this Shadowbase solution.


The above was adapted from the book: Breaking the Availability Barrier, Volume III: Active/Active Systems in Practice by Paul J. Holenstein, Dr. Bruce Holenstein, and Dr. Bill Highleyman.