“Achieving Century Uptimes” as Published in The Connection

Published by Connect, written by Dr. Bill Highleyman, Paul J. Holenstein, and Dr. Bruce Holenstein

  • Part 25: Is the Active/Active Topic Getting Stale? (11/2010) As the active/active message spreads and interest begins to build in the commodity server community, active/active technology should be in the position to explode.
  • Part 24: Is It Worth the Effort to Move to Active/Active? (9/2010) Determining whether moving to an active/active architecture is worth it, whether the savings to your company in current downtime costs will provide an attractive return on investment.
  • Part 23: Fast Failover with Active/Active Systems (2 of 2) (7/2010) How to rapidly recover from a node failure in an active/active network using server redirection, in which the nodes themselves monitor faults and control network reconfiguration.
  • Part 22: Fast Failover with Active/Active Systems (1 of 2) (5/2010) How to rapidly recover from a node failure in an active/active network using client redirection and network redirection; failover using these techniques can be automatic and very rapid.
  • Part 21: Active/Active NonStop Blades (3/2010) A review of the increased computing density of NonStop Blade Systems, and how active/active systems can be implemented to provide even higher availability, greater capacity, and lower cost of ownership.
  • Part 20: Is Your Application Active-Active Ready? (1/2010) A review of the problems that can result when active/active systems achieve their scalability and continuous availability by distributing application and database copies across an application network.
  • Part 19: Reviewing Three Years of The Availability Corner (11/2009) A three-year-anniversary review of the past 18 articles published continuously in The Connection column,. “The Availability Corner: Achieving Century Uptimes.”
  • Part 18: Recovering from Synchronous Replication Failures (9/2009) A discussion of the procedures that allow transaction processing to continue in the face of a target database failure and for reinstating the target database as a participant in transactions upon its recovery.
  • Part 17: HP Unveils Its Synchronous Replication API for TMF (7/2009) How HP’s Synchronous Replication Gateway (SRG) API allows TMF to safely support gateways to foreign systems through volatile-resource managers, which allows replication engines to be integrated with TMF so that updates to remote databases can be synchronously replicated.
  • Part 16: Zero-Downtime Migrations for Active/Backup Configurations (5/2009) How active/backup systems can take advantage of zero downtime migrations by using a fast and reliable failover in order to eliminate planned downtime for upgrades and migrations.
  • Part 15: Zero-Downtime Migrations: Eliminating Planned Downtime (3/2009) How active/active systems, comprised of two or more nodes cooperating in a common application, achieve continuous availability and eliminate unplanned downtime with zero downtime migrations.
  • Part 14: The Evolution of Real-Time Business Intelligence (1/2009) How real-time business intelligence systems provide the information necessary to strategically improve an enterprise’s processes as well as to take tactical advantage of events as they occur.
  • Part 13: Synchronous Replication: Pros, Cons, and Myths (11/2008) A comparison between the two primary methods of data replication: asynchronous replication and synchronous replication, and a discussion of the advantages, disadvantages and contemporary misunderstandings regarding synchronous replication.
  • Part 12: Rules of Availability III (09/2008) A discussion of the importance of recovery time and the rules from the book Breaking the Availability Barrier III, choosing specific rules as “best practices” to achieve continuous availability with redundant systems and a focus on active/active systems. (Rules 41-64)
  • Part 11: Rules of Availability II (07/2008) A discussion of the importance of recovery time and the rules from the book Breaking the Availability Barrier III, choosing specific rules as “best practices” to achieve continuous availability with redundant systems and a focus on active/active systems. (Rules 18-40)
  • Part 10: Rules of Availability I (05/2008) A discussion of the importance of recovery time and the rules from the book Breaking the Availability Barrier III, choosing specific rules as “best practices” to achieve continuous availability with redundant systems and a focus on active/active systems. (Rules 2-17)
  • Part 9: Where is My Database of Record? (03/2008) How through expansions, mergers, and acquisitions, companies tend to wind up with many databases, causing problems with data unavailability and data loss, and how active/active systems can largely solve these twin problems.
  • Part 8: Let’s Make Availability a Part of Performance Benchmarking (01/2008) How adding an availability test to performance benchmarks strengthens the benchmark data of the expected restore time which currently only gives the transaction per minute (tpm) capacity of the system and the cost per tpm.
  • Part 7: What is the Availability Barrier, Anyway? (11/2007) In commercial data processing, the availability barrier is recovery time, but how recovery time can be reduced so that the availability barrier can be pushed back as far as possible.
  • Part 6: Active/Active versus Clusters (09/2007) A comparison between clusters that are five 9s and a mature technology with thousands of installations, and active/active systems that are six 9s and beyond technology, but are relatively new.
  • Part 5: Modular Redundancy – To Need or Not To Need (07/2007) A review of how Neoview is a massively parallel database appliance derived from NonStop technology; however, its hardware architecture is different and its SQL engine, though derived from SQL/MX, has been significantly enhanced to support BI-specific features, such as very large queries.
  • Part 4: Resolving Data Collisions (05/2007) Why it is important to minimize data collisions by using a replication engine with a short replication latency time and to minimize the requirement to manually resolve data collisions by using the appropriate set of collision resolution algorithms.
  • Part 3: Avoiding Data Collisions (03/2007) How to structure an active/active system that uses asynchronous replication to avoid data collisions.
  • Part 2: What Will Active/Active Cost Me? (01/2007) The complex comparison of the costs of a monolithic system versus an active/active system, which incurs the cost of redundancy and network management, while reducing the cost of downtime and the related insurance costs.
  • Part 1: Survivable Systems for Enterprise Computing (11/2006) A description of active/active architectures, their advantages, and the issues associated with them while approaching 100% uptime.

Related Pages: