Compare SQL (for HPE NonStop SQL/MP and SQL/MX Databases)

HPE Shadowbase Compare SQL is a “database compare” utility product suite that compares the contents of HPE NonStop SQL tables with equal or at least similar structure. Shadowbase Compare SQL/MP compares SQL/MP target table data with SQL/MP source table data. Shadowbase Compare SQL/MX compares SQL/MX target table data with SQL/MX source table data. Each product is licensed separately. Heterogeneous table comparisons (e.g., between SQL/MP and SQL/MX tables, or off-platform SQL tables) are being considered for a future product release.

The source and target tables can be located on the same, or different, HPE NonStop systems. Expand connectivity is used if they are located on different systems. If Expand is not available (or the user desires to use TCP/IP for inter-node communication), Shadowbase Compare SQL will use an agent process to read the data to compare. The comparison is done on a row-by-row and column-by-column basis. The range of rows or the range of columns to be compared is defined via user-supplied parameters (for example, the entire table, or a key range supplied by primary key or index path).

If differences in the data are found, actions can be configured and executed. Actions are specific to the type of difference found (for example, missing records, or different values between certain columns). The actions may include steps to assist in repairing the found differences. If required, user functions may be bound into the action process for customized data conversion purposes.

HPE Shadowbase Compare SQL is a powerful tool with many capabilities, but is also very easy to use. To use the product, some general knowledge about HPE NonStop servers, the HPE NonStop SQL database, and the usage of HPE NonStop command line tools like TACL, SQLCI, PATHCOM, or OSS MXCI is required.

Shadowbase Compare SQL may be run while the databases are being used in production as it only opens the source and target tables in shared read-only mode. The tables involved in the comparison can be opened in shared or protected mode for concurrent access activity (e.g., reading or inserting/updating/deleting) by applications while the comparison takes place, but they may not be opened in exclusive mode. A special iteration algorithm handles differences “on the fly” for replication environments where changes occur to one table (the source table), but have not yet been applied to the other table(s) (the target table). This algorithm is explained in more detail below.

The product is used in a batch-like manner via a command line user interface. All of the necessary details of the intended comparison are specified as input parameters to the command, and then the comparison is executed. Parameters specified on the Shadowbase Compare SQL command include:

  • Names of the tables to compare
  • Description of the row-to-row relationship between the tables
  • Range of rows to compare (specified by the “WHERE” clause)
  • Columns to compare (including data conversion rules–actions)
  • Level of output detail required (granularity of comparison differences to provide in the listing, for example, counts versus individual column data values, etc.)

Shadowbase Compare SQL can be used when the databases are open and running in production, enabling some very useful applications which would otherwise not be possible. For example, while real-time data replication is in process between the source and target systems, Shadowbase Compare SQL may be used to ensure the replicated databases are consistent. Since the product never locks any rows in the source or target tables (it uses “BROWSE” access), the production environment is not affected by locked rows due to the comparison running. This method improves data concurrency, however it also means that there can be changes read that later are undone in a TMF transactional environment (e.g., the source transaction aborts). Shadowbase Compare SQL automatically handles this type of mismatch as described next.

In a production environment, some rows of the source and target tables may be different only for a short period of time. This difference happens especially in real-time data replication scenarios: replicated data may be “in-flight” meaning a change has been applied to the source table but not yet replicated and applied into the target table. This time lag is referred to as “data replication latency” or “lag.” If Shadowbase Compare SQL compares source and target rows in such a situation, it will find a difference. Once the change has been applied to the target table this difference disappears.

To handle the situations where differences are found, Shadowbase Compare SQL uses a special iteration algorithm, which includes several steps. First, it looks for differences as it walks through all the rows to be compared. When the rows match, it moves on to the next set of rows to compare. When a difference is found, it writes a row in a specially created “difference” table. This difference table contains only the primary key of those records where a difference was found. As soon as the utility is finished with this first step it starts the comparison again, but this time looks only at the records contained in the difference table. Typically this number will be very small compared to the size of the original Shadowbase Compare SQL run. If the product finds that the previously found difference is gone (i.e., the rows now match), it will delete the row from the difference table. After a configurable number of iterations through the difference table it is assumed that the remaining differences are not caused by “in-flight” data but are instead “real” differences, and they are reported in the final statistics of mismatches.

It is important to note that this algorithm eliminates most cases of real differences due to conditions such as data replication latency. However, if the delay in between difference table executions and number of iterations reports a “real” mismatch, it is still possible that this is a temporary condition and the product has reported a “false positive.” This condition can occur, for example, when the replication latency is longer than the time spent for all of the difference table comparison iterations. To resolve these false positives, simply rerun the comparison for the differences found to see if they have now been resolved or not.

Figure 1 — HPE Shadowbase Compare SQL Configuration – Source-Side Compare

Figure 1 depicts a sample architecture for a source side comparison. In this case, HPE Shadowbase Compare SQL is configured to run on the source system.

Figure 2 — HPE Shadowbase Compare SQL Configuration – Target-side Compare

Figure 2 depicts a sample architecture for a target side comparison. In this case, HPE Shadowbase Compare SQL is configured to run on the target system.

Note that in both figures, a data replication engine can be actively replicating between the source and target environments. The link between the replication engine and the comparison tool represents a future feature for reducing false positive comparison mismatches.


 Related Pages:
Solution Brief: