HPE Shadowbase Compare SQL is a “database compare” utility that compares the contents of HPE NonStop SQL tables with equal or at least similar structure. Shadowbase Compare SQL/MP compares SQL/MP target table data with SQL/MP source table data. Shadowbase Compare SQL/MX compares SQL/MX target table data with SQL/MX source table data.
The source and target tables can be located on the same, or different, HPE NonStop systems. Expand connectivity is used if they are located on different systems. If Expand is not available (or the user desires to use TCP/IP for inter-node communication), Shadowbase Compare SQL will use an agent process to read the data to compare.
The comparison is done on a row-by-row and column-by-column basis. The range of rows or the range of columns to be compared is defined via user-supplied parameters (e.g., the entire table, or a key range supplied by primary key or index path). If differences in the data are found, actions can be configured and executed. Actions are specific to the type of difference found (e.g., missing rows, extra rows, or different values between certain columns). The actions may include steps to assist in repairing the found differences.
Shadowbase Compare SQL is a powerful tool with many capabilities, but is also very easy to use. To use the utility, some general knowledge about HPE NonStop servers, the HPE NonStop SQL database, and the usage of HPE NonStop command line tools like TACL, SQLCI, or OSS MXCI is required.
Shadowbase Compare SQL may be run while the databases are being used in production as it only opens the source and target tables in shared, read-only mode. The tables involved in the comparison can be opened by applications in shared or protected mode for concurrent access activity (e.g., reading or inserting/updating/deleting) while the comparison takes place, but they may not be opened in exclusive mode. A special iteration algorithm handles differences “on the fly” for replication environments where changes occur to one table (the source table), but have not yet been applied to the other table(s) (the target table). This algorithm is explained in more detail below.
The utility is used in a batch-like manner via a command line user interface. All of the necessary details of the intended comparison are specified as input parameters, and then the comparison is executed. Parameters specified on the Shadowbase Compare SQL command include:
Shadowbase Compare SQL can be used when the databases are open and running in production, enabling some very useful applications which would otherwise not be possible. For example, while real-time data replication is in process between the source and target systems, Shadowbase Compare SQL may be used to ensure the replicated databases are consistent. Since the utility never locks any rows in the source or target tables (it uses “BROWSE” access), the production environment is not affected by locked rows due to the comparison running. This method improves data concurrency, however it also means that there can be changes read that later are undone in a TMF transactional environment (e.g., the source transaction aborts). Shadowbase Compare SQL automatically handles this type of mismatch as described next.
In a production environment, some rows of the source and target tables may be different only for a short period of time. This difference happens especially in real-time data replication scenarios: replicated data may be “in-flight” meaning a change has been applied to the source table but not yet replicated and applied into the target table. This time lag is referred to as “data replication latency” or “lag.” If Shadowbase Compare SQL compares source and target rows in such a situation, it will find a difference. Once the change has been applied to the target table this difference disappears.
To handle the situations where differences are found, Shadowbase Compare SQL uses a special iteration algorithm, which includes several steps. First, it looks for differences as it walks through all the rows to be compared. When the rows match, it moves on to the next set of rows to compare. When a difference is found, it writes a row in a specially created “difference” table. This difference table contains only the primary key of those rows where a difference was found. As soon as the utility is finished with this first step it starts the comparison again, but this time looks only at the rows contained in the difference table. Typically this number will be very small compared to the size of the original Shadowbase Compare SQL run. If the utility finds that the previously found difference is gone (i.e., the rows now match), it will delete the row from the difference table. After a configurable number of iterations through the difference table it is assumed that the remaining differences are not caused by “in-flight” data but are instead “real” differences, and they are reported in the final statistics of mismatches.
It is important to note that this algorithm eliminates most cases of real differences due to conditions such as data replication latency. However, if the delay in between difference table executions and number of iterations reports a “real” mismatch, it is still possible that the condition is temporary, and the utility has reported a “false positive.” This condition can occur, for example, when the replication latency is longer than the time spent for all of the difference table comparison iterations. To resolve these false positives, simply rerun the comparison for the differences found to see if they have now been resolved or not.
Figure 1 depicts a sample architecture for a source-side comparison. In this case, Shadowbase Compare SQL is configured to run on the source system.
Figure 2 depicts a sample architecture for a target-side comparison. In this case, Shadowbase Compare SQL is configured to run on the target system.
Note that in both figures, an optional data replication engine can be actively replicating between the source and target environments.