Introduction to IMS replication with Replicate
The IBM IMS database is very different from typical data sources supported by Replicate. This section provides a high level overview of IMS as a Replicate source endpoint.
Overview
IMS Database (IMS DB) is a hierarchical database management system developed by IBM. It is particularly used in large-scale mainframe environments for transaction-heavy industries such as banking, insurance, and airlines.
An IMS database organizes its data in a tree-like structure using segments (records) and parent-child relationships, offering extremely high performance, reliability, and scalability for mission-critical applications. Replicate translates segments in a source IMS database into tables in the target database.
Replicate reads IMS segments one at a time, without regard to the segment location in the hierarchy. This enables the full load of multiple segments in parallel, with minor I/O overhead.
Unlike modern relational databases that have a simple table structure defined in the system catalog, classical IMS databases use files to describe the database structure and the data layout of its segments. There files are:
-
DBD file
Defines the physical database structure (segments, hierarchy, keys, and access method).
-
PSB file
Defines which segments and fields an application program (like Replicate) is allowed to see.
-
COBOL copybooks (or PL/I INCLUDE members)
Define the exact internal layout (byte offsets, data types, lengths, and field names) of each segment as seen by the application program
Modern IMS databases may store these files in the IMS catalog, in which case, this information will be readily available to Replicate.
For more traditional (legacy) IMS databases, either there is no entry for the database in the IMS catalog, or it exists but does not contain all the required information (specifically, the COBOL copybooks may not be available or may not describe the data appropriately enough to facilitate replication).
For those cases, Qlik recommends the following IBM provided tools:
- IBM Developer for z/OS
- IMS Explorer for Development (part of IBM Explorer for z/OS)
Using these tools, a user can combine the above mentioned files into a single XML file, which can then be provided in the Replicate IMS source endpoint setting. Using this file, Replicate has all the technical information required to work with the IMS database.
Handling of IMS segment hierarchy
The hierarchy of the segments in an IMS database is described in the DBD file. When replicating to a relational database, Replicate converts each of these segments into a table with each table having as its primary key the primary keys of its ancestors, plus the primary key of the segment itself (the “concatenated key” in IMS language).
In the following piece from the DBD of the HOSPITAL IMS sample database we have:
PRINT NOGEN
DBD NAME=HOSPDBD,ACCESS=(HISAM,ISAM)
DATASET DD1=PRIME,OVFLW=OVERFLOW,DEVICE=3390
SEGM NAME=HOSPITAL,PARENT=0,BYTES=80
FIELD NAME=(HOSPNAME,SEQ,U),BYTES=20,START=1,TYPE=C
FIELD NAME=ADMIN,BYTES=20,START=61,TYPE=C
SEGM NAME=WARD,PARENT=HOSPITAL,BYTES=31
FIELD NAME=(WARDNO,SEQ,U),BYTES=2,START=1,TYPE=C
FIELD NAME=BEDAVAIL,BYTES=3,START=9,TYPE=C
FIELD NAME=WARDTYPE,BYTES=20,START=12,TYPE=C
SEGM NAME=PATIENT,PARENT=WARD,BYTES=125
FIELD NAME=(BEDIDENT,SEQ,U),BYTES=4,START=61,TYPE=C
FIELD NAME=PATNAME,BYTES=20,START=1,TYPE=C
FIELD NAME=DATEADMT,BYTES=6,START=65,TYPE=C
SEGM NAME=TREATMNT,PARENT=PATIENT,BYTES=113,RULES=(,LAST)
FIELD NAME=(TRDATE,SEQ),BYTES=6,START=21,TYPE=C
FIELD NAME=TRTYPE,BYTES=20,START=1,TYPE=C
With this DBD, Replicate will generate four tables:
-
HOSPITAL with fields HOSPNAME (key), ADMIN
-
WARD with fields HOSPNAME (key from parent), WARDNO (key), BEDAVAIL and WARDTYPE
-
PATIENT with fields HOSPNAME (key from grandparent), WARDNO (key from dparent), BEDIDENT (key), PATNAME and DATEADMT
-
TREATMNT with fields HOSPNAME (key from grand grandparent), WARDNO (key from grandparent), BEDIDENT (key from parent), TRDATE (key) and TRTYPE
Replicate will load each of those tables on its own, not necessarily based on their hierarchy (although the full load table order can be defined in the Replicate UI).
Handling of Complex IMS segments
As the layout of IMS segments is based on Cobol copybooks (or PL/I includes), IMS developers can use various capabilities that makes the layout of a segment complex. Here are the complex layout that Replicate supports and how it supports them:
Nested structures
It is typical for Cobol developers to aggregate fields in a structure (a nested field level), for example, to copy them as unit. If the structure is not repeated, Replicate will simply ignore the structure and pick the data fields themselves with their original name (that is, without adding the encompassing structure name to the field name).
OCCURS <count> TIMES [DEPENDING ON <field>]
This construct allows a segment to contain one or more arrays of sub-structure – the full size of the array is always given in <count>, and if the DEPENDING ON <field> clause is given, then the actual number of items in the array is available in <field>.
An example from the IMS HOSPITAL demo database is:
01 TREATMNT.
03 TRTYPE PIC X(20).
03 TRDATE PIC X(6).
03 NUM-MEDICATIONS PIC 99 COMP.
03 MEDICATION OCCURS 0 TO 3 DEPENDING ON NUM-MEDICATIONS
05 MEDIC-NAME PIC X(8).
05 MEDIC-QTY PIC 99 COMP.
When the OCCUR construct is used, and the repeating part has more than one item (like in this example where MEDICATION may occur multiple times, with each item having MEDIC-NAME and MEDIC-QTY), Replicate will split the segment into a parent table (here TREATMNT) and nested tables (here MEDICATION) for the different OCCURRING fields in the segment.
Thus, in this example, the original TREATMENT segment will be converted into two tables at the target:
-
TREATMENT – will have all the fields except for the repeating ones (in this example, TREATMENT will include the fields TYTYPE, TRDATE and NUM-MEDICATIONS) plus the concatenated key.
-
TREATMENT__MEDICATION – will have the segment’s concatenated key, a counter field MEDICATION_Rownum, and the fields of the repeated item (MEDIC-NAME and MEDIC-QTY).
When the array field does not have the DEPENDING ON field, Replicate will assume that there are <count> actual occurrences of the array item in the segment data record. The only exception is in the event of an array item whose data is all nulls or all spaces, Replicate will take that as an indication that there are no more items in this array in the segment data record.
Replicate will not be able to re-load array tables – it is only possible to reload tables associated with the segment itself – that reload will automatically reload the array tables.
Another important replication behavior to note is that any update to the segment will automatically delete all rows related to the segment in the target table for the array, and then insert the current data from the array to the target table of the array.
Example one
Here is an example of a Customer segment:
- Customer - Segment
- Customer_Id - String
- Address - Struct
- Street - String
- State - String
- Country - String
- Orders_Num - integer
- Orders - Array depends on Orders_Num
- Order_Details - Struct
- Order_Status - String
- Total_Amount - Number
- Order_Date - Date
- Items_Num - integer
- Items Array depends on Items_Num
- Quantity - Integer
- Total_Amount - Number
Example two
In this example segment, Replicate will show three tables: Customer, Orders and Items, with the following fields:
- Customer - Customer_Id - PK - Street - State - Country - Orders_Num - Orders - Customer_Id - PK - Orders_Rownum - PK - Order_Status - Total_Amount - Order_Date - Items_Num - Items - Customer_Id - PK - Orders_Rownum - PK - Items_Rownum - PK - Quantity - Integer - Total_Amount - Number
Table selection in Replicate
Replicate requires each selected table to have all its ancestors selected as well.
For instance, in Example one above, Orders cannot be selected without its parent Customer, and the table Items cannot be selected without Orders and Customer.
If this requirement is not met, when the task starts, a fatal error will occur, indicating which table is missing its parent.
Full load process
In order to avoid reading the same segment more than once, the IBM IMS endpoint only recognizes segments as Full Load tables.
This means that each sub-task in Full Load reads a segment and sends all the tables of that segment to the target endpoint.
Because target endpoints in Replicate use batch optimization in Full Load, it is not efficient to switch between tables frequently during Full Load. Therefore, the IBM IMS endpoint accumulates segment data in memory, before sending all the data records for each table in bulk.
The root table is sent to the target while accumulating segments to improve performance, and avoid "starving" the target.
To control the amount of memory used to accumulate segments, you can set the Maximum full load segment cache size (MB) parameter in the Advanced tab.
Suspended tables
Suspended tables during full load
In Full Load, if a table is suspended, all the other tables in the same segment will be suspended as well.
Suspended tables during CDC
During CDC, if a table is suspended, the table and all its descendants will be suspended as well, but the ancestors of the table will remain active.