IBM i Disaster Recovery and High Availability
Disaster recovery
Objective
In a disaster recovery (DR) scenario, the objective is to quickly restore application services running within an LPAR on a standby (backup) environment in case of a long-term or disruptive outage of the primary (production) environment. The primary environment can be hosted either on customer premises or in Skytap, and the standby environment is expected to always be hosted in Skytap.
To restore services with minimal downtime, the standby environment is configured with an application stack and system configuration identical to the primary environment. The standby environment receives periodic updates of application and system configuration data from the primary. In the case of an outage on the primary environment, the application stack on the standby environment takes over and starts providing service based on the most recent state of the data it received from the primary. There is some duration of service outage during the time required to detect the failure of the primary and enable the application stack on the standby site. In addition, any data modified on the primary after the most recent update of the standby environment will be lost.
High availability
Objective
In a high availability (HA) scenario, the objective is to restore application services running within a standby LPAR in case of an outage of the primary LPAR. Compared to the DR scenario, HA requires that there be no service outage or data loss during the failover event, and that the secondary LPAR can take over immediately. To achieve this, the secondary LPAR has to receive data and configuration updates from the primary in real time, and the application stack on the secondary LPAR must be running and ready to take over the workload. Because of the tight timing constraints, the network latency between the primary and secondary LPARs must be very low, typically within the same Skytap region.
Implementations
Logical replication-based DR and HA
HA and DR are architecturally similar for IBM i. The majority of IBM i DR and HA solutions use logical replication software to duplicate the transactions from one IBM i environment to another IBM i environment, which requires reliable and mature IBM i remote journaling technology. There is a rich ecosystem of logical replication software and service providers. The primary difference between DA and HA architecture being whether record replication between primary and secondary is synchronous or asynchronous. Logical replication software typically only requires a TCP/IP connection between primary and secondary to replicate the transactions and doesn’t rely on facilities unavailable in Skytap.
There are five primary providers selling logical replication-based HA for IBM i, and majority of their solutions can also be used to implement DR by replicating records asynchronously to a remote standby environment. The available logical replication-based HA/DR solutions include:
- Assure MIMIX™ Software Availability and Quick EDD/HA from SyncSort
- Robot HA from HelpSystems
- Maxava HA from Maxava
- Rocket iCluster from Rocket Software
- HA4i from Shield Advanced Solutions