Disaster Recovery planning and Spectrum Scale

There are multiple ways to use Spectrum Scale to prepare for disaster and help with recovery.

An application is said to be active/passive if it runs only on a primary site with a particular data set, and only runs on a secondary site with that data set if the primary site is down. Active/passive configurations may have a specific a Recovery Point Objective (RPO), a measure of how much data loss may be tolerated, in time. For example, if a primary site fails, and the RPO is 30 minutes, we meet the RPO if the secondary site, once in production, has lost at most the last 30 minutes of data. Loosely speaking, any replication interval must be less than the RPO (and must complete before the RPO interval). It may take longer than the RPO to put the secondary site into production! The Recovery Time Objective (RTO) is a measure of how long it takes to fail over to the secondary site -– it includes time to reconfigure applications, etc.

An application is running active/active if it is running on more than one site with the same data set, e.g., there is no clear-cut “primary” and “secondary” site.

Options with Spectrum Scale

Besides the choice of configuring an application to be either active/passive or active/active, we have different layers in the file system where we might choose to do the replication. This table shows the possibilities:

Active/Passive Active/Active
Application layer Individual applications may have their own means of doing asynchronous replication. Aspera, rsync — These need to run regularly (maybe use cron) Individual applications may have their own means of doing synchronous replication.
File system layer Spectrum Scale AFM-DR Spectrum Scale failure groups and replication
Block layer Block synchronous replication, configured for active/passive; block point-in-time copy Block synchronous replication, configured for active/active

How do we choose where to do the replication? The general rule is: Use replication at the highest layer of the hardware/software stack where it both works well and provides the features required.
If the application can effectively do its own replication, use that.
Otherwise, if one of Spectrum Scale’s replication modes meets the needs, use that.
Otherwise, choose block level replication.

There are several reasons:
1. Replication at lower layers requires upper layers to first be quiesced in a consistent state.
2. Replicating at lower layers generally transfers more data.
3. Replicating at lower layers will also replicate any upper layer corruption.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s