XtremIO Snapshot Refresh

XtremIO 4.0 added the ability to refresh a snapshot from either the original source volume, or from any other snapshot copy of that volume. Just like XtremIO snapshots themselves, this was implemented in an extremely cool way, but also a way that can be a little difficult to get your head around as far as the resulting volumes on the array.

Hopefully the description here will allow things to "click" in your head, and once they do you'll realize just how cool the implementation is!

First lets get some terms out of the way...

Consistency Group - A Consistency Group (CG for short) is a group of volumes that can be used to create a consistent snapshot across all of the volumes in the group. Whilst there's no need to use a CG to get a consistent snapshot (the same can be done with tags, or by just selecting multiple volumes manually), using a Consistency Group makes things easier to manage - especially when it comes to refreshing snapshots.

Snapshot Set - A Snapshot Set (SS) is one of the results of a snapshot operation - it's similar to a CG (and can actually be used as one), but contains the snapshot volumes that were created when the snapshot was taken. So if you take a snapshot of a CG with 4 volumes in it, then 4 new volumes will be created (the snapshot volumes), and those 4 volumes will be in a newly created snapshot set.

Refreshing a snapshot

There's really two ways to look at what a refresh operation does - from the host perspective, and from the array perspective.

From the Host perspective, a refresh operation results in the contents of the volumes that are being refreshed being instantly updated with the content of the volumes that they are being refreshed from. This is exactly what you should expect - other than the time it takes to occur (instant v's potentially minutes or hours with other snapshot implementations) this is no different to what you'd see for any other snapshot refresh implementation.

The speed of the operation also means that there's no need for things like blocking access to the LUN during the refresh, but obviously from an application perspective there is still a need to stop accessing the LUN on the target side whilst the data changes - such as stopping applications, unmounting filesystems, exporting disk groups, etc.

From the Array perspective what is actually being done is unlike any other snapshot refresh implementation, which is one of the reasons that it can be done so quickly and with no performance impact.
Rather than actually refreshing the existing snapshot, which would require copying (at least) metadata within the array, we simply take a new snapshot - an operation that completes in basically zero time. We then swap the "personality" (or what is sometimes referred to as the "SCSI Face") of the volume - its volume name, its NAA, any mappings it has (including the corresponding SCSI ID's), and even the creation time with the volume that we are refreshing.

This operation is completely transparent to the host so there's no need to rescan disks/etc, but when it makes a request to the LUN the array now directs that request to the newly created snapshot volume. "Refresh" completed!

As a new snapshot has been taken, a new Snapshot Set is also created to contain the new volumes. The name for this Snapshot Set can be specified when doing the "refresh", or if none is given it will use a default name of SnapshotSet.<timestamp>. As the volume names for the snapshots are moved from the old snapshot volumes to the new ones during the refresh operation, it will appear that these volumes have "moved" from their original snapshot set into the new one, when in fact the volumes themselves haven't moved, but their corresponding names/SCSI face/mappings/etc have!

Of course this also means that we now have 2 snapshot sets - the original snapshot set containing the original (now unused) snapshot volumes, and the new snapshot set with the currently-mapped snapshot volumes.

In some situations it may be desirable to keep the pre-refresh copy of the data (think of it as a "backup" copy of that data) - perhaps to revert to it at a later stage, or just as an actual backup. In other cases this extra copy may not be needed, so it can be deleted.

Thankfully this deletion can he handled automatically as a part of the refresh - either by unselecting the "Keep the Backup..." option when using the GUI, or passing the "no-backup" option to either the CLI or the REST API.

Before Refresh - a single Snapshot Set containing 4 volumes which are mapped to a host :

After Refresh without using "no-backup". A new Snapshot Set is created with the new snapshot volumes in it. The volumes that were originally in the old snapshot set will appears to have moved to the new set due to their names being reassigned to the new snapshot - along with their host mappings/NAA/etc (notice how the "Mounted" flag has moved) :

If the no-backup option had been selected, then the original snapshot set (SnapshotSet.1452152517217) would have been automatically deleted.

Refresh v's Restore v's Create-snapshot-and-reassign

The actual command used to refresh a snapshot set depends on where you are doing it.

From the GUI there are two related but different options - Refresh and Restore. Refresh is the unconstrained option - it allows you to refresh a snapshot or an entire snapshot set from any other snapshot/snapshot set of the same volumes. You can refresh from a read-write or a read-only volume, and you can refresh any number of levels up or down in the tree - as long as two volumes share the same parent volume, you can refresh between them in either direction.

Restore is a deliberately-constrained version of refresh. Restore only allows refreshing from a direct child volume, and only when that child is a read-only volume. It is intended to be used when restoring a volume from a known-good, unmodified (thus read-only) backup-style copy of the parent volume. Functionally it does exactly the same as refresh, however it will only be available as an option when a read-only child copy exists. If for some reason you can't use restore (eg, the snapshot you want to restore from is read-write) then you can always revert to refresh - just without the guarantee that the data on that volume is an immutable copy of the original.

From the CLI there is only a single refresh-related command, with the somewhat unfriendly name of "create-snapshot-and-reassign". As odd as that command name seems, when you consider what this command does it's actually a good description - it creates a new snapshot, and then "reassisgn" the characteristics of the old snapshot (volume name, NAA, SCSI ID, etc) to the new one.

The syntax of the command itself is fairly self-explanatory - just remember that the "from" options relate to the object that data will appear to be copied from, and "to" is the object it'll be copied to. The "backup" options refer to the name that will be given to the existing snapshot volumes if they are not automatically deleted with the 'no-backup' option.

The REST API is similar to the CLI, however that'll be covered in an upcoming post in my Using the XtremIO REST API series.