9 min read

Querying Performance Data from the XtremIO REST API

(If you're not already famaliar with using the XtremIO REST API, then I'd suggest reading my series on using it first)

XtremIO has a very powerful API for accessing performance data from the XMS, but it can be a little difficult to get your head around. The REST API Guide available on support.emc.com, does a good job of documenting the options, but not really how to use them.

Understanding XtremIO Performance Data

Before accessing the data from the API, it helps to understand a little about how performance data is collected and stored by XtremIO.

Every 5 seconds the XMS attempts to collect hundreds of performance metrics from the array - not just at the array level, but also at the level of things like Targets, Initiators, and Volumes.

This is obviously a lot of data, so over time the data is consolidated to less frequent intervals. The initial "raw data" is kept of 3 days. One minute data is kept for 7 days, ten minute data for 30 days, and so on. (See the XtremIO Users Guide for full details)

What is interesting is that this consolidation is done soon after the data is collection, not once the data reaches the expiry time for the previous timeframe (eg, 3 days for the 5 second raw data). Thus for (say) data collected yesterday the system will have data available at each of "raw data" (5 seconds), "1 minute", "10 minute", "1 hour" and "1 day" granularity, After 3 days, the raw data will be deleted, but the other points will return. After 7 days, the 1 minute data will be deleted, and so on.

As we'll see shortly, when querying data we can specify which of the granularities we want to use, or we can let the system decide for us.

When the consolidation occurs, the system will maintain 3 data points for each time period - a minimum, a maximum, and an average. However for the initial raw data it will have only a single value (which is in itself generally an average over the 5 seconds collection iterval)

Querying the Data

All performance queries are made using a single URI - /api/json/v2/types/performance

At a minimum you need to pass an "entity" parameter to specify what you which type of object you want performance data for. The list of possible entities is contained in the REST API guide, but you can also get a list by passing in an invalid entity (eg, .../performance?entity=A) which will return a response containing the full list :

{
    "message": "Command Syntax Error: entity property must have one of the following values: [SnapshotGroup, Initiator, Target, VolumeTag, XEnv, DataProtectionGroup, Volume, Cluster, Tag, InitiatorGroup, InitiatorGroupTag, SSD, TargetGroup, Xms]", 
    "error_code": 400
}

Given that many of these entities return a LOT of data by default, lets start with one of the more simple ones - "XEnv" which will return performance data for the "X Environments", otherwise know as the CPUs!

Sending a request for /api/json/v2/types/performance?entity=XEnv (note that the entity name IS CASE SENSITIVE, so in this case the X and E need to be upper case!) returns a JSON response which basically consists of 2 sections. The first is a number of "counter" entries that look like :

"counters": [
    [
        1513555200000, 
        "529d231911a84b0a879a557c250abcd4", 
        "X1-SC1-E1", 
        1, 
        2.2815487587808998
    ], 
    [
        1513555200000, 
        "33ba4bf1e2ea4175905098bdc9a825c2", 
        "X1-SC1-E2", 
        2, 
        2.1723940081082902
    ], 

Secondly is a "members" entry :

"members": [
    "timestamp", 
    "guid", 
    "name", 
    "index", 
    "avg__cpu_usage"
], 

We need to use the two of these together - the multiple "counters" entries include the actual data, whilst the (single) members entry provides the order of the fields in the counters.

Thus for the example above we've got a data point with a timestamp of 1513555200000, for the XEnv with the name "X1-SC1-E1" (X-brick 1, Storage Controller 1, Environment/CPU 1), for which the performance counter "avg__cpu_usage" has a value of 2.2815487587808998

We've also got a second data point with the same timestamp, but for the XEnv "X1-SC1-E2" (CPU2), with an "avg__cpu_usage" value of 2.1723940081082902. There were of course hundreds of other data points returned - in part because we didn't specify a time frame for the query!

The name of the data field "avg__cpu_usage" actually tells us that we were NOT seeing the raw data in this result, but instead one of the consolidated values, and specifically the "average" value (not minimum or maximum for that consolidated period). We'll see below how to get the raw data or the min/max values. The double-underscore in the name is used to indicate the "avg" comes from the XMS's consolidation of the data, and not an average value from the array itself.

Limiting the Results

By default, the number of data points returned can be very large - especially if you're not specifying a time range (which we'll cover in a moment).

You can filter the returned results in 2 separate ways - either by the entities (eg, one or more specific volumes), or by the properties returned (eg, bandwidth, iops, latency, etc).

To filter by entity you use the "entity-name=XXX" option on the URL. Multiple of these can be used in the same query to return data for multiple entities.

eg, to get data just for the volume "MyVol1" you could use :

/api/json/v2/types/performance?entity=Volume&entity-name=MyVol1

To get data for three separate volumes, simply list them all :

/api/json/v2/types/performance?entity=Volume&entity-name=MyVol1&entity-name=MyVol2&entity-name=MyVol3

Even with only a few entities the results can still be very large. For example, "Volume" entities return around 24 different performance metrics for each time point, including bandwidth (avg__bw), read bandwidth (avg__rd_bw), write bandwidth (avg__wr_bw), IOPS, read IOPS, write IOPS, etc

Most likely you only need a few of those, so you can specify which properties you want returned using the "prop=XXX" option. The best way to use this is to do a query without it to see the full list of properties, and then specify just the ones you want. As before, you can pass multiple "prop" options to get multiple properties returned.

eg, to get the average bandwidth and IOPS for both MyVol1 and MyVol2 :

/api/json/v2/types/performance?entity=Volume&entity-name=MyVol1&entity-name=MyVol2&prop=avg__bw&prop=avg__iops

Note that there are a few properties returned regardless of whether you ask for them or not, such as timestamp, name, index and guid.

Filtering by Time

As mentioned above, the XMS consolidates data over time, with data being available for up to 2 years (although at a low granularity - one point per day!), or as frequently as every 5 seconds.

When it comes to selecting what data we're interest in there's a few different options that play a part.

The first is "time-frame", which can be one of four specific ranges "last_hour", "last_day", "last_week" or "last_year", or alternatively "real_time" or "custom_time".

When using custom_time, either or both of "from-time" and "to-time" can be specified to control the range - if either (or both) parameters are skipped then the time of the oldest (from-time) or newest (to-time) data available will be used.

from/to-time are specified in GMT time, in the format "YYYY-MM-DD hh:mm:ss". Note that's a space in the middle, which is not a valid character to have in a URL. If the software you're using to make the request doesn't automatically do so, you should replace it with "%20" which is the encoded equivalent of a space (Space is ASCII code 32 decimal, which is 20 hex).

In addition to "time-frame" you can also specify which level of data consolidation, or "granularity" you want the data from. ie, whether you want the "raw" data (5 second), "one_minute", "ten_minute", "one_hour" or "one_day" values. Of course, not all data is available for all time ranges, so if you ask for a "time_frame" of "one_year" with a "granularity" of "raw", you'll actually end up getting only the past 3 days worth of data - as that is all that is available for the "raw" setting.

It's also possible to pass a "granularity" of "auto", in which case the system will automatically determine the granularity based on the time range you've specified.

Aggregation Type

The final parameter you may want to use is "aggregation-type". By default, when querying data that has been consolidated (ie, everything except "raw" data) you will be given the average over the time period, and the name of any fields that have been aggregated with be prefaced with "avg__" (eg, avg__iops). You can instead specifically specify that you want the "min", "max" or "avg" values - and as with the other options you can specify multiple if needed.

/api/json/v2/types/performance?entity=Volume&entity-name=MyVol1&aggregation-type=min&aggregation-type=max&aggregation-type=avg

This will give you results that include all of min__bw, max__bw and avg__bw (and of course the same for every other property returned!)

This also leads to one of the quirks of the performance API. If you end up querying the "raw" data - either because you specifically specify it (with granularity), or because you specify a time-frame where the system picks that data automatically (eg, last-hour), then the data isn't aggregated. This means that there are no min/max/avg values, but more importantly it means that the property names returned do NOT include the avg__ at the start like they do when querying other time frames.

eg, if you query XEnv performance data for the past hour, the response will have the following entries :

"members": [
    "timestamp", 
    "guid", 
    "name", 
    "index", 
    "cpu_usage"
],

But if you instead query for the data for the past day (with no other changes), you will receive :

"members": [
    "timestamp", 
    "guid", 
    "name", 
    "index", 
    "avg__cpu_usage"
],

There's 2 potential ways to handle this - either explicitely specify the granularity of the data you want to receive, or programatically handle the avg__ prefix on the results if it exists.

"null" values

There is one other thing to be aware of when using "raw" data, which is that although the system attempts to collect data every 5 seconds, for various reasons it occasionally fails. eg, there might be a network issue between the XMS and the array that causes the data collection to fail.

In this case, when querying the raw data you will still receive entries, however the value will be "null". For example, for a XEnv query where no data is available you'll see :

   [
        1527861190000, 
        "529d231911a84b0a879a557c250abcd4", 
        "X1-SC1-E1", 
        1, 
        null
    ], 

There is a second occasion that "null" will be returned. If a volume or a snapshot has never been mapped to a host (ie, it doesn't have an NAA assigned), then the array doesn't generate performance data for it - as it can't have any! In this case, the data for both the raw data and the consolidated values will be "null"

[
        1527938480000, 
        "a3ea0523b56a49c4a200eb04070bb210", 
        "NewVol1", 
        4, 
        null, 
        null, 
        null, 
        null, 
        null,

These entries should generally be ignored as invalid (or at best, uninteresting) data points.

Putting it all together

Lets say I wanted to get the average IOPS and Bandwidth for my two Oracle volumes for the first week of May 2018, with one data point per hour. So we've got :
entity=volume because I'm looking for data on a volume
entity-name=Oracle1&entity-name=Oracle2 as those are my 2 Oracle volume names
prop=avg__bw&prop=avg__iops because these are the only two properties I'm interested in
time-frame=custom_time&from-time=2018-05-01 00:00:00&to-time=2018-05-07 23:59:59 (Don't forget to replace the spaces with %20 if needed!)
granularity=one_hour

Giving me a full query of

/api/json/v2/types/performance?entity=volume&entity-name=Oracle1&entity-name=Oracle2&prop=avg__bw&prop=avg__iops&time-frame=custom_time&from-time=2018-05-01%2000:00:00&to-time=2018-05-07%2023:59:59&granularity=one_hour

If I wanted to get both the maximum AND average values, then I'd need to add :
prop=max__bw&prop=max__iops (In addition to the existing prop entries)
aggregation-type=max&aggregation-type=avg in order to get BOTH minimum and average

Giving :

/api/json/v2/types/performance?entity=volume&entity-name=Oracle1&entity-name=Oracle2&prop=avg__bw&prop=avg__iops&prop=max__bw&prop=max__iops&time-frame=custom_time&from-time=2018-05-01%2000:00:00&to-time=2018-05-07%2023:59:59&granularity=one_hour&aggregation-type=max&aggregation-type=avg

Final Thoughts

Whilst this may seem complex, once you get your head around the options it's actually fairly simple. As with all REST API queries, the best option is to simply play around with some queries and see what you get back - either with a tool like Postman or HTTPRequester in Firefox, or even using something like cURL. Try different options, and see what you get back. Use prop and entity-name to limit the number of entries.

There's also another option I haven't mentioned which is "limit=X" which can be used to limit the number of results returned. Using this in a final query probably isn't a good idea, but it can be useful when learning and testing to make sure you don't get too much data back from a query.