Bridgeworks CEO, David Trossell speaks to Data Centre Metrics Magazine this month about the metrics needed to assess the efficiency of your data centre.
March 29, 2018
Made To Measure
Traditional datacentre metrics include the power usage effectiveness (PUE), uptime, downtime, network performance to and from the datacentre, heat gain/loss, OPEX, solutions provided, back-up and restore times, SLAs, etc. Each metric measures a different aspect of datacentre performance. For example, a datacentre with a good PUE rating will often be one that has green credentials because it uses less power to primarily operate its servers and cooling equipment. In turn, it will increase their efficiency by reducing the datacentre’s operating costs and even its carbon emissions.
Industry group The Green Grid developed PUE as a metric in 2007, and now it has created another one called Performance Indicator (PI). The difference between PUE, which is mainly concerned with power usage, and PI is that the latter is more concerned with datacentre cooling. Both metrics are, however, designed to measure efficiency, performance, and resilience. So, if you raise the temperature on the datacentre floor you can gain energy efficiency, but if the temperature is too high or too cold, the result could be calamitous – leading to hardware failure.
Redundant servers, data and network latency can also impact efficiency in other ways. Nonetheless each piece of equipment within a datacentre can have a measurably impact on whether a datacentre is efficient, with good performance and resilience. So, for example, if you have slow and outdated servers, yet have a fast and modern network pipe going into the datacentre, the results could be that the servers can’t process data as fast as the coupling of more modern servers and network infrastructure.
Metric abundancy
Other datacentre metrics that need to be considered when measuring performance and efficiency include the following:
- Compute Units Per Second (CUPS)
- Water Usage Efficiency (WUE)
- Datacentre Infrastructure Efficiency (DCIE)
- Carbon Usage Efficiency
- CPU Utilisation
- I/O Utilisation
- Server utilisation – which traditionally tends to be very low.
- Total Cost of Ownership
- Capital Expenditure (CAPEX)
- Operational Expenditure (OPEX)
- Transaction Throughput and Response Times
- Return on Investment (ROI)
- Recovery Time Objectives (RTO)
- Recovery Point Objectives (RPO)
It’s worth noting at this juncture that a datacentre can still be inefficient even if it has a great PUE and WUE rating. This emphasises the need to look holistically at how datacentre efficiency and performance is measured. Measuring performance is no easy feat. For example, servers loaded to use for the sake of increasing utilisation for no financially justifiable reason, would increase operating costs. So, this would arguably make the datacentre inefficient if performance is defined as being an equation of how much work is completed against all the resources used to complete the task(s).
Back-up, back-up!
With the need to back-up, back-up and back-up in real-time in many cases to ensure uptime, recovery time objectives and recovery point objectives are achieved, the spectre of data and network latency must be considered, too. The further away a datacentre or disaster recovery site is located, the greater the impact that latency will have on the datacentre’s ability to send, receive and process the data traffic emanating, most probably, from a variety of data sources.
Packet loss is another factor that needs to be considered whenever anyone talks about data and even datacentre efficiency, performance and resilience. An efficient, performant and resilient datacentre will therefore spend time to address these issues. This means that, to ensure that they measure highly against these markers, they will also have to analyse the knock-on effects of each part of the datacentre and network infrastructure, cooling strategy and process. This should then lead to a comprehensive plan to ensure that a high level of datacentre performance can be maintained.
Holy Grail metrics
Eric Smalley, writing for Wired magazine about ‘IT Minds Quest For ‘Holy Grail’ Of Data Center Metrics’’ explains the challenge: “Settling on a metric is only a step toward solving the even bigger challenge of calculating a data center’s value to the business it serves. In the Internet age, as more and more of the world’s data moves online, the importance of this task will only grow. But some question whether developing a data center business-value metric is even possible. As you move up the chain from power systems to servers, to applications to balance sheets, the variables increase and the values are harder to define.”
The Holy Grail is, therefore, to find a standard way of measuring useful IT work per unit of energy concerned. Yet, the overall performance of a datacentre requires much detailed analysis to gain a true picture of whether a datacentre is highly efficient and performant. Nothing can be taken for grant and analysed in complete isolation. For example, datacentres are dead without a network infrastructure and connectivity. So, there needs to be a view of how well they mitigate data and network latency. To achieve increased speed, WAN data acceleration solutions such as PORTrockIT are needed.
Case Study: CVS Healthcare
By accelerating data, with the help of machine learning, it becomes possible to increase the efficiency and performance of datacentre and thereby of their clients. CVS Healthcare, is but one organisation, that has seen the benefits of taking such an innovative approach. The company’s issues were as follows:
• Back-up RPO and RTO
• 86ms latency over the network (>2,000 miles)
• 1% packet loss
• 430GB daily backup never completed across the WAN
• 50GB incremental taking 12 hours to complete
• Outside RTO SLA – unacceptable commercial risk
• OC12 pipe (600Mb per second)
• Excess Iron Mountain costs
To address these challenges, CVS turned to a data acceleration solution, the installation of which took only 15 minutes. As a result, it reduced the original 50GB back-up from 12 hours to 45 minutes. That equates to a 94% reduction in back-up time. This enabled the organisation to complete daily back-ups of its data, equating to 430GB, in less than 4 hours per day. So, in the face of a calamity it could perform disaster recovery in less than 5 hours to recover everything completely.
Amongst other things, the annual cost-savings created by using data acceleration amounted to $350,000. Interestingly, CVS Healthcare is now looking to merge with Aetna, and so it will most probably need to roll this solution out across both merging entities.
Any reduction in network and data latency can lead to improved customer experiences. However, with the possibility of large congested data transfers to and from the cloud, latency and packet loss can have a considerable negative effect on data throughput. Without machine intelligence solutions, the effects of latency and packet loss can inhibit data and back-up performance.
Industry disruption
Further to its merger with healthcare insurance provider Aetna, CVS may now want to expand the results across both companies. Forbes magazine claims that CVS is therefore the company to watch throughout 2018. The company generated $177.5 billion in net revenue in 2016, and it’s thought that the merger will disrupt the healthcare market by pushing down the price of healthcare plans. It currently offers the third largest healthcare plan in the US. CVS is described as being retail and healthcare industry savvy, and so it’s predicted that it will beat off market outsiders such as Amazon.
WAN data acceleration
With WAN data acceleration, it becomes possible to locate datacentres outside of their own circles of disruption without increasing data and network latency – and much further away than is traditionally achieved with WAN Optimisation. Packet loss can also be reduced too, and so it’s now possible to have highly efficient and performant datacentres, as well as disaster recovery sites, situated in different countries right across the globe without impinging on their performance.
So, arguably, the proximity of datacentres also need to be considered whenever datacentre efficiency and performance is calculated. Part of the analysis may also require datacentres, and their clients, to calculate the risks associated with their location versus the cost-savings, customer satisfaction and profitability gains that can come with business and service continuity. It’s therefore important to use a wide variety of metrics to define how well a datacentre performs. This includes its green credentials.