Data centres – Part three: power and protection
Doomsday has come. The earth trembles, the waters gush and your data centre is in imminent peril. While the event of a natural disaster may not be quite as apocalyptic as the prophesied battle of Armageddon, the results can be just as catastrophic for a business. Studies have shown that 40% of businesses that experience a disaster of some sort will go out of business within the following five years, and when the average disaster-hit SMB accrues losses of around $18,000 per hour, it’s easy to see why.
Of course, disasters need not be spectacular to inflict extensive damage; business catastrophe is not only caused by devastating natural disasters or human tragedy, but can also be the result of something as ‘simple’ as user error, a system crash or hardware failure. Whichever way, with IT becoming central to everything a business does, no matter the size of the organisation, a continuous power supply and a comprehensive disaster recovery plan are absolutely essential.
Gartner’s latest statistics on disaster recovery spending support this trend, showing that the percentage of organisations planning to spend at least 4% of their data centre budget on disaster recovery requirements has increased by 39% during the past 24 months. This mirrors the heightened awareness surrounding power availability and efficiency that has sprung up over the same period of time. Gartner said its results “illustrate a significant shift in planned spending between 2006 and 2008” and demonstrate that “organisations have come to the same realisation that requirements for recovery periods are now on the order of hours versus days, which is generating the need for significant physical and logical IT infrastructure changes, as well as a rethinking of the source(s) for DR management”.
Gen-i’s Business Manager, ICT Outsourcing, and Acting Head of ICT Solutions, Ron Murray, agreed: “There is a growing awareness of the importance of disaster recovery”.
However, while awareness may be growing, some of those interviewed by The Channel, including Gen-i’s Murray, felt there was still a fair way to go in terms of organisations fully grasping the need for a comprehensive disaster recovery plan. Microsoft’s Windows Server Marketing Manager, Tovia Va’aelua, said a lot of SMBs in particular are hesitant to implement such a plan due to cost restrictions, adding that SMBs make up about 90% of New Zealand’s market.
Shaun Vosper, Director of Data Centre Technologies commented, “I know of global companies that don’t have a DR site and are totally reliant on a single site.”One reason for people’s growing awareness, according to Gen-i’s Murray, is the increasingly sophisticated technology and a growing appreciation for the difficulty involved in looking after it, and for the criticality of the data – the life blood of the business – that is held electronically. Added to this, global compliance laws and the events of 9/11 have highlighted the need for extra security. Mark Buckland, Sales and Marketing Director for Business Continuity Asia Pacific (BCAP), noted that although Sarbanes Oxley, Basel II and UK and European data protection directly applies only to US, UK and European laws, they still require organisations dealing with the US, UK or European companies to meet compliance standards; therefore, local companies have to make sure they have an adequate and implemented disaster recovery plan.
Make a plan
Once a company has decided to implement a disaster recovery plan for their data centre, there is a considerable number of variables that must be taken into account.
According to Microsoft’s Va’aelua, “Disaster recovery is the ability to immediately show some business continuity with little to no impact on the business”. Most businesses, however, focus disaster recovery on their critical systems only, but Greg Wyman, Regional Director Asia Pacific for StorageCraft, believes the challenge for most companies is to understand which systems are actually critical.
Similarly, Gen-i’s Murray pointed out that organisations “have to make sure that the disaster recovery plan is fit for purpose whilst not wanting to over engineer it and being pragmatic about it”.Aaron Lamond, Server Marketing Manager for HP, voiced the opinion of many, saying that a disaster recovery plan is “more of a business project as opposed to just an IT project”. It should clearly define the scope and objectives of a disaster recovery project, outlining the actions that should be taken prior to, during and after a disaster.
Lamond also suggested organisations need to take their “credibility and the reputation with the customer” into account; and APC’s ANZ Manager, Gordon Makryllos, likewise remarked that maintenance of service levels are of great concern when implementing a disaster recovery plan.
A highly automated, and easily used system is preferable when implementing a disaster recovery plan, because, as Buckland of BCAP explained, “As unfortunate as it may sound, in a disaster a company may lose their key IT personnel. The DR plan and system must, therefore, be easy to understand and require minimal user intervention to get the IT systems back up and running.
Selecting a solution which is as autonomous as possible will help alleviate this risk, while also providing the benefit of being faster to restore”.
ANZ Product Marketing Manager for EMC, Shane Moore, agreed noting, “Many DR/BC solutions have been implemented with too many manual procedures. A lack of automation means that written manual procedures quickly become out of date and further increase the risk of human error when going into DR mode.”
While it is fairly easy to have a fallback site, assuming that finance is not an issue, the difficulty lies in making sure that the data you hold there is up to date and can be restored within an acceptable timeframe. StorageCraft’s Wyman believes that the key to a successful disaster recovery program lies in the quality of the backups. “The reality is that companies today need to address their backup technology or, more importantly, their recovery technology.”
He felt strongly that tape is better suited to archival purposes than backup and recovery needs, because newer, less intrusive back up technology can perform the task far more often and more quickly, providing a much more current set of data as a recovery point.EMC’s Moore also observed that organisations are required to store more data and for much longer periods; therefore, much more information needs to be protected and replicated, making time efficient backups a must.
Many large organisations run their own secondary disaster recovery site, and while this is the paradigm to which many may aspire, its necessity has been called into question with the advent of new technologies and options. Microsoft’s Va’aelua said the traditional approach has been a physical disaster recovery plan, but now there is the virtual plan to consider; these days a plethora of backup and replication options are available which are “far cheaper than a full backup and secondary site”.
BCAP’s Buckland further explained how “using intelligent software products it is possible to replicate entire server workloads from multiple physical and virtual servers in production to a single server running in a disaster recovery centre”, as opposed to having a duplicate hardware system standing idle. Va’aelua warned, however, that it is important for resellers to understand the business they are implementing the disaster recovery plan for, because “realistically speaking virtualisation isn’t for everyone and, specifically, isn’t for everything”.
One of the most prevalent disaster recovery strategies in multi-site organisations is to replicate or mirror the data at another branch, then use smart tools to manage the data remotely. Many vendors and distributors noted a trend for those with a purpose-built secondary disaster recovery site to run it as part of their infrastructure for test and development purposes so that expensive equipment is not sitting idle. Microsoft’s Va’aelua even mentioned a mobile piece of infrastructure that can be powered up and taken off-site.
Testing and communication
The planning and deployment of a disaster recovery strategy should not be the last that anyone ever hears of it. “First of all you should have a plan and it needs to be communicated, confirmed APC’s Makryllos emphatically. A DR plan also needs to be tested to demonstrate that all the policies, processes and systems work. Then the test needs to be reversed to ensure that once you are in disaster recovery mode you can successfully return to your primary site.
In Murray of Gen-i’s experience, organisations “don’t test it [the DRP], they don’t actively ensure their people are aware of it and train them in how to use it and, therefore, it becomes just a bit of brochure-ware”.
Data Centre Technologies’ Vosper agreed, and Vince Renaud, spokesperson for the American vendor-neutral organisation The Uptime Institute, highlighted the lack of plans in existence that consider how to return the systems to the primary site once in disaster recovery mode.OutsourcingMany customers choose to avoid such problems by outsourcing their disaster recovery to a specialised disaster recovery provider.
Even enterprise level organisations are considering outsourcing, as it is not only a viable and obvious choice for small businesses. As organisations consolidate and virtualise their infrastructure it is increasingly important that they protect it; and for many it makes sense to host it with a specialist organisation.The general consensus amongst vendors and distributors is that outsourcing is a popular option in terms of increased uptime availability, risk reduction and the provision of a purpose built environment.
Customers are essentially purchasing an environment that is well maintained and operated under strict service level agreements; although in doing so they become reliant on the company doing the right thing and might lose a little bit of security.Nonetheless, as HP’s Lamond pointed out, specialised disaster recovery sites are purpose built to manage the recovery of systems as well, so they are able to handle a wide range of applications, which should avoid compatibility issues when bringing data onboard and returning it to the primary site.
Equally APC’s Makryllos highlighted “level of service” as is a major concern because “most disaster recovery sites are sort of like an insurance policy, they have multiple customers and contracts for the same piece of equipment and you hope that not all customers are not going to need it at the same time”.Small businessOutsourcing’s popularity may be diluted by the revolution in infrastructure, especially around storage systems, networking and virtualisation that has given a lot more organisations the ability to implement their own disaster recovery systems, in particular small businesses, which would explain there being no clear trend either way.
Microsoft’s Va’aelua said that until now the cost of virtualisation has been prohibitive for small to medium sized businesses but that is changing. Previously, the limited deployment of virtualisation has meant “power consumption or reduction of power usage has never been a discussion topic in the small or medium business”, but this too is likely to change.
Of course, power and disaster recovery are intimately linked as HP’s Lamond explained: “The UPS, to a degree, is part of a disaster recovery plan because it gives you the ability to run on a very limited capability just long enough to shut down systems nicely”.The UPSRenaud of The Uptime Institute quantified the role of the UPS, explaining it is there for two reasons: to condition the power and to provide ride through time. Ride through time is the period of time power will continue to be provided by the UPS in the event of a power failure, and can vary from seconds to many minutes depending on the particular unit.
This built in fallback can be a double edged sword as Scott Morris, Director of Partner Sales, Australia and New Zealand, for NetApp observed: “I think a lot of organisations try to use the UPS in the wrong role, to ride out a power failure”.These organisations try to build a UPS that’s big enough to keep the system up and running until the power comes back on; whereas, in Morris’s opinion, “The fundamental role of a UPS is to allow all of your systems to run long enough to get into a safe and protected state and shut themselves down, or to provide enough power to allow alternate power sources to be switched on to allow your infrastructure to continue to run”.
Interestingly, Unisys’ New Zealand’s Business Manager – Data Centre, John Borthwick, mentioned that, “Whilst UPS provides backup power [to the load], it does not provide power to the cooling infrastructure”, which means that running on UPS as a backup for an extended period of time creates the risk of a thermal runaway and may result in heat damage to the computing infrastructure.
For this reason the UPS should be used for ride through time alone in order to provide time for a graceful system shutdown or for the backup generator to kick in.Happily, as Stuart King, General Manager of New Zealand Sales Service for ABB, pointed out, a generator needs only to be marginally bigger than some modern UPS topologies these days (although older topologies may require a generator that is almost five times larger than the UPS system). This nicely demonstrates that a power solution, as APC’s Makryllos put it, “isn’t just the UPS, it is the whole”. The solution involves everything from the power distribution units to the software to monitor the environment and provide a graceful shutdown.
Ride through and redundancy
The UPS continues to be an important part of the power solution, and data centre managers must choose the type of UPS (static versus rotary) and configuration they wish to use. The static UPS manipulates the power internally without any moving parts, while the rotary has an alternator type device that spins and doesn’t need batteries. Vosper of Data Centre Technologies suggested “the rotary style tends to be a lot better at larger sites because it becomes more energy efficient”.
“It costs more power to keep batteries charged up [in the static version] because you get a greater loss of power and heat in keeping batteries charged compared to actually keeping a large mass continuing to spin given the momentum, and I think that’s why you get greater efficiencies with a large rotary system,” agreed Gary Hull, Director of Sales for Raritan Australia.
The Uptime Institute’s Renaud said that while he saw the two systems as having comparable efficiency, rotary systems seem “very popular in Asia and in Australia” despite their considerably lower ride through time. “They don’t consider ride through time, they consider that the engines are going to come up so it doesn’t seem to be a concern,” he said of our APAC cousins.
Whether the generators start up or not, both UPS systems should provide enough time for a graceful system shutdown at the very least.The aim of a UPS system is to provide adequate redundancy, and there are several ways of doing that.
The N+1 configuration, or parallel redundancy, either has redundancy within one UPS, or more than one UPS can be connected together to run in parallel, allowing for redundancy or future expansion (parallel capacity). The 2+0 configuration means that the UPS has two separate mains power feeds to provide the required redundancy in the event of one power supply failing.
UPS modules can also be run in parallel to provide short repair times (modular UPS systems) and are growing in popularity. For dual redundancy, separate UPS systems feed dual fed IT equipment and a static transfer switch is used to feed devices that only have a single feed.“There are different approaches, and it depends on your investment and whether you have two UPSs or redundancy within the [one] UPS,” said Makryllos of APC. If, like much of New Zealand, you’ve only got one power supply then “N+1 is the more cost effective way”.
Craig Newby of Eaton Powerware was the only interviewee to mention the use of DC power instead of a UPS system within the data centre and had some interesting comments.
“Some companies have done a complete swap out of their infrastructure and gone to DC powered setups and they are seeing much higher efficiencies and so on,” he stated.
Traditionally DC power has been seen as more reliable (although modern UPS systems are closing the gap) because the IT equipment is connected directly to the batteries without any power electronics in between, whereas the power electronics in a UPS can be a point of failure.
Newby affirmed “The big advantage with a DC system is that the data centre equipment is completely isolated from the mains”; however, he admitted that UPS systems remain the predominant choice for data centres, but suggested that mission critical loads will still be on 48 volt, “because it’s inherently more reliable”.As Newby mentioned, there have been great advances in UPS and power solutions technologies over the last three to four years that have improved efficiency.
APC’s Makryllos suggested the channel opportunity here is to approach customers that have an older power system and offer to do an audit of their power and infrastructure.
Makryllos also noted that, with equipment becoming “smaller and thinner” tolerances to power spikes are also decreasing, meaning it is increasingly more important to have adequate power filters and conditioners. “You can void warranty on some servers and storage if the vendor believes that your environment has not been a managed environment,” he concluded.
Over the last few years the demand for power has soared and data centre managers are beginning to encounter a raft of different problems, namely an increase in power costs and cooling issues. Peter Spiteri, Senior Marketing Manager for Emerson, illustrated how a 100m2 data centre uses an astounding one megawatt, or the equivalent of 1200 modern households worth of power.
Research by Gartner demonstrates that this quest for power will only worsen and it “will increase the energy required to power and cool the ICT hardware infrastructure”.
Independent consultant Vosper, of Data Centre Technologies, observed that some Australian power companies are limiting the amount of power a data centre can be given in an attempt to maintain a balance power grid. And if a data centre cannot maintain a certain level of power redundancy and capacity, its tier classification could be at risk.Unisys’ Borthwick said, “The biggest impact of scarcity of power is cost.
Data centre operators are affected by ‘electricity spot prices’ and the increasing cost of power in general – everyone’s struggling. Data centre operators are consciously looking at reducing energy requirements and, therefore, costs by deploying energy efficient technologies from cooling infrastructure to servers.”
Historically data centre pricing was charged per square foot or square metre; however, it now has to be charged by area, plus the price of power.Virtualisation is the golden promise of reducing power consumption, and it does just that by consolidating a data centre’s physical infrastructure. However, as Hull of Raritan asserted, “It’s a double edged sword: virtualisation can lead to power savings but can also lead to high density heating issues and the subsequent cooling issues as well”.
APC’s Makryllos agreed, suggesting that resellers are yet to understand the full implications and flow on effects of virtualisation within the data centre, particularly in terms of power and cooling. “Resellers have been well educated in server and storage over many years. Power and cooling issues have only come to the fore in the last 12-24 months.”
Having witnessed data centres overload and loose power, Emerson’s Spiteri is taking the scarcity of power and the endless and increasing need for it very seriously. To avoid this happening, he suggests data centre managers “try to stay abreast of the situation” by talking to their power providers to find out what sort of power availability there is.
He added that in many cases “the CIO or the CTO needs to bring the IT manager and mechanic together” so they can work as a team – elementary, but in many data centres not the case, according to Spiteri.Spiteri has recently promoted the vendor neutral Energy Logic Symposium and white paper that has produced a top ten list of ideas to reduce power consumption by a conservative estimate of over 50 percent!
These are enormous savings, but the ideas behind many of them are simple, said Spiteri. To find out more on how to reduce power usage, read Spiteri’s skills column on page 42 and read/download the Energy Logic white paper.
Cool and green
As we discussed in our first instalment on data centres, cooling is a major power consumer and reduction solutions abound. Some of the most popular ones include leveraging the low external ambient temperature by filtering it and using it to cool the data centre; splitting the data centre into modules to be cooled on an in-use basis only; and implementing hot or cold aisle containment.
It has also been suggested that 20% of existing servers could be switched off without any loss of data or performance and, similarly, most servers are set to normal power use by default, but will perform just as well when set to their ‘power save’ option.
Gartner echoes the Energy Logic paper, proposing “a green data centre will broaden its environmental strategy beyond energy efficiency, gleaning the maximum amount of production from the minimum amount of materials and energy, without compromising performance, resilience and security. Such an approach requires an end-to-end integrated view that includes the configuration of the building, energy efficiency, waste management, asset management, capacity management, technology architecture, support services, energy sources and operations”.
Food for thought
Clearly there is still “a major requirement for education”, as Microsoft’s Va’aelua put it, with regards to power and cooling issues and how to implement the best disaster recovery option for your business.
Emerson’s Energy Logic white paper and The Uptime Institute’s Research Symposium & Expo: Lean, Clean and Green Symposium in New York next April go a long way to help you, as resellers and SIs, to understand the needs of a modern data centre and to pass that knowledge on, using compelling facts to create a cohesive and all encompassing energy efficiency plan.Just as the power solution needs to take all facets of a business in to consideration, so does the disaster recovery plan. Equipment and policies go hand in hand, and a well implemented but poorly understood and non-tested system is not a success.
Microsoft’s Va’aelua suggested that “the challenge that we are finding when we are talking to people about what DR is [...] is being able to talk about the unfortunate event of a disaster but being able to do it without any scaremongering.”
In these fairly tough financial conditions, APC’s Makryllos said disaster recovery and power solutions are “a great opportunity for the channel” because, although the IT industry is experiencing a general slowdown in business, the areas of disaster and recovery and power seem to be unaffected because they are not “discretionary areas”. Makryllos sees great potential in this area because “it’s an area they [the channel] haven’t focused enough on”.