From the May 2004 Issue

Enhancing the Enterprise

Tech and Tenderness

Breathing New Life Into Home Care: Case History

Bar Coding at the Bedside: Case History

When One = Zero

 

When One = Zero


Bruce Buell is the director of facilities for CyrusOne, an IT infrastructure and services provider in Houston. Contact him at bbuell@cyrus one.com.
 

Single points of failure can lead to revenue and production losses for the unprepared enterprise.

By Bruce Buell, C.P.M.M., C.H.F.M.

Once deemed a luxury, the need for hospitals and other healthcare institutions to maintain continuous operations of critical applications and infrastructure has become a business imperative. With increased government regulation and heightened security concerns, many C-level executives find that they must identify a strategic plan for business continuity that will deliver the power and connectivity they require, as well as provide for the fastest recovery possible while controlling costs.

While the specific requirements of individual departments may vary, the one component that remains constant is the need for continuous availability. The burden for maintaining this lies with the information technology group or department to identify the best way to deploy an infrastructure that is seamless, meets the business requirements for each person or department and takes into consideration the potential business interruptions that might occur.

The base of all data is ones and zeros—that’s a well-known fact. However, when considering the infrastructure requirements for a Tier 1 data center, the consequence of one single point of failure, anywhere in the system, can result in one or all of the following scenarios: lost revenue, lost production and lost data.

It goes without saying that healthcare companies and institutions expect a superior level of redundancy to be built into their data center infrastructure. They also expect a level of expertise from their data center manager to manage the power and connectivity needs to meet their demands. While most data centers are equipped with a certain level of redundancy, they often don’t account for some of the most obvious requirements. Redundancy must extend past the back plane of infrastructure and into the rack configuration as well; because this is so obvious, it is probably one of the most overlooked problems in the data center environment.

Mission Critical Designation
When designing and developing their infrastructure, system engineers generally begin by identifying the different resources (switches, routers and drivers) that will be employed, and then designate which devices are mission-critical and will require redundant power supplies.

Often, it is devices deemed less critical that are ordered with one power supply. In some cases, older devices in use don’t have an option for a second power supply, with most 24-port switches designed to serve as single power supply devices. The consequence for these solitary system devices is that, in the event that the switch’s power supply fails or if the circuit it is plugged into fails, the flow of data stops at the switch, regardless of all the dual power supply drivers that are still working. The end result is that the single point of failure has now just cost that institution or company significant dollars in terms of revenue, production and data loss.

Single Points of Failure
When single points of failure are introduced into the system, failure becomes an issue of when, not if. And if Murphy rules as usual, that failure will occur at the worst possible moment.

More than one data center in our industry has experienced the unfortunate event of a utility failure. To reinforce the “when” portion of the statement above, standard procedure for a data center facility would be to call upon the uninterruptible power supply (UPS) system to provide emergency power, as well as to call backup generators to duty. But, due to an undetected, defective, single battery string, the UPS system is unable to comply and fails. Meanwhile, the emergency generator is unable to start because one faulty battery in each of the redundant strings fails. This would constitute “a Murphy kind of day.”

The value for providing redundancy in one’s infrastructure goes beyond the need for additional protection. It also is an acknowledgement that electrical failures will occur. While IT managers have no control over when or how often they might occur, managers can be proactive in minimizing risk while ensuring the power and availability required around-the-clock.

Several years ago, a hospital suffered a huge communications failure due to an accident that occurred 10 miles away. The backhoe operator at a construction site got careless and severed the only cable providing communication services into that area from the central office.

That error disabled the hospital’s ability to communicate to its staff of doctors via phone or pagers. The single point of failure in this case was due to the phone company not having the ability to route calls across another cable, which left the hospital at the mercy of the phone company for several hours and jeopardized communications, compromising service to its patients.

Unfortunately, the hospital’s systems and infrastructure continued to be tested. Several months later, the utility company lost two of the three phases to the main building’s electrical system. This single-phase condition put an extreme current draw on the remaining hot phase of the hospital’s electrical system.

Thirty days later, one 3,000-amp section of switch gear exploded, when one connection inside the gear finally got hot enough from being too loose. The infrared scan done 60 days prior to the explosion did not reveal a problem. That’s because it wasn’t a problem until after the single-phase condition had occurred. The temperature during the single-phase condition caused that connection to expand. When service was restored and the feeds cooled down, that connection contracted and started a vicious cycle of expansion and contraction, building up further resistance that eventually caused the explosion.

In a separate incident, a technician installing dual-powered servers in a rack system plugged both cords from the servers into the same power strip, an action that removed all power redundancy to the servers. Most routers and firewalls are single power supply devices. A failure of any one of these could result in loss of communication and production downtime, and could even expose them to hackers attempting to break in and cripple systems.

By utilizing innovative technologies, healthcare enterprises can protect their infrastructures and systems, provide redundancy and minimize business interruption scenarios resulting from single points of failure. For example, installation of a source transfer switch can provide protection to single power devices by sensing loss of the preferred source and transferring to a redundant source in less than one-quarter of a cycle. At that speed, the load does not experience a loss of power. Also, there are some switches available that allow Internet-ready monitoring and alarming capabilities.

What If?
Generally, designing engineers fail to play “what if?” Systems are designed to work, but not to be worked on. Thorough analysis, including identification of the scenarios that could develop when a device is crippled and the ramifications of that loss, can prove invaluable. It can provide end-users with a “bigger picture” view that illustrates how the loss of even a small, insignificant router can devastatingly impact the entire system.

Providing the power needed to operate 24 hours a day is no easy task, but with thorough preparation and design, deployment and implementation, the power and redundancy required is within reach.

For more information about disaster recovery services from CyrusOne,
www.rsleads.com/405ht-201

© 2004 Nelson Publishing, Inc