Making the Most of Server Performance Monitoring

YYN9T7C2DYJU

Stephen J. Bigelow, Senior Technology Editor, DataCenter News

It’s just not enough to install new servers, set them up, install applications and then walk away. Servers need regular performance monitoring to ensure that your hardware investment will deliver the service you expect – and provide ample early warning of impending trouble, such as resource shortages or hardware issues. Performance monitoring tools can provide a wealth of useful information, but only when those tools are set up and running properly. Fortunately, a few important insights will help any administrator get the best results from performance monitoring.

Achieving accuracy in performance monitoring
Monitoring is useless if it delivers erroneous information, so ensuring accurate data should be one of your first considerations. There are several aspects to accuracy, including interoperability, sampling window, tool architecture, virtualization awareness and calibration.

Interoperability. For this discussion, interoperability is basically the ability of a performance monitoring tool to access and read data points from the various pieces of hardware within your data center environment. Homogeneous environments focused on a single vendor’s product line can take advantage of performance monitoring tools that use hooks deliberately integrated into the hardware. These hooks can deliver detailed information to the tool.

The situation can be far more challenging for heterogeneous environments, where tools and hardware don’t mesh. A vendor’s tool may look for data that certain pieces of hardware simply cannot provide with the required level of consistency (if at all). It’s a similar problem for third-party performance monitoring tools that often cannot detect every sensor or hardware nuance on every possible device, and instead rely more on operating system-level data, which usually lacks granularity. In either case, the result is missing data or inaccurate data points that reduce the insight gained from performance monitoring.

Continue Reading…

Ten Steps to Increasing Data Center Efficiency and Availability through Infrastructure Monitoring

A White Paper from the Experts
in Business-Critical Continuity Summary. . .

The first decade of the 21st century was one of rapid growth and change for data centers. For most of the decade, data center managers were forced to react to rapid, continuous changes dictated by the capacity and availability requirements of their organizations, and the density of the equipment being deployed to meet those requirements.

Now, data centers must enter a new stage of maturity marked by a more proactive approach to management to enable increased efficiency, better planning and higher levels of service. Achieving actionable visibility into data center operations requires the ability to collect, consolidate and analyze data across the data center, using advanced devices, sensors and management software.

The ten steps outlined in this paper provide a systematic approach to building the foundation for data center infrastructure management by deploying and leveraging measurement, intelligent controls and centralized monitoring and management. Data centers employing these 10 prescribed point solutions for infrastructure performance monitoring stand to gain an operational, strategic and transformative advantage for their enterprise or business.

Continue Reading…

Data Center Monitoring System Considerations

Bill Kleymen

Data center monitoring is often focused on computers; monitoring system performance, tracking virtual workloads, and reacting to the inevitable warnings and alerts that spell trouble for servers, network or storage within the architecture. But modern data centers need a more holistic monitoring strategy that embraces environmental factors like temperature and humidity – not just within the room – but at a granular level within racks and servers. Let’s cover some key monitoring points for the environment and show you how to deal with environmental monitoring problems.

Aspects of data center environmental monitoring
Many data centers employ sophisticated management tools, but many tools still don’t provide granular insight into environmental conditions; or worse, data center owners simply don’t use the environmental data those tools provide. Part of the problem is heterogeneity. It simply may not be possible to use a single tool that can monitor voltages, fan speeds, temperatures, humidity levels, and other environmental factors across every possible system. In other cases, the availability and placement of necessary environmental sensors may be inadequate for proper monitoring. Yet another part of the problem is a lack of planning and coordination – IT administrators don’t worry about the data center environment as much as they should.

When you’re ready to extend data center monitoring to the environment, take time to consider the following monitoring points:

Sensing and monitoring temperature. One of the most significant results of data center growth is the issue of heat density. It has become much more difficult to manage temperatures on a facility level because rack densities (and corresponding rack heat) may vary widely. As a result, we see hot spots in one zone and cooler spots in another zone. Installing temperature sensors with network connectivity within the data center helps IT administrators look for those hot and cold spots to ensure that all equipment is operating safely. If not, early alerting can allow administrators to boost cooling, shift workloads, or take other pre-emptive action to avert failures.

A good metric to follow is the older ASHRAE recommended temperature range (64.4 to 80.6 degrees Fahrenheit) or the newer ASHRAE standard outlined in TC 9.9. Data center best practices recommend at least one sensor on every rack. If an environment has a hot-aisle/cold-aisle configuration, it becomes acceptable to place a sensor on every “hot” rack or row. Since heat also rises, it is recommended to place sensors near the top of the rack where temperatures are generally highest. Another recommendation is to place sensors near the end of the row where they are able to detect any spillover; hot air entering the cold aisle from the hot aisle.

  • Establish precision cooling control. With large enterprise data centers, maintaining consistent levels of cooling and room/row air conditions is essential. Deploying intelligent controls, which are sometimes integrated into cooling and monitoring systems, helps data centers run as efficiently as possible. The goal of intelligent control is to allow multiple large systems to compliment, rather than compete with one another. Let’s take humidity control at a large data center as an example. Let’s assume that for some reason, one unit begins to report a high humidity reading from one of its sensors. Without an intelligent system, that unit’s remediation process may start. However, with an intelligent cooling system in place, the data center monitoring tools will first query the humidity status of all the other units in the facility. If it finds that the other units are operating within range, it will continue to monitor the situation to see if the levels even out. Otherwise, it will send an alert to an administrator or begin a pre-designed remediation process.
  • Fluid and humidity detection. One chiller leak inside a data center can cost thousands, if not millions, of dollars in damage to a data center and critical business hardware. This type of damage will deal a serious blow to enterprise functionality and productivity. Use leak detection sensors strategically located within the data center to detect leaks, trigger alarms, and help prevent water damage. It’s highly recommended that leak sensors be installed at every location where fluids are present in the data center. Depending on the data center environment, leak sensors are able to operate as a standalone system or can be connected into the central monitoring system to simplify management. In large environments where cooling areas are numerous, leak and fluid sensors can also monitor for areas of condensation and excess humidity. Having humidity sensors as a part of the internal and external rack sensor array will maintain regular levels of humidity control. Drip pans and designated areas for liquid run-off will help curb the risk of a major leak.Humidity detection can also help detect excessively dry conditions that might precipitate electrostatic discharge (ESD) problems. Dry air is common when free air-side cooling technologies are adopted for the data center.
  • Integrate the environment with other sensors. Temperature and humidity/liquid sensors are just the beginning of intelligent data center environment monitoring. Smoke/fire alarms are needed at several locations throughout the facility to detect impending fire. While these alarms are usually tied to the building’s fire suppression system, they can also be integrated into the data center monitoring system to provide administrators with an opportunity for early action before more dramatic gas suppression is released.Monitor power from each power distribution system (PDS) and integrate that data as well. Power monitoring can support a continuous evaluation of the data center’s Power Usage Effectiveness (PUE) and report power faults for early intervention by the IT staff. Some data centers also monitor and integrate data from intelligent uninterruptable power supply (UPS) systems as well, and can track UPS battery and alarm conditions.

    Room and rack access (security) sensors report on unauthorized access, alerting the IT administrators – and could even summon security assistance if necessary. As a minimum, such simple physical sensors can at least log door openings and closings to help narrow down the personnel present at the time.

  • Managing alarms and notifications. Uptime and data center efficiency have been the main justifications for implementing some sort of environmental monitoring controls. This continues to be a main driver, since the ability to view immediate notifications of a failure or proactively monitor a situation to prevent a failure are critical data center tasks. A centralized and well-managed system allows administrators to respond quickly to emergencies and help retain a higher uptime. Creating a central alarm system is also very important for data center uptime and health. A good alarm system is able to prioritize issues by criticality, to ensure the most serious incidents receive priority attention. When setting up an alarm-based system, it is important to evaluate and designate every alarm for its impact on business and IT operations.
  • Remote data center monitoring. Large environments often must leverage outside expertise when it comes to data center monitoring. Remote monitoring capabilities can help organizations keep an eye on their secondary or backup environments, or outsource the monitoring and management to a service provider. The ability to see the health of remote facilities can help IT administrators respond to emergencies faster and bring their environments back to a healthy state. By having external visibility into multiple sites, managers can keep track of alerts, alarms and general data center environmental statistics all in one central place.

Data center monitoring best practices
It’s important to remember that a data center monitoring infrastructure will require periodic maintenance and testing – just like any other part of the facility. In addition, the monitoring must change or scale to accommodate the data center’s evolution. Don’t ignore the sensors or allow their placement to remain static as other systems and racks move. Here are some other tips for data center environmental monitoring:

Testing and Maintenance. All sensors within a data center should undergo regular testing and maintenance. Faulty or erratic sensors should immediately be replaced. One way to identify a faulty sensor is to review readings from similar nearby sensors. For example, when several sensors within a rack report one temperature, but another sensor reports a surprising alarm, it should warrant immediate investigation, but should be approached with a modicum of skepticism until the root of the alarm can be identified and confirmed.

    • Be ready for emergencies. Sensors do not prevent emergencies, so common-sense emergency planning should still be part of every data center manager’s agenda. A disaster recovery plan must include immediate personnel notification; know who your data center maintenance team is, and how to reach them quickly. When a cooling failure occurs, your first call will be to your data center HVAC engineers. Be detailed in the description of the problem, too. If your engineers need to bring spare parts, this will help them. When it comes to data center environmental emergencies, every second counts.
    • Have a backup plan ready. Monitoring systems have the ability to set off different alarm levels. If your data center is in a hosted environment, it is very important to specify and understand emergencies in your service-level agreement. The hosting provider must have a contingency plan prepared in the case of a sudden disruption. In a private data center, always have sensor monitoring and alert systems operational. Cooling systems may warrant local backup units in the event of an emergency–even if this means using temporary portable cooling systems.
    • Have an automated recovery plan. Some monitoring systems have integrated automation systems. In the event of an isolated rack emergency, some systems are able to shut off non-essential servers. Development servers are often big power users that don’t need to be run during production. Any test server that is not essential can be set to shut down when emergency conditions arise.
    • As IT data centers continue to evolve, managers will begin to see more automated tools to help keep an environment alive longer and without disruption. Automating and centralizing the management of physical infrastructure components for effective resource usage will be the next step in data center design and implementation. They key will always revolve around strategic uptime capabilities. By proactively monitoring server room environmental variables, IT administrators are able to greatly reduce their risk of having extended downtime. This, in turn, creates a more robust and easier to manage data center.

 

 

 

 

 

 

Uptime Devices, Inc. Remote Physical Monitor Console Manager

Austin, TX based Uptime Devices, Inc. has released the Remote Physical Monitors (RPM).  Uptime RPM employs intelligent Daisy Chain Sensor ® technology, and is the most efficient way to leverage sensors, monitoring up to  250 different sensors to plugging into one RPM.  Daisy Chain Sensor® technology sets the standard in data center physical monitoring.  Compatible with the leading management solutions and hardware, the RPM system allows customers to simply plug in and protect their IT and infrastructure investments.  Uptime Devices created the Daisy Chain Sensor® and developed smart sensors Remote Intelligent Multi Sensor® RIMS technology.

To enter a raffle for a chance to win the RPM unit, go to www.uptimedevices.com/processormag

Data Server Downtime Costs Average of $500K

In a study earlier this year from the Ponemon Institute (www.ponemon.org), they highlighted the enormous costs inherited when systems and networks go down.  IT equipment failure means big losses, not just in the revenue bottom line, but in the reputation of the business.  From the 41 data centers surveyed, an unplanned outage cost up to $11,000 a minute.  The average cost of an outage incident?  $505,502. Some companies were even set back seven-figure numbers.  

The expenditures analyzed were:
-detection costs
-containment costs
-recovery costs
-ex-post response costs
-equipment costs
-IT productivity costs
-user productivity loss
-3rd party costs

The most costly catalyst of these incidents was IT equipment failure. The key causes of data center outages were IT demands exceeding capacity, rising rack densities, data center efficiency, and the need for infrastructure management and control. Regarding the latter, effective monitoring and control directly influenced the other key factors and the presence of intelligent management abilities could have prevented many of the problems that arose.

Download the study here: tinyurl.com/7mxtess