Gulf Breeze Software / Software Partners

Healthcare Industry

Healthcare facilities have mission critical applications that supply hospitals, physicians, nurses, etc. with information that can directly affect the well-being of their patients.

Overview

One organization has multiple critical applications utilizing a vast assortment of underlying technologies. Existing tools were providing some of the required functionality but there were gaps in their abilities and the maintenance costs were extremely high. In addition, a merger with another healthcare organization was just beginning and they needed their tools to scale without knowing exactly what the future requirements would be.

healthcare-featured-image
To implement a new solution

The Challenge

The existing toolset covered network monitoring and some of the server and application monitoring. However, as new technologies were being deployed, there were gaps in the monitoring that could not be fulfilled with the existing tools.

An existing installation of IBM Tivoli Monitoring (ITM) that was used solely for AIX server monitoring also needed to be merged with the new solution.

Three critical applications were part of the initial scope and required monitoring to be put in place for specific technologies, including Citrix, SQL Server and VMware. In addition, synthetic transactions needed to be run against the three applications. Beyond server and application monitoring, network device discovery and monitoring was required, along with a root cause analysis engine that could correlate network and server availability alerts.

All new monitoring put in place also needed to integrate with their existing incident management system, HEAT. However, a new incident management system, ServiceNow, was in the early stages of replacing the existing system so the integration had to account for this transition.

The Solution

SCAPM

The SmartCloud Application Performance Management (SCAPM) suite, comprised of the monitoring infrastructure and server and application agent technologies, was deployed to replace the existing monitoring and fill the monitoring gaps.

Within SCAPM, various agents were utilized to perform the required monitoring:

  • Windows/Linux/UNIX operating system agents
  • Citrix agent
  • VMware agent
  • SQL Server agent.

As the monitoring is rolled out to other areas, additional agents will be utilized, including:

  • Active Directory
  • Microsoft Exchange
  • Internet Information Server
  • .NET WebSphere
  • WebLogic
  • Tomcat
  • JBoss.

Rational Performance Tester

The Rational Performance Tester (RPT) piece of IBM Tivoli Composite Application Manager (ITCAM) for Transactions, which is part of SCAPM, was utilized to record synthetic user transaction scripts. Multiple scripts were created for multiple applications to simulate real user transactions and are played back from various locations to help in pinpointing application issues.

The existing ITM agents that were part of the existing ITM infrastructure were migrated to the new SCAPM infrastructure and the existing ITM environment was shut down. In addition to migrating the agents, the custom situations, workspaces and views were also migrated to the new environment.

ITNM

IBM Tivoli Network Manager (ITNM) was deployed to discover and monitor the network devices. There was also a requirement to correlate server availability alerts with the availability alerts from the network devices. Therefore, the servers were also discovered by ITNM and included in the topology. By doing so, ITNM was updated to enable the root cause analysis engine to correlate network and server alerts and suppress all downstream alerts, including those from servers.

OMNIbus

OMNIbus was deployed to manage the alerts from the SCAPM and ITNM implementations. Utilizing the EIF and SNMP probes, alerts from both systems were integrated. The SNMP probe deployment also included the implementation of the Netcool Knowledge Library (NcKL) set of rules to correctly interpret the SNMP traps sent from the network devices. The JDBC gateway was installed to provide the facility to archive events.

Impact

Impact, which is tightly integrated with OMNIbus out of the box, was implemented to provide the integration between the OMNIbus alerts and the existing incident management system. In order to correctly identify alerts for auto-ticketing and to enrich the alerts with the required information for the ticketing system, a custom database schema was developed. The schema housed the identifying alert information, as well as the group to which the incident should be assigned.

To provide easy access to the enrichment data, Impact operator views were created. These views provide a web based interface to the enrichment database and allow operators to easily query and update the data.

As previously mentioned, a new incident management system was being deployed at the same time as the Tivoli implementation. Therefore, Impact was actually utilized to integrate with both the old and new ticketing systems at the same time. Each integration was separately controlled, which enabled the company to turn off the integration with the old system as soon as the new system was live.

With all of the data collected into the Tivoli Data Warehouse (TDW) by SCAPM agents, including the ITNM agent that captures network polling data, and the events being collected in the event archive, a tool was needed to provide reporting. Tivoli Common Reporting (TCR) was installed in conjunction with the Jazz for Service Management (JazzSM) component. TCR provides out of the box reports for the SCAPM agents, the ITNM polling data and the event archive data. The JazzSM infrastructure not only provides the foundation for TCR but also provides additional functionality that will be utilized in future projects, such as capacity planning and dashboarding for the VMware infrastructure and providing dashboarding for SCAPM metrics. JazzSM will also facilitate the exchange of information between SCAPM and future implementations of Tivoli Business Service Manager (TBSM) and Tivoli Application Dependency Discovery Manager (TADDM).

Conclusion

By implementing the SCAPM suite of tools, the monitoring needs of this healthcare company were met: existing monitoring was replaced and the monitoring gaps were filled. Utilizing ITNM to discover and monitor te network and provide root cause diagnostics between network devices and between network and server, fulfilled yet another monitoring requirement. OMNIbus and Impact were deployed to manage and correlate the events from the new monitoring tools and integrate with the incident management system.

TCR was implemented to provide the required reporting capabilities for all of the metrics collected by SCAPM and ITNM in the TDW and the archived events. By implementing the JazzSM infrastructure along with TCR, the capability exists for future integrations with other Tivoli products, as well as providing deeper functionality with existing products.

As the merger between the two healthcare companies moves forward, the new Tivoli infrastructure can easily be scaled to accommodate additional networks, servers and applications. In addition, the new implementation is positioned to be made highly available when this becomes a requirement.