Entities within the U.S. government are similar to those in the corporate world in that they need to understand what their infrastructure consists of, the state of their infrastructure and how the systems are functioning individually and as a unit.
A large energy company had nearly zero visibility into the health of this critical application. Providing customers with multiple ways to view and pay their bills allows for more satisfied customers and more bills being paid on time. However, when these services are unavailable customer aggravation quickly escalates and revenue is delayed. With limited insight into their application, diagnosing and troubleshooting issues was a constant challenge.
Government networks and systems are typically highly secure. In this project, that meant working with systems administrators to determine exactly which policies needed to be modified to allow the proper access to perform discoveries of those systems and the applications running on them.
It also meant working with the network administrators to ensure the credentials supplied had the necessary access to not only discover the network devices but to read the necessary configuration objects from the network devices.
Once the proper monitoring is in place and the systems have been discovered, it is always a challenge to develop dashboards that accurately represent the systems since this is normally the first time the consumers of the dashboards have been able to visualize their systems in such a manner.
Tivoli Application Dependency Discovery Manager (TADDM) was used to find the solution to these problems.
While discovering the computer system was not required to meet the monitoring or dashboard requirements of the project, the data discovered from these systems was required to be injected into the asset management database. Due to the strict security requirements, the minimal set of privileges and changes to system policies required for discovery had to be determined and tested. Tivoli Application Dependency Discovery Manager (TADDM) was used to perform these discoveries and it was not difficult to supply the administrators with the necessary credentials, libraries and binaries required but it required extensive testing to determine the exact changes required to the government policies in place on the systems.
IBM Tivoli Network Manager was utilized to perform the network discoveries and, again, it was a routine task to provide the network administrators with the access requirements that ITNM needed in order to discover the devices, as well as, the configuration data within the devices. It was a much more difficult task for the network administrators to determine the minimal access required, the correct acl's and the correct configuration changes to enable successful discoveries.
As mentioned previously, the network had existing rudimentary monitoring tools in place. This monitoring was left in place and supplemented by the new monitoring within ITNM. Therefore, the new alerts from ITNM and the alerts from the existing tools had to be integrated into the new event management system, OMNIbus. Prior to the implementation, operations utilized administrative tools from each vendor to gain a glimpse into the health of the network. With the implementation of OMNIbus, all events were fed into the same system and were easily visualized in one common location, the WebGUI within the Dashboard Application Services Hub (DASH).
New WebGUI tools were also created to provide launch in context capabilities from the events to external third party log analytics and network monitoring tools.
ITNM, coupled with OMNIbus, also provides Root Cause Analysis (RCA) functionality without any customization necessary, which enables operators to focus on only the most critical alerts. Depending upon the role of the individual, the alerts can viewed within the topology of the network devices in ITNM or together with alerts from all monitoring tools, within DASH. Impact was also utilized to enrich the events to provide additional information to the network operators.
In addition to viewing the alerts with ITNM and DASH, the operators were also provided with dashboards created using a combination of Tivoli Business Service Manager (TBSM) and DASH. ITNM was used to perform the network discoveries and this data was then exported using the built-in Discovery Library Adapter functionality and imported into TADDM. The underlying service model for the dashboards was then configured in TBSM and utilizing the XML Toolkit, the resources that were now in TADDM were automatically imported into TBSM to provide a hierarchical view.
Providing the system and network administrators with the exact requirements of the tools is essential to a successful implementation.
Once the discovery and monitoring tools were in place and functioning, the process of designing the dashboards was similar to other projects. Ensuring that the consumers of the dashboards are involved from the very first iteration of the dashboards until the final design is imperative. The dashboards should be designed by those consumers and the consultant's role is to guide them and help them stay within the confines of the tools.
Utilizing ITNM, OMNIbus, Impact, TADDM and TBSM enabled this organization to understand the state of all network devices, to quickly determine the root cause of an outage and to visualize the devices in easy to use and easily understood dashboards. Importing the discovered data into SCCD provided the foundation for their future asset management needs.