Network problems are some of the most difficult to fix because there are so many “moving parts” within every network.
There are tools that enable operators to quickly detect, isolate, and troubleshoot abnormal network behavior to help easily identify these. With these tools, operators can also record what has been done to date to troubleshoot or resolve a problem. An operator can:
Rapidly detect, isolate, and correct network problems:
Monitor table views that contain critical nodes and interfaces
Watch incident views for incidents with status other than normal
Watch map views for icons that change color to yellow or red
Investigate and diagnose problems
Annotate information for future diagnosis
Look for historical information to proactively monitor the network
Event pipeline and causal engine provide deterministic root cause analysis
NNMi’s causal engine is a big part of its ability to quickly solve problems.
NNMi includes modules for automated event correlation and root cause analysis (RCA). These capabilities are powered by the event pipeline—the event correlation module. The event pipeline contains multiple processes that act on the event stream to correlate and condition events for the causal engine.
The causal engine is a virtual machine that offers continuous analysis of conditions of monitored objects. It relies on the continuous spiral discovery process, so it can adapt to topology changes as fast as they happen. Ultimately, the causal engine provides automated and deterministic RCA (versus probabilistic RCA) to reduce MTTR and increase operator efficiency, while allowing you to define your relevant and service-impacting events.
An example of NNMi’s deterministic RCA is how it determines a node is down. The first indication of a node being down would come from a poll that is not successful or an SNMP trap. For the causal engine to claim a node is down it then checks that interfaces connecting to the device also show it down. This resolves a false negative from a failed poll. This also prevents reporting that a node is down when it is up but unreachable because something between that node and the network management station is down and preventing communication.