Guest post by Ramakrishna Baipadithaya Kenchabhatre and Sunil Lingappa
HP Operations Agent R&D Leads
As a system expert have you ever been approached by users complaining that their systems are running slowly, being unresponsive and in spite of using system tools like Task Manager, the actual bottleneck issue is still untraceable. Well if this is the case, HP GlancePlus and other suites of products are the flagship tools to troubleshoot and debug system performance problems in Standalone and Virtualized Environments. These tools have a rich metric set that spans across different system elements like CPU, Disk, Memory, Network etc.
Now let us consider a real-time scenario: Tom Smith, a software programmer, complains that his test system is pretty slow over the past few days and refuses to run any newer applications. Martin, the System Expert logs in to the system and invokes his favorite tool glance (part of HP GlancePlus package).
On the first Screen Martin gets an overview of the entire system including the top processes.
He immediately figures out that this is a Linux Standalone System and there is a pressure on both CPU, Memory resources on the system. He uses the integrated help of glance to figure that the metric GBL_CPU_TOTAL_UTIL is about 70 percent. Martin also sees the ‘top’ processes listed based on their CPU utilizations on the home screen.
He immediately sees that the process ‘dbmon’ is listed right at the top. He gets a hint that there is something wrong with the DB Application running on the system. He switches over to the Application View (An application is a user defined group of related processes under a single bucket). His guess is correct because DBServer is listed as the top CPU consuming application on the system. (These applications are sorted based on the utilizations of the metric APP_CPU_TOTAL_UTIL).
Screen below shows the list of applications running on the system:
Now Martin’s curiosity doesn’t stop here but he goes further to drill down to the list of processes contributing to the resource of this application.
Screen below shows the list of processes under the application ‘DBServer’:
He sees the list of all processes associated with DBServer. He is now interested in understanding the complete behavior of ‘dbmon’ process.
Detailed drill down screen of ‘dbmon’ process:
On drilling down further to the process ‘dbmon’, he can view different metrics associated with the process like process cpu utilization, rss, vss memory patterns, IO Rate, page fault rate etc.
Martin concludes that ‘dbmon’ process is consuming more CPU than expected and there is a potential issue with the native application and not with the system configuration.
Martin collates all this information and gets back to Tom with the analysis for further actions to be made from an application perspective.
In the above troubleshooting scenario we see how the HP GlancePlus tool can be used effectively in a top-down manner (from Global level down to an application level and finally to process level) to drill down and troubleshoot performance bottlenecks. GlancePlus helps to reduce the average triage time (from hours to minutes) and Mean time to repair (MTTR) by pin-pointing issues in a systematic manner. The screens of GlancePlus are arranged in a logical manner based on resource types to enable ease of system troubleshooting. In the next blog post we will see some tips and tricks of using HP system performance tools to proactively detect system bottlenecks.
Have you experienced a similar troubleshooting experience? What was the most direct way you found to solve the problem. Feel free to share your experience in the comments section below, I am sure other readers want to hear about it too.