IT Operations Management (ITOM)
cancel
Showing results for 
Search instead for 
Did you mean: 

Troubleshooting system performance bottlenecks using GlancePlus

Troubleshooting system performance bottlenecks using GlancePlus

HPE-SW-Guest

Guest post by Ramakrishna Baipadithaya Kenchabhatre and Sunil Lingappa

HP Operations Agent R&D Leads 

 

As a system expert have you ever been approached by users complaining that their systems are running slowly, being unresponsive and in spite of using system tools like Task Manager, the actual bottleneck issue is still untraceable. Well if this is the case, HP GlancePlus and other suites of products are the flagship tools to troubleshoot and debug system performance problems in Standalone and Virtualized Environments. These tools have a rich metric set that spans across different system elements like CPU, Disk, Memory, Network etc.

Now let us consider a real-time scenario: Tom Smith, a software programmer, complains that his test system is pretty slow over the past few days and refuses to run any newer applications. Martin, the System Expert logs in to the system and invokes his favorite tool glance (part of HP GlancePlus package).

 

On the first Screen Martin gets an overview of the entire system including the top processes.

 

He immediately figures out that this is a Linux Standalone System and there is a pressure on both CPU, Memory resources on the system. He uses the integrated help of glance to figure that the metric GBL_CPU_TOTAL_UTIL is about 70 percent. Martin also sees the ‘top’ processes listed based on their CPU utilizations on the home screen.

 

 He immediately sees that the process ‘dbmon’ is listed right at the top. He gets a hint that there is something wrong with the DB Application running on the system.  He switches over to the Application View (An application is a user defined group of related processes under a single bucket). His guess is correct because DBServer is listed as the top CPU consuming application on the system.  (These applications are sorted based on the utilizations of the metric APP_CPU_TOTAL_UTIL).

 

Screen below shows the list of applications running on the system:

 

 

Now Martin’s curiosity doesn’t stop here but he goes further to drill down to the list of processes contributing to the resource of this application.

 

Screen below shows the list of processes under the application ‘DBServer’:

 

 

 He sees the list of all processes associated with DBServer. He is now interested in understanding the complete behavior of ‘dbmon’ process.

 

Detailed drill down screen of ‘dbmon’ process:

 

On drilling down further to the process ‘dbmon’, he can view different metrics associated with the process like process cpu utilization, rss, vss memory patterns, IO Rate, page fault rate etc.

 

Martin concludes that ‘dbmon’ process is consuming more CPU than expected and there is a potential issue with the native application and not with the system configuration.

 

Martin collates all this information and gets back to Tom  with the analysis for further actions to be made from an application perspective.

 

In the above troubleshooting scenario we see how the HP GlancePlus tool can be used effectively in a top-down manner (from Global level down to an application level and finally to process level) to drill down and troubleshoot performance bottlenecks. GlancePlus helps to reduce the average triage time (from hours to minutes) and Mean time to repair (MTTR) by pin-pointing issues in a systematic manner. The screens of GlancePlus are arranged in a logical manner based on resource types to enable ease of system troubleshooting. In the next blog post  we will see some tips and tricks of using HP system performance tools to proactively detect system bottlenecks.

 

This is the second-part of our three-part series on GlancePlus. Read the first article "Performance analysis in virtualized environments using GlancePlus." Look for the third part next week.

 

Have you experienced a similar troubleshooting experience? What was the most direct way you found to solve the problem. Feel free to share your experience in the comments section below, I am sure other readers want to hear about it too.

 

  • infrastructure management
About the Author

HPE-SW-Guest

This account is for guest bloggers. The blog post will identify the blogger.

Comments
Honored Contributor

(I tried posting earlier, but it crashed. Let's try again, but keep it shorter).

 

I think this is a very poor example of the potential of Glance. The problem has existed for days, Operations is only responding due to customer complaints, and then they don't actually resolve the problem. All they find out is the same information they could have seen with "top" - available for free, and usually installed by default. 

 

This looks like the troubleshooting we did 15+ years ago. Times have moved on. Now we need to be seeing early warnings of these sorts of issues, in one central place - not manually going out and running commands. It's also not enough to just say "oh it's that process, you better go and fix it" - we need to be investigating why the CPU is high - e.g. is it because GC is running constantly in the Java app, due to underallocation of memory?

 

The approach shown here works when you have 10 servers, but doesn't work when you have 500. 

 

PS: As an aside, no developer I know refuses to use newer tools - usually quite the opposite.

HPE Blogger

Thank you Lindsay for your comment,

The on-s erver performance diagnostics and one central console for monitoring all the performance issue across multiple machines are separate uses. 

HP performance manager along with HP Operations agents could provide you with real time central performance diagnostic capabilities, on-demand performance graphs & tables for usage trend analysis and correlation . That’s a great tool to monitor performance issue from central location across thousands of nodes.

But for a system administrator looking to do on- server performance diagnostics especially on UNIX and Linux, glance is really handy. It is still Best-in-industry UNIX performance diagnostic tool with ability to drill down to the system calls.

 

 Sanjay Chaudhary <csanjay@hp.com>

//Add this to "OnDomLoad" event