Performance Engineering Best Practices and Methodology

[Best Practices] Performance Monitoring - overview and terminology

Outstanding Contributor.

[Best Practices] Performance Monitoring - overview and terminology


Performance monitoring ensures that you have up-to-date information about how your application is operating under load. Performance monitoring helps identify bottlenecks and verify whether the application meets its performance objectives, by collecting metrics that characterize the application’s behavior under different workload conditions (load, stress, or single user operation). These metrics should then correlate with those defined in the performance objectives. Examples of such metrics can be: response time, throughput, and resource utilization (i.e. CPU, memory, disk I/O, network bandwidth).

Without a good understanding of these metrics, it is very difficult to draw the right conclusions and/or pinpoint the bottleneck when analyzing performance results.


Performance Terminology
Quantitative aspects of performance testing are gathered during the monitoring phase. Let’s take a closer look at main terms used in performance monitoring.


Two of the most important measures of system behavior are bandwidth and throughput. Bandwidth is a measure of quantity, which is the rate at which work can be completed, whereas throughput measures the actual rate at which work requests are completed.

Throughput can vary depending on the number of users applied to the system under test. It is usually measured in terms of requests per second. In some systems, throughput may go down when there are many concurrent users, while in other systems, it remains constant under pressure but latency begins to suffer, usually due to queuing. How busy the various resources of a computer system get is known as their utilization.


The key measures of the time it takes to perform specific tasks are queue time, service time, and response time:

  • Service time measures how long it takes to process a specific customer work request. When a work request arrives at a busy resource and cannot be serviced immediately, the request is queued.
  • Requests are subject to a queue time delay once they begin to wait in a queue before being serviced.
  • Response time is the most important metric - it can be divided into response time at the server or client. Here the latency measured at the server is the time taken by the server to complete the execution of a request. This does not take into account the client-to-server latency, which includes additional time for the request and response to cross the network. Another one is latency measured at the client which includes the request queue, the time taken by the server to complete the execution ofthe request, and the network latency.

    This post is part of the Performance Monitoring Best Practices series - you may see all of the posts under PerfMonitoring tag.