HP AppPulse mobile is a Big Data, Software as a Service (SaaS) application for monitoring real user experience in mobile applications. It focuses on what matters most: users using the application in their mobile device.
We chose Vertica as the application database platform because it is best suited for our big data needs, and for its ability to handle a mixed workload. It also offers advanced analytics on top of Standard SQL.
Our application’s Vertica mixed workload is comprised mainly of two activities:
Load: High volume streams of events broadcast by the mobile devices of the application tenants pour into the Vertica high-end cluster, where the data is transformed and aggregated, before ending up as multi-dimensional views in the HP AppPulse mobile dashboard.
Reporting: HP AppPulse mobile end users concurrently execute “near” real-time and historical reporting using the application web UI. Reporting covers summary reporting, drill use cases and analytics reporting with various multi-dimensional slice and dice views.
In addition to these activities, Vertica had to handle:
Optimal compression for reducing Total Cost of Ownership.
Offline tasks such as purging, statistics collection and aggregation of the application big data.
Vertica Application-Focused Monitoring using SiteScope
Typical database monitors usually include the capability to monitor database host resources and services health. Online dashboards showing database load activity and health are also quite common (Vertica Management Console is a good tool for this end).
However, in our use case, we found it essential to add an additional “application-focused monitoring” perspective, that focused on the application mixed workload challenge, while considering the special architecture and physical design of the Vertica cluster.
We took HP SiteScope’s out of the box HP Vertica JDBC Monitor, and enriched it with the following application-focused counters to serve our challenging use cases:
ROS containers growth during load per projection per node
Partitions growth and purging by table
Top slow reports
Invalid objects in a schema
Detection of resource management bottlenecks.
The personas using the monitor are performance engineers, DBAs, and application operators (“DevOps” stakeholders).
Our customized SiteScope counters are generic enough to be used by other applications using Vertica in a similar manner as HP AppPulse mobile. This motivated us to develop a deployable template that can be deployed in any SiteScope system to monitor any Vertica server with applications that are similar to ours, either in a cluster or in a single node environment.
In this example, we will show a typical workflow for using SiteScope together with other monitoring tools and techniques to detect, isolate, and resolve resource management bottlenecks in a running system. SiteScope is used as the first layer for reporting a resource management issue. Thereafter, we use other application monitoring tools and steps to drill down into the information collected by SiteScope.
In a SaaS Cluster with many applications, each with its own mixed workload and access patterns to Vertica, it is important to detect resource management bottlenecks. As per Vertica’s best practices (Admin Guide), it is common to use dedicated Vertica resource pools for different users/workloads and applications, instead of sharing global pools.
We used the HP AppPulse mobile customized "Resource Pool Waits" monitor counter to show the number of transactions that waited in a specific resource pool in a Vertica Cluster during the last hour. The counter indicates an issue with a certain workload resource (for example, queries waiting) or a general resource issue within the Vertica cluster such as database host running out of memory.
The counter is only displayed in the dashboard if there are resource waits in a certain pool (the Error Threshold is 100 delayed transactions per hour).
Resource Pool - Problem Isolation Flow:
The SiteScope error status is displayed in the SiteScope Dashboard of the application operator. The counter shows many waits (delayed transactions); it also shows the name of the resource pool that encountered an issue. An alert is triggered and sent to application operators and DBAs.
2. Application owners correlate with other monitoring tools (SiteScope or Vertica Management Console) to view database node resource metrics (memory usage, CPU utilization, files handles, I/O waits) at the period the transactions waited. No issue is shown.
Vertica MC Dashboard:
3. Now it is time for a deeper drill down into Vertica’s system table.
The DBA uses Vertica system tables to drill down into the resource pool issue to try to determine which workload is affected, and by which resource. He filters by the resource pool name reported by SiteScope.
The “RESOURCE_QUEUES“system table shows many resource waits of reporting system queries. The “RESOURCE_REJECTIONS” system table points to “Memory(KB)” as the problematic resource in this specific pool, as demonstrated by the below query results.
“select reason,resource_type from resource_rejections where pool_name=<resource pool name >;”
Using the SiteScope Vertica monitor together with other monitoring tools and techniques, we were able to:
Detect the resource pool memory issue in the Vertica cluster.
Resolve the issue by increasing the resource pool memory allocation to handle the workload.
Here are a few other blog posts where you can learn about HP AppPulse mobile: