I visited a customer earlier this year who over-allocates up to 500 percent on hosts and arrays, with no consolidated host and array capacity management tools in place. After seeing their environment, it was clear that a disaster was just waiting. I predicted downtime on three large arrays during the retailer’s busiest time of the year.
About Thin Provisioning
As customers aim to control the cost of the growing storage capacity demands, they move away from physical environments into virtual storage environments. As the demand for productivity continues to increase, hardware and resource budgets decrease. This forces many organizations to re-evaluate their data center designs in the midst of fast accelerating data growth trends. With business critical data protection challenging storage teams, many IT decision makers direct their attention toward maximizing storage Return on Investment (ROI) and reducing costs. Hence, customers place much greater demand on storage vendors for hardware solutions with greater efficiency and advanced features, while offering attractive, competitive pricing.
Virtualization of your environment is not done to improve performance, it's done to make it cheaper and easier to manage.
Thin provisioning, in a shared-storage environment, provides a method for optimizing utilization of available storage. It relies on the on-demand allocation of blocks of data versus the traditional method of allocating all of the blocks up front. This methodology eliminates almost all whitespace which helps avoid the poor utilization rates, often as low as 10 percent, that occur in the traditional storage allocation method where large pools of storage capacity are allocated to individual servers but remain unused (not written to). This traditional model is called thick provisioning.
Over-allocation or over-subscription is a mechanism that allows a server to view more storage capacity than has been physically reserved on the storage array itself. This allows flexibility in the growth of storage volumes, without having to predict accurately how much a volume will grow. Instead, block growth becomes sequential. Physical storage capacity on the array is only dedicated when data is actually written by the application, not when the storage volume is initially allocated. The servers, and by extension the applications that reside on them, view a full size volume from the storage but the storage itself only allocates the blocks of data when they are written. The whole point of over-allocation, whether at the array or at the host level, is to allow a host to just run with the storage that it needs, and to avoid giving it the storage that it might need in the future. After all, you are paying for this storage, so that last thing you want, is to pay for something you will never be using.
A lot that goes into the thin vs thick provisioning decision is how much/ how fast you expect your data to grow, or how much physical disk space you have right now, and how many hosts you expect to add in the near future.
Thick on thin
If your array supports thin provisioning, you’ll generally get more efficiency using the array-level thin provisioning in most operational models. If you thick provision at the LUN or file system level, there will always be large amounts of unused space until you start to get it highly utilized. This is true unless you start small and keep extending the volume, which operationally is very time consuming. When you use thin provisioning techniques at the array level using NFS or VMFS and block storage, you always benefit. The larger the scale of the “thin pool” (i.e. the more oversubscribed objects), the more efficient thin provisioning also tends to be.
Thin on thick
Obviously if your storage system doesn’t support thin provisioning at the array level, use thin provisioning at the host level as much as possible.
Thin on thin
Wouldn't this give you the best of both worlds? While there is nothing inherently wrong with doing thin-on-thin, additional management overhead occurs with this approach. Technically it can be the most efficient configuration, but only if you carefully monitor capacity usage. Possibly the biggest issue that we have with Thin Provisioning is running out of space on a device that is Thin Provisioned at the front- and back-end. Thin provisioning will have to be managed at the host (hypervisor) level as well as at the storage array level. Keep in mind that this level of over-commitment could lead to out of space conditions occurring sooner rather than later.
Why is data growing so big?
The biggest driver of storage growth these days is "secondary" data, aka copies of original data or primary storage. Secondary data includes snapshots, mirrors, replication and even data warehouse applications. It would seem the obvious solution is to reduce the number of data copies. However, the secondary copies were likely created for a reason, such as for data protection or to reduce contention for specific sets of data. Storage managers must be aware that there's an inverse relationship between data recovery, performance and capacity management; if you improve one, you're likely to impact the other.
Capacity management can be optimized only to the point that other service levels aren't jeopardized.
Managing “out of space” conditions
Capacity used on virtualized hosts and arrays is growing exponentially. Customers these days have a choice whether to provision thin on thick, thick on thin, or even thin on thin. Therefore, the host side as well as storage array side thin provisioning needs to be carefully managed for “out of space” conditions. Using thin provisioning can be very efficient, but can “accelerate” the transition to oversubscription. You can advertise a ton of capacity, although you truly have only a small fraction of that space. Thin provisioning allocates flexible disk capacity to the hosts, allowing them to grow as needed. With this in mind, it is important to continuously monitor and manage the environment.
When using thin provisioning, use host capacity usage reports in conjunction with array-level reports, and set thresholds with notification and automated actions for both the host side and the array level.
About space reclamation
Thin provisioning has some complications, including issues related to how space from deleted data is handled. Block storage only knows about areas of a volume that have ever been written. Later if an application frees up space or deletes files, the space is not marked as unused on the storage side. The host cannot see changes on the storage level. Therefore, many storage vendors started introducing thin reclamation technologies, which provides a means for the host to communicate to the storage array which blocks are not being used, and can be reclaimed by the shared storage system for use in other volumes. Microsoft SDelete was the tool of choice for early reclamation systems, since it had the ability to zero out deleted space in a volume. Later came support from VMware with the VAAI technologies. Lastly, support came for the SCSI UNMAP command in Red Hat Enterprise 6, integrated into the ext4 file system. With the introduction of the SCSI UNMAP/trim feature at the operating system (file system) level, the array is now able to reclaim space, triggered by the host.
A lack of storage capacity management tools leads to underused storage systems.
Thin provisioning is a great concept for using only what you need, and not allowing you waste valuable storage. However, this can have a significant effect on database performance. There is a slight performance penalty to expanding a thin provisioned disk. This doesn't mean that you should thick provision storage on every host or guest. You should right-size the database virtual disk from day one, the same way you should right-size your database files from day one.
In some situations, you may run out of IO quicker than running out of disk capacity.
You’ve probably heard the quote from Gartner’s Thomas Bittman, “Virtualization without good management is more dangerous than not using virtualization in the first place.” That goes double for thin-provisioned virtual disks. Without comprehensive accounting and monitoring in place, your virtual infrastructure may be heading for disaster. Fortunately, storage managers have numerous tools to assist them in tackling capacity management. These include two general categories: utilities and reporting tools. With SRM (storage resource management) tools, storage managers can balance and optimize performance, data protection and capacity utilization simultaneously.
HP Storage Essentials (SE) ensures that IT capacity meets current and future business requirements in a cost-effective manner.
Alerts, Notifications and Thresholds
HP Storage Essentials captures three different kinds of events in the Event Manager:
Events generated by the devices supported by SE (SNMP traps, SMIS indications, etc.)
Internal events that are generated by internal operations (discovery, report cache refresh, etc.)
Storage Essential policy generated events
HP Storage Essentials consolidates information which otherwise must be manually aggregated. Storage host and array utilization capacity can be monitored, using policy templates:
Three actions are possible based on capacity management policies:
Forward the policy based event to an email address
Log an event in the local operating system event log
Run a batch script, executing an action
Capacity planning, trending and forecasting
Allocation and over-allocation can be monitored via the HP Storage Essentials Capacity Manager. The following example shows the HP 3PAR storage system array based over-allocated capacity:
The following example shows the ESX datastore trending information for the coming 70 weeks, based on last month’s datapoint:
The following example shows a storage-based chargeback report:
The Storage System Physical Capacity report shows the capacity statistics for the storage systems and their storage pools. In case of over-allocating capacity, arrays that support Thin Provisioning may show the allocated capacity greater than the total size.
The Storage System Capacity Forecast report displays the next eighteen months of forecasting information for the Storage System total capacity and allocated capacity.
The HP Storage Essentials software suite offers enhanced host and array based storage over-allocation visibility, allowing customers to monitor the latest thin provisioning benefits. The monitoring features include:
Setting up alerts and thresholds
Monitoring capacity trends and forecasts
Thin provisioning dead space deficiencies are no longer a concern, and IT administrators can manage data capacity demands and control storage growth much better. Moreover, storage administrators have visibility into multi-level physical and virtual space reclamation benefits, which help mitigate growing capacity concerns. Storage optimization and efficiency improvements allow IT administrators to maximize storage ROI, while helping to reduce costs.