Data gravity and how its pull is transforming data storage infrastructure
As enterprises aim to overcome the cost and complexity of storing, moving, and activating data at scale, they should seek better economics, less friction, and a simpler experience.
Data analytics, AI, and IoT are driving unprecedented growth in enterprise data. In fact, over the next year, enterprise data is projected to increase at a 42.2% annual growth rate, with only 32% of data available to enterprises is put to work, based on ‘Rethink Data: Put More of Your Business Data to Work — From Edge to Cloud’ report conducted by IDC. The remaining 68% goes unleveraged. With data volume growing exponentially, the accumulation of data has given rise to data gravity.
What is data gravity? It is the ability of data and applications to attract each other, similar to how two or more objects are attracted to each other under the law of gravity. As enterprise data sets grow, they become stagnant, and business processes and software grow around it, which attracts more data. Generally speaking, data gravity is a consequence of data’s volume and level of activation. According to a recent IDC whitepaper, Future-proofing Storage: Modernizing Infrastructure for Data Growth Across Hybrid, Edge and Cloud Ecosystems, “Workloads with the largest volumes of stored data exhibit the largest mass within their ‘universe,’ attracting applications, services, and other infrastructure resources into their orbit.”
The business challenges of data gravity
Data gravity affects the entire IT infrastructure; it should be a major consideration when planning data management strategies. Data is now an essential asset to businesses in every vertical. The growth of data—both of the structured and unstructured kind—will continue at unprecedented rates in the coming years. Meanwhile, data sprawl—the increasing degree to which business data is scattered across data centers and geographies—adds complexity to the challenges of managing data’s growth, movement, and activation.
What worked for terabytes may not work for petabytes. As enterprises aim to overcome the cost and complexity of storing, moving, and activating data at scale, they should seek better economics, less friction, and a simpler experience—easy, open, limitless, and built for the data-driven, distributed enterprise.
What does it take to manage massive datasets?
According to the Future-Proofing Storage report, as storage associated with massive data sets continues to grow, so does its gravitational force on other elements within the IT universe. Consider two data sets: one is 1 petabyte, and the other is 1 gigabyte. In order to integrate the two sets, it is more efficient to move the smaller dataset to the location of the larger dataset. Because large data sets will “attract" other smaller data sets, services, and applications, large databases tend to accrete data, further increasing their overall data gravity. Reflecting data lifecycle dynamics, data gravity helps inform IT architecture decisions.
The more massive a data set grows, the harder it is to make use of that data unless it is close to the applications and services that help to manage or activate the data. So, applications and services are often moved close to the data sets or are kept near the data sets.
But such massive data sets can trap stored data, applications, and services in a single location, forming data “black holes,” which make it hard to put that data to use. From data centers to public clouds and edge computing, data gravity is a property that spans the entire IT infrastructure. IDC analysts recommend that no single data set exerts an uncontrollable force on the rest of the IT and application ecosystem.
Ensuring applications have access to data, regardless of location
One way to mitigate the impact of data gravity is to ensure that stored data is co-located adjacent to applications regardless of location. This model can be accomplished by leveraging co-located data centers that bring together multiple private and public cloud service providers, allowing enterprises to pair their mass data storage with the best solutions for applications, computing, and networking needs. By optimizing data location, a data-centered architecture brings applications, services, and user interaction closer to the location where data resides—rather than relying on time-consuming and often costly long-distance transfers of mass data to and from centralized service providers.
Center data in your IT strategy
Putting data at the center of IT architecture can positively impact application performance optimization, issues of transfer latency, access and egress charges, and security and compliance needs. The overall reliability and durability of the data is also an important focus. Planning data-centric workloads and jobs mean accounting for data gravity. Key parameters for such assessment include the volume of data being generated/consumed; distribution of data across datacenter, private and public clouds, edge devices, and remote/branch offices; and velocity of data being transmitted. Addressing these considerations will increase the efficiency of the data infrastructure and can reduce costly data pipeline issues significantly. Because data has gravity, it is important to automate the movement of data to reduce storage costs and consider moving lower-performing data sets that are not immediately or actively needed to backup repositories.
Enterprises must implement a strategy to efficiently manage mass data across cloud, edge, and endpoint environments. It’s critical to develop a comprehensive data-centric strategy when designing data storage infrastructure at scale. It's important that every data management system can change to accommodate new data requirements. Data management and the data architecture to support it must be agile, and able to adapt to shifting business needs and emerging technical opportunities—including those brought by data gravity.
This article has been written by Grace Liu, Senior Vice President, Information Technology at Seagate Technology.