How to create data management policies for unstructured data

Join today’s leading executives online at the Data Summit on March 9th. Register here.

This article was contributed by Randy Hopkins, VP of global systems engineering & enablement at Komprise.

If data is the lifeblood of your organization — dictating the success of customer relationships, product launches, cross-selling and upselling, and employee productivity — then you need to manage it strategically.  That strategy should include a program for data management policies. This ensures that data is always stored in the appropriate environment according to its usage, age, value, and business priority.  And those parameters are always changing.  

For instance, an electric car manufacturer wants to understand how its vehicles perform under different climate conditions. Therefore, they may want to create a data management policy to continually pull trace files from cars at regular intervals into data lakes and analyze them. Once the study has been completed, that policy will retire and the moved data could be deleted or moved to deep archive storage. A hospital may have a policy to retain medical images for the life of the patient and the policy could dictate where and when those images move to cold storage.  Some organizations are required to delete the files of ex-employees immediately or after a period of time.  Doing this manually is no longer a viable option given the scope of data stored in enterprises today. 

With data growing at an unprecedented rate, comprising 30% or more of the overall IT budget on its storage, now is the time to hunker down on the idea of data management policy automation. The lion’s share of all data is unstructured — files of all types including images, log files, trace files, output files, video, and audio, and which are spreading like wildfire. Data management policies must address the efficient movement and management of unstructured data.

The benefits of adopting a systematic way to create, execute and manage policies for data include:

  • Automated policies align data strategy with business goals; 
  • Simplifies data management by reducing manual effort and ad hoc decision-making;
  • Delivers the means to maximize cost savings by continuously moving cold data tiering to less expensive storage;
  • Ensures compliance with industry regulations;
  • Adds ransomware protection by copying data from primary storage into object lock storage where it cannot be compromised;  
  • Automatically feeds data pipelines into data lakes and tools for analytics and AI programs.

The notion of data management policies isn’t new, but historically, this activity took place within storage vendor technology. A storage vendor-centric approach was all well and good before data hit the petabyte and growing levels of today and before organizations were using multiple storage vendors and clouds to manage their data.  But now, the storage-centric approach to policy management creates vendor lock-in and silos, making it onerous to cost-effectively manage data and move it expediently to different storage technologies and services as needed to support users, big data analytics initiatives and cost-saving mandates. IT leaders know that in the digital age, data needs more than protection. It needs full lifecycle management and that is where modern data management policies come into play.  

Considerations for managing unstructured data management policies

Access anywhere. Distributed workforces now require instant access to data — regardless of where it’s stored — with a transparent user experience. Data professionals should prioritize these needs as they create policies for saving money, protecting data, and enabling access controls.

Automate as much as you can. A declarative approach is the goal. While there are many options available now using independent data management software to manage policies across storage, many organizations still employ IT managers and spreadsheets to create and track policies.  The worst part of this bespoke manual effort is searching for files containing certain attributes and then moving or deleting them. These efforts are inefficient, incomplete, and impede the goals of having policies; it’s painful to maintain them, and IT professionals have too many competing priorities. Plus, this approach limits the potential of using policies to continuously curate and move data to data lakes for strategic AI and ML projects. Instead, look for a solution with an intuitive interface to build and execute on a schedule and that runs in the background without human intervention.  

Measure outcomes and refine. Any data management policy should be mapped to specific goals, such as cost savings on storage and backups. It should measure those outcomes and let you know their status so that if those goals are not being met, you can change the plans accordingly. This is akin to a smoke detector which is always checking its own battery and then alerts you when it’s time to change it out. Data management tools should do the heavy lifting for you and let you know when something is not working or there’s an issue to fix.

For instance, if you have a data management plan which tiers data after it reaches one year of age into object storage in the cloud, you’ll expect a certain percentage of savings. However, if this cold data ends up being frequently pulled back into local applications and storage, then you’re going to face high egress fees which will counteract those savings.  At that point, you would want to consider a different tiering model, or better yet, a data management solution that recognizes the activity and trend of recent access of cold data and applies the declarative action to right-place it.

Align staff roles. Data management policies should be managed by a team within the organization that identifies how policies are created, accessed, and used. The team is also responsible for managing, enforcing, and refining policies and communicating them to employees with a need to know. Large enterprises should consider creating a data management policy team including top executives who contribute to discussions concerning data governance, protection, and monetization. This team will align with business units to ensure retention and protection considerations are consistent. 

Metadata management for unstructured data:  Another consideration is to simplify searches across all file metadata from a unified file index. The technology should also enable actions to copy, move, archive, tier, and report on unstructured data files.

In closing, enterprise data is not owned by any individual or business unit; it’s owned by the enterprise and needs to be managed holistically and strategically to meet the needs of critical stakeholders and to align with broad organizational goals. Users should not have to worry about where data lives. Data should be accessible to users no matter where it resides. Ultimately, a data management policy should guide your organization’s philosophy toward managing both structured and unstructured data as a valued enterprise asset. 

Randy Hopkins is VP of global systems engineering & enablement at Komprise.

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers

Source: Read Full Article