The digital revolution and the rise of big data have created an enormous volume of data for the average business. In a 2017 study, Data Age 2025: The Evolution of Data to Life-Critical, IDC predicted that global data will grow to 163ZB (zettabytes, or one trillion gigabytes) by 2025.
The emergence of big data has spawned a wide range of data types that companies must manage and secure. These data types include:
Each of these data types poses unique challenges in terms of creating a data governance strategy that stores the information, protects privacy and security, and complies with government regulations about data.
Most businesses have a solid understanding of structured data, which usually has a row-column format and very explicit metadata elements, such as month/day/year. Largely numeric, structured data comes from transactional systems, databases and back-office applications (for example, ERP systems). While businesses have an overwhelming amount of structured data, they generally know how to manage, analyze and apply it because of how well it’s defined.
The bigger challenge for the majority of organizations lies in understanding and extracting value from unstructured data. Unstructured data comes in many formats, each with varying degrees of complexity, such as images, audio files, office or productivity files, and handwritten notes that have been scanned. This data can, and does, originate from anywhere: internally, externally, from third parties, via edge devices, and from other sources.
Because unstructured data is not governed by strict rules or shared formats, it can be difficult to manage and apply a consistent data governance strategy. Still, it can contain critical insights that organizations need to leverage in today’s highly competitive, always-on business world.
For example, consider an important customer’s complaint left on voicemail. Mining value from the audio file requires a software application capable of playing it, a person to physically listen to it, and another person to determine what information is valuable and what isn’t. Converting the audio to text as part of the data processing strategy creates a consistent view of the recording that can be interpreted as necessary by anyone with authorized access to the recording. It also allows the voicemail to be blended with other forms of analytics, without compromising the original source.
Other unstructured data that contains key insights could include the handwritten notes of a maintenance technician servicing an essential piece of production equipment. With regard to third-party data, a long-range weather forecast or a negative social media post by an influencer can significantly impact demand for some products. It’s easy to see the huge potential value of this type of data.
Finally, semi-structured data represents a hybrid of these two data types. This group could include Excel spreadsheets that contain important financial information, but the data itself is hard to extract. These data objects may have structure within them, but they lack the external structure needed for standard data management processes. Like unstructured data, these objects contain important insights that can be hard to extract and apply without an intelligent data governance strategy.
Semi-structured data refers to any information that uses a self-describing schema, such as XML or JSON. These types of data have an open-ended schema that enables application data flexibility. Sometimes, this type of data is combined with structured data to record additional properties for specific types of records within a structured data store.
The open-ended schema means that semi-structured data does not rely on the application that created it to define the embedded structure. For example, an Oracle database would be considered a structured data type. The rules governing the database are bound and applied by the application that creates the file, or, in this case, the database.
With semi-structured datasets, the definitions and constraints are embedded within the file, regardless of the application that created them. For example, XML files and cascading style sheets for web pages are both forms of semi-structured data. They can be created by almost any kind of application — such as Notepad, a website builder app or an Office app like Word — so there is no way for the application to apply structure or rules to these data types.
Semi-structured data is challenging for organizations to manage because it does not necessarily have the same level of organization and predictability as structured data. It does not reside in fixed fields or records. At the same time, it does have more rigidity than unstructured data, because it does contain elements that can separate the data into various hierarchies (think of comma delimited files or tab delimited files).
Unlike structured data — which represents data as a flat table —semi-structured data can contain n-level hierarchies of nested information. This means that it can be easy to apply standard data management processes to semi-structured data, and it can be easy to extract insights from. The real issue is making sure your business has the tools and technology necessary to load the data into structured or unstructured data models, which can be managed via data governance.
In simplest terms, intelligent data governance means bringing data under control, keeping it protected and enabling access to it, to carry out the top-level business strategy. But data governance also means knowing where data originated, where it is currently located, who can access it, what it contains, and how long it should be retained. Intelligent data governance also implies that trivial data is distinguished from strategically important information.
Once data is centralized and thoughtfully managed, its true strategic potential can be unleashed. Businesses can easily identify customer needs, anticipate emerging issues, explore new business opportunities and respond to regulatory inquiries. They can optimize the costs of storing and administering these information assets, while still allowing key stakeholders in the business to leverage data for improved decision-making.
When it comes to data governance, striking an appropriate balance is key. Data of all types must be closely managed, but the organization still needs to make it accessible, supporting the high degrees of flexibility and speed that are essential in today’s fast-moving world.
The good news is that there are innovative, automated solutions that can help streamline and accelerate the process of data governance, saving your organization valuable time and costs.
Hitachi Vantara literally wrote the book on data governance. With established leadership in data storage and management, the experts at Hitachi can make the complex task of data governance easy and straightforward, via automated solutions that help your company to:
By implementing automated solutions that cleanse, identify and centralize your structured, unstructured and semi-structured data, Hitachi can help you create a “single source of truth” that has enormous strategic value. You can gain new insights about your daily operations, your customers and trading partners, your finances and emerging trends that will impact your company and its financial results.
Intelligent data governance provides a range of strategic benefits for the typical company, including:
Because data volumes are growing exponentially, Hitachi Vantara recommends that your company reviews its data governance policies and practices on a quarterly basis. By looking at the “big picture” with regard to data every three months, your company can identify emerging trends, troubleshoot problems and ensure that data continues to function as a strategic resource.
In addition to establishing a cadence for data governance, Hitachi recommends that every organization include the position of chief data officer or CDO. Within the organization, the CDO serves as “the voice of the data,” protecting it and maximizing its strategic contribution on an ongoing basis.
An emerging concept, DataOps — or data operations — is enterprise-level data management for the artificial intelligence era. By implementing an overarching DataOps strategy, you can seamlessly connect your data consumers and creators, to rapidly find and use all the value in your data.
Data operations is not a product, service or solution. It’s a methodology, a technological and cultural change aimed at improving your organization’s use of data through better data quality, shorter cycle time and superior data management.
Since DataOps spans the entire cycle of gathering and applying information, it’s absolutely essential that your organization manages every type of data efficiently. By having data cleansed, well managed and immediately accessible, your DataOps initiative can be supported with the right information you need to make strategic decisions based on facts, not guesswork.
Because Hitachi Vantara has proven expertise in both DataOps and data governance, across every type of data, Hitachi is a natural partner. By instilling a data-driven culture and mindset, Hitachi can help make data a focus for your business every day.