Data analytics is the science of analyzing raw data in order to draw conclusions about the information they contain to uncover patterns and extract valuable insights from it. Its aim is to apply those insights to beneficially inform larger organizational decisions and subsequently achieve real-world goals. For this reason, it is also known as Business Intelligence.
Technically, data analysis is a process of cleaning, prepping, transforming, modeling, and processing data with the goal of discovering meaningful information, making informed conclusions, and supporting decision-making. Due to this technical nature, many data analytics techniques and processes have been automated using computer algorithms to prepare raw data for human consumption. Depending on the workflow stage and data analysis requirements, there are four main kinds of analytics that provide varying depths of analysis—Descriptive, Diagnostic, Predictive, and Prescriptive.
Data sets in analytics scenarios are very large and complex, so four types of data analytics have been defined in order to help uncover different patterns and stories within the data. The first two types of data analytics, descriptive analytics and diagnostic analytics, focus on past analysis and are the simplest to perform. The next two types of data analysis, predictive analytics and prescriptive analytics attempt to understand the future from the data. These forward-looking analyses tend to be more complex while potentially offering the greatest value of insight.
Data analytics applications vary by business and industry, with the most common application of data analytics as web analytics, which draws conclusions about website user behavior based on website traffic, and financial data analytics used to create necessary reports performed on financial data sets. Additionally, technology trends continue to push analytic capabilities into new areas like edge computing, where automated remote analysis dramatically reduces latency, and overcomes the data glut challenges of today's edge technologies.
Although there are many roles involving business data analytics (data suppliers, data consumers, and data preparers), the role of developing and engineering the data pipeline between suppliers and consumers belongs to data preparers. In this category, the specific roles that contribute to ensure raw data becomes usable insights are:
High-speed processing has allowed sophisticated data analytics software the ability to analyze data in real-time as well as peer back to analyze past performance. But they are not the same processes and should be understood to serve different purposes depending on their applications.
Take for instance network monitoring analytics where historic and real-time data is used in different ways. While network traffic passes over a network, routers and switches can monitor data packets, identifying unwanted packets by comparing those signatures with a database of known threats. Likewise, enabled by automation, real-time intelligent network monitoring can reroute traffic, reconfigure settings, and even complete minor tasks that 'self-heal' the network. Analyzing data in real-time requires speed and compute power sufficient enough to ingest large volumes of data at high velocities, often sacrificing deep data analysis for speed and automation.
However, if in the case of network intrusion and criminal activity, sometimes prompting an in-depth network forensics investigation, historic network data records can prove to be the only source of truth for analysts. Yet maintaining a source of truth has its challenges. A routine network data practice purges traffic logs older than a few weeks in order to mitigate the costs of data storage. Although a summary of network traffic may be kept, details will be lost which could make deep investigations impossible.
Data analytics vendors abound providing businesses with ample solutions to solve their data analysis needs. There are stand-alone data tools, but analytics platforms offer businesses solutions with full capabilities to absorb, organize, discover, and analyze their data.
Some platforms require IT expertise to set up the analytics environment, connect data sources, and prepare data for usage; while others are user-friendly, designed with the non-expert in mind. These user-friendly platforms are known as self-service, and allow data consumers to prepare, model, and transform data as they need to make business decisions.
Data analytics software with the following end-to-end features can be classified as platforms.
Data analytics and big data are terms that often collocate and can be confused to mean the same thing. Data analytics is about finding patterns within data, typically structured data, within significantly smaller sets than Big Data sets. Statistical analysis is a primary tool for data analytics. And the purpose is usually business problem-oriented.
Big Data analytics, however, is characterized by a high variety of structured, semi-structured, and unstructured data, drawn from various sources like social media, mobile, smart devices, text, voice, IoT sensors, and web, and further by the high velocity and high volume at which its data pipelines ingest.
Though there is no official big data size, big data operations can be measured in the terabytes and petabytes for organizations like eBay and Walmart, and in the zettabytes for Google or Amazon. Once collected, data can reside in an unstructured form in data lakes available for processing by data preparers. After processing, the filtered and structured data is maintained in data warehouses to be used by data consumers.