All Blogs

Speed the Path to Vastly More Data Insights With Pentaho 9.2 and DataOps

Jason Tiret
Senior Product Manager for Pentaho and Dataflow Studio

August 26, 2021


In our modern world, accelerating the process of extracting insights from data is a complex challenge. Exacerbating this task are colossal data volumes, the expansion and use of multiple cloud platforms, and the increasing demands for self-service in a way that maintains compliance. Enterprises attempting to tackle the problem encounter various forms of friction everywhere they turn.

Friction impedes the journey from data to insight by making it hard to:

  • Streamline the process of acquiring and adding data to your repository.
  • Make sure you can both understand and deliver that data to the right people.
  • Ensure your downstream users can use and support the data in complex workloads.

To quicken the time it takes to get insights from data, you need to remove the friction throughout the process: Streamline the pipeline to handle massive data volumes across hybrid and multicloud landscapes while enabling self-service for data consumers.

Part of Hitachi’s Lumada digital innovation solutions portfolio, Pentaho’s raison d’etre has always been to help solve these data orchestration problems in as versatile and powerful a way as possible. As a result, Pentaho has been on the data-to-insight acceleration journey from day one, supporting every phase as a leader in open-source data integration and analytics. In the current environment, Pentaho is helping organizations manage the data explosion, data modernization efforts, the move from on premises to the cloud, data storage in multiple data lakes and clouds, and core-to-edge data collection. Simultaneously, it is enabling enterprises to embrace new approaches, such as DataOps.

In line with how the market is transitioning from on premises to cloud, enterprises need to evolve to integrate the data flowing in from multiple data lakes and from edge to core to cloud. In addition to the data volumes we mention, some of the most critical transitions are driven by shifting investments in Hadoop infrastructure.

As Hadoop becomes more cloud-centric, and as Hadoop Distributed File System (HDFS) is being replaced by object storage, different types of cloud-based Hadoop and big data infrastructure have emerged. These data infrastructure modernization efforts need to be supported wherever they live and on whatever cloud. At the same time, enterprises are exploring DataOps as an antidote to friction and as a way to automate data pipeline management.

DataOps allows organizations to build complex, secure and governed systems from the outset. Expanding automation across the board, especially with respect to construction and evolution of data pipelines dramatically increases agility, imitating the successes of continuous integration and delivery (CI/CD) in software development. As a result, enterprises are better equipped to move faster while remaining resilient, safe and compliant. Hitachi’s strategy to help enterprises achieve these objectives is two-pronged, serving the needs of Pentaho customers and Lumada DataOps Suite users.

For decades, Pentaho’s open-source roots have served the core mission of data integration and analysis. Customers familiar with Pentaho understand its breadth of capability. By investing in the standalone solution, existing enterprise customers can benefit from the enhanced DataOps capabilities they need, in the product they already know. In tandem, since Pentaho is now integrated as a powerful engine within the Lumada DataOps Suite, it is a win-win strategy for Pentaho’s existing customers and users of the DataOps Suite.

Pentaho 9.2 Improvements

For customers who want to use industry-leading data integration and business analytics capabilities, the newest release of Pentaho 9.2 offers:

Full Azure Cloud Support. Pentaho 9.2 offers extensive support for the Azure ecosystem, including Azure HDInsight, Azure Blob Storage, Azure Data Lake Storage, and Azure SQL Database, that is, SQL Server in the Azure cloud. This improvement addresses customers’ changing needs, transitioning their infrastructure from on-premises solutions, such as HDFS, to Azure-based cloud-based services, including Azure Blob Storage, Azure HDInsight, and Azure Data Lake. Azure connectivity, in addition to the existing support for Google Cloud Platform (GCP) and Amazon Web Services data sources, means no matter how customers choose to store their data, Pentaho gives them control of that journey. Pentaho enables them to move from Hadoop to object storage and use object storage or Hadoop on any cloud.

Updated Data Support for Cloudera Data Platform and MapR. Pentaho 9.2 is now up to speed and certified to support the recent architecture changes to the Cloudera Platform. In the same vein, Pentaho 9.2 supports MapR and is fully certified. Furthermore, Pentaho 9.2 offers multicluster support across the board, supporting enterprises working across clusters, including Cloudera, MapR, Azure HDInsight, Google, or Amazon Elastic MapReduce (EMR) clusters. Now, it’s easier than ever to configure, integrate and manage multiple Hadoop clusters. For example, you can easily pull files from one cluster, interact with a Hive table on another cluster, do the transformations you need, combine and blend data, and then put that data in whatever cluster you want.

Logging and Performance Enhancements. Pentaho 9.2 includes updates in performance and logging, as well as tuning the dashboard capability. We added Mondrian improvements that span general performance with certain types of roll-ups, such as sorting and slicing. As a result, customers can drill into more detailed information to tune specific queries to explain plans that a dashboard is issuing. Not only is the dashboard more performant, but it also logs what a user is accessing, giving the administrator better visibility into the underlying datasets and columns being accessed, enabling them to monitor whether sensitive data shows up in the report quickly.

Pentaho and Lumada DataOps Suite

Pentaho 9.2 accelerates core-to-multicloud pipelines while building resiliency. The latest release offers enterprises productivity boosts that improve performance while also building in automation. Taking advantage of Pentaho 9.2 improvements helps rapidly build and deploy data pipelines, at scale and across multiple clouds. As a result, enterprises can accelerate the data-to-insight process by bringing data in, blending, integrating and moving it wherever they want it analyzed.

However, with data integration and analytics done in different contexts and methods, DataOps approaches are becoming a powerful trend in organizing their efforts. In this setting, the Lumada DataOps Suite helps support the application of DataOps, solving a plethora of problems for people trying to accelerate the data-to-insight journey. In this instance, Pentaho 9.2 enhancements play a crucial role in supplying the data integration and data pipeline building capabilities for the DataOps Suite.

Recent improvements to the Pentaho Data Integration’s execution capabilities have dramatically boosted the DataOps Suite’s power. Now, Pentaho can run on both a Kubernetes foundation and as a microservice deployed and managed that way. As a result, customers gain the elasticity needed to spin up and down, no matter what environment they choose. So, if you are running at a massive scale across environments where you want to shift data around, rapidly create more instances, or shrink the instances handling a workload, Kubernetes flexibility is essential. You can deploy in Microsoft Azure or Amazon Web Services, and Lumada DataOps Suite provides the autoscaling you need by plugging it right into your environment.

It’s a Wrap

In both the standalone software and within the Lumada DataOps Suite, Pentaho’s latest enhancements address core customer issues, including better performance, new data sources, and new cloud vendors. In addition, customers gain more choice in how they want to source these capabilities: They can use Pentaho to solve data integration and analytics challenges as a standalone product, or they can choose the Lumada DataOps Suite, powered by Pentaho, to solve other problems.

In either case, Pentaho adheres to its primary mission: It ensures that customers benefit from the flexibility, expanded cloud choice, and expanded data fabric choice needed to navigate in an ever-changing data landscape.

For more information about the capabilities that Pentaho delivers to the Lumada platform to help enterprises accelerate their data-to-insight journey, visit our Lumada webpage.

Jason Tiret is Senior Product Manager for Pentaho and Dataflow Studio at Hitachi Vantara.


Jason Tiret

Jason Tiret

Jason Tiret is Senior Product Manager for Pentaho and Dataflow Studio at Hitachi Vantara.