The Intricacies of Data Accessibility for AI

Mark Katz
CTO, Financial Services

May 15, 2024

The Intricacies of Data Accessibility for AI

Making Data Accessible: Unraveling the Complexities

Artificial intelligence (AI) has become a focal point for innovation across various industries, with businesses exploring ways to harness its power for commercial success. The prominence of technologies like Chat GPT has sparked a new wave of creativity and experimentation where companies are now exploring diverse applications of AI. While generative AI applications such as virtual assistants offer users a seamless digital experience, more advanced sectors like financial services have started to deploy recommendation engines and portfolio optimizations to detect credit card fraud and provide better insights into trading decisions.

However, as businesses integrate AI models into their operations, they are dealing with an explosion of datasets which are mainly unstructured. This means organizations need a very good data infrastructure to manage the new architecture for generative AI applications. This was a focal point of discussion at the Hitachi Vantara Exchange event held in New York in February 2024.

As enterprises look to apply advanced models of generative AI, here are some key things to consider when dealing with data management issues and some strategies you can employ to mitigate the risk of data accessibility.

Responsible AI and data management

The explosion of unstructured data in organizations has created a shift in how we approach data management. Simply hosting data in the cloud is no longer sufficient, and data management tools must be embedded into the contents of data to discern Personally identifiable information (PII), control access, and establish a chain of custody. With generative AI, organizations must balance adopting new data management tools while ensuring stability, security, and strategic alignment with business objectives.

When training Gen AI models, they not only absorb the knowledge base of data across social media platforms, but they also absorb all the biases that are implicit in the society that produced it. This can cause the Gen AI applications to make errors called ‘Hallucinations’, which are essentially the intrinsic bias trained into the model; something you don’t want to end up with. This is where responsible and explainable AI comes in; by forcing the models to explain themselves, you will be promoting transparency and accountability in AI deployment.

Responsible AI underscores the need to mitigate biases and errors in AI models by fostering inclusive AI training practices. Explainable AI is a similar initiative that aims to enhance visibility into AI decision-making processes, enabling greater control and understanding of AI outputs. As businesses grapple with the computational demands of generative AI, scaling down from large language models to small language models is also emerging as a practical solution to enhance efficiency and focus on specific use cases.

Data management and privacy in a multi-cloud environment

As we streamline data accessibility, it is crucial to ensure proper segmentation, providing the organization with the opportunity to comply with all expectations placed upon it. For instance, EU GDPR customers now hold an expectation of privacy regarding their data, amplifying scrutiny on data usage. Customers now demand responsible data handling, leading to increased constraints on data use, including residency requirements and the right to delete.

The implications of non-compliance are a breach of trust, especially in sectors like finance and healthcare, where trust is foundational and challenging to restore once lost. Regulatory bodies are expanding their focus beyond privacy to cyber resilience, exemplified by initiatives like the Digital Operational Resilience Act (DORA) influencing the landscape.

In the transition to cyber resiliency, the importance of immutable data becomes evident. However, immutability alone is insufficient in the face of ransomware threats. Understanding the last known good recovery point becomes crucial, necessitating a comprehensive approach to data recovery at scale.

Key components of a cyber resilience posture encompass data immutability, knowledge of recovery points, and the ability to recover at scale. The demand for products capable of handling the recovery of thousands or tens of thousands of instances ensures business continuity in the face of cyber threats.

Actionable insights for digital transformation

The AI use cases we’ve explored here are extremely complex environments; this includes generative AI, cyber resilience and privacy, cloud and hybrid cloud. The digital transformation strategy for every organization is going to be a journey where choosing the right technology partner is critical for success. In a landscape with diverse challenges, having a partner capable of delivering a full suite of capabilities ensures a more robust and holistic approach to digital transformation.

As we continue this journey, the key takeaway is clear: with great power and great data comes the responsibility to navigate the future with wisdom and resilience.

Visit our Hitachi iQ solution page to learn how to Transform Business Operations and Beyond with AI.