Survey Says: Data Quality Management Isn’t Evolving Fast Enough for AI

Barr Moses
4 min readJun 25, 2024

--

Each year, Monte Carlo surveys real data professionals about the state of their data quality. This year, we turned our gaze to the shadow of AI—and the message was clear.

Data quality risks are evolving — and data quality management isn’t.

Of the 200 data professionals polled about the state of enterprise AI, a staggering 91% said they were actively building AI applications, but 2 out of 3 admitted to not completely trusting the data they’re built on.

And “not completely” leaves a lot of room for error in the world of AI.

Far from pushing the industry towards better habits — and more trustworthy outputs — the introduction of GenAI seems to have exacerbated the scope and severity of data quality problems.

So, why is this happening? And what can we do about it?

Read on to see what else we uncovered in the 2024 State of Reliable AI Survey, and find out how it impacts your own AI initiatives in 2024 and beyond.

Fast facts about enterprise AI today

The Wakefield Research survey — which polled 200 data leaders and professionals — was commissioned by Monte Carlo in April 2024, and comes as data teams are grappling with the adoption of generative AI.

Among the findings are several statistics that indicate the current state of the AI race and professional sentiment about the technology:

  • 100% of data professionals feel pressure from their leadership to implement a GenAI strategy and/or build GenAI products
  • 91% of data leaders (VP or above) have built or are currently building a GenAI product
  • 82% of respondents rated the potential usefulness of GenAI at least an 8 on a scale of 1–10, but 90% believe their leaders do not have realistic expectations for its technical feasibility or ability to drive business value.
  • 84% of respondents indicate that it is the data team’s responsibility to implement a GenAI strategy, versus 12% whose organizations have built dedicated GenAI teams.

While AI is widely expected to be among the most transformative technological advancements of the last decade, these findings suggest a troubling disconnect between data teams and business stakeholders.

More importantly, they suggest a risk of downward pressure toward AI initiatives without a clear understanding of the data and infrastructure that power it.

Unfortunately, when it comes to data and AI, time and pressure doesn’t always deliver diamonds.

The state of AI infrastructure—and the risks it’s hiding

Even before the advent of GenAI, organizations were dealing with an exponentially greater volume of data than in decades past.

Since adopting GenAI programs, 91% of data leaders report that both applications and the number of critical data sources has increased even further — deepening the complexity and scale of their data estates in the process.

More than that, there’s no clear solution for a successful enterprise AI architecture. Here’s how data teams are approaching AI based on survey results:

  • 49% building their own LLM
  • 49% using model-as-a-service providers like OpenAI or Anthropic
  • 48% implementing a retrieval-augmented generation (RAG) architecture
  • 48% fine-tuning models-as-a-service or their own LLMs

Talk about a divided industry.

So, what’s the big issue?

As the complexity of the AI’s architecture — and the data that powers it — continues to expand, one perennial problem is expanding with it: data quality issues.

The modern data quality problem

While data quality has always been a challenge for data teams, this year’s survey results suggest the introduction of GenAI has exacerbated both the scope and severity of the problem.

More than half of respondents reported experiencing a data incident that cost their organization more than $100K. And we didn’t even ask how MANY they experienced. (Previous surveys suggest an average of 67 data incidents per month of varying severity.)

This is a shocking figure when you consider that 70% of data leaders surveyed also reported that it takes longer than 4 hours to find a data incident—and at least another 4 hours to resolve it.

But the real nail in the coffin is this: even with 91% of teams reporting that their critical data sources are expanding, an alarming 54% of teams surveyed still rely on manual testing or have no initiative in place at all to address data quality in their AI.

This anemic approach to data quality will have a demonstrable impact on enterprise AI applications and data products in the coming months — allowing more data incidents to slip through the cracks, multiplying hallucinations, diminishing the safety of outputs, and eroding confidence in both the AI and the companies that build them.

Is your data AI-ready?

While a lot has certainly changed over the last 12 months, one thing is still absolutely clear: if AI is going to succeed, data quality needs to be front and center.

This quote from Lior Solomon, VP of Data at Drata says it all.

“Data is the lifeblood of all AI — without secure, compliant, and reliable data, enterprise AI initiatives will fail before they get off the ground…The most advanced AI projects will prioritize data reliability at each stage of the model development life cycle, from ingestion in the database to fine-tuning or RAG.”

The success of AI depends on the data — and the success of the data depends on your team’s ability to efficiently detect and resolve the data quality issues that impact it.

By curating and pairing your own first-party context data with modern data quality management solutions like data observability, your team can mitigate the risks of building fast and deliver reliable business value for your stakeholders at every stage of your AI adventure.

Check out the full report.

--

--