Over the past few weeks, I’ve found myself in a familiar pattern. I have been on a lot of calls with fellow architects, partners and industry leaders who say, “Your data needs to be AI-ready.” Is your data AI-ready?
It lands with confidence. Everyone nods, and the conversation just moves forward.
But do we actually know what that means?
I’ve been in data long enough (close to 30 years now) to recognize when something seems to be being rebranded, when in reality, it’s something we should have all been doing all along.
This idea of “AI-ready data” isn’t revolutionary. It’s not even recent. In fact, the foundational practices to prepare data for AI are the same ones we should have been following all along. The same practices that were drilled into me back in the mid-90s when I was learning about data. So, what do I think it actually means to have data that’s AI-ready?
Here’s my take:
It Starts with Structure and Business Meaning
When I began my career, data modeling wasn’t optional, it was essential. We spent time with the business stakeholders, defining what the business concepts were and how they related. We weren’t just creating databases; we were building a shared understanding of how the business operated and communicated.
These days, we hear seemingly new terms like “semantic layers,” “ontologies,” and “knowledge graphs.” However, while the concepts evolved, the core principles remain unchanged

“If your data doesn’t reflect your business, your AI models will learn the wrong lessons.” — Keith Belanger
One of the best explanations I’ve seen comes from an industry expert and friend John Giles, in his recent book The Data Elephant in the Board Room. He draws a powerful comparison between data modeling and urban planning:
“Imagine building a new city (or significantly extending an existing one) without a ‘town plan’. Yet some people build IT solutions without the data equivalent.”
That “town plan” is your map. Your shared vision. Your business’s language is expressed in data concepts. Without it, your AI models are working from GPS coordinates without any road names, landmarks, or sense of direction.
Let’s make this concrete. Take SAP, one of the most widely used enterprise systems in the world. The table that holds general customer information is called KNA1. Within that table, the customer number is stored in a column named KUNNR, the city in ORT01, and the country in LAND1. You’ll also find names like KTOKD (customer account group), STCD1 (tax number), and SPRAS (language key).
If you’re a developer, you might recognize those. But if you’re training an AI model or trying to build a business facing semantic layer, this structure is close to meaningless. Worse, what if your business doesn’t even use the term “Customer?” Maybe your organization calls them Clients, Members, or Partners. That’s the real-world concept your business operates on. Yet none of that is reflected in the data structures.
Unfortunately, this discipline has eroded over time. Structure is often sacrificed for speed. I’ve heard the excuses:
- “We’ll figure it out later.”
- “We don’t have time to model”
- “Let’s just get something working.”
- “The tools can infer that.”
But here we are now, trying to layer AI on top of fragmented, inconsistent, and poorly defined data and wondering why the results feel off. AI doesn’t thrive in chaos. It needs structure. It needs clarity. It needs a town plan. While a plan is often framed as an AI requirement, truthfully, it should have always been part of your data strategy. AI just made the cracks more visible.
But it doesn’t stop at individual entities. Relationships matter, too.
In conceptual modeling, we spend time defining how entities relate: one customer places many orders, each order contains many products, and each product is supplied by one vendor. These connections tell a story about how business actually works.
Knowledge graphs take that idea even further, allowing us to explicitly model relationships, not just store them. They can represent relationships that evolve over time, follow complex hierarchies, or capture abstract connections such as influence, similarity, or lineage. AI thrives on relationships. Patterns. Context. It’s not enough to know what the data is; it needs to know how everything connects. That’s where structure becomes intelligence. Relationships are how we give data context. And context is how AI learns.
Garbage In, Well You Know the Rest
Data quality isn’t new. But with AI, the stakes have never been higher. We all know that phrase: “Garbage in, garbage out.” What’s changed is the cost of garbage. AI models don’t get confused like humans do. They don’t stop and ask questions. They just learn confidently and relentlessly from whatever data you give them. And if that data is incomplete, inconsistent, outdated, or full of silent errors? Well, AI will come to the wrong conclusions and tell the wrong stories with absolute confidence.
We often talk about hallucinations like they’re some mysterious model flaw. But in many cases, they’re just symptoms of bad data. When models encounter conflicting values, ambiguous definitions, or erratic patterns, they guess. They fill in the blanks. That’s hallucination, and in the world of business, those hallucinations aren’t harmless, they can be expensive, misleading, even dangerous.
Data Quality Isn’t a Project. It’s Discipline.
Before AI, bad data broke dashboard and report results. Today, it can misguide strategy, customer interactions, and automated decisions at scale, and often invisible. The impact is bigger, faster, and harder to trace. That’s why clean, consistent, trustworthy data is now non-negotiable.
Furthermore, no LLM can correct raw misinformation at scale.
Having spent three decades in the data intelligence space, I can say for sure that every successful data-driven initiative , whether analytics, AI, machine learning, or operations , has one thing in common:
A good data culture treats data quality as continuous discipline, not a one-time cleanup.
What this means:
- Validating data at every stage, not just at the end
- Building feedback loops to catch drift, anomalies, and silent failures
- Creating ownership and accountability, not just handing it to IT
- And yes: testing, not occasionally, but continuously
Just like we test code, we need to test our data. Trust but verify. (We will get more into that shortly)
You Can’t Automate Your Way Out of Bad Data
There’s a misconception that modern tools will magically clean or correct your data. But automation is only as good as the assumptions you bake into it. If your definitions are wrong, your structure is vague, or your source systems are full of silent errors, automation just scales the mess. AI amplifies whatever you feed it. So, if you feed it noise, it will generate noise with confidence. This is why I believe data quality isn’t just a technical concern, it’s an ethical one.
If we expect people to trust AI, then we need to take responsibility for what we feed it. You can’t trust outcomes from untrustworthy inputs. And you can’t add trust later in the process; you have to build it in from the start. Bottom line: If you want reliable outcomes, you need reliable data.
Trust, but Verify: Why Testing Is Non-Negotiable
Testing has always been part of good data engineering. We test software before deploying it (I hope you do). We test infrastructure before relying on it. And yet, when it comes to data, testing often gets treated like an afterthought or a one-time process to release into production. That’s a big problem cause AI doesn’t just consume data, it depends on it. You can’t trust AI outputs if you haven’t verified the integrity of the data inputs.
And testing isn’t just something you do when you deploy a new transformation or pipeline. It must happen as data flows through the system, every day in every batch in every stream.
That doesn’t necessarily mean testing every record inline in real time (though in some cases, that’s possible). It means that validation, assertions, and thresholds are built into your pipeline, and that metadata about what passed, what failed, and why is logged, analyzed, and acted on. You shouldn’t let data move to the next step in the pipeline without checking if it’s trustworthy enough to go forward.
Think of it like ingredients in a kitchen. If a shipment of vegetables shows up spoiled, you wouldn’t just keep cooking with them because they arrived on time. You’d stop. You’d inspect. You’d send them back. The same mindset applies to data pipelines. If the inputs are questionable, the downstream models, dashboards, and decisions will be too.
DataOps: The Modern Engine Behind AI-Ready Data
If structure, quality, and testing are the pillars of AI-ready data, DataOps is the discipline that holds it all together.
Early in my career, we didn’t have Git for SQL or CI/CD for ETL. We deployed manually, relied on tribal knowledge, and updated data monthly. But today’s data ecosystems are fast, distributed, and cloud native, making DataOps essential.

Modern data pipelines run continuously, with changes pushed daily or even hourly. Engineering teams must deliver new models, enable self-service analytics, and support data products without breaking anything. Meanwhile, business leaders expect data to be accurate, fresh, and always available. Meeting these demands requires automation, governance, and operational rigor the foundation of DataOps. That’s why new tools are bringing DevOps principles into the data world, helping teams automate deployments, enforce testing, manage version control, and orchestrate dependencies without slowing down.
For example, DataOps.live (the company I work at) supports advanced data operations at scale, integrating CI/CD workflows, automated testing, and safe change management, all while keeping engineers in control. The platform was built by data engineers, for data engineers, to address the real-world challenges of the modern data ecosystem.
As data becomes more central to business and AI strategies, DataOps isn’t just helpful, it’s the backbone that lets teams move fast, stay reliable, and fully realize the value of their data.
Industry Alignment and Leadership Responsibility
If you’re wondering whether this emphasis on structure, quality, testing, and operational rigor is just my opinion, take a look at the broader industry signals.
Analyst firms like Gartner are publishing entire frameworks around what it means to be “AI-ready.” Their recent guidance calls for trustworthy data foundations, semantic consistency, and proactive governance as core pillars. They highlight the need for data observability, data product thinking, and automation, not as future ambitions, but as current necessities.

And this isn’t just a technical concern, it’s strategic. The organizations best positioned to succeed with AI aren’t the ones with the most data. They’re the ones with the most trusted, well-managed, and well-understood data.
Which brings us to a critical point: It’s up to today’s data leaders, CIOs, CDOs, VPs, and Directors to take the lead.
Not just in buying tools or launching AI pilots, but in shaping the culture. Setting expectations. Demanding the disciplines that turn data chaos into clarity. Because preparing your data for AI is not just a technical step, it’s a leadership decision.
Full Circle: Foundational, Not Futuristic
I didn’t have a magic ball over the years. I didn’t predict AI would land here, at this scale, this fast. Honestly, it felt like science fiction for much of my career.
But here we are… living it.
What’s funny (and a little ironic) is that after all these years, the industry has come full circle. We’re talking about AI-ready data like it’s a brand-new concept… but it’s really not:
It’s the same foundational practices I learned many years ago. The same principles we followed to design better data solutions, build trust, and data model the business clearly and meaningfully. We’re just blowing the dust off. And maybe that’s fine if it finally gets people to pay attention to what truly matters.
Because the truth is this:
- AI doesn’t need magic. It needs structure.
- AI doesn’t need hype. It needs quality.
- AI doesn’t need reinventing. It needs remembering.
If we get that right, we won’t just have AI-ready data. We’ll have decision ready data, trustworthy data, and future-ready organizations.
Then you may find your AI Gold.