CTO AI Corner: Using AI to improve data quality? Absolutely

Data quality is one of the biggest obstacles companies face when trying to adopt AI. It's also one of the areas where AI can shine. Can AI help fix the very thing it depends on? Turns out, yes. Well... at least sometimes.

Let’s talk about the usual suspects: incorrect, missing, and unstructured data.

Incorrect data often slips past validations because users just want to move on with their lives. Fields filled with plausible nonsense that humans can spot instantly. Here, AI can help in a few ways: you can use it to design smarter validation rules that make it harder to enter garbage data in the first place. Or use AI to detect anomalies and flag suspicious entries. Give it examples, and it can even learn to recognize patterns of "looks okay but definitely isn't."

Once you know what’s wrong, you can choose whether to clean, flag, or remove it. At the very least, you'll know what you're working with – and more importantly, what you're not.

Missing data? AI can help there too. You might already have the same data elsewhere just under a different name or in another system. AI can match, validate, and even generate import scripts to backfill the blanks. In some cases, it can infer missing info based on existing data. It’s not magic, just probability and context.

Unstructured data? Now we’re speaking LLM’s native language. Whether it’s free text, mixed-format fields, or names in odd orders, modern LLMs can extract structured data with impressive accuracy. Typos? Inconsistent order? Doesn't matter. It usually does as well as (or better than) a human who didn’t have enough coffee.

One thing to keep in mind: don’t throw your biggest, flashiest model at the problem right away. Test with smaller LLMs first. Many times, they're good enough. You'll save money and reduce the environmental impact. Use Batch APIs too. They’re cheaper, and easier on the infrastructure. Your CFO and the planet will thank you.

April 11, 2025

Authors

Tomi Leppälahti

CTO AI Corner: Using AI to improve data quality? Absolutely

Jätä viesti ja kartoitetaan yhdessä, miten ja missä hyödyntää tekoälyä.