Data Quality
The quality of your data directly determines the quality of your decisions — and ultimately, your organisation's success. Discover the framework that separates trusted data from noise.
The challenge
Every organisation holds vast amounts of data — used, with varying degrees of effectiveness, to draw insights and guide decisions. Yet the value of those insights is only as strong as the quality of the data behind them.
The key challenge lies in identifying which data is essential to your organisation's operations. This begins with understanding the decisions that need to be made and the specific data required to support them — then defining minimum quality standards for each dataset.
As Butterfly Data has observed through extensive work in this area, poor data inevitably leads to poor decision-making. But the process of assessing data quality is itself valuable: it reveals where gaps exist, which formats work best, and what users truly need.
Poor data inevitably leads to poor decision-making. High-quality data is essential — but the process of assessing data quality across the organisation is equally valuable.
Butterfly Data · What Does Good Data Look Like? 2025The DAMA Framework
The Data Management Association (DAMA) — now the UK government's recommended best practice — defines six dimensions for assessing data quality. Together they provide a comprehensive picture of what good data looks like.
No fields are illegitimately missing across a dataset. Missing data can appear as truly empty fields or as placeholders such as null, N/A, or 0. Some blank fields are legitimate — a conditionally mandatory field only needs a value when another field triggers it.
If 'UK Born?' is 'N', then 'Country of Birth' must be populated. If 'UK Born?' is 'Y', a blank country field is legitimate — not a quality failure.
Values are reasonable and conform to defined rules for that field — covering length, format, permitted characters, and allowed values. Rules can be externally defined (e.g. UK postcodes) or set internally by your organisation (e.g. department codes).
A UK phone number must be exactly 11 digits, start with 0, and contain only numeric characters.
Similarly, multiple date formats (21/12/1995, May 12 2005, 12-4-01) in one column signals a validity failure.
Data aligns with other records within the same dataset or across different datasets. An address must correspond to its postcode area. A person's stated birthplace must not contradict their country of origin recorded elsewhere.
A record showing 'UK Born?: Y' and 'Country of Birth: Germany' fails on consistency — Germany is not part of the UK, so the fields contradict each other.
The data reflects reality — the most important and most difficult dimension to assess. Accuracy can sometimes be verified through common sense checks, or by comparing against authoritative external sources such as Companies House or banking records.
An adult patient's weight recorded as 50g is clearly inaccurate.
Information is available when needed. Critical operational datasets may require real-time feeds; analytical datasets may only need annual refreshes. Data quality deteriorates as circumstances change — stale data can be as damaging as inaccurate data.
A housing association allocating properties needs live data updates. Relying on ten-year-old census data to estimate current populations leads to poor planning decisions.
No record is duplicated in a way that introduces conflicting information. Duplication goes beyond wasted storage — it creates unreliable, contradictory records. Unique identifiers are invaluable for detection; composite keys handle legitimate historical duplicates.
A National Insurance number appearing twice with conflicting data likely means one record is outdated. An analyst must capture date metadata to determine which entry is reliable.
Who's responsible?
Just as everyone is responsible for data security, everyone is responsible for data quality. If you work with data, you are responsible for assessing whether it is fit for purpose — and there must be organisational processes to address issues when they arise.
In an ideal world, data is validated at collection, on load, and monitored continuously. In practice, a tiered approach works best:
Good vs poor quality
Practical steps
Replace free-text address fields with standardised dropdown or postcode-lookup inputs to eliminate format inconsistencies at the point of entry.
A calendar selector removes the ambiguity of free-text date fields entirely — no more confusion between dd/mm/yyyy and mm/dd/yyyy formats.
Disable form submission until all required fields are complete, and provide clear inline validation guidance when validity criteria aren't met.
Establish an organisation-wide standard for empty or unknown values — differentiating between numeric and text fields — so missing data is always obvious and consistent.
Large datasets require automated scripts or COTS tools managed by data quality experts. Start by identifying poor quality, then apply comprehensive remediation to bring data up to standard.
The MoSCoW method divides data needs into four categories: Must haves, Should haves, Could haves, and Won't haves. This allows critical datasets to be prioritised for assessment, while providing a hierarchy for less-essential data.
The most challenging aspect is agreeing on shared data standards. Best practice is to define organisation-wide baselines — covering formats for dates, addresses, country codes, and identifiers — while allowing teams to use more granular data where necessary.
The Data Management Association is the UK government's recommended best practice for data quality assessment and management. Their guidance provides a flexible, comprehensive framework for evaluating data quality across different datasets and contexts.
Whether you choose Commercial Off-The-Shelf (COTS) tools or a custom-built solution, implementation varies by organisational size and in-house technical capabilities. COTS tools reduce development time; bespoke solutions offer greater flexibility. Butterfly Data can help you evaluate both options.
Ready to act?
Book a free discovery call with our experts. We'll help you understand where your data stands today and build a practical roadmap for improvement.