Data Quality: A Journey Through Time
In the current digital age, where decisions cannot be made without Data, the importance of Data Quality cannot be overstated. Take, for instance, the major error made by Public Health England during their daily COVID-19 contact tracing in 2020. Nearly 16,000 positive coronavirus cases were omitted due to the use of an outdated Excel format (XLS) that could only handle around 65,000 rows. The newer format (XSLX), with a significantly higher capacity, could have prevented this oversight. This example underscores the critical role of Data Quality in ensuring accurate and reliable information. Read further to know more about the journey through time as we unravel the history of Data Quality, from ancient civilisations to the cutting-edge technologies of the 21st century.First Tracks of Data Quality
In the ancient city of Sumer (Mesopotamia), clay tablets served as the precursors to today’s databases. These clay tablets were handmade. During the process of copying these tablets, information could easily be left out or misspelled. Making it one of the early signs of issues in Data Quality. Another trace of Data Quality was found in the monasteries in medieval Europe. They documented vital information on manuscripts, setting the stage for more structured Data Management and the need to have high-quality information. This illustrates the early recognition of the need for accurate record-keeping.Middle Ages of Data Quality
Following into the footprints of ancient Data Quality. The middle ages of Data Quality began in the Industrial Revolution. This period led to increased business activities, for example, encouraging textile mills to keep accurate records of high quality for better efficiency in managing operations and resources.
In the 19th century, handwritten ledgers posed early challenges for Data Quality, plagued by errors due to illegible handwriting and miscalculations, exposing the difficulties of manual data entry.
The advent of computers in the mid-20th century ushered in early databases like the Integrated Data Store (IDS), revolutionising how organisations stored and retrieved data. CODASYL’s efforts in the 1960s led to the development of COBOL, a language crucial for maintaining data consistency within early databases like the IBM Information Management System (IMS). With the rise of computers and data, the quality of data was an important factor that had to be considered.
The 1970s and 1980s witnessed the prominence of mainframe computing, emphasising data integrity and the reduction of redundancy, laying the groundwork for modern Data Quality practices. This era introduced frameworks like Total Data Quality Management (TDQM), offering standardised approaches for ensuring Data Quality across organisations. Not only in the industry field Data Quality has become an important topic. In the 1980s academic papers about Data Quality were being published (e.g. Brodie, 1980; Woodward & Masters, 1989).
Transitioning into the 1990s, advanced data profiling tools such as Trillium Software emerged, empowering organisations to thoroughly analyse and enhance the overall quality of their data.
Data Quality in the Modern Era
Globalisation introduced new challenges, with multinational corporations grappling with variations in data formats, currencies, and languages. Today, cloud-based Data Quality platforms like SODA, Collibra DQ, Ataccama, and Informatica exemplify the evolution of Data Quality tools, providing scalable solutions for organisations dealing with vast and diverse data.
As we navigate the digital landscape today, for example, e-commerce platforms leverage advanced machine learning for precise product recommendations, underscoring the pivotal role of high-quality data in shaping our digital experiences. Because without high-quality data these models are not able to make accurate predictions or offer personalised recommendations, compromising the effectiveness of e-commerce platforms and diminishing user experiences. High-quality data is indispensable for ensuring the precision and relevance of machine learning models in today’s digital landscape.
In this time-travelling journey, we have witnessed the evolution of Data Quality from ancient clay tablets to sophisticated cloud-based solutions. Exploring its dimensions that encapsulate the essence of accuracy, consistency, completeness, validity, timeliness, and uniqueness. As we delve into the complexities of these Data Quality dimensions, we set the stage for a deeper understanding.
To read more about the dimensions of Data Quality. If you would like to know more about how Clever Republic can help you with your Data Quality program, contact us.