Data is the new oil of the digital economy. It is crucial to the everyday operations of all enterprises. However, data can often be of very low quality. If used for business operations, such poor data will produce results that alienate customers and clients, resulting in huge maintenance expenses and reduced business efficiency, effectiveness, and profitability.
The only way to address data quality is to adhere to the dimensions of data quality. Each dimension represents a specific characteristic or data attribute to be relevant to the users. A data quality dimension is a characteristic for classifying information and data requirements. It offers a way of measuring and managing data quality and information.
Within their data quality model in ISO/IEC 25012:2008(E) PA, the International Organization for Standardization defines a rather technical set of data quality characteristics as part of standards for software quality, distinguishing between two overarching data quality categories: inherent and system-dependent data quality.
In this case, inherent characteristics have an inherent ability to meet the requirements. In contrast, system-dependent characteristics depend on the IT system used to manage the data. This post presents nine frequently used data quality dimensions, which may apply to different data levels.
1. Accessibility or coverage
It is the measurement of the availability of required data records. This dimension depicts the extent to which the data are available and with which ease the user can access them. The breadth, depth, and availability of data that exists but is not available from a data provider is coverage. The ISO standard ISO/IEC 25012:2008(E) assesses data accessibility by considering the data’s intended use and the absence of barriers, thereby promoting data accessibility for people with disabilities. For measuring accessibility, no quantitative measure is recommended; instead, this dimension should be assessed qualitatively or by a grade.
2. Accuracy
It is a metric for determining the veracity of data concerning its authoritative source. The precision of data is measured by accuracy. It can be validated against defined business rules and measured against original documents or authoritative sources. Accuracy is a term used to describe the degree of agreement between data and real-world objects; it refers to how error-free and reliable the data are and how closely the data map (or come close to) the true values of the items. Divide the number of accurate items or records by the total number of items or records to calculate accuracy. A local population register, for example, contains 932,904 phone numbers, of which 813,942 have been confirmed, resulting in an accuracy of 87.25 percent (813,942/932,904 * 100).
3. Completeness
One of the most commonly used data quality dimensions is completeness, which measures the data’s presence. In the population of data records, completeness measures the presence of required data attributes. The dimension represents the extent to which all expected and required data (records, attributes) are present or not. It also shows the degree of resolution required for the intended use. When you divide the available items or records by the expected total number, you get a percentage, which you can multiply by 100 to get a percentage. For example, a local population register contains 932,904 people, but only 930,611 have a birth date. This yields a completion rate of 99.75 percent (930,611/932,904 * 100).
4. Consistency
The measurement of compliance with required formats, values, or definitions is known as consistency. It ensures that one population’s data values, formats, and definitions are consistent with those of another. Consistency, also known as consistent representation, refers to how data are free of internal contradictions, follow a set of rules, and are presented in the same format as previous data. The proportion of items or records found to be consistent can represent consistency. For example, we assume that in a population register, the date of birth should be stored in the “YYYY-MM-DD” (year-month-day) format. Date of birth was stored inverted as “DD-MM-YYYYY” in 61,196 of 930,611 total instances, resulting in a 93.42 percent ((930,611 61,196)/930,611 * 100) consistency.
5. Currency or Currentness
The degree to which the data are sufficiently or reasonably up to date for the intended task is currency or currentness. Data quality can be evaluated qualitatively. A dataset of bird observations from the summer of 1969, for example, is insufficient to forecast bird populations in 2022. Otherwise, the percentage of current records in a population register can be calculated by dividing the number of recently validated entries (764,111) by the total population (932,904), yielding 81.91 percent (764,111/932,904 * 100).
6. Relevancy or Relevance
Relevancy or Relevance as a dimension maps the degree to which the data meets the expectations and requirements of the user [43,45]. Like currency, relevancy may be evaluated qualitatively at the discretion and concerning the user’s requirements, e.g., using a scorecard.
7. Reliability
In some cases, reliability is used interchangeably with accuracy; however, others define the dimension as the degree to which an initial data value matches a subsequent data value. As a result, the quantification is identical.
8. Timeliness
Timeliness is another term that is frequently used to describe how the data’s age is suitable for the intended use or the time difference between a real-time event and the time it takes to capture or verify the data. Timeliness assesses how accurately content reflects the current market and business conditions and whether data is functionally available when needed. Timeliness can be measured in duration or the time between data collection and entry. The timeliness of data is nine days if employees of a population register enter addresses into a database collected nine days before.
9. Validity
Validity of data is defined as the degree to which the data agrees with established rules, and it is quantified in the same way as to accuracy. It assesses how well data conforms to internal, external, and industry-wide standards.
10. Uniqueness
The degree to which no record or attribute is recorded more than once is uniqueness. It alludes to the uniqueness of records and attributes. The goal is to create a single (unique) data recording.