Start trefwoorden te typen om de site te doorzoeken. Druk enter om te verzenden.
Generative AI
Cloud
Testing
Artificial intelligence
Security
A big part of testing in a data driven project such as for BI testing is testing the quality of the data. The data quality needs to measured at several steps in the data landscape; at the source, but also after the migration/transformation (ETL-process). See building block: BI Testing.
Quality can be defined as: Quality is the totality of features and characteristics of a product or service that bear on its ability to satisfy stated or implied needs. (Source: TMAP Quality for DevOps teams)
The TMAP method described several quality characteristics to help measure the quality of software products, but these are not well suited for data driven environments. For testing specifically data quality the UK chapter of the DAMA (Data Management Association) wrote a paper describing new measures for data quality. (Source: The Six Primary Dimensions for Data Quality. DAMA UK, October 2013).
Six primary dimensions of data quality These are the six primary dimensions of data quality:
These dimensions are primarily designed to be used to check the quality of the data at the target point, but most of them can already be used to look at the quality of the source data (see also building block: Data Profiling), the only exception being ‘timeliness’. How to calculate these dimensions is further explained in the next section.
The listed data quality dimensions can be used as KPI’s for the quality of the data, but the data quality assessment needs to be done in a methodical fashion. Not all dimensions have to weighed equally. Some dimensions might be more important in the project than others and maybe some data items need a different weighting from other data items. It’s also important to define the requirements for acceptance. In total it is important to go through the following steps (as advised by DAMA UK):
In step 4 you will need to calculate the score on the data dimension:
* Note that because real-world data is not always available, the accuracy cannot always be tested. Examples to check for are: impossible birthdates, e-mail addresses that don’t contain the @ sign or that have spaces in them and non-existent combinations of postal code and city (postcode/huisnummer check).
Next to the primary data dimensions there are other data dimensions that might be taken into consideration. This list is in no way complete, but it gives an idea on what else to consider:
Building Blocks