Start trefwoorden te typen om de site te doorzoeken. Druk enter om te verzenden.
Generative AI
Cloud
Testing
Artificial intelligence
Security
With data profiling you look at the source data and use those aspects of the data for testing purposes. As a BI tester or a tester in a data migration project you will have as your testbase a source-to-target mapping (STM) and you will have access to the source databases, files, etc. With the use of data profiling you can start testing and building up your testcases before the datawarehouse (DWH) or datamigration is done.
The STM will hold information about which source field will go to what target field in the new DWH and whether any transformations are done or if it’s a direct mapping. It will also hold information about the target field type, length, null values, etc.. Data profiling in this case is a form of static testing where a (BI) tester can assess the quality of the STM by checking if the source data matches with what is expected in the STM.
There are three major benefits a tester can have from data profiling:
These can be made clear with an example.
Figure 1 contains an example STM. Data profiling is “looking at the source data and collecting statistics”, these statistics can be such things as maximum length, missing values, character sets, uniqueness, etc. From this example a BI tester can gather the following points of interest:
After doing data profiling you might have the following findings: in the source the last name is sometimes missing, the gender field also has values “Man” and “Woman” and the customer ID sometimes ends with a letter. This is important information for the data modeler so these issues can be fixed early in the project instead of finding out the ETL fails after it’s all built!For Email address there is a direct mapping so in theory all data should be good, but with the value pattern check you could find that there are addresses that are missing the ‘@’ or have written out ‘[at]’ instead or maybe have space in them. This means they cannot be used for automatic email distribution as they are not valid email addresses. This lack of quality should be addressed with the business or product owner. See also building block: Data Quality.
A few items have already been mentioned in the previous part and there is much more to be found online, but some of the more common things to look for are:
Data profiling can be done with SQL, but the use of (data cleaning) tools is more common. This building block only discusses data profiling on a single column at a time. Another approach is to also look at relationships between columns in a table or between columns from different tables. For a (BI) tester however this sort of advanced data profiling is not needed. Understanding the possibilities and being able to do single column data profiling will already provide a lot of insight into the data and data quality.
Building Blocks