Data Migration Part IV : Data Cleansing

Data quality is determined by 3 key factors: Accuracy, Completeness and Relevancy/Validity. Data Quality is the most important aspect for any enterprise as it enables a business to make more informed decisions, better audience target, effective usage, and better user experience.

As an enterprise, any opportunity to improve on Data quality should be leveraged and Data Migration is one the best time to do a reset and clean the data. So let’s look into what are a few common data quality issues and how we deal with them when using Dynamics 365.

  1. Inconsistent and incomplete Data: inconsistency is a big indicator that there’s a data quality problem. When data has not been entered in the system correctly like missing zip code, province or other information remaining data could be of less value.
  2. Duplicates: In many circumstances, the same records might exist multiple times in a database, Common culprits are Account, Contacts.
  3. Inconsistent Formats: if the source application did not have much validation the data will have more typos of errors, Address is the most common place to find with City and provinces being named differently.
  4. Obsolete Data:  Every day, people move, marry, and change their names. Since Peoples’ information is rarely static, obsolete data is a common challenge that is hard to rectify as well.

Common entities to Look at are :

    • Accounts: Duplicate, Inaccurate
    • Contacts: Duplicate, Inaccurate
    • Address: Inconsistent, incomplete
    • Obsolete data 

Data Cleaning Strategy: we all agree that data quality needs to be optimized and data has to be cleaned, The key question is when to clean the data?

    • Pre Migration?
    • Post Migration?
    • Source?
    • Destination?

Options:

    • Clean Data at source: by cleaning data at the source itself migration team can focus on data migration, it can also reduce the data set. But, this is often hards as the Legacy system is not generally equipped to help identify data quality issues and rectify them in bulk.
    • Clean during transit: doing all data cleanup during transformation in package or in the staging database could be a cumbersome exercise, primarily for deduplication as all the reference to related record needs to be maintained while deduplicating.

My Recommendation:

Data cleaning is a continuous exercise and the cleaning different types of data cleaning are best suited at different stages, like optimizing data is best done at source while merge could be easily handled at the destination. A hybrid approach is often the best.

Data Quality optimization, Hybrid approach for continuous optimization.

A hybrid strategy is great as it helps segregate the responsibility of data cleanup and well as assisting the best use of technical resources to optimize the process and gain efficiencies.  here are some of the reasons why I suggest so.

  • Identifying the data issues and enriching the data or correcting the data is best done by the business SME or application users. engaging them and empowering them to do this at the source reduces the complexity of data migration packages. this is also the best time to identify and clean any data which was created in error.
  • Populating defaults for missing field data, enforcing format and consistency are best done during transformation as a bulk rule could be applied easily.
  • During Migration or post-migration is the best time to identify duplicates, as your duplicates could be caused or get amplified by data collision between different sources.
  • Merging duplicates, as duplicates are often User data(Accounts/COntacts). merging them is far easier in Dynamics 365, Thanks to duplicate detention tool and Merge utility.
  • For Bulk Deduplication and merge, you may use below tools or options by Kingswaysoft :
    • KingswaySoft Duplicate Detector
    • KingswaySoft Address Verification
    • KingswaySoft Diff Detector
Duplicate detection and merge Package sample flow using Kingswaysoft and SSIS

Poor Data quality impacts a solution /project greatly, it hinders user adaptation, causes management/user overhead and leads to incorrect analytics. Cleaning data is important and Data Migration is one of the best times to optimize it. It is a time-intensive task, but it could be greatly simplified by the use of tools and technology.  

Let me know your thoughts on the data cleanup approach for data migration and what has worked for you.

Leave a Reply

Your email address will not be published. Required fields are marked *