Skip to content

Data Migration Validation Tool

Migrate with confidence

Data Migration Validation Tool

During data warehouse migration from one platform to another, it is crucial to ensure zero data loss and maintain visibility into the migration process. How can you guarantee data integrity throughout the migration and gain insights into the project's progress?

Who needs Data Migration Validation Tool?

Enterprises seeking to transfer elements of their data warehouse to different platforms, or redesign existing data platforms on-site.

How it works?

Our solution leverages the open-source (and Google-backed) Data-Validation-Tool (DVT). With minimal information, we can integrate this tool into your infrastructure to automate complex validation tasks, allowing you to focus solely on evaluating the generated reports.

  • Specification. For the solution to function properly, some data from data-owners needs to be specified: Entities within the data to be evaluated; Keys to identify rows to be used for in-depth validation; Type-casting configurations; Contextual information for reporting; Acceptable margins of errors.
  • Automatic configuration Based on the above specification all the required technical configuration will be automatically generated including intelligent partitioning for larger datasets.
  • Automatic runs The infrastructure is designed to ensure that only essential jobs are executed, triggered by CI/CD pipelines or scheduled to run periodically. It scales automatically, monitoring the load on the source and target data systems, to handle workload fluctuations efficiently.
  • Intermediate validation data All outputs will be stored persistently in a BigQuery dataset. The reports will utilize this data as their foundation. However, retaining the raw data proves beneficial for future reference by data engineers.
  • Reports  A well-organized Google Sheets document consolidates all the data into different tabs, each offering varying levels of details—from high-level progress gauges to granular column-level figures. Access to this sensitive information is restricted only to authorized users, as per Google IAM guidelines.
  • Feedback Enhance collaboration by using this Google Sheet report to share various types of feedback (notes, context, error margins, exclusions) with colleagues and other stakeholders
Customer Story

Single Source of Truth (SSOT) for Just Eat's Data & Analytics ecosystem.

JustEat Takeaway shifted their data warehouse from AWS Redshift to Google BigQuery

Key Benefits

Our service provides comprehensive insights into the progress of your data migration project, catering to both high-level management reporting and detailed technical information for data engineers. As your data engineers work on migrating and (re-)designing data infrastructure, our service can provide the following benefits in the meantime:

Track migration progress

Once the data quality in the target platform matches your quality standards, the data can be deemed migrated. As the quality is assessed for each entity, these numbers provide valuable insights into the progress of the migration project.

Continuous reporting

Automated updates keep your data quality reports current. Triggered by CI/CD pipelines or as needed, these reports provide accurate and timely information, eliminating the risk of relying on outdated data.

Multi level quality monitoring

Data quality checks can be customized to cover various levels of scrutiny, from rapid high-level assessments such as schema and row count checks to detailed cell-level equality validations.

Designed with data engineers in mind

Data engineers can delve deeper into potential issues and obtain comprehensive information related to failed validations. Clues are provided to assist in pinpointing the areas that require further investigation. To facilitate this process, a dataset will be made available, featuring examples of data mismatch.

Scalable to large datasets

For large datasets, extensive validations can be scaled out either through serverless solutions like Cloud Functions and Cloud Run or by employing autoscaling Kubernetes solutions

Variable quality standard

Data quality contracts can be specified for each entity at different levels. Prioritize your most valuable data.