Data Migration Validation Tool
During data warehouse migration from one platform to another, it is crucial to ensure zero data loss and maintain visibility into the migration process. How can you guarantee data integrity throughout the migration and gain insights into the project's progress?
Who needs Data Migration Validation Tool?
Enterprises seeking to transfer elements of their data warehouse to different platforms, or redesign existing data platforms on-site.
How it works?
Our solution leverages the open-source (and Google-backed) Data-Validation-Tool (DVT). With minimal information, we can integrate this tool into your infrastructure to automate complex validation tasks, allowing you to focus solely on evaluating the generated reports.
- Specification. For the solution to function properly, some data from data-owners needs to be specified: Entities within the data to be evaluated; Keys to identify rows to be used for in-depth validation; Type-casting configurations; Contextual information for reporting; Acceptable margins of errors.
- Automatic configuration Based on the above specification all the required technical configuration will be automatically generated including intelligent partitioning for larger datasets.
- Automatic runs The infrastructure is designed to ensure that only essential jobs are executed, triggered by CI/CD pipelines or scheduled to run periodically. It scales automatically, monitoring the load on the source and target data systems, to handle workload fluctuations efficiently.
- Intermediate validation data All outputs will be stored persistently in a BigQuery dataset. The reports will utilize this data as their foundation. However, retaining the raw data proves beneficial for future reference by data engineers.
- Reports A well-organized Google Sheets document consolidates all the data into different tabs, each offering varying levels of details—from high-level progress gauges to granular column-level figures. Access to this sensitive information is restricted only to authorized users, as per Google IAM guidelines.
- Feedback Enhance collaboration by using this Google Sheet report to share various types of feedback (notes, context, error margins, exclusions) with colleagues and other stakeholders
Single Source of Truth (SSOT) for Just Eat TakeAway.com's Data & Analytics ecosystem.
JustEat Takeaway shifted their data warehouse from AWS Redshift to Google BigQuery
Key Benefits
Our service provides comprehensive insights into the progress of your data migration project, catering to both high-level management reporting and detailed technical information for data engineers. As your data engineers work on migrating and (re-)designing data infrastructure, our service can provide the following benefits in the meantime:
Track migration progress
Continuous reporting
Multi level quality monitoring
Designed with data engineers in mind
Scalable to large datasets
For large datasets, extensive validations can be scaled out either through serverless solutions like Cloud Functions and Cloud Run or by employing autoscaling Kubernetes solutions
Variable quality standard
Data quality contracts can be specified for each entity at different levels. Prioritize your most valuable data.