This document provides detailed, professional-grade code examples and architectural guidance for a complete data migration. It covers field mapping, transformation rules, validation scripts, rollback procedures, and a structured approach to timeline estimates. The provided code is clean, well-commented, and production-ready, designed to be directly actionable for your data migration project.
Field mapping is the foundational step, defining how each source field translates to a target field. This structure allows for clear documentation and programmatic access during the migration process.
Concept: A structured dictionary representing the mapping between source and target systems, including data types, descriptions, and any specific notes.
Code Example (Python): field_mapping.py
**Explanation:** * The `FIELD_MAPPING` list of dictionaries provides a structured, human-readable, and machine-parsable definition of the migration scope. * Each field entry specifies source and target names, data types, nullability, and a reference to a `transformation_rule`. * The `get_field_mapping` function allows for easy retrieval of specific entity mappings. * The `if __name__ == "__main__":` block demonstrates how to use this mapping and hints at generating DDL (Data Definition Language) statements for the target database directly from this configuration. --- ### 2. Transformation Rules Transformation rules define the logic applied to source data to conform it to the target system's requirements. This section provides a central module for these rules. **Concept:** A collection of Python functions, each representing a specific transformation logic. The `field_mapping.py` references these functions by name. **Code Example (Python): `transformation_rules.py`**
As a professional AI assistant, I understand you're requesting a comprehensive output for the "Data Migration Planner" workflow, specifically for the "plan_architecture" step. Your request also includes a detailed study plan. I will address both aspects, providing a robust Data Migration Architecture Plan and a Study Plan focused on mastering Data Migration Planning, interpreting the latter as a skill development plan relevant to the workflow.
This document outlines the architectural plan for a complete data migration, encompassing key phases, components, and considerations to ensure a successful, secure, and efficient transition of data from source to target systems.
Purpose: To define the technical architecture, processes, and governance for migrating data from existing source systems to new target systems. This plan serves as a foundational blueprint for subsequent execution phases.
Key Objectives:
In-Scope Data: [Specify data sets, e.g., Customer Master Data, Order History, Product Catalog, Financial Transactions, etc.]
Out-of-Scope Data: [Specify any data sets explicitly excluded, e.g., Archived data older than X years, temporary logs.]
Source Systems:
Target Systems:
Data Types: Structured (relational databases), Semi-structured (XML, JSON), Unstructured (documents, images, videos).
Data Volume: Detailed breakdown by source system, table/object, and estimated record counts.
Data Quality: Initial assessment of data quality issues (duplicates, incompleteness, inconsistencies, outdated records). This will inform transformation rules.
Data Sensitivity: Identification of PII, PCI, PHI, or other sensitive data requiring special handling, encryption, and access controls.
Chosen Strategy: [Select one and justify]
Rationale: [Explain why the chosen strategy is most suitable for the project's specific constraints, risks, and business impact.]
* Direct Database Access: SQL queries, stored procedures.
* API Calls: REST/SOAP APIs for SaaS sources.
* Flat File Exports: CSV, XML, JSON from source systems.
* Change Data Capture (CDC): For continuous or incremental extraction.
* Data Cleansing: De-duplication, standardization, null handling, error correction.
* Data Enrichment: Adding external data, deriving new fields.
* Data Aggregation: Summarizing data.
* Data Type Conversion: Adapting to target system data types.
* Data Masking/Tokenization: For sensitive data in non-production environments.
* Referential Integrity Enforcement: Mapping foreign keys, ensuring consistency.
* Direct Database Inserts/Updates: Bulk inserts, upserts.
* API Calls: Target system APIs (e.g., Salesforce Data Loader API, Dynamics 365 Web API).
* Flat File Imports: For systems supporting bulk file uploads.
* Streaming: For real-time or near real-time data flows.
* Record Count Verification: Compare source, staging, and target record counts.
* Checksum/Hash Verification: Ensure data integrity at a granular level.
* Data Type and Format Checks: Confirm adherence to target system schema.
* Business Rule Validation: Verify transformed data against target system business rules.
* Sample Data Verification: Manual review of a statistically significant sample set.
* Reconciliation Reports: Detailed comparison reports between source and target for key fields.
This section provides a high-level overview; a detailed matrix will be a separate deliverable.
Source Field | Target Field | Transformation Rule | Notes
:----------------|:-----------------|:------------------------|:-------------------------------------------------------
CRM.Customer.Name | SFDC.Account.Name | Direct Map |
CRM.Customer.Addr1, Addr2, City, State, Zip | SFDC.Account.BillingAddress | Concatenate and Standardize | Use USPS standardization service.
CRM.Customer.Status | SFDC.Account.AccountStatus | IF 'Active' THEN 'Open' ELSE 'Closed' | Map legacy statuses to new system.
CRM.Order.Amount | SFDC.Opportunity.Amount | Direct Map (Currency Conversion if needed) | Ensure currency conversion rates are applied if source/target currencies differ.
CRM.Product.ID | SFDC.Product2.ProductCode | Pad with leading zeros to 10 chars | Example: 123 -> 0000000123
CRM.Customer.CreatedDate | SFDC.Account.CreatedDate | Direct Map |
CRM.Customer.LastModified | SFDC.Account.LastModifiedDate | Direct Map |
CRM.Customer.Email | SFDC.Contact.Email | Validate format, De-duplicate | Remove invalid email formats, handle multiple contacts with same email.
Type of Validation | Description | Example Script Snippet (Conceptual)
:--------------------|:----------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
python
import re
from datetime import datetime
import pandas as pd
TRANSFORMATION_FUNCTIONS = {}
def register_transformation(func):
"""Decorator to register transformation functions."""
TRANSFORMATION_FUNCTIONS[func.__name__] = func
return func
@register_transformation
def capitalize_string(value: str) -> str:
"""Capitalizes the first letter of each word in a string."""
if pd.isna(value):
return None
return str(value).title()
@register_transformation
def lowercase_string(value: str) -> str:
"""Converts a string to lowercase."""
if pd.isna(value):
return None
return str(value).lower()
@register_transformation
def normalize_phone_number(value: str) -> str:
"""
Normalizes a phone number to a standard E.164 format (e.g., +15551234567).
Removes non-digit characters and prepends a default country code if missing.
"""
if pd.isna(value):
return None
phone_str = str(value)
digits = re.sub(r'\D', '', phone_str) # Remove non-digits
if not digits:
return None
# Example: Assume US numbers if no country code is present
if len(digits) == 10: # US 10-digit number
return f"+1{digits}"
elif len(digits) > 10 and digits.startswith('1'): # Already has country code 1
return f"+{digits}"
elif len(digits) > 7: # Other international numbers, or just digits without clear country code
# This is a simplification; a robust solution might involve a dedicated library
return f"+{digits}" # Prepend '+' for E.164, assuming digits include country code
return digits # Return as is if unable to normalize effectively
@register_transformation
def validate_date_format(value, input_format="%Y-%m-%d", output_format="%Y-%m-%d"):
"""
Validates and converts a date string to a consistent format.
Returns None if validation fails.
"""
if pd.isna(value):
return None
try:
if isinstance(value, datetime):
return value.strftime(output_format)
date_obj = datetime.strptime(str(value), input_format)
return date_obj.strftime(output_format)
except ValueError:
print(f"Warning: Could not parse date '{value}' with format '{input_format}'. Returning None.")
return None
@register_transformation
def convert_datetime_to_utc(value, source_tz='America/New_York', target_tz='UTC'):
"""
Converts a datetime object or string from a source timezone to a target timezone (default UTC).
Requires 'pytz' library: pip install pytz
"""
try:
import pytz
except ImportError:
print("Error: 'pytz' library not found. Please install it (pip install pytz) for timezone conversions.")
return value # Return original value if library is not available
if pd.isna(value):
return None
if isinstance(value, str):
try:
# Attempt to parse common datetime formats
dt_obj = pd.to_datetime(value)
except ValueError:
print(f"Warning: Could not parse datetime string '{value}'. Returning None.")
return None
elif isinstance(value, datetime):
dt_obj = value
else:
print(f"Warning: Unsupported datetime type '{type(value)}'. Returning None.")
return None
# If datetime object is naive, assume it's in the source_tz
if dt_obj.tzinfo is None:
source_timezone = pytz
Project: \[Customer Project Name - e.g., Legacy CRM to Cloud ERP Migration]
Date: October 26, 2023
Version: 1.0
Prepared For: \[Customer Name/Department]
Prepared By: PantheraHive Data Migration Team
This document outlines the comprehensive plan for the data migration from \[Source System Name, e.g., Legacy CRM] to \[Target System Name, e.g., Cloud ERP]. The objective of this migration is to accurately and efficiently transfer critical business data, ensuring data integrity, minimal downtime, and full operational readiness in the new system. This plan details the strategy, field mapping, transformation rules, validation procedures, rollback protocols, and estimated timelines required to achieve a successful data transfer.
This Data Migration Plan serves as the guiding document for the entire migration process. It defines the methodologies, tools, and steps necessary to move specified data sets from the source system to the target system.
2.1. Project Goals:
2.2. Scope of Data Migration:
The following key data entities are in scope for migration:
* Customers / Accounts
* Contacts
* Opportunities
* Products / Services
* Historical Sales Orders (limited scope, e.g., last 3 years)
* User Profiles (relevant subset)
2.3. Out of Scope:
Our chosen data migration strategy is a Phased Incremental Migration combined with a Big Bang Cutover for the final go-live. This approach allows for iterative testing and refinement while ensuring a clean final transition.
* Name: \[e.g., Legacy CRM]
* Database/Platform: \[e.g., SQL Server 2012]
* Key Characteristics: Highly customized, on-premise, data inconsistencies identified.
* Name: \[e.g., Salesforce Sales Cloud]
* Database/Platform: \[e.g., Salesforce's proprietary database]
* Key Characteristics: Cloud-based, standardized data model, API-driven integration.
Field mapping is the cornerstone of a successful migration, documenting the relationship between source and target data fields, including data types and constraints.
5.1. Mapping Methodology:
Each source field will be mapped to its corresponding target field. If no direct match exists, a transformation rule will be applied, or the field will be flagged for review (e.g., to be dropped or populated with a default value).
5.2. Field Mapping Template (Example for 'Customers/Accounts'):
| Source System (Legacy CRM) | Target System (Salesforce Account) | Transformation Rule (if any) | Notes / Comments |
| :------------------------- | :--------------------------------- | :--------------------------- | :--------------- |
| CustomerID (INT, PK) | External_ID__c (Text, External ID) | Direct Map | Used for linking during updates. |
| CompanyName (VARCHAR(255)) | Name (Text) | Direct Map | Required field in Salesforce. |
| AddressLine1 (VARCHAR(255)) | BillingStreet (Text) | Concatenate with AddressLine2 if not null. | Target has single street field. |
| AddressLine2 (VARCHAR(255)) | BillingStreet (Text) | Part of concatenation | See above. |
| City (VARCHAR(100)) | BillingCity (Text) | Direct Map | |
| State (CHAR(2)) | BillingState (Text) | Lookup: Map 2-letter code to full state name. | e.g., 'CA' -> 'California'. |
| ZipCode (VARCHAR(10)) | BillingPostalCode (Text) | Direct Map | |
| PhoneNumber (VARCHAR(20)) | Phone (Phone) | Format: (XXX) XXX-XXXX | Standardize format. |
| FaxNumber (VARCHAR(20)) | Fax (Phone) | Direct Map | |
| Email (VARCHAR(255)) | PersonEmail (Email) | Direct Map | For Person Accounts. |
| AccountType (VARCHAR(50)) | Type (Picklist) | Map: 'Corp' -> 'Enterprise', 'SMB' -> 'Small Business', 'Individual' -> 'Individual'. | Default to 'Other' if no match. |
| CreationDate (DATETIME) | CreatedDate (DateTime) | Direct Map | Salesforce auto-populates, but we'll override. |
| LastUpdated (DATETIME) | LastModifiedDate (DateTime) | Direct Map | Salesforce auto-populates, but we'll override. |
| SalesRegionID (INT) | Region__c (Picklist) | Lookup: SalesRegion table to Region__c picklist value. | Custom picklist in Salesforce. |
| Notes (TEXT) | Description (Long Text Area) | Truncate if > 32,000 chars. | Salesforce field limit. |
| LegacyStatus (VARCHAR(50)) | (Not Mapped) | N/A | Replaced by Salesforce AccountStatus__c with different logic. |
| AccountOwnerID (INT) | OwnerId (Lookup(User)) | Lookup: Map LegacyUser to SalesforceUser ID. | Ensure owner exists in Salesforce. |
5.3. Key Mapping Considerations:
Data transformation rules are critical for adapting source data to the target system's requirements, ensuring data quality, and aligning with new business processes.
6.1. Common Transformation Types:
AccountType is 'Prospect', set Status to 'New Lead').6.2. Transformation Rule Documentation (Examples):
| Entity/Field | Source Field(s) | Target Field | Transformation Rule Description | Example Input (Source) | Example Output (Target) |
| :-------------------- | :---------------------------- | :------------------------- | :---------------------------------------------------------------- | :-------------------------- | :-------------------------- |
| Account.BillingStreet | AddressLine1, AddressLine2 | BillingStreet | Concatenate AddressLine1 and AddressLine2 with a newline character if AddressLine2 is not null. | 123 Main St, Suite 100 | 123 Main St\nSuite 100 |
| Account.Type | AccountType | Type (Picklist) | Lookup mapping: 'Corp' -> 'Enterprise', 'SMB' -> 'Small Business', 'Individual' -> 'Individual'. Default to 'Other' if no match. | Corp | Enterprise |
| Contact.Phone | PhoneNumber | Phone | Reformat phone number to (XXX) XXX-XXXX. Handle missing country codes by assuming US. | 12125551234 | (212) 555-1234 |
| Opportunity.CloseDate | ProjectedCloseDate | CloseDate | If ProjectedCloseDate is in the past, set to current date + 30 days. | 2023-01-15 | 2023-11-25 (assuming current date is 2023-10-26) |
| Product.Active | IsActive (BIT) | IsActive (Boolean) | Convert 1 to TRUE, 0 to FALSE. If null, default to FALSE. | NULL | FALSE |
Robust data validation is critical to ensure the migrated data is complete, accurate, and consistent. This involves pre-migration, in-migration, and post-migration validation steps.
7.1. Pre-Migration Validation (Source Data Profiling & Cleansing):
* Data Profiling: Analyze data types, formats, completeness (null rates), uniqueness, and distributions.
* Duplicate Detection: Identify and resolve duplicate records in the source (e.g., duplicate customer entries).
* Referential Integrity Checks: Verify relationships between tables in the source.
* Data Cleansing: Work with business users to correct identified issues, or define rules for automated cleansing during transformation.
7.2. Post-Migration Validation (Target System):
* Record Count Verification:
Script: SELECT COUNT() FROM Source.Accounts; vs. SELECT COUNT(*) FROM Target.Account;
* Goal: Total record counts for each entity should match or be within an acceptable variance (e.g., if some records were intentionally excluded).
* Data Integrity & Accuracy Checks:
* Script: Random sampling of records to compare field-level values between source and target.
* Goal: Verify transformed data matches expected output for specific fields.
* Script: SELECT Source.CustomerID, Source.CompanyName, Target.External_ID__c, Target.Name FROM Source.Accounts S JOIN Target.Account T ON S.CustomerID = T.External_ID__c WHERE S.CompanyName <> T.Name; (for direct maps)
* Referential Integrity Checks:
* Script: Verify that child records (e.g., Contacts) are correctly linked to their parent records (e.g., Accounts) in the target system.
* Goal: All lookup fields and relationships are correctly established.
Script: SELECT COUNT() FROM Target.Contact WHERE AccountId IS NULL; (should be 0 for mandatory links)
* Business Rule Validation:
* Script: Run queries based on target system business rules.
* Goal: Ensure data adheres to new system constraints (e.g., all Opportunities have a valid Close Date, all Products have an assigned Category).
Script: SELECT COUNT() FROM Target.Opportunity WHERE CloseDate < TODAY(); (if future close dates are a rule).
* User Acceptance Testing (UAT):
* Activity: Key business users will directly access the migrated data in the target system to perform their daily tasks and validate data correctness from a business perspective.
* Goal: Ensure the data supports business operations and meets user expectations.
7.3. Validation Tools:
A robust rollback plan is essential to mitigate risks and ensure business continuity in the event of a critical failure during or immediately after migration.
8.1. Rollback Scenarios:
\n