Project: Data Migration Planner
Workflow Step: Generate Code for Data Migration Plan
Deliverable Date: October 26, 2023
This document outlines a comprehensive plan for the upcoming data migration, encompassing detailed field mapping, transformation rules, robust validation scripts, clear rollback procedures, and realistic timeline estimates. The goal is to ensure a smooth, accurate, and secure transfer of data from the source system to the target system with minimal downtime and maximum data integrity. This plan provides the foundational blueprint and illustrative code examples to guide the migration process, serving as a critical deliverable for successful execution.
A successful data migration requires meticulous planning and execution. This strategy focuses on minimizing risks, ensuring data accuracy, and providing clear operational procedures. We will adopt a phased approach, including analysis, design, development (including scripting), testing, execution, and post-migration validation.
The following sections detail the core components of our migration strategy, including illustrative, production-ready code examples where applicable. These code snippets are designed to be adaptable and serve as a strong starting point for the actual implementation, requiring specific database connection details, table names, and field names to be customized for your environment.
Field mapping defines the precise relationship between source system fields and target system fields. It's the foundation for understanding how data will flow and where transformations will be required.
Explanation:
The field mapping is represented as a dictionary (or JSON structure) where keys are source field identifiers (e.g., source_table.source_column_name) and values are target field identifiers (e.g., target_table.target_column_name). This document also details the data type mapping and any specific notes or constraints.
Illustrative Code: field_mapping.py
#### 3.2. Transformation Rules Transformation rules define how data is modified during the migration process to fit the target system's schema, data types, and business logic. **Explanation:** A Python module with a collection of functions, each responsible for a specific data transformation. These functions will be called during the ETL (Extract, Transform, Load) process. **Illustrative Code: `transformation_rules.py`**
As a professional AI assistant, I understand the importance of clear, unambiguous instructions. The request "Generate detailed professional output for: Data Migration Planner" aligns with the workflow description. However, the subsequent sentence "Create a detailed study plan with: weekly schedule, learning objectives, recommended resources, milestones, and assessment strategies" appears to be a distinct, unrelated request.
Given that this is Step 1 of 3 for the workflow "Data Migration Planner" and the step name is plan_architecture, I will prioritize generating a comprehensive Data Migration Architecture Plan. This aligns with the overall workflow objective to "Plan a complete data migration with field mapping, transformation rules, validation scripts, rollback procedures, and timeline estimates." I will proceed with the assumption that the request for a "study plan" was an oversight or an extraneous instruction, as it does not fit the context of planning a data migration architecture.
This document outlines the architectural plan for the complete data migration for [Client/Project Name]. The objective is to seamlessly transition critical data from [Source System(s)] to [Target System(s)], ensuring data integrity, minimal downtime, and adherence to business requirements. This plan covers the strategy, scope, technical architecture, data mapping, transformation rules, validation, error handling, rollback procedures, security, performance, and high-level timeline estimates necessary for a successful migration.
The primary goals of this data migration are to:
* [System 1 Name] (e.g., Legacy CRM Database - Oracle 12c)
* [System 2 Name] (e.g., On-premise ERP System - SQL Server 2016)
* [List specific modules/tables/datasets to be migrated, e.g., Customer Master Data, Order History (last 5 years), Product Catalog]
* [System 1 Name] (e.g., Salesforce CRM)
* [System 2 Name] (e.g., SAP S/4HANA)
* [List specific modules/tables/datasets to receive data]
A detailed data inventory will be created, listing all tables, fields, and their characteristics from source systems. Data profiling will be conducted to:
Deliverables: Data Dictionary for Source Systems, Data Profiling Reports.
Phase 1:* [e.g., Static reference data, product catalog]
Phase 2:* [e.g., Customer master data]
Phase 3:* [e.g., Historical orders/transactions]
Phase N:* [Final cutover]
This is the core of the migration, defining how each piece of data moves and changes.
A comprehensive Field Mapping Document will be created for each source-to-target entity relationship. It will include:
Example Mapping (Simplified):
| Source System | Source Table | Source Field | Source Type | Target System | Target Table | Target Field | Target Type | Mapping Type | Transformation Rule ID | Business Rule/Comments
python
import uuid
from datetime import datetime
import hashlib
USER_ID_MAP = {}
def generate_uuid_from_int(old_id: int) -> uuid.UUID:
"""
Generates a consistent UUID based on an integer ID.
This is useful for maintaining referential integrity when old INT IDs become UUIDs.
Note: For production, consider a more robust UUID generation strategy
if collision risk or true randomness is required.
A common approach is to store the generated UUIDs in a lookup table.
"""
if not isinstance(old_id, int):
raise TypeError("Input must be an integer.")
# Create a deterministic UUID from the integer for consistency across runs
namespace_uuid = uuid.UUID('f8000000-0000-4000-8000-000000000000') # A fixed namespace UUID
return uuid.uuid5(namespace_uuid, str(old_id))
def map_user_id_to_customer_uuid(old_user_id: int) -> uuid.UUID:
"""
Looks up the new customer UUID based on the old user ID.
This function assumes USER_ID_MAP has been populated by the user migration script.
"""
if old_user_id not in USER_ID_MAP:
# In a real scenario, this might log an error or raise an exception
# if a user ID is not found, indicating a data integrity issue.
print(f"WARNING: Old user ID {old_user_id} not found in USER_ID_MAP. Generating new UUID.")
new_uuid = generate_uuid_from_int(old_user_id) # Fallback
USER_ID_MAP[old_user_id] = new_uuid
return new_uuid
return USER_ID_MAP[old_user_id]
def standardize_status(source_status: str) -> str:
"""
Transforms source status strings to target ENUM values.
Example: 'Active' -> 'active', 'Deactivated' -> 'inactive'
"""
if not isinstance(source_status, str):
return 'unknown' # Or raise an error
status_map = {
'Active': 'active',
'ACTIVE': 'active',
'active': 'active',
'Inactive': 'inactive',
'INACTIVE': 'inactive',
'deactivated': 'inactive',
'Suspended': 'suspended',
'SUSPENDED': 'suspended',
'pending': 'pending'
}
return status_map.get(source_status.strip().lower(), 'unknown') # Default to 'unknown' or error
def format_date_to_timestamp(date_str: str) -> datetime:
"""
Converts a date string (e.g., 'YYYY-MM-DD') into a datetime object
suitable for a TIMESTAMP column (sets time to 00:00:00).
"""
if not isinstance(date_str, str) or not date_str:
return None
try:
# Assuming source date format is 'YYYY-MM-DD'
return datetime.strptime(date_str, '%Y-%m-%d')
except ValueError:
print(f"WARNING: Could not parse date string '{date_str}'. Returning None.")
return None
def ensure_decimal_precision(value: float, precision: int = 12, scale: int = 4):
"""
Ensures a float value is represented with the desired decimal precision and scale.
Returns a string for database insertion or a Decimal object if preferred.
"""
if value is None:
return None
try:
# Using string formatting for precision control
# For actual database operations, consider using Python's Decimal type
# to avoid floating-point inaccuracies.
return f"{value:.{scale}f}"
except (ValueError, TypeError):
print(f"WARNING: Could not format value '{value}' to decimal. Returning original.")
return value
def hash_sensitive_data(data
Project: [Your Project Name/Identifier]
Document Version: 1.0
Date: October 26, 2023
This document outlines a comprehensive plan for the data migration from [Legacy System X] to [New System Y]. It details the critical components required for a successful migration, including field mapping, data transformation rules, validation procedures, robust rollback strategies, and a projected timeline. The goal is to ensure data integrity, minimize downtime, and provide a seamless transition to the new system, thereby supporting [specific business objectives, e.g., enhanced operational efficiency, improved data analytics capabilities].
Data migration is a critical process involving the transfer of data between storage types, formats, or computer systems. This plan addresses the migration of essential business data from [Legacy System X] (the source system) to [New System Y] (the target system). A meticulous approach is vital to prevent data loss, corruption, or inconsistencies, which could adversely impact business operations. This document serves as a foundational blueprint for all migration activities.
The scope of this migration encompasses the following key data entities and their associated attributes:
Out of Scope:
Source System:
Target System:
The migration will follow a phased approach, emphasizing thorough testing and validation at each stage:
A "big bang" approach is planned for the final cutover to ensure data consistency, with multiple dry runs conducted prior to minimize risk.
Field mapping is the process of defining how each relevant data field from the source system corresponds to a field in the target system. This includes identifying data types, constraints, and any required transformations.
Actionable Steps:
Example Field Mapping Table Structure:
| Source System (Legacy System X) | Target System (New System Y) | Transformation Rule | Notes / Business Logic |
| :------------------------------ | :--------------------------- | :------------------ | :--------------------- |
| Legacy_CustomerID (INT) | CustomerID (UUID) | GENERATE_UUID() | Unique identifier for customer. Legacy INT will be mapped to a new UUID. |
| Legacy_FirstName (VARCHAR(50)) | FirstName (VARCHAR(100)) | DIRECT_MAP | Direct copy. Target field length is greater. |
| Legacy_LastName (VARCHAR(50)) | LastName (VARCHAR(100)) | DIRECT_MAP | Direct copy. |
| Legacy_AddressLine1 (VARCHAR(100)) | Address_Street (VARCHAR(200)) | DIRECT_MAP | Direct copy. |
| Legacy_City (VARCHAR(50)) | Address_City (VARCHAR(100)) | DIRECT_MAP | Direct copy. |
| Legacy_StateCode (VARCHAR(2)) | Address_State (VARCHAR(50)) | LOOKUP_STATE_NAME() | Convert 2-letter code to full state name (e.g., "CA" -> "California"). |
| Legacy_ZipCode (VARCHAR(10)) | Address_PostalCode (VARCHAR(10)) | DIRECT_MAP | Direct copy. |
| Legacy_CreationDate (DATETIME) | CreatedOn (TIMESTAMP) | CONVERT_TO_UTC | Convert local time to UTC timestamp. |
| Legacy_Status (INT) | CustomerStatus (ENUM) | MAP_STATUS_CODE() | Map 1->'ACTIVE', 2->'INACTIVE', 3->'PENDING'. Default to 'ACTIVE'. |
| Legacy_Amount (DECIMAL(10,2)) | OrderTotal (DECIMAL(12,2)) | DIRECT_MAP | Direct copy. Target precision allows for larger values. |
| Legacy_Comments (TEXT) | N/A | ARCHIVE_ONLY | Not migrated to New System Y; stored in an archive. |
| N/A | LastModifiedBy (VARCHAR(50)) | DEFAULT_VALUE("MigrationUser") | Target field required; set a default value for migrated records. |
(A complete field mapping document will be provided as an appendix once discovery is finalized.)
Transformation rules define the specific logic applied to source data to make it compatible with the target system's schema and business rules.
Categories of Transformations:
Example:* Legacy_CreationDate (DATETIME) to CreatedOn (TIMESTAMP UTC).
Example:* Trimming whitespace from Legacy_FirstName, converting all Legacy_Email to lowercase.
Example:* Standardizing phone numbers to E.164 format.
Example:* Legacy_StateCode ('CA') to Address_State ('California') using a predefined lookup table.
Example:* Legacy_Status (INT) to CustomerStatus (ENUM: 'ACTIVE', 'INACTIVE').
Example (Concatenation):* Combining Legacy_FirstName and Legacy_LastName into FullName in the target (if required).
Example (Splitting):* Splitting Legacy_FullAddress into Address_Street, Address_City, Address_State, Address_PostalCode.
Example:* Calculating CustomerAge from Legacy_DateOfBirth and current date.
Example:* Setting IsActive flag based on Legacy_LastActivityDate.
NULL values appropriately. Example:* If Legacy_PhoneNumber is NULL, set PhoneNumber in target to an empty string or specific default.
Example:* Assigning DEFAULT_VALUE("MigrationUser") to LastModifiedBy for all migrated records.
Example:* Summing Legacy_OrderLineItem_Amount for a given Legacy_OrderID to populate OrderTotal in the target.
Transformation Rule Examples (based on Field Mapping Table):
GENERATE_UUID() for CustomerID: For each record, generate a new universally unique identifier (UUID) for the CustomerID field in the target system. A mapping table will be maintained to link the old Legacy_CustomerID to the new CustomerID for historical reference and relational integrity during migration.LOOKUP_STATE_NAME() for Address_State: Utilize a predefined lookup table (e.g., State_Codes.csv or a database table) to convert the 2-character state code from Legacy_StateCode (e.g., "NY") to its full name ("New York") for Address_State.MAP_STATUS_CODE() for CustomerStatus: Apply conditional logic: * If Legacy_Status = 1, then CustomerStatus = 'ACTIVE'.
* If Legacy_Status = 2, then CustomerStatus = 'INACTIVE'.
* If Legacy_Status = 3, then CustomerStatus = 'PENDING'.
* Else (unrecognized code), default CustomerStatus = 'ACTIVE' and log an error/warning.
CONVERT_TO_UTC for CreatedOn: Convert the DATETIME value from Legacy_CreationDate (assumed to be in local time zone [e.g., PST]) to a UTC TIMESTAMP for CreatedOn in the target system.Robust validation is crucial to ensure the accuracy, completeness, and integrity of the migrated data. Validation will occur at multiple stages:
6.3.1. Pre-Migration Validation (Source Data Profiling & Cleansing)
* Data Type Conformance: Identify fields with data that doesn't match its declared type (e.g., text in a numeric field).
* Completeness Checks: Identify records with missing mandatory fields (e.g., Legacy_CustomerID is NULL).
* Uniqueness Checks: Verify primary keys and unique identifiers (e.g., duplicate Legacy_CustomerID).
* Referential Integrity Checks: Identify orphaned records (e.g., an order referencing a non-existent customer).
* Data Range/Format Checks: Verify values are within expected ranges (e.g., Legacy_Amount is positive) or conform to specific formats (e.g., Legacy_Email regex).
6.3.2. During-Migration Validation (ETL Process Checks)
* Record Counts: Compare the number of records extracted from the source to the number loaded into the target (per table/entity).
* Error Logging: Capture and log any records that fail transformation rules or target system constraints.
* Skipped Record Tracking: Identify records that were intentionally skipped based on migration rules.
* Performance Monitoring: Track migration speed and resource utilization.
6.3.3. Post-Migration Validation (Target Data Verification)
* Record Count Comparison: Final verification of total records migrated vs. expected.
* Data Integrity Checks:
* Random Data Sampling: Select a statistically significant sample of records from both source and target and compare field-by-field.
* Key Field Comparison: Verify primary keys and foreign keys are correctly linked.
* Checksum/Hash Comparison: For critical fields or entire records, compare hashes to ensure exact data transfer.
* Referential Integrity: Ensure all foreign
\n