Workflow Name: Database Schema Designer
Category: Development
Description: Test run
Topic: AI Technology
Execution Time: 5 minutes
Credits Consumed: 100 cr
This workflow has successfully generated a database schema tailored for managing projects, datasets, models, experiments, and their associated metrics within the domain of AI Technology.
This document outlines a relational database schema designed to track and manage various aspects of AI technology development and experimentation. The schema provides a structured way to store information about AI projects, the datasets used, the models developed, individual experiments conducted, and the performance metrics generated from these experiments.
The primary goal of this schema is to enable:
This design is flexible and scalable, providing a solid foundation for MLOps platforms, research tracking systems, or internal AI development tools.
The core entities and their relationships are as follows:
Relationships:* One-to-Many with Datasets, Models, and Experiments.
Relationships:* Many-to-One with Projects, One-to-Many with Experiments.
Relationships:* Many-to-One with Projects, One-to-Many with Experiments.
Relationships:* Many-to-One with Projects, Many-to-One with Models, Many-to-One with Datasets, One-to-Many with Metrics.
Relationships:* Many-to-One with Experiments.
Below are the detailed table definitions, including columns, data types, constraints, and relationships. For data types, we assume a PostgreSQL-compatible environment.
projects| Column Name | Data Type | Constraints | Description |
| :---------- | :----------------------------- | :-------------------------- | :--------------------------------------------- |
| project_id| SERIAL | PRIMARY KEY | Unique identifier for the project. |
| name | VARCHAR(255) | NOT NULL, UNIQUE | Name of the project. |
| description| TEXT | | Detailed description of the project. |
| start_date| DATE | NOT NULL, DEFAULT CURRENT_DATE | Date when the project started. |
| end_date | DATE | | Optional date when the project ended. |
| status | VARCHAR(50) | NOT NULL, DEFAULT 'Active', CHECK | Current status of the project (e.g., 'Planning', 'Active', 'Completed', 'Archived'). |
| created_at| TIMESTAMP WITH TIME ZONE | DEFAULT CURRENT_TIMESTAMP | Timestamp of project creation. |
| updated_at| TIMESTAMP WITH TIME ZONE | DEFAULT CURRENT_TIMESTAMP | Timestamp of last update. |
datasets| Column Name | Data Type | Constraints | Description |
| :------------ | :----------------------------- | :-------------------------- | :--------------------------------------------- |
| dataset_id | SERIAL | PRIMARY KEY | Unique identifier for the dataset. |
| project_id | INT | NOT NULL, FOREIGN KEY | Foreign key referencing projects.project_id. |
| name | VARCHAR(255) | NOT NULL | Name of the dataset. |
| version | VARCHAR(50) | DEFAULT '1.0' | Version of the dataset. |
| description | TEXT | | Detailed description of the dataset. |
| source_url | VARCHAR(512) | | URL or path to the dataset's source. |
| data_type | VARCHAR(100) | NOT NULL, CHECK | Type of data (e.g., 'Image', 'Text', 'Tabular', 'Audio', 'Video', 'Other'). |
| size_bytes | BIGINT | | Size of the dataset in bytes. |
| num_records | BIGINT | | Number of records/samples in the dataset. |
| created_at | TIMESTAMP WITH TIME ZONE | DEFAULT CURRENT_TIMESTAMP | Timestamp of dataset record creation. |
| updated_at | TIMESTAMP WITH TIME ZONE | DEFAULT CURRENT_TIMESTAMP | Timestamp of last update. |
(project_id, name, version)models| Column Name | Data Type | Constraints | Description |
| :------------ | :----------------------------- | :-------------------------- | :--------------------------------------------- |
| model_id | SERIAL | PRIMARY KEY | Unique identifier for the model. |
| project_id | INT | NOT NULL, FOREIGN KEY | Foreign key referencing projects.project_id. |
| name | VARCHAR(255) | NOT NULL | Name of the model. |
| version | VARCHAR(50) | DEFAULT '1.0' | Version of the model. |
| architecture| VARCHAR(255) | | Model architecture (e.g., 'ResNet-50', 'BERT'). |
| framework | VARCHAR(100) | NOT NULL | AI framework used (e.g., 'PyTorch', 'TensorFlow', 'Scikit-learn'). |
| description | TEXT | | Detailed description of the model. |
| created_at | TIMESTAMP WITH TIME ZONE | DEFAULT CURRENT_TIMESTAMP | Timestamp of model record creation. |
| updated_at | TIMESTAMP WITH TIME ZONE | DEFAULT CURRENT_TIMESTAMP | Timestamp of last update. |
(project_id, name, version)experiments| Column Name | Data Type | Constraints | Description |
| :---------------- | :----------------------------- | :-------------------------- | :--------------------------------------------- |
| experiment_id | SERIAL | PRIMARY KEY | Unique identifier for the experiment. |
| project_id | INT | NOT NULL, FOREIGN KEY | Foreign key referencing projects.project_id. |
| model_id | INT | NOT NULL, FOREIGN KEY | Foreign key referencing models.model_id. |
| dataset_id | INT | NOT NULL, FOREIGN KEY | Foreign key referencing datasets.dataset_id. |
| name | VARCHAR(255) | NOT NULL | Name of the experiment. |
| description | TEXT | | Detailed description of the experiment. |
| start_time | TIMESTAMP WITH TIME ZONE | NOT NULL, DEFAULT CURRENT_TIMESTAMP | Start time of the experiment. |
| end_time | TIMESTAMP WITH TIME ZONE | | End time of the experiment. |
| status | VARCHAR(50) | NOT NULL, DEFAULT 'Pending', CHECK | Current status (e.g., 'Pending', 'Running', 'Completed', 'Failed', 'Cancelled'). |
| hyperparameters | JSONB | | JSON object storing hyperparameters used. |
| code_version | VARCHAR(255) | | Git commit hash or version of the code used. |
| notes | TEXT | | Any additional notes for the experiment. |
| created_at | TIMESTAMP WITH TIME ZONE | DEFAULT CURRENT_TIMESTAMP | Timestamp of experiment record creation. |
| updated_at | TIMESTAMP WITH TIME ZONE | DEFAULT CURRENT_TIMESTAMP | Timestamp of last update. |
metrics| Column Name | Data Type | Constraints | Description |
| :------------- | :----------------------------- | :-------------------------- | :--------------------------------------------- |
| metric_id | SERIAL | PRIMARY KEY | Unique identifier for the metric. |
| experiment_id| INT | NOT NULL, FOREIGN KEY | Foreign key referencing experiments.experiment_id. |
| name | VARCHAR(100) | NOT NULL | Name of the metric (e.g., 'accuracy', 'precision', 'loss', 'RMSE'). |
| value | DECIMAL(18, 9) | NOT NULL | Numeric value of the metric. |
| step | INT | | Optional: Training step/epoch when metric was recorded. |
| evaluation_set| VARCHAR(50) | NOT NULL, CHECK | The dataset split on which the metric was evaluated (e.g., 'train', 'validation', 'test', 'inference'). |
| recorded_at | TIMESTAMP WITH TIME ZONE | DEFAULT CURRENT_TIMESTAMP | Timestamp when the metric was recorded. |
(experiment_id, name, step, evaluation_set)projects: * idx_projects_name ON projects (name)
* idx_projects_status ON projects (status)
datasets: * idx_datasets_project_id ON datasets (project_id)
* idx_datasets_name_version ON datasets (name, version)
* idx_datasets_data_type ON datasets (data_type)
models: * idx_models_project_id ON models (project_id)
* idx_models_name_version ON models (name, version)
* idx_models_framework ON models (framework)
experiments: * idx_experiments_project_id ON experiments (project_id)
* idx_experiments_model_id ON experiments (model_id)
* idx_experiments_dataset_id ON experiments (dataset_id)
* idx_experiments_status ON experiments (status)
metrics: * idx_metrics_experiment_id ON metrics (experiment_id)
* idx_metrics_name ON metrics (name)
* idx_metrics_evaluation_set ON metrics (evaluation_set)
-- Enable UUID generation if desired for primary keys instead of SERIAL
-- CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
-- Table: projects
CREATE TABLE projects (
project_id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL UNIQUE,
description TEXT,
start_date DATE NOT NULL DEFAULT CURRENT_DATE,
end_date DATE,
status VARCHAR(50) NOT NULL DEFAULT 'Active' CHECK (status IN ('Planning', 'Active', 'Completed', 'Archived')),
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
-- Table: datasets
CREATE TABLE datasets (
dataset_id SERIAL PRIMARY KEY,
project_id INT NOT NULL,
name VARCHAR(255) NOT NULL,
version VARCHAR(50) DEFAULT '1.0',
description TEXT,
source_url VARCHAR(512),
data_type VARCHAR(100) NOT NULL CHECK (data_type IN ('Image', 'Text', 'Tabular', 'Audio', 'Video', 'Other')),
size_bytes BIGINT,
num_records BIGINT,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT fk_dataset_project
FOREIGN KEY (project_id)
REFERENCES projects (project_id) ON DELETE CASCADE,
CONSTRAINT uq_dataset_project_name_version UNIQUE (project_id, name, version)
);
-- Table: models
CREATE TABLE models (
model_id SERIAL PRIMARY KEY,
project_id INT NOT NULL,
name VARCHAR(255) NOT NULL,
version VARCHAR(50) DEFAULT '1.0',
architecture VARCHAR(255),
framework VARCHAR(100) NOT NULL,
description TEXT,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT fk_model_project
FOREIGN KEY (project_id)
REFERENCES projects (project_id) ON DELETE CASCADE,
CONSTRAINT uq_model_project_name_version UNIQUE (project_id, name, version)
);
-- Table: experiments
CREATE TABLE experiments (
experiment_id SERIAL PRIMARY KEY,
project_id INT NOT NULL,
model_id INT NOT NULL,
dataset_id INT NOT NULL,
name VARCHAR(255) NOT NULL,
description TEXT,
start_time TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
end_time TIMESTAMP WITH TIME ZONE,
status VARCHAR(50) NOT NULL DEFAULT 'Pending' CHECK (status IN ('Pending', 'Running', 'Completed', 'Failed', 'Cancelled')),
hyperparameters JSONB, -- For storing key-value pairs of hyperparameters
code_version VARCHAR(255), -- e.g., Git commit hash
notes TEXT,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT fk_experiment_project
FOREIGN KEY (project_id)
REFERENCES projects (project_id) ON DELETE CASCADE,
CONSTRAINT fk_experiment_model
FOREIGN KEY (model_id)
REFERENCES models (model_id) ON DELETE RESTRICT, -- Prevent deleting models if experiments exist
CONSTRAINT fk_experiment_dataset
FOREIGN KEY (dataset_id)
REFERENCES datasets (dataset_id) ON DELETE RESTRICT -- Prevent deleting datasets if experiments exist
);
-- Table: metrics
CREATE TABLE metrics (
metric_id SERIAL PRIMARY KEY,
experiment_id INT NOT NULL,
name VARCHAR(100) NOT NULL,
value DECIMAL(18, 9) NOT NULL,
step INT, -- For metrics recorded over training steps/epochs
evaluation_set VARCHAR(50) NOT NULL CHECK (evaluation_set IN ('train', 'validation', 'test', 'inference')),
recorded_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT fk_metric_experiment
FOREIGN KEY (experiment_id)
REFERENCES experiments (experiment_id) ON DELETE CASCADE,
CONSTRAINT uq_metric_experiment_name_step_evalset UNIQUE (experiment_id, name, step, evaluation_set)
);
-- Create Indexes
CREATE INDEX idx_projects_name ON projects (name);
CREATE INDEX idx_projects_status ON projects (status);
CREATE INDEX idx_datasets_project_id ON datasets (project_id);
CREATE INDEX idx_datasets_name_version ON datasets (name, version);
CREATE INDEX idx_datasets_data_type ON datasets (data_type);
CREATE INDEX idx_models_project_id ON models (project_id);
CREATE INDEX idx_models_name_version ON models (name, version);
CREATE INDEX idx_models_framework ON models (framework);
CREATE INDEX idx_experiments_project_id ON experiments (project_id);
CREATE INDEX idx_experiments_model_id ON experiments (model_id);
CREATE INDEX idx_experiments_dataset_id ON experiments (dataset_id);
CREATE INDEX idx_experiments_status ON experiments (status);
CREATE INDEX idx_metrics_experiment_id ON metrics (experiment_id);
CREATE INDEX idx_metrics_name ON metrics (name);
CREATE INDEX idx_metrics_evaluation_set ON metrics (evaluation_set);
SERIAL (auto-incrementing integer) for primary keys is common and efficient for relational databases. For distributed systems or larger scales, UUIDs could be considered (as mentioned in the DDL comments).FOREIGN KEY ... REFERENCES ... ON DELETE ...) enforce referential integrity. * ON DELETE CASCADE for project_id in datasets and models means if a project is deleted, all associated datasets and models are also deleted.
* ON DELETE RESTRICT for model_id and dataset_id in experiments prevents deletion of models or datasets if active experiments still reference them, ensuring historical data integrity.
* ON DELETE CASCADE for experiment_id in metrics ensures that all metrics related to an experiment are removed if the experiment itself is deleted.
VARCHAR vs. TEXT: VARCHAR(N) is used for fixed-length or shorter, well-defined strings (e.g., names, versions), while TEXT is used for potentially longer descriptive fields (e.g., description, notes) for flexibility.JSONB for hyperparameters: This column in the experiments table allows for flexible storage of varied hyperparameters without requiring schema changes for every new model or experiment type. JSONB in PostgreSQL offers efficient indexing and querying of JSON data.TIMESTAMP WITH TIME ZONE: Recommended for created_at, updated_at, start_time, end_time to handle time across different geographical locations consistently.CHECK Constraints: Used to enforce valid values for status and data type fields, improving data quality.UNIQUE Constraints: Enforce uniqueness where appropriate (e.g., project names, specific dataset/model versions within a project, or a specific metric for an experiment at a given step).users and teams to associate projects, experiments, and models with specific individuals or groups, along with roles and permissions.models table with more fields like model_path (to stored artifacts), model_type (e.g., 'classification', 'regression'), input_schema, output_schema.experiment_runs or model_versions to track granular changes or iterations more explicitly.tags table and a tag_assignments join table) to categorize projects, datasets, models, or experiments with arbitrary keywords.JSONB support.* Execute the provided SQL DDL script in your chosen database to create the schema.
* Consider using an ORM (Object-Relational Mapper) like SQLAlchemy (Python), Hibernate (Java), or Entity Framework (.NET) to interact with the database in your application code, simplifying data operations.
* Develop scripts or APIs to populate these tables. For example, when an AI experiment starts, record an entry in experiments. When it completes, update its status and insert relevant metrics.
* Automate the recording of hyperparameters and code_version directly from your training scripts.
* Regularly back up your database.
* Monitor database performance, especially for frequently queried tables like experiments and metrics.
* Review and optimize indexes as your data grows and query patterns evolve.
\n