Files
explorer-monorepo/docs/specs/database/postgres-schema.md

15 KiB

PostgreSQL Database Schema Specification

Overview

This document specifies the complete PostgreSQL database schema for the explorer platform. The schema is designed to support multi-chain operation, high-performance queries, and data consistency.

Schema Design Principles

  1. Multi-chain Support: All tables include chain_id for chain isolation
  2. Normalization: Normalized structure to avoid data duplication
  3. Performance: Strategic indexing for common query patterns
  4. Consistency: Foreign key constraints where appropriate
  5. Extensibility: JSONB columns for flexible data storage
  6. Partitioning: Large tables partitioned by chain_id

Core Tables

Blocks Table

See ../indexing/data-models.md for detailed block schema.

Partitioning: Partition by chain_id for large deployments.

Key Indexes:

  • Primary: (chain_id, number)
  • Unique: (chain_id, hash)
  • Index: (chain_id, timestamp) for time-range queries

Transactions Table

See ../indexing/data-models.md for detailed transaction schema.

Key Indexes:

  • Primary: (chain_id, hash)
  • Index: (chain_id, block_number, transaction_index) for block queries
  • Index: (chain_id, from_address) for address queries
  • Index: (chain_id, to_address) for address queries
  • Index: (chain_id, block_number, from_address) for compound queries

Logs Table

See ../indexing/data-models.md for detailed log schema.

Key Indexes:

  • Primary: (chain_id, transaction_hash, log_index)
  • Index: (chain_id, address) for contract event queries
  • Index: (chain_id, topic0) for event type queries
  • Index: (chain_id, address, topic0) for filtered event queries
  • Index: (chain_id, block_number) for block-based queries

Traces Table

See ../indexing/data-models.md for detailed trace schema.

Key Indexes:

  • Primary: (chain_id, transaction_hash, trace_address)
  • Index: (chain_id, action_from) for address queries
  • Index: (chain_id, action_to) for address queries
  • Index: (chain_id, block_number) for block queries

Internal Transactions Table

See ../indexing/data-models.md for detailed internal transaction schema.

Key Indexes:

  • Primary: (chain_id, transaction_hash, trace_address)
  • Index: (chain_id, from_address)
  • Index: (chain_id, to_address)
  • Index: (chain_id, block_number)

Token Tables

Tokens Table

CREATE TABLE tokens (
    id BIGSERIAL,
    chain_id INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    type VARCHAR(10) NOT NULL CHECK (type IN ('ERC20', 'ERC721', 'ERC1155')),
    name VARCHAR(255),
    symbol VARCHAR(50),
    decimals INTEGER CHECK (decimals >= 0 AND decimals <= 18),
    total_supply NUMERIC(78, 0),
    holder_count INTEGER DEFAULT 0,
    transfer_count INTEGER DEFAULT 0,
    logo_url TEXT,
    website_url TEXT,
    description TEXT,
    verified BOOLEAN DEFAULT false,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (id),
    UNIQUE (chain_id, address)
) PARTITION BY LIST (chain_id);

CREATE INDEX idx_tokens_chain_address ON tokens(chain_id, address);
CREATE INDEX idx_tokens_chain_type ON tokens(chain_id, type);
CREATE INDEX idx_tokens_chain_symbol ON tokens(chain_id, symbol);

Token Transfers Table

CREATE TABLE token_transfers (
    id BIGSERIAL,
    chain_id INTEGER NOT NULL,
    transaction_hash VARCHAR(66) NOT NULL,
    block_number BIGINT NOT NULL,
    log_index INTEGER NOT NULL,
    token_address VARCHAR(42) NOT NULL,
    token_type VARCHAR(10) NOT NULL CHECK (token_type IN ('ERC20', 'ERC721', 'ERC1155')),
    from_address VARCHAR(42) NOT NULL,
    to_address VARCHAR(42) NOT NULL,
    amount NUMERIC(78, 0),
    token_id VARCHAR(78),
    operator VARCHAR(42),
    created_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (id),
    FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
    FOREIGN KEY (chain_id, token_address) REFERENCES tokens(chain_id, address),
    UNIQUE (chain_id, transaction_hash, log_index)
) PARTITION BY LIST (chain_id);

CREATE INDEX idx_token_transfers_chain_token ON token_transfers(chain_id, token_address);
CREATE INDEX idx_token_transfers_chain_from ON token_transfers(chain_id, from_address);
CREATE INDEX idx_token_transfers_chain_to ON token_transfers(chain_id, to_address);
CREATE INDEX idx_token_transfers_chain_tx ON token_transfers(chain_id, transaction_hash);
CREATE INDEX idx_token_transfers_chain_block ON token_transfers(chain_id, block_number);
CREATE INDEX idx_token_transfers_chain_token_from ON token_transfers(chain_id, token_address, from_address);
CREATE INDEX idx_token_transfers_chain_token_to ON token_transfers(chain_id, token_address, to_address);

Token Holders Table (Optional)

Purpose: Maintain current token balances for efficient queries.

CREATE TABLE token_holders (
    id BIGSERIAL,
    chain_id INTEGER NOT NULL,
    token_address VARCHAR(42) NOT NULL,
    address VARCHAR(42) NOT NULL,
    balance NUMERIC(78, 0) NOT NULL DEFAULT 0,
    token_id VARCHAR(78), -- For ERC-721/1155
    updated_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (id),
    FOREIGN KEY (chain_id, token_address) REFERENCES tokens(chain_id, address),
    UNIQUE (chain_id, token_address, address, COALESCE(token_id, ''))
) PARTITION BY LIST (chain_id);

CREATE INDEX idx_token_holders_chain_token ON token_holders(chain_id, token_address);
CREATE INDEX idx_token_holders_chain_address ON token_holders(chain_id, address);

Contract Tables

Contracts Table

CREATE TABLE contracts (
    id BIGSERIAL,
    chain_id INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    name VARCHAR(255),
    compiler_version VARCHAR(50),
    optimization_enabled BOOLEAN,
    optimization_runs INTEGER,
    evm_version VARCHAR(20),
    source_code TEXT,
    abi JSONB,
    constructor_arguments TEXT,
    verification_status VARCHAR(20) NOT NULL CHECK (verification_status IN ('pending', 'verified', 'failed')),
    verified_at TIMESTAMP,
    verification_method VARCHAR(50),
    license VARCHAR(50),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (id),
    UNIQUE (chain_id, address)
) PARTITION BY LIST (chain_id);

CREATE INDEX idx_contracts_chain_address ON contracts(chain_id, address);
CREATE INDEX idx_contracts_chain_verified ON contracts(chain_id, verification_status);
CREATE INDEX idx_contracts_abi_gin ON contracts USING GIN (abi); -- For ABI queries

Contract ABIs Table

CREATE TABLE contract_abis (
    id BIGSERIAL,
    chain_id INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    abi JSONB NOT NULL,
    source VARCHAR(50) NOT NULL,
    verified BOOLEAN DEFAULT false,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (id),
    UNIQUE (chain_id, address)
) PARTITION BY LIST (chain_id);

CREATE INDEX idx_abis_chain_address ON contract_abis(chain_id, address);
CREATE INDEX idx_abis_abi_gin ON contract_abis USING GIN (abi);

Contract Verifications Table

CREATE TABLE contract_verifications (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    chain_id INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    status VARCHAR(20) NOT NULL CHECK (status IN ('pending', 'processing', 'verified', 'failed', 'partially_verified')),
    compiler_version VARCHAR(50),
    optimization_enabled BOOLEAN,
    optimization_runs INTEGER,
    evm_version VARCHAR(20),
    source_code TEXT,
    abi JSONB,
    constructor_arguments TEXT,
    verification_method VARCHAR(50),
    error_message TEXT,
    verified_at TIMESTAMP,
    version INTEGER DEFAULT 1,
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (chain_id, address) REFERENCES contracts(chain_id, address)
);

CREATE INDEX idx_verifications_chain_address ON contract_verifications(chain_id, address);
CREATE INDEX idx_verifications_status ON contract_verifications(status);

Address Labels Table

CREATE TABLE address_labels (
    id BIGSERIAL,
    chain_id INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    label VARCHAR(255) NOT NULL,
    label_type VARCHAR(20) NOT NULL CHECK (label_type IN ('user', 'public', 'contract_name')),
    user_id UUID,
    source VARCHAR(50),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (id),
    UNIQUE (chain_id, address, label_type, user_id),
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);

CREATE INDEX idx_labels_chain_address ON address_labels(chain_id, address);
CREATE INDEX idx_labels_chain_user ON address_labels(chain_id, user_id);

Address Tags Table

CREATE TABLE address_tags (
    id BIGSERIAL,
    chain_id INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    tag VARCHAR(50) NOT NULL,
    tag_type VARCHAR(20) NOT NULL CHECK (tag_type IN ('category', 'risk', 'protocol')),
    user_id UUID,
    created_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (id),
    UNIQUE (chain_id, address, tag, user_id),
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);

CREATE INDEX idx_tags_chain_address ON address_tags(chain_id, address);
CREATE INDEX idx_tags_chain_tag ON address_tags(chain_id, tag);

User Tables

Users Table

CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) UNIQUE,
    username VARCHAR(100) UNIQUE,
    password_hash TEXT,
    api_key_hash TEXT,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    last_login_at TIMESTAMP
);

CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_username ON users(username);

Watchlists Table

CREATE TABLE watchlists (
    id BIGSERIAL,
    user_id UUID NOT NULL,
    chain_id INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    label VARCHAR(255),
    created_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (id),
    UNIQUE (user_id, chain_id, address),
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);

CREATE INDEX idx_watchlists_user ON watchlists(user_id);
CREATE INDEX idx_watchlists_chain_address ON watchlists(chain_id, address);

API Keys Table

CREATE TABLE api_keys (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL,
    key_hash TEXT NOT NULL UNIQUE,
    name VARCHAR(255),
    tier VARCHAR(20) NOT NULL CHECK (tier IN ('free', 'pro', 'enterprise')),
    rate_limit_per_second INTEGER,
    rate_limit_per_minute INTEGER,
    ip_whitelist TEXT[], -- Array of CIDR blocks
    last_used_at TIMESTAMP,
    expires_at TIMESTAMP,
    revoked BOOLEAN DEFAULT false,
    created_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);

CREATE INDEX idx_api_keys_user ON api_keys(user_id);
CREATE INDEX idx_api_keys_hash ON api_keys(key_hash);

Multi-Chain Partitioning

Partitioning Strategy

Large Tables: Partition by chain_id using LIST partitioning.

Tables to Partition:

  • blocks
  • transactions
  • logs
  • traces
  • internal_transactions
  • token_transfers
  • tokens
  • token_holders (if used)

Partition Creation

Example for blocks table:

-- Create parent table
CREATE TABLE blocks (
    -- columns
) PARTITION BY LIST (chain_id);

-- Create partitions
CREATE TABLE blocks_chain_138 PARTITION OF blocks
    FOR VALUES IN (138);

CREATE TABLE blocks_chain_1 PARTITION OF blocks
    FOR VALUES IN (1);

-- Add indexes to partitions (inherited from parent)

Benefits:

  • Faster queries (partition pruning)
  • Easier maintenance (per-chain operations)
  • Parallel processing
  • Data isolation

Indexing Strategy

Index Types

  1. B-tree: Default for most indexes (equality, range, sorting)
  2. Hash: For exact match only (rarely used, B-tree usually better)
  3. GIN: For JSONB columns (ABIs, decoded data)
  4. BRIN: For large ordered columns (block numbers, timestamps)
  5. Partial: For filtered indexes (e.g., verified contracts only)

Index Maintenance

Regular Maintenance:

  • VACUUM ANALYZE regularly (auto-vacuum enabled)
  • REINDEX if needed (bloat, corruption)
  • Monitor index usage (pg_stat_user_indexes)

Index Monitoring:

  • Track index sizes
  • Monitor index bloat
  • Remove unused indexes

Data Retention and Archiving

Retention Policies

Hot Data: Recent data (last 1 year)

  • Fast access required
  • All indexes maintained

Warm Data: Older data (1-5 years)

  • Archive to slower storage
  • Reduced indexing

Cold Data: Very old data (5+ years)

  • Archive to object storage
  • Minimal indexing

Archiving Strategy

Approach:

  1. Partition tables by time ranges (monthly/yearly)
  2. Move old partitions to archive storage
  3. Query archive when needed (slower but available)

Implementation:

  • Use PostgreSQL table partitioning by date range
  • Move partitions to archive storage (S3, etc.)
  • Query via foreign data wrappers if needed

Migration Strategy

Versioning

Migration Tool: Use migration tool (Flyway, Liquibase, or custom).

Versioning Format: YYYYMMDDHHMMSS_description.sql

Example:

20240101000001_initial_schema.sql
20240115000001_add_token_holders.sql
20240201000001_add_partitioning.sql

Migration Best Practices

  1. Backward Compatible: Additive changes preferred
  2. Reversible: All migrations should be reversible
  3. Tested: Test on staging before production
  4. Documented: Document breaking changes
  5. Rollback Plan: Have rollback strategy

Schema Evolution

Adding Columns:

  • Use ALTER TABLE ADD COLUMN with default values
  • Avoid NOT NULL without defaults (use two-step migration)

Removing Columns:

  • Mark as deprecated first
  • Remove after migration period

Changing Types:

  • Create new column
  • Migrate data
  • Drop old column
  • Rename new column

Performance Optimization

Query Optimization

Common Query Patterns:

  1. Get block by number: Use (chain_id, number) index
  2. Get transaction by hash: Use (chain_id, hash) index
  3. Get address transactions: Use (chain_id, from_address) or (chain_id, to_address) index
  4. Filter logs by address and event: Use (chain_id, address, topic0) index

Connection Pooling

Configuration:

  • Use connection pooler (PgBouncer, pgpool-II)
  • Pool size: 20-100 connections per application server
  • Statement-level pooling for better concurrency

Read Replicas

Strategy:

  • Primary: Write operations
  • Replicas: Read operations (load balanced)
  • Async replication (small lag acceptable)

Backup and Recovery

Backup Strategy

Full Backups: Daily full database dumps Incremental Backups: Continuous WAL archiving Point-in-Time Recovery: Enabled via WAL archiving

Recovery Procedures

RTO Target: 1 hour RPO Target: 5 minutes (max data loss)

References

  • Data Models: See ../indexing/data-models.md
  • Indexer Architecture: See ../indexing/indexer-architecture.md
  • Search Index Schema: See search-index-schema.md
  • Multi-chain Architecture: See ../multichain/multichain-indexing.md