Add full monorepo: virtual-banker, backend, frontend, docs, scripts, deployment

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-10 11:32:49 -08:00
parent aafcd913c2
commit 88bc76da91
815 changed files with 125522 additions and 264 deletions
--- a/docs/specs/indexing/data-models.md
+++ b/docs/specs/indexing/data-models.md
@@ -0,0 +1,566 @@
+# Data Models Specification
+
+## Overview
+
+This document specifies the data models used throughout the indexing pipeline and stored in the database. All models support multi-chain operation via a `chain_id` field.
+
+## Core Data Models
+
+### Block Schema
+
+**Table**: `blocks`
+
+**Fields**:
+```sql
+blocks (
+    id BIGSERIAL PRIMARY KEY,
+    chain_id INTEGER NOT NULL,
+    number BIGINT NOT NULL,
+    hash VARCHAR(66) NOT NULL,
+    parent_hash VARCHAR(66) NOT NULL,
+    nonce VARCHAR(18),
+    sha3_uncles VARCHAR(66),
+    logs_bloom TEXT,
+    transactions_root VARCHAR(66),
+    state_root VARCHAR(66),
+    receipts_root VARCHAR(66),
+    miner VARCHAR(42),
+    difficulty NUMERIC,
+    total_difficulty NUMERIC,
+    size BIGINT,
+    extra_data TEXT,
+    gas_limit BIGINT,
+    gas_used BIGINT,
+    timestamp TIMESTAMP NOT NULL,
+    transaction_count INTEGER DEFAULT 0,
+    base_fee_per_gas BIGINT, -- EIP-1559
+    created_at TIMESTAMP DEFAULT NOW(),
+    updated_at TIMESTAMP DEFAULT NOW(),
+    UNIQUE(chain_id, number),
+    UNIQUE(chain_id, hash)
+)
+```
+
+**Indexes**:
+- `idx_blocks_chain_number` ON (chain_id, number)
+- `idx_blocks_chain_hash` ON (chain_id, hash)
+- `idx_blocks_chain_timestamp` ON (chain_id, timestamp)
+
+**Relationships**:
+- One-to-many with `transactions`
+- One-to-many with `logs`
+
+### Transaction Schema
+
+**Table**: `transactions`
+
+**Fields**:
+```sql
+transactions (
+    id BIGSERIAL PRIMARY KEY,
+    chain_id INTEGER NOT NULL,
+    hash VARCHAR(66) NOT NULL,
+    block_number BIGINT NOT NULL,
+    block_hash VARCHAR(66) NOT NULL,
+    transaction_index INTEGER NOT NULL,
+    from_address VARCHAR(42) NOT NULL,
+    to_address VARCHAR(42), -- NULL for contract creation
+    value NUMERIC NOT NULL DEFAULT 0,
+    gas_price BIGINT,
+    max_fee_per_gas BIGINT, -- EIP-1559
+    max_priority_fee_per_gas BIGINT, -- EIP-1559
+    gas_limit BIGINT NOT NULL,
+    gas_used BIGINT,
+    nonce BIGINT NOT NULL,
+    input_data TEXT, -- Contract call data
+    status INTEGER, -- 0 = failed, 1 = success
+    contract_address VARCHAR(42), -- NULL if not contract creation
+    cumulative_gas_used BIGINT,
+    effective_gas_price BIGINT, -- Actual gas price paid
+    created_at TIMESTAMP DEFAULT NOW(),
+    updated_at TIMESTAMP DEFAULT NOW(),
+    FOREIGN KEY (chain_id, block_number) REFERENCES blocks(chain_id, number),
+    UNIQUE(chain_id, hash)
+)
+```
+
+**Indexes**:
+- `idx_transactions_chain_hash` ON (chain_id, hash)
+- `idx_transactions_chain_block` ON (chain_id, block_number, transaction_index)
+- `idx_transactions_chain_from` ON (chain_id, from_address)
+- `idx_transactions_chain_to` ON (chain_id, to_address)
+- `idx_transactions_chain_block_from` ON (chain_id, block_number, from_address)
+
+**Relationships**:
+- Many-to-one with `blocks`
+- One-to-many with `logs`
+- One-to-many with `internal_transactions`
+- One-to-many with `token_transfers`
+
+### Receipt Schema
+
+**Note**: Receipt data is stored denormalized in the `transactions` table for efficiency. If separate storage is needed:
+
+**Table**: `transaction_receipts`
+
+```sql
+transaction_receipts (
+    id BIGSERIAL PRIMARY KEY,
+    chain_id INTEGER NOT NULL,
+    transaction_hash VARCHAR(66) NOT NULL,
+    transaction_index INTEGER NOT NULL,
+    block_number BIGINT NOT NULL,
+    block_hash VARCHAR(66) NOT NULL,
+    from_address VARCHAR(42) NOT NULL,
+    to_address VARCHAR(42),
+    gas_used BIGINT,
+    cumulative_gas_used BIGINT,
+    contract_address VARCHAR(42),
+    logs_bloom TEXT,
+    status INTEGER,
+    root VARCHAR(66), -- Pre-Byzantium
+    created_at TIMESTAMP DEFAULT NOW(),
+    FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
+    UNIQUE(chain_id, transaction_hash)
+)
+```
+
+### Log Schema
+
+**Table**: `logs`
+
+**Fields**:
+```sql
+logs (
+    id BIGSERIAL PRIMARY KEY,
+    chain_id INTEGER NOT NULL,
+    transaction_hash VARCHAR(66) NOT NULL,
+    block_number BIGINT NOT NULL,
+    block_hash VARCHAR(66) NOT NULL,
+    log_index INTEGER NOT NULL,
+    address VARCHAR(42) NOT NULL,
+    topic0 VARCHAR(66), -- Event signature
+    topic1 VARCHAR(66), -- First indexed parameter
+    topic2 VARCHAR(66), -- Second indexed parameter
+    topic3 VARCHAR(66), -- Third indexed parameter
+    data TEXT, -- Non-indexed parameters
+    decoded_data JSONB, -- Decoded event data (if ABI available)
+    created_at TIMESTAMP DEFAULT NOW(),
+    FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
+    UNIQUE(chain_id, transaction_hash, log_index)
+)
+```
+
+**Indexes**:
+- `idx_logs_chain_tx` ON (chain_id, transaction_hash)
+- `idx_logs_chain_address` ON (chain_id, address)
+- `idx_logs_chain_topic0` ON (chain_id, topic0)
+- `idx_logs_chain_block` ON (chain_id, block_number)
+- `idx_logs_chain_address_topic0` ON (chain_id, address, topic0) -- For event filtering
+
+**Relationships**:
+- Many-to-one with `transactions`
+
+### Trace Schema
+
+**Table**: `traces`
+
+**Fields**:
+```sql
+traces (
+    id BIGSERIAL PRIMARY KEY,
+    chain_id INTEGER NOT NULL,
+    transaction_hash VARCHAR(66) NOT NULL,
+    block_number BIGINT NOT NULL,
+    block_hash VARCHAR(66) NOT NULL,
+    trace_address INTEGER[], -- Array representing call hierarchy [0,1,2]
+    subtraces INTEGER, -- Number of child calls
+    action_type VARCHAR(20) NOT NULL, -- 'call', 'create', 'suicide', 'delegatecall'
+    action_from VARCHAR(42),
+    action_to VARCHAR(42),
+    action_value NUMERIC DEFAULT 0,
+    action_input TEXT,
+    action_gas BIGINT,
+    action_call_type VARCHAR(20), -- 'call', 'delegatecall', 'staticcall'
+    result_type VARCHAR(20), -- 'callresult', 'createresult'
+    result_gas_used BIGINT,
+    result_output TEXT,
+    result_address VARCHAR(42), -- For create results
+    result_code TEXT, -- For create results
+    error TEXT, -- Error message if trace failed
+    created_at TIMESTAMP DEFAULT NOW(),
+    FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash)
+)
+```
+
+**Indexes**:
+- `idx_traces_chain_tx` ON (chain_id, transaction_hash)
+- `idx_traces_chain_block` ON (chain_id, block_number)
+- `idx_traces_chain_from` ON (chain_id, action_from)
+- `idx_traces_chain_to` ON (chain_id, action_to)
+
+**Note**: Trace data can be large. Consider partitioning or separate storage for historical traces.
+
+### Internal Transaction Schema
+
+**Table**: `internal_transactions`
+
+**Purpose**: Track value transfers that occur within transactions (via calls).
+
+**Fields**:
+```sql
+internal_transactions (
+    id BIGSERIAL PRIMARY KEY,
+    chain_id INTEGER NOT NULL,
+    transaction_hash VARCHAR(66) NOT NULL,
+    block_number BIGINT NOT NULL,
+    trace_address INTEGER[] NOT NULL,
+    from_address VARCHAR(42) NOT NULL,
+    to_address VARCHAR(42) NOT NULL,
+    value NUMERIC NOT NULL,
+    call_type VARCHAR(20), -- 'call', 'delegatecall', 'staticcall', 'create'
+    gas_limit BIGINT,
+    gas_used BIGINT,
+    input_data TEXT,
+    output_data TEXT,
+    error TEXT,
+    created_at TIMESTAMP DEFAULT NOW(),
+    FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash)
+)
+```
+
+**Indexes**:
+- `idx_internal_tx_chain_tx` ON (chain_id, transaction_hash)
+- `idx_internal_tx_chain_from` ON (chain_id, from_address)
+- `idx_internal_tx_chain_to` ON (chain_id, to_address)
+- `idx_internal_tx_chain_block` ON (chain_id, block_number)
+
+**Relationships**:
+- Many-to-one with `transactions`
+
+### Token Transfer Schema
+
+**Table**: `token_transfers`
+
+**Purpose**: Track ERC-20, ERC-721, and ERC-1155 token transfers.
+
+**Fields**:
+```sql
+token_transfers (
+    id BIGSERIAL PRIMARY KEY,
+    chain_id INTEGER NOT NULL,
+    transaction_hash VARCHAR(66) NOT NULL,
+    block_number BIGINT NOT NULL,
+    log_index INTEGER NOT NULL,
+    token_address VARCHAR(42) NOT NULL,
+    token_type VARCHAR(10) NOT NULL, -- 'ERC20', 'ERC721', 'ERC1155'
+    from_address VARCHAR(42) NOT NULL,
+    to_address VARCHAR(42) NOT NULL,
+    amount NUMERIC, -- For ERC-20 and ERC-1155
+    token_id VARCHAR(78), -- For ERC-721 and ERC-1155 (can be large)
+    operator VARCHAR(42), -- For ERC-1155
+    created_at TIMESTAMP DEFAULT NOW(),
+    FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
+    FOREIGN KEY (chain_id, token_address) REFERENCES tokens(chain_id, address)
+)
+```
+
+**Indexes**:
+- `idx_token_transfers_chain_token` ON (chain_id, token_address)
+- `idx_token_transfers_chain_from` ON (chain_id, from_address)
+- `idx_token_transfers_chain_to` ON (chain_id, to_address)
+- `idx_token_transfers_chain_tx` ON (chain_id, transaction_hash)
+- `idx_token_transfers_chain_block` ON (chain_id, block_number)
+- `idx_token_transfers_chain_token_from` ON (chain_id, token_address, from_address)
+- `idx_token_transfers_chain_token_to` ON (chain_id, token_address, to_address)
+
+**Relationships**:
+- Many-to-one with `transactions`
+- Many-to-one with `tokens`
+
+### Token Schema
+
+**Table**: `tokens`
+
+**Purpose**: Store token metadata (ERC-20, ERC-721, ERC-1155).
+
+**Fields**:
+```sql
+tokens (
+    id BIGSERIAL PRIMARY KEY,
+    chain_id INTEGER NOT NULL,
+    address VARCHAR(42) NOT NULL,
+    type VARCHAR(10) NOT NULL, -- 'ERC20', 'ERC721', 'ERC1155'
+    name VARCHAR(255),
+    symbol VARCHAR(50),
+    decimals INTEGER, -- For ERC-20
+    total_supply NUMERIC,
+    holder_count INTEGER DEFAULT 0,
+    transfer_count INTEGER DEFAULT 0,
+    logo_url TEXT,
+    website_url TEXT,
+    description TEXT,
+    verified BOOLEAN DEFAULT false,
+    created_at TIMESTAMP DEFAULT NOW(),
+    updated_at TIMESTAMP DEFAULT NOW(),
+    UNIQUE(chain_id, address)
+)
+```
+
+**Indexes**:
+- `idx_tokens_chain_address` ON (chain_id, address)
+- `idx_tokens_chain_type` ON (chain_id, type)
+- `idx_tokens_chain_symbol` ON (chain_id, symbol) -- For search
+
+**Relationships**:
+- One-to-many with `token_transfers`
+- One-to-many with `token_holders` (if maintained)
+
+### Contract Metadata Schema
+
+**Table**: `contracts`
+
+**Purpose**: Store verified contract information.
+
+**Fields**:
+```sql
+contracts (
+    id BIGSERIAL PRIMARY KEY,
+    chain_id INTEGER NOT NULL,
+    address VARCHAR(42) NOT NULL,
+    name VARCHAR(255),
+    compiler_version VARCHAR(50),
+    optimization_enabled BOOLEAN,
+    optimization_runs INTEGER,
+    evm_version VARCHAR(20),
+    source_code TEXT,
+    abi JSONB,
+    constructor_arguments TEXT,
+    verification_status VARCHAR(20) NOT NULL, -- 'pending', 'verified', 'failed'
+    verified_at TIMESTAMP,
+    verification_method VARCHAR(50), -- 'standard_json', 'sourcify', 'multi_file'
+    license VARCHAR(50),
+    created_at TIMESTAMP DEFAULT NOW(),
+    updated_at TIMESTAMP DEFAULT NOW(),
+    UNIQUE(chain_id, address)
+)
+```
+
+**Indexes**:
+- `idx_contracts_chain_address` ON (chain_id, address)
+- `idx_contracts_chain_verified` ON (chain_id, verification_status)
+
+**Relationships**:
+- One-to-one with `contract_abis` (if separate ABI storage)
+
+### Contract ABI Schema
+
+**Table**: `contract_abis`
+
+**Purpose**: Store contract ABIs for decoding (can be separate from verification).
+
+**Fields**:
+```sql
+contract_abis (
+    id BIGSERIAL PRIMARY KEY,
+    chain_id INTEGER NOT NULL,
+    address VARCHAR(42) NOT NULL,
+    abi JSONB NOT NULL,
+    source VARCHAR(50) NOT NULL, -- 'verification', 'sourcify', 'public', 'user_submitted'
+    verified BOOLEAN DEFAULT false,
+    created_at TIMESTAMP DEFAULT NOW(),
+    updated_at TIMESTAMP DEFAULT NOW(),
+    UNIQUE(chain_id, address)
+)
+```
+
+**Indexes**:
+- `idx_abis_chain_address` ON (chain_id, address)
+
+## Address-Related Models
+
+### Address Labels Schema
+
+**Table**: `address_labels`
+
+**Purpose**: User-defined and public labels for addresses.
+
+**Fields**:
+```sql
+address_labels (
+    id BIGSERIAL PRIMARY KEY,
+    chain_id INTEGER NOT NULL,
+    address VARCHAR(42) NOT NULL,
+    label VARCHAR(255) NOT NULL,
+    label_type VARCHAR(20) NOT NULL, -- 'user', 'public', 'contract_name'
+    user_id UUID, -- NULL for public labels
+    source VARCHAR(50), -- 'user', 'etherscan', 'blockscout', etc.
+    created_at TIMESTAMP DEFAULT NOW(),
+    updated_at TIMESTAMP DEFAULT NOW(),
+    UNIQUE(chain_id, address, label_type, user_id)
+)
+```
+
+**Indexes**:
+- `idx_labels_chain_address` ON (chain_id, address)
+- `idx_labels_chain_user` ON (chain_id, user_id)
+
+### Address Tags Schema
+
+**Table**: `address_tags`
+
+**Purpose**: Categorize addresses (e.g., "exchange", "defi", "wallet").
+
+**Fields**:
+```sql
+address_tags (
+    id BIGSERIAL PRIMARY KEY,
+    chain_id INTEGER NOT NULL,
+    address VARCHAR(42) NOT NULL,
+    tag VARCHAR(50) NOT NULL,
+    tag_type VARCHAR(20) NOT NULL, -- 'category', 'risk', 'protocol'
+    user_id UUID, -- NULL for public tags
+    created_at TIMESTAMP DEFAULT NOW(),
+    UNIQUE(chain_id, address, tag, user_id)
+)
+```
+
+**Indexes**:
+- `idx_tags_chain_address` ON (chain_id, address)
+- `idx_tags_chain_tag` ON (chain_id, tag)
+
+## User-Related Models
+
+### User Accounts Schema
+
+**Table**: `users`
+
+**Purpose**: User accounts for watchlists, alerts, preferences.
+
+**Fields**:
+```sql
+users (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    email VARCHAR(255) UNIQUE,
+    username VARCHAR(100) UNIQUE,
+    password_hash TEXT, -- If using password auth
+    api_key_hash TEXT, -- Hashed API key
+    created_at TIMESTAMP DEFAULT NOW(),
+    updated_at TIMESTAMP DEFAULT NOW(),
+    last_login_at TIMESTAMP
+)
+```
+
+**Indexes**:
+- `idx_users_email` ON (email)
+- `idx_users_username` ON (username)
+
+### Watchlists Schema
+
+**Table**: `watchlists`
+
+**Purpose**: User-defined lists of addresses to monitor.
+
+**Fields**:
+```sql
+watchlists (
+    id BIGSERIAL PRIMARY KEY,
+    user_id UUID NOT NULL,
+    chain_id INTEGER NOT NULL,
+    address VARCHAR(42) NOT NULL,
+    label VARCHAR(255),
+    created_at TIMESTAMP DEFAULT NOW(),
+    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
+    UNIQUE(user_id, chain_id, address)
+)
+```
+
+**Indexes**:
+- `idx_watchlists_user` ON (user_id)
+- `idx_watchlists_chain_address` ON (chain_id, address)
+
+## Data Type Definitions
+
+### Numeric Types
+
+- **BIGINT**: Used for block numbers, gas values, nonces (64-bit integers)
+- **NUMERIC**: Used for token amounts, ETH values (arbitrary precision decimals)
+  - Precision: 78 digits (sufficient for wei)
+  - Scale: 0 (integers) or configurable for token decimals
+
+### Address Types
+
+- **VARCHAR(42)**: Ethereum addresses (0x + 40 hex chars)
+- Normalize to lowercase for consistency
+
+### Hash Types
+
+- **VARCHAR(66)**: Transaction/block hashes (0x + 64 hex chars)
+- **TEXT**: For very long hashes or variable-length data
+
+### JSONB Types
+
+- Used for: ABIs, decoded event data, complex nested structures
+- Benefits: Indexing, querying, efficient storage
+
+## Multi-Chain Considerations
+
+### Chain ID Partitioning
+
+All tables include `chain_id` as the first column after primary key:
+- Enables efficient partitioning by chain_id
+- Ensures data isolation between chains
+- Simplifies multi-chain queries
+
+### Partitioning Strategy
+
+**Recommended**: Partition large tables by `chain_id`:
+- `blocks`, `transactions`, `logs` partitioned by chain_id
+- Benefits: Faster queries, easier maintenance, parallel processing
+
+**Implementation** (PostgreSQL):
+```sql
+-- Example partitioning
+CREATE TABLE blocks (
+    -- columns
+) PARTITION BY LIST (chain_id);
+
+CREATE TABLE blocks_chain_138 PARTITION OF blocks FOR VALUES IN (138);
+CREATE TABLE blocks_chain_1 PARTITION OF blocks FOR VALUES IN (1);
+```
+
+## Data Consistency
+
+### Foreign Key Constraints
+
+- Enforce referential integrity where possible
+- Consider performance impact for high-throughput inserts
+- May disable for initial backfill, enable after catch-up
+
+### Unique Constraints
+
+- Prevent duplicate blocks, transactions, logs
+- Enable idempotent processing
+- Use ON CONFLICT for upserts
+
+## Indexing Strategy
+
+### Index Types
+
+1. **B-tree**: Default for most indexes (equality, range queries)
+2. **Hash**: For exact match lookups (addresses, hashes)
+3. **GIN**: For JSONB columns (ABIs, decoded data)
+4. **BRIN**: For large ordered columns (block numbers, timestamps)
+
+### Index Maintenance
+
+- Regular VACUUM and ANALYZE
+- Monitor index bloat
+- Consider partial indexes for filtered queries
+
+## References
+
+- Indexer Architecture: See `indexer-architecture.md`
+- Database Schema: See `../database/postgres-schema.md`
+- Search Index Schema: See `../database/search-index-schema.md`
+
--- a/docs/specs/indexing/indexer-architecture.md
+++ b/docs/specs/indexing/indexer-architecture.md
@@ -0,0 +1,518 @@
+# Indexer Architecture Specification
+
+## Overview
+
+This document specifies the architecture for the blockchain indexing pipeline that ingests, processes, and stores blockchain data from ChainID 138 and other supported chains. The indexer is responsible for maintaining a complete, queryable database of blocks, transactions, logs, traces, and token transfers.
+
+## Architecture
+
+```mermaid
+flowchart TB
+    subgraph Input[Input Layer]
+        Node[RPC Node<br/>ChainID 138]
+        WS[WebSocket<br/>New Block Events]
+    end
+    
+    subgraph Ingest[Ingestion Layer]
+        BL[Block Listener<br/>Real-time]
+        BW[Backfill Worker<br/>Historical]
+        Q[Message Queue<br/>Kafka/RabbitMQ]
+    end
+    
+    subgraph Process[Processing Layer]
+        BP[Block Processor]
+        TP[Transaction Processor]
+        LP[Log Processor]
+        TrP[Trace Processor]
+        TokenP[Token Transfer Processor]
+    end
+    
+    subgraph Decode[Decoding Layer]
+        ABI[ABI Registry]
+        SigDB[Signature Database]
+        Decoder[Event Decoder]
+    end
+    
+    subgraph Persist[Persistence Layer]
+        PG[(PostgreSQL<br/>Canonical Data)]
+        ES[(Elasticsearch<br/>Search Index)]
+        TS[(TimescaleDB<br/>Metrics)]
+    end
+    
+    subgraph Materialize[Materialization Layer]
+        Agg[Aggregator<br/>TPS, Gas Stats]
+        Cache[Cache Layer<br/>Redis]
+    end
+    
+    Node --> BL
+    Node --> BW
+    WS --> BL
+    
+    BL --> Q
+    BW --> Q
+    
+    Q --> BP
+    BP --> TP
+    BP --> LP
+    BP --> TrP
+    
+    TP --> TokenP
+    LP --> Decoder
+    Decoder --> ABI
+    Decoder --> SigDB
+    
+    BP --> PG
+    TP --> PG
+    LP --> PG
+    TrP --> PG
+    TokenP --> PG
+    
+    BP --> ES
+    TP --> ES
+    LP --> ES
+    
+    BP --> TS
+    TP --> TS
+    
+    PG --> Agg
+    Agg --> Cache
+```
+
+## Block Ingestion Pipeline
+
+### Block Listener (Real-time)
+
+**Purpose**: Monitor blockchain for new blocks and ingest them immediately.
+
+**Implementation**:
+- Subscribe to `newHeads` via WebSocket
+- Poll `eth_blockNumber` as fallback (every 2 seconds)
+- Handle WebSocket reconnection automatically
+
+**Flow**:
+1. Receive block header event
+2. Fetch full block data via `eth_getBlockByNumber`
+3. Enqueue block to processing queue
+4. Acknowledge receipt
+
+**Error Handling**:
+- Retry on network errors (exponential backoff)
+- Handle reorgs (see reorg handling section)
+- Log errors for monitoring
+
+### Backfill Worker (Historical)
+
+**Purpose**: Index historical blocks from genesis or a specific starting point.
+
+**Implementation**:
+- Parallel workers for faster indexing
+- Configurable batch size (e.g., 100 blocks per batch)
+- Rate limiting to avoid overloading RPC node
+- Checkpoint system for resuming interrupted backfills
+
+**Flow**:
+1. Determine starting block (checkpoint or genesis)
+2. Fetch batch of blocks
+3. Enqueue each block to processing queue
+4. Update checkpoint
+5. Repeat until caught up with chain head
+
+**Optimization Strategies**:
+- Parallel workers process different block ranges
+- Skip blocks already indexed (idempotent processing)
+- Batch RPC requests where possible
+
+### Message Queue
+
+**Purpose**: Decouple ingestion from processing, enable scaling, ensure durability.
+
+**Technology**: Kafka or RabbitMQ
+
+**Topics/Queues**:
+- `blocks`: New blocks to process
+- `transactions`: Transactions to decode
+- `traces`: Traces to process (async)
+
+**Configuration**:
+- Durability: Persistent storage
+- Replication: 3 replicas for high availability
+- Partitioning: By chain_id and block number (for ordering)
+
+## Transaction Processing Flow
+
+### Block Processing
+
+**Steps**:
+1. **Validate Block**: Verify block hash, parent hash, block number
+2. **Extract Transactions**: Get transaction list from block
+3. **Fetch Receipts**: Get transaction receipts for all transactions
+4. **Process Each Transaction**:
+   - Store transaction data
+   - Process receipt (logs, status)
+   - Extract token transfers (ERC-20/721/1155)
+   - Link to contract interactions
+
+**Data Extracted**:
+- Transaction fields (hash, from, to, value, gas, etc.)
+- Receipt fields (status, gasUsed, logs, etc.)
+- Contract creation detection
+- Token transfer events
+
+### Transaction Decoding
+
+**Purpose**: Decode event logs and transaction data using ABIs.
+
+**Process**:
+1. Identify contract address (to field or created address)
+2. Look up ABI in registry (verified contracts)
+3. Decode function calls and events
+4. Store decoded data for search and filtering
+
+**Fallback Strategies**:
+- Signature database for unknown functions/events (4-byte signatures)
+- Heuristic detection for common patterns (Transfer events)
+- Store raw data when decoding fails
+
+### ABI Registry
+
+**Purpose**: Store contract ABIs for decoding transactions and events.
+
+**Data Sources**:
+- Contract verification submissions
+- Sourcify integration
+- Public ABI repositories (4byte.directory, etc.)
+
+**Storage**:
+- Database table: `contract_abis`
+- Cache layer: Redis for frequently accessed ABIs
+- Versioning: Support multiple ABI versions per contract
+
+**Schema**:
+```sql
+contract_abis (
+    id UUID PRIMARY KEY,
+    chain_id INTEGER NOT NULL,
+    address VARCHAR(42) NOT NULL,
+    abi JSONB NOT NULL,
+    verified BOOLEAN DEFAULT false,
+    source VARCHAR(50), -- 'verification', 'sourcify', 'public'
+    created_at TIMESTAMP,
+    updated_at TIMESTAMP,
+    UNIQUE(chain_id, address)
+)
+```
+
+### Signature Database
+
+**Purpose**: Map 4-byte function signatures and 32-byte event signatures to function/event names.
+
+**Data Sources**:
+- Public signature databases (4byte.directory)
+- User submissions
+- Automatic extraction from verified contracts
+
+**Usage**:
+- Lookup function name from signature (e.g., `0x095ea7b3` → `approve(address,uint256)`)
+- Lookup event name from topic[0] (e.g., `0xddf252...` → `Transfer(address,address,uint256)`)
+- Partial decoding when full ABI unavailable
+
+## Event Log Indexing
+
+### Log Processing
+
+**Purpose**: Index event logs for efficient querying and filtering.
+
+**Process**:
+1. Extract logs from transaction receipts
+2. Decode log topics and data using ABI
+3. Index by:
+   - Contract address
+   - Event signature (topic[0])
+   - Indexed parameters (topic[1..3])
+   - Block number and transaction hash
+   - Log index
+
+**Indexing Strategy**:
+- PostgreSQL table: `logs` with indexes on (address, topic0, block_number)
+- Elasticsearch index: Full-text search on decoded event data
+- Time-series: Aggregate log counts per contract/event
+
+### Event Decoding
+
+**Decoding Flow**:
+1. Identify event signature from topic[0]
+2. Look up event definition in ABI registry
+3. Decode indexed parameters (topics 1-3)
+4. Decode non-indexed parameters (data field)
+5. Store decoded parameters as JSONB
+
+**Common Events to Index**:
+- ERC-20: `Transfer(address,address,uint256)`
+- ERC-721: `Transfer(address,address,uint256)`
+- ERC-1155: `TransferSingle`, `TransferBatch`
+- Approval events: `Approval(address,address,uint256)`
+
+## Trace Processing
+
+### Call Trace Extraction
+
+**Purpose**: Extract detailed call traces for transaction debugging and internal transaction tracking.
+
+**Trace Types**:
+- `call`: Contract calls
+- `create`: Contract creation
+- `suicide`: Contract self-destruct
+- `delegatecall`: Delegate calls
+
+**Process**:
+1. Request trace via `trace_transaction` or `trace_block`
+2. Parse trace result structure
+3. Extract:
+   - Call hierarchy (parent-child relationships)
+   - Internal transactions (value transfers)
+   - Gas usage per call
+   - Revert information
+
+### Internal Transaction Tracking
+
+**Purpose**: Track value transfers that occur inside transactions (not just top-level).
+
+**Data Extracted**:
+- From address (caller)
+- To address (callee)
+- Value transferred
+- Call type (call, delegatecall, etc.)
+- Success/failure status
+- Gas used
+
+**Storage**:
+- Separate table: `internal_transactions`
+- Link to parent transaction via `transaction_hash`
+- Link to parent call via `trace_address` array
+
+## Token Transfer Extraction
+
+### ERC-20 Transfer Detection
+
+**Detection Method**:
+1. Look for `Transfer(address,address,uint256)` event
+2. Decode event parameters (from, to, value)
+3. Store in `token_transfers` table
+4. Update token holder balances
+
+**Data Stored**:
+- Token contract address
+- From address
+- To address
+- Amount (with decimals)
+- Block number
+- Transaction hash
+- Log index
+
+### ERC-721 Transfer Detection
+
+**Similar to ERC-20 but**:
+- Token ID is tracked (unique NFT)
+- Transfer can be from zero address (mint) or to zero address (burn)
+
+### ERC-1155 Transfer Detection
+
+**Events**:
+- `TransferSingle`: Single token transfer
+- `TransferBatch`: Batch token transfer
+
+**Challenges**:
+- Multiple token IDs and amounts per transfer
+- Batch operations require array decoding
+
+### Token Holder Tracking
+
+**Purpose**: Maintain list of addresses holding each token.
+
+**Strategy**:
+- Real-time updates: Update on each transfer
+- Periodic reconciliation: Verify balances via RPC
+- Balance snapshots: Store balance at each block (for historical queries)
+
+## Indexer Worker Scaling and Partitioning
+
+### Horizontal Scaling
+
+**Strategy**: Multiple indexer workers processing different blocks/chains.
+
+**Partitioning Methods**:
+1. **By Chain**: Each worker handles one chain
+2. **By Block Range**: Workers split block ranges (for backfill)
+3. **By Processing Stage**: Separate workers for blocks, traces, token transfers
+
+### Worker Coordination
+
+**Mechanisms**:
+- Message queue: Workers consume from shared queue
+- Database locks: Prevent duplicate processing
+- Leader election: For single-worker tasks (reorg handling)
+
+### Load Balancing
+
+**Distribution**:
+- Round-robin for backfill workers
+- Sticky sessions for chain-specific workers
+- Priority queuing: Real-time blocks before historical blocks
+
+### Performance Targets
+
+**Throughput**:
+- Process 100 blocks/minute per worker
+- Process 1000 transactions/minute per worker
+- Process 100 traces/minute per worker (trace operations are slower)
+
+**Latency**:
+- Real-time blocks: Indexed within 5 seconds of block production
+- Historical blocks: Catch up to chain head within reasonable time
+
+## Data Consistency
+
+### Transaction Isolation
+
+**Strategy**: Process blocks atomically (all or nothing).
+
+**Implementation**:
+- Database transactions for block-level operations
+- Idempotent processing (can safely retry)
+- Checkpoint system to track last processed block
+
+### Idempotency
+
+**Requirements**:
+- Processing same block multiple times should not create duplicates
+- Use unique constraints in database
+- Upsert operations where applicable
+
+## Error Handling and Retry Logic
+
+### Error Types
+
+1. **Transient Errors**: Network issues, temporary RPC failures
+   - Retry with exponential backoff
+   - Max retries: 10
+   - Max backoff: 5 minutes
+
+2. **Permanent Errors**: Invalid data, unsupported features
+   - Log error and skip
+   - Alert for investigation
+
+3. **Reorg Errors**: Block replaced by different block
+   - Handle via reorg detection (see reorg handling spec)
+
+### Retry Strategy
+
+**Exponential Backoff**:
+- Initial delay: 1 second
+- Multiplier: 2x
+- Max delay: 5 minutes
+- Jitter: Random ±20% to avoid thundering herd
+
+## Monitoring and Observability
+
+### Key Metrics
+
+**Throughput**:
+- Blocks processed per minute
+- Transactions processed per minute
+- Logs indexed per minute
+
+**Latency**:
+- Time from block production to index completion
+- Time to process block (p50, p95, p99)
+
+**Lag**:
+- Block height lag (current block - last indexed block)
+- Time lag (current time - last indexed block time)
+
+**Errors**:
+- Error rate by type
+- Retry count
+- Failed blocks
+
+### Alerting Rules
+
+- Block lag > 10 blocks: Warning
+- Block lag > 100 blocks: Critical
+- Error rate > 1%: Warning
+- Error rate > 5%: Critical
+- Worker down: Critical
+
+## Integration Points
+
+### RPC Node Integration
+
+- See `../infrastructure/node-rpc-architecture.md`
+- Connection pooling
+- Rate limiting awareness
+- Failover handling
+
+### Database Integration
+
+- See `../database/postgres-schema.md`
+- Connection pooling
+- Batch inserts for performance
+- Transaction management
+
+### Search Integration
+
+- See `../database/search-index-schema.md`
+- Async indexing to Elasticsearch
+- Bulk indexing for efficiency
+
+## Implementation Guidelines
+
+### Technology Stack
+
+**Recommended**:
+- **Language**: Go, Rust, or Python (performance considerations)
+- **Queue**: Kafka (high throughput) or RabbitMQ (simpler setup)
+- **Database**: PostgreSQL with connection pooling
+- **Caching**: Redis for frequently accessed data
+
+### Code Structure
+
+```
+indexer/
+├── cmd/
+│   ├── block-listener/      # Real-time block listener
+│   ├── backfill-worker/     # Historical indexing worker
+│   └── processor/           # Block/transaction processor
+├── internal/
+│   ├── ingestion/           # Ingestion logic
+│   ├── processing/          # Processing logic
+│   ├── decoding/            # ABI/signature decoding
+│   └── persistence/         # Database operations
+└── pkg/
+    ├── abi/                 # ABI registry
+    └── rpc/                 # RPC client
+```
+
+### Testing Strategy
+
+**Unit Tests**:
+- Decoding logic
+- Data transformation
+- Error handling
+
+**Integration Tests**:
+- End-to-end block processing
+- Database operations
+- Queue integration
+
+**Load Tests**:
+- Process historical blocks
+- Simulate high block production rate
+- Test worker scaling
+
+## References
+
+- Data Models: See `data-models.md`
+- Reorg Handling: See `reorg-handling.md`
+- Database Schema: See `../database/postgres-schema.md`
+- RPC Architecture: See `../infrastructure/node-rpc-architecture.md`
+
--- a/docs/specs/indexing/reorg-handling.md
+++ b/docs/specs/indexing/reorg-handling.md
@@ -0,0 +1,353 @@
+# Reorg Handling Specification
+
+## Overview
+
+This document specifies how the indexer handles blockchain reorganizations (reorgs), where the canonical chain changes and previously indexed blocks become invalid. Reorg handling ensures data consistency and maintains accurate blockchain state.
+
+## Reorg Detection
+
+### Detection Methods
+
+**1. Block Hash Mismatch**
+- Compare stored block hash with RPC node block hash
+- If mismatch detected, reorg has occurred
+
+**2. Parent Hash Validation**
+- Verify each block's parent_hash matches previous block's hash
+- Chain break indicates reorg point
+
+**3. Block Height Comparison**
+- Monitor chain head block number
+- Sudden decrease indicates potential reorg
+
+**4. Chain Head Monitoring**
+- Poll `eth_blockNumber` periodically
+- Compare with last indexed block number
+- Detect rollback scenarios
+
+### Detection Strategy
+
+**Real-time Monitoring**:
+- Check block hash after each new block ingestion
+- Compare with RPC node's block hash for same block number
+- Immediate detection for recent blocks
+
+**Periodic Validation**:
+- Validate last N blocks (e.g., 100 blocks) every M seconds
+- Detect deep reorgs that may have been missed
+- Background validation job
+
+**Checkpoint Validation**:
+- Validate checkpoint blocks (every 1000 blocks)
+- Ensure checkpoint blocks still exist in canonical chain
+- Detect deep reorgs quickly
+
+## Reorg Handling Flow
+
+```mermaid
+flowchart TB
+    Detect[Detect Reorg]
+    Identify[Identify Reorg Point<br/>Find Common Ancestor]
+    Mark[Mark Blocks as Orphaned]
+    Delete[Delete Orphaned Data]
+    Reindex[Re-index New Chain]
+    Verify[Verify Data Consistency]
+    
+    Detect --> Identify
+    Identify --> Mark
+    Mark --> Delete
+    Delete --> Reindex
+    Reindex --> Verify
+    Verify --> Done[Complete]
+    
+    Verify -->|Inconsistency| Delete
+```
+
+### Step 1: Identify Reorg Point
+
+**Goal**: Find the common ancestor block (last block that's still valid).
+
+**Algorithm**:
+1. Start from current chain head
+2. Compare block hash with stored hash at each block number
+3. When hash matches, that's the common ancestor
+4. All blocks after common ancestor need reorg handling
+
+**Optimization**:
+- Binary search to find reorg point quickly
+- Start from recent blocks and work backwards
+- Cache last N block hashes for faster comparison
+
+### Step 2: Mark Blocks as Orphaned
+
+**Strategy**: Mark blocks as orphaned before deletion (for audit trail).
+
+**Database Update**:
+```sql
+UPDATE blocks 
+SET orphaned = true, orphaned_at = NOW()
+WHERE chain_id = ? 
+  AND block_number > ?
+  AND orphaned = false;
+```
+
+**Benefits**:
+- Audit trail of reorgs
+- Ability to query orphaned blocks
+- Easier debugging
+
+### Step 3: Delete Orphaned Data
+
+**Cascade Deletion Order**:
+1. Token transfers (depends on transactions)
+2. Logs (depends on transactions)
+3. Traces (depends on transactions)
+4. Internal transactions (depends on transactions)
+5. Transactions (depends on blocks)
+6. Blocks (orphaned blocks)
+
+**Implementation**:
+- Use database transactions for atomicity
+- Cascade deletes via foreign keys (if enabled)
+- Or explicit deletion in correct order
+
+**Performance Considerations**:
+- Batch deletions for large reorgs
+- Index on `block_number` for efficient deletion
+- Consider soft deletes (mark as deleted) vs hard deletes
+
+### Step 4: Re-index New Chain
+
+**Process**:
+1. Fetch new blocks from RPC node (from reorg point onward)
+2. Process blocks through normal indexing pipeline
+3. Verify each block before marking as indexed
+
+**Optimization**:
+- Parallel processing for multiple blocks (if safe)
+- Use existing indexer workers
+- Priority queue: reorg blocks before new blocks
+
+### Step 5: Verify Data Consistency
+
+**Validation Checks**:
+- Block hashes match RPC node
+- Transaction counts match
+- Parent hashes form valid chain
+- No gaps in block numbers
+
+**Metrics**:
+- Reorg depth (number of blocks affected)
+- Reorg duration (time to handle)
+- Data consistency verification
+
+## Data Consistency Guarantees
+
+### Transaction Isolation
+
+**Strategy**: Use database transactions to ensure atomic reorg handling.
+
+**Implementation**:
+```sql
+BEGIN;
+-- Mark blocks as orphaned
+-- Delete orphaned data
+-- Insert new blocks
+-- Verify consistency
+COMMIT;
+```
+
+**Rollback**: If any step fails, rollback entire operation.
+
+### Idempotency
+
+**Requirement**: Reorg handling must be idempotent (safe to retry).
+
+**Mechanisms**:
+- Check block hash before processing
+- Skip blocks already correctly indexed
+- Use unique constraints to prevent duplicates
+
+### Finality Considerations
+
+**Reorg Depth Limits**:
+- Only handle reorgs within finality window
+- For PoW: Typically 12-100 blocks deep
+- For PoS: Typically 1-2 epochs (32-64 blocks)
+- For BFT: Typically immediate finality
+
+**Configuration**:
+- Configurable reorg depth limit per chain
+- Skip reorgs beyond finality window (log for investigation)
+
+## Performance Optimization
+
+### Handling Frequent Reorgs
+
+**Problem**: Some chains have frequent small reorgs (1-2 blocks).
+
+**Solution**:
+- Batch reorg handling (wait for stability)
+- Detect reorgs but delay handling for short period
+- Only handle if reorg persists
+
+**Configuration**:
+- Reorg confirmation period: 30 seconds
+- Maximum reorg depth to handle immediately: 5 blocks
+- Deeper reorgs: Manual investigation
+
+### Handling Deep Reorgs
+
+**Problem**: Deep reorgs require deleting and re-indexing many blocks.
+
+**Optimization Strategies**:
+1. **Parallel Processing**: Process new chain blocks in parallel
+2. **Batch Operations**: Batch database operations
+3. **Incremental Updates**: Only update changed data
+4. **Selective Deletion**: Only delete affected data (not entire blocks if possible)
+
+### Index Maintenance
+
+**During Reorg**:
+- Pause index updates temporarily
+- Resume after reorg handling complete
+- Rebuild affected indexes if necessary
+
+## Monitoring and Alerting
+
+### Metrics
+
+**Reorg Metrics**:
+- Reorg count (per chain, per time period)
+- Reorg depth distribution
+- Reorg handling duration (p50, p95, p99)
+- Data consistency check results
+
+**Alerting Rules**:
+- Reorg depth > 10 blocks: Warning (investigate)
+- Reorg depth > 100 blocks: Critical (potential chain issue)
+- Reorg handling duration > 5 minutes: Warning
+- Data consistency check failure: Critical
+
+### Logging
+
+**Log Events**:
+- Reorg detection (block number, depth)
+- Reorg point identification (common ancestor)
+- Blocks orphaned (count, block numbers)
+- Re-indexing progress
+- Data consistency verification results
+
+**Log Levels**:
+- INFO: Normal reorgs (< 5 blocks)
+- WARN: Unusual reorgs (5-10 blocks)
+- ERROR: Deep reorgs (> 10 blocks) or failures
+
+## Edge Cases
+
+### Multiple Reorgs in Quick Succession
+
+**Scenario**: Chain reorgs, then reorgs again before first reorg is handled.
+
+**Handling**:
+- Cancel in-progress reorg handling
+- Start new reorg handling from latest state
+- Ensure idempotency
+
+### Reorg During Backfill
+
+**Scenario**: Historical block indexing encounters a reorg.
+
+**Handling**:
+- Pause backfill
+- Handle reorg
+- Resume backfill from reorg point
+
+### Concurrent Reorg Handling
+
+**Prevention**:
+- Use database locks to prevent concurrent reorg handling
+- Single reorg handler per chain
+- Queue reorg events if handler is busy
+
+## Recovery Procedures
+
+### Manual Reorg Handling
+
+**When to Use**:
+- Automatic handling fails
+- Deep reorgs beyond normal limits
+- Data corruption detected
+
+**Procedure**:
+1. Identify reorg point manually
+2. Verify with RPC node
+3. Mark blocks as orphaned
+4. Delete orphaned data
+5. Trigger re-indexing
+6. Verify data consistency
+
+### Data Recovery
+
+**Backup Strategy**:
+- Regular database backups
+- Point-in-time recovery capability
+- Ability to restore to pre-reorg state
+
+**Recovery Steps**:
+1. Restore database to point before reorg
+2. Re-run indexer from that point
+3. Let automatic reorg handling process naturally
+
+## Testing Strategy
+
+### Unit Tests
+
+- Reorg detection logic
+- Common ancestor identification
+- Orphan marking
+- Data deletion logic
+
+### Integration Tests
+
+- Simulate reorgs (testnet with known reorgs)
+- Verify data consistency after reorg
+- Test concurrent reorg handling
+- Test deep reorgs
+
+### Load Tests
+
+- Simulate frequent reorgs
+- Measure performance impact
+- Test reorg handling during high load
+
+## Configuration
+
+### Reorg Handling Configuration
+
+```yaml
+reorg:
+  detection:
+    check_interval_seconds: 10
+    validation_depth: 100
+    checkpoint_interval: 1000
+  
+  handling:
+    max_depth: 100
+    confirmation_period_seconds: 30
+    batch_size: 1000
+    parallel_processing: true
+  
+  finality:
+    chain_138:
+      type: "poa" # or "pos", "pow", "bft"
+      depth_limit: 50
+      finality_blocks: 12
+```
+
+## References
+
+- Indexer Architecture: See `indexer-architecture.md`
+- Data Models: See `data-models.md`
+- Database Schema: See `../database/postgres-schema.md`
+
--- a/docs/specs/indexing/verification-pipeline.md
+++ b/docs/specs/indexing/verification-pipeline.md
@@ -0,0 +1,523 @@
+# Contract Verification Pipeline Specification
+
+## Overview
+
+This document specifies the pipeline for verifying smart contracts on the explorer platform. Contract verification allows users to submit source code, which is compiled and compared against deployed bytecode to enable source code viewing, debugging, and ABI extraction.
+
+## Architecture
+
+```mermaid
+flowchart TB
+    subgraph Submit[Submission]
+        User[User Submits<br/>Source Code]
+        UI[Verification UI]
+        API[Verification API]
+    end
+    
+    subgraph Validate[Validation]
+        Val[Validate Input]
+        Check[Check Contract Exists]
+        Dup[Check Duplicate]
+    end
+    
+    subgraph Compile[Compilation]
+        Comp[Compiler Service]
+        Versions[Compiler Version<br/>Registry]
+        Build[Build Artifacts]
+    end
+    
+    subgraph Verify[Verification]
+        Match[Bytecode Matching]
+        Construct[Constructor Args<br/>Extraction]
+        MatchResult[Match Result]
+    end
+    
+    subgraph Store[Storage]
+        DB[(Database)]
+        Artifacts[Artifact Storage<br/>S3/Immutable]
+        ABI[ABI Registry]
+    end
+    
+    User --> UI
+    UI --> API
+    API --> Val
+    Val --> Check
+    Check --> Dup
+    Dup --> Comp
+    Comp --> Versions
+    Comp --> Build
+    Build --> Match
+    Match --> Construct
+    Construct --> MatchResult
+    MatchResult --> DB
+    MatchResult --> Artifacts
+    MatchResult --> ABI
+```
+
+## Source Code Submission Workflow
+
+### Submission Methods
+
+**1. Standard JSON Input** (Recommended)
+- Submit Solidity compiler's standard JSON input format
+- Includes source files, compiler settings, optimization
+- Most reliable for complex contracts
+
+**2. Multi-file Upload**
+- Upload individual source files
+- Specify compiler version and settings
+- Compiler constructs standard JSON input
+
+**3. Sourcify Integration**
+- Verify via Sourcify API
+- Automatic source code and metadata retrieval
+- Supports verified contracts from Sourcify registry
+
+**4. Flattened Source**
+- Single flattened source file
+- All imports inlined
+- Simpler but less flexible
+
+### Submission API
+
+**Endpoint**: `POST /api/v1/contracts/{address}/verify`
+
+**Request Body**:
+```json
+{
+  "chain_id": 138,
+  "address": "0x...",
+  "compiler_version": "v0.8.19+commit.7dd6d404",
+  "optimization_enabled": true,
+  "optimization_runs": 200,
+  "evm_version": "london",
+  "source_code": "...", // or standard_json_input
+  "constructor_arguments": "0x...",
+  "library_addresses": {
+    "Lib1": "0x..."
+  },
+  "verification_method": "standard_json"
+}
+```
+
+**Response**:
+```json
+{
+  "status": "pending",
+  "verification_id": "uuid",
+  "message": "Verification submitted"
+}
+```
+
+### Input Validation
+
+**Validation Rules**:
+1. **Contract Address**: Must be valid Ethereum address, must exist on chain
+2. **Compiler Version**: Must be supported compiler version
+3. **Source Code**: Must be valid Solidity/Vyper code
+4. **Constructor Arguments**: Must match deployed contract (if provided)
+5. **Library Addresses**: Must match deployed libraries (if provided)
+
+**Error Handling**:
+- Invalid address: 400 Bad Request
+- Unsupported compiler: 400 Bad Request
+- Invalid source code: 400 Bad Request
+- Contract not found: 404 Not Found
+
+## Compiler Version Management
+
+### Compiler Registry
+
+**Purpose**: Manage available compiler versions and their metadata.
+
+**Storage**:
+```sql
+compiler_versions (
+    id SERIAL PRIMARY KEY,
+    version VARCHAR(50) UNIQUE NOT NULL,
+    compiler_type VARCHAR(20) NOT NULL, -- 'solidity', 'vyper'
+    evm_version VARCHAR(20),
+    optimizer_available BOOLEAN DEFAULT true,
+    download_url TEXT,
+    checksum VARCHAR(64),
+    installed BOOLEAN DEFAULT false,
+    installed_path TEXT,
+    created_at TIMESTAMP DEFAULT NOW()
+)
+```
+
+### Compiler Installation
+
+**Methods**:
+1. **Pre-installed**: Common versions pre-installed on compilation servers
+2. **On-demand**: Download and install when needed
+3. **Docker**: Use compiler Docker images (isolated, reproducible)
+
+**Recommended**: Docker-based compilation for isolation and reproducibility.
+
+**Docker Setup**:
+```dockerfile
+FROM ethereum/solc:0.8.19
+# Or use solc-select for version management
+```
+
+### Version Selection
+
+**Strategy**:
+- Exact match: User specifies exact version
+- Pragma matching: Extract version from source code pragma
+- Latest compatible: Use latest compatible version if exact not available
+
+**Pragma Parsing**:
+- Extract `pragma solidity ^0.8.0;` or `>=0.8.0 <0.9.0`
+- Resolve to specific compiler version
+- Handle caret (^), tilde (~), and range operators
+
+## Compilation Process
+
+### Standard JSON Input Format
+
+**Structure**:
+```json
+{
+  "language": "Solidity",
+  "sources": {
+    "Contract.sol": {
+      "content": "pragma solidity ^0.8.0; ..."
+    }
+  },
+  "settings": {
+    "optimizer": {
+      "enabled": true,
+      "runs": 200
+    },
+    "evmVersion": "london",
+    "outputSelection": {
+      "*": {
+        "*": ["abi", "evm.bytecode", "evm.deployedBytecode"]
+      }
+    }
+  }
+}
+```
+
+### Compilation Steps
+
+1. **Prepare Input**: Construct standard JSON input from user submission
+2. **Select Compiler**: Choose appropriate compiler version
+3. **Resolve Imports**: Handle import statements (local files, external URLs)
+4. **Compile**: Execute compiler with standard JSON input
+5. **Extract Artifacts**: Extract ABI, bytecode, deployed bytecode
+6. **Handle Errors**: Parse compilation errors and return to user
+
+### Import Resolution
+
+**Import Types**:
+- **Local Files**: Included in submission
+- **External URLs**: Fetch from URL (GitHub, IPFS, etc.)
+- **Standard Libraries**: Known library addresses (OpenZeppelin, etc.)
+
+**Resolution Strategy**:
+1. Check local files first
+2. Try external URL fetching
+3. Check standard library registry
+4. Fail if cannot resolve
+
+### Optimization Settings
+
+**Optimizer Configuration**:
+- **Enabled**: Boolean flag
+- **Runs**: Optimization runs (affects bytecode size vs gas cost)
+- **EVN Version**: Target EVM version (affects bytecode generation)
+
+**Matching Strategy**:
+- Must match deployed contract's optimization settings exactly
+- Try multiple optimization combinations if initial match fails
+
+## Bytecode Matching
+
+### Matching Process
+
+**Goal**: Compare compiled bytecode with deployed bytecode.
+
+**Steps**:
+1. Fetch deployed bytecode from chain via `eth_getCode(address)`
+2. Extract deployed bytecode from compilation artifacts
+3. Compare bytecodes (exact match required)
+4. Handle constructor arguments (trimmed from deployed bytecode)
+
+### Bytecode Normalization
+
+**Normalization Steps**:
+1. Remove metadata hash (last 53 bytes)
+2. Remove constructor arguments (if contract creation)
+3. Compare remaining bytecode
+
+**Metadata Hash**:
+- Solidity appends metadata hash to bytecode
+- Format: `0xa2646970667358221220...` + 43 bytes
+- Should be excluded from comparison
+
+### Constructor Arguments Extraction
+
+**Purpose**: Extract constructor arguments from deployed bytecode.
+
+**Process**:
+1. Compiled bytecode: `creation_code + constructor_args`
+2. Deployed bytecode: `runtime_code` (constructor args removed)
+3. Extract constructor args: `deployed_bytecode.length - runtime_code.length`
+
+**Validation**:
+- Verify extracted constructor args match user-provided args (if provided)
+- Decode constructor args if ABI available
+
+### Library Linking
+
+**Problem**: Contracts using libraries have placeholders in bytecode.
+
+**Solution**:
+1. Identify library placeholders in compiled bytecode
+2. Replace placeholders with actual library addresses
+3. Compare linked bytecode with deployed bytecode
+
+**Library Placeholder Format**:
+- `__$...$__` (Solidity)
+- Must match user-provided library addresses
+
+## Verification Status Tracking
+
+### Status States
+
+**States**:
+1. **pending**: Verification submitted, queued for processing
+2. **processing**: Compilation/verification in progress
+3. **verified**: Bytecode matches, contract verified
+4. **failed**: Verification failed (mismatch, compilation error, etc.)
+5. **partially_verified**: Some source files verified (multi-file contracts)
+
+### Status Updates
+
+**Database Schema**:
+```sql
+contract_verifications (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    chain_id INTEGER NOT NULL,
+    address VARCHAR(42) NOT NULL,
+    status VARCHAR(20) NOT NULL,
+    compiler_version VARCHAR(50),
+    optimization_enabled BOOLEAN,
+    optimization_runs INTEGER,
+    evm_version VARCHAR(20),
+    source_code TEXT,
+    abi JSONB,
+    constructor_arguments TEXT,
+    verification_method VARCHAR(50),
+    error_message TEXT,
+    verified_at TIMESTAMP,
+    created_at TIMESTAMP DEFAULT NOW(),
+    updated_at TIMESTAMP DEFAULT NOW(),
+    FOREIGN KEY (chain_id, address) REFERENCES contracts(chain_id, address)
+)
+```
+
+**Status Transitions**:
+- `pending` → `processing` → `verified` or `failed`
+- Webhook/notification on status change (optional)
+
+## Build Artifact Storage
+
+### Artifact Types
+
+**Artifacts to Store**:
+1. **Source Code**: Original submitted source files
+2. **Standard JSON Input**: Compiler input
+3. **Compiler Output**: Full compiler JSON output
+4. **ABI**: Extracted ABI
+5. **Bytecode**: Creation and runtime bytecode
+6. **Metadata**: Compiler metadata
+
+### Storage Strategy
+
+**Immutable Storage**:
+- Use S3-compatible storage (AWS S3, MinIO, etc.)
+- Immutable after verification (no updates)
+- Versioned storage if updates needed
+
+**Storage Path Structure**:
+```
+contracts/{chain_id}/{address}/verification_{id}/
+  - source_code.sol
+  - standard_json_input.json
+  - compiler_output.json
+  - abi.json
+  - bytecode.txt
+  - metadata.json
+```
+
+**Database Reference**:
+- Store artifact storage path in database
+- Link to contract record
+
+### Access Control
+
+**Public Access**:
+- Verified contracts: Public read access
+- Source code: Public read access
+- Artifacts: Public read access
+
+**Private Access**:
+- Pending verifications: Owner only
+- Failed verifications: Owner only (optional public)
+
+## Sourcify Integration
+
+### Sourcify API
+
+**Endpoint**: `GET /api/v1/verify/{chain_id}/{address}`
+
+**Process**:
+1. Query Sourcify API for contract verification
+2. Retrieve source files and metadata
+3. Verify match with deployed bytecode
+4. Store in our database if match
+
+**Benefits**:
+- Leverage existing verified contracts
+- Automatic verification for popular contracts
+- Reduces manual verification workload
+
+### Sourcify Format
+
+**Structure**:
+```
+contracts/
+  - {chain_id}/
+    - {address}/
+      - metadata.json
+      - sources/
+        - Contract.sol
+```
+
+**Metadata Format**:
+- Compiler version
+- Settings
+- Source file mapping
+
+## Multi-Compiler Version Support
+
+### Supported Compilers
+
+**Solidity**:
+- Versions: 0.4.x through latest
+- Multiple versions per contract (updates)
+
+**Vyper**:
+- Versions: 0.1.x through latest
+- Similar workflow to Solidity
+
+### Version Compatibility
+
+**Handling**:
+- Support multiple verification attempts with different versions
+- Store all verification attempts (history)
+- Mark latest successful verification as active
+
+**Database Schema**:
+```sql
+contract_verifications (
+    -- ... fields ...
+    version INTEGER DEFAULT 1, -- Increment for each new verification
+    is_active BOOLEAN DEFAULT true -- Latest successful verification
+)
+```
+
+## Error Handling
+
+### Compilation Errors
+
+**Error Types**:
+- Syntax errors
+- Type errors
+- Import resolution errors
+- Optimization errors
+
+**Response**:
+- Return detailed error messages to user
+- Include file and line number
+- Suggest fixes when possible
+
+### Verification Failures
+
+**Failure Reasons**:
+- Bytecode mismatch
+- Constructor arguments mismatch
+- Library address mismatch
+- Optimization settings mismatch
+
+**Response**:
+- Return specific mismatch reason
+- Suggest correct settings if possible
+- Allow retry with corrected input
+
+## Performance Considerations
+
+### Compilation Performance
+
+**Optimization**:
+- Cache compilation results (same source + settings)
+- Parallel compilation for multiple contracts
+- Compiler server pool for load distribution
+
+### Queue Management
+
+**Queue System**:
+- Use message queue (RabbitMQ, Kafka) for verification jobs
+- Priority queue: User submissions before automated checks
+- Rate limiting per user/IP
+
+**Processing Time**:
+- Target: < 30 seconds for simple contracts
+- Target: < 5 minutes for complex contracts
+- Timeout: 10 minutes maximum
+
+## Security Considerations
+
+### Source Code Validation
+
+**Validation**:
+- Validate source code size (max 10MB)
+- Sanitize input to prevent injection attacks
+- Validate compiler version (whitelist known versions)
+
+### Artifact Storage Security
+
+**Access Control**:
+- Verify ownership before allowing updates
+- Audit log all verification submissions
+- Rate limit submissions per user/IP
+
+## API Endpoints
+
+### Submit Verification
+
+`POST /api/v1/contracts/{address}/verify`
+
+### Check Status
+
+`GET /api/v1/contracts/{address}/verification/{verification_id}`
+
+### Get Verified Contract
+
+`GET /api/v1/contracts/{address}`
+
+### List Verification History
+
+`GET /api/v1/contracts/{address}/verifications`
+
+## References
+
+- Indexer Architecture: See `indexer-architecture.md`
+- Data Models: See `data-models.md`
+- Database Schema: See `../database/postgres-schema.md`
+- API Specification: See `../api/rest-api.md`
+