Add full monorepo: virtual-banker, backend, frontend, docs, scripts, deployment

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
defiQUG
2026-02-10 11:32:49 -08:00
parent aafcd913c2
commit 88bc76da91
815 changed files with 125522 additions and 264 deletions

View File

@@ -0,0 +1,566 @@
# Data Models Specification
## Overview
This document specifies the data models used throughout the indexing pipeline and stored in the database. All models support multi-chain operation via a `chain_id` field.
## Core Data Models
### Block Schema
**Table**: `blocks`
**Fields**:
```sql
blocks (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
number BIGINT NOT NULL,
hash VARCHAR(66) NOT NULL,
parent_hash VARCHAR(66) NOT NULL,
nonce VARCHAR(18),
sha3_uncles VARCHAR(66),
logs_bloom TEXT,
transactions_root VARCHAR(66),
state_root VARCHAR(66),
receipts_root VARCHAR(66),
miner VARCHAR(42),
difficulty NUMERIC,
total_difficulty NUMERIC,
size BIGINT,
extra_data TEXT,
gas_limit BIGINT,
gas_used BIGINT,
timestamp TIMESTAMP NOT NULL,
transaction_count INTEGER DEFAULT 0,
base_fee_per_gas BIGINT, -- EIP-1559
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
UNIQUE(chain_id, number),
UNIQUE(chain_id, hash)
)
```
**Indexes**:
- `idx_blocks_chain_number` ON (chain_id, number)
- `idx_blocks_chain_hash` ON (chain_id, hash)
- `idx_blocks_chain_timestamp` ON (chain_id, timestamp)
**Relationships**:
- One-to-many with `transactions`
- One-to-many with `logs`
### Transaction Schema
**Table**: `transactions`
**Fields**:
```sql
transactions (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
hash VARCHAR(66) NOT NULL,
block_number BIGINT NOT NULL,
block_hash VARCHAR(66) NOT NULL,
transaction_index INTEGER NOT NULL,
from_address VARCHAR(42) NOT NULL,
to_address VARCHAR(42), -- NULL for contract creation
value NUMERIC NOT NULL DEFAULT 0,
gas_price BIGINT,
max_fee_per_gas BIGINT, -- EIP-1559
max_priority_fee_per_gas BIGINT, -- EIP-1559
gas_limit BIGINT NOT NULL,
gas_used BIGINT,
nonce BIGINT NOT NULL,
input_data TEXT, -- Contract call data
status INTEGER, -- 0 = failed, 1 = success
contract_address VARCHAR(42), -- NULL if not contract creation
cumulative_gas_used BIGINT,
effective_gas_price BIGINT, -- Actual gas price paid
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY (chain_id, block_number) REFERENCES blocks(chain_id, number),
UNIQUE(chain_id, hash)
)
```
**Indexes**:
- `idx_transactions_chain_hash` ON (chain_id, hash)
- `idx_transactions_chain_block` ON (chain_id, block_number, transaction_index)
- `idx_transactions_chain_from` ON (chain_id, from_address)
- `idx_transactions_chain_to` ON (chain_id, to_address)
- `idx_transactions_chain_block_from` ON (chain_id, block_number, from_address)
**Relationships**:
- Many-to-one with `blocks`
- One-to-many with `logs`
- One-to-many with `internal_transactions`
- One-to-many with `token_transfers`
### Receipt Schema
**Note**: Receipt data is stored denormalized in the `transactions` table for efficiency. If separate storage is needed:
**Table**: `transaction_receipts`
```sql
transaction_receipts (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
transaction_hash VARCHAR(66) NOT NULL,
transaction_index INTEGER NOT NULL,
block_number BIGINT NOT NULL,
block_hash VARCHAR(66) NOT NULL,
from_address VARCHAR(42) NOT NULL,
to_address VARCHAR(42),
gas_used BIGINT,
cumulative_gas_used BIGINT,
contract_address VARCHAR(42),
logs_bloom TEXT,
status INTEGER,
root VARCHAR(66), -- Pre-Byzantium
created_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
UNIQUE(chain_id, transaction_hash)
)
```
### Log Schema
**Table**: `logs`
**Fields**:
```sql
logs (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
transaction_hash VARCHAR(66) NOT NULL,
block_number BIGINT NOT NULL,
block_hash VARCHAR(66) NOT NULL,
log_index INTEGER NOT NULL,
address VARCHAR(42) NOT NULL,
topic0 VARCHAR(66), -- Event signature
topic1 VARCHAR(66), -- First indexed parameter
topic2 VARCHAR(66), -- Second indexed parameter
topic3 VARCHAR(66), -- Third indexed parameter
data TEXT, -- Non-indexed parameters
decoded_data JSONB, -- Decoded event data (if ABI available)
created_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
UNIQUE(chain_id, transaction_hash, log_index)
)
```
**Indexes**:
- `idx_logs_chain_tx` ON (chain_id, transaction_hash)
- `idx_logs_chain_address` ON (chain_id, address)
- `idx_logs_chain_topic0` ON (chain_id, topic0)
- `idx_logs_chain_block` ON (chain_id, block_number)
- `idx_logs_chain_address_topic0` ON (chain_id, address, topic0) -- For event filtering
**Relationships**:
- Many-to-one with `transactions`
### Trace Schema
**Table**: `traces`
**Fields**:
```sql
traces (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
transaction_hash VARCHAR(66) NOT NULL,
block_number BIGINT NOT NULL,
block_hash VARCHAR(66) NOT NULL,
trace_address INTEGER[], -- Array representing call hierarchy [0,1,2]
subtraces INTEGER, -- Number of child calls
action_type VARCHAR(20) NOT NULL, -- 'call', 'create', 'suicide', 'delegatecall'
action_from VARCHAR(42),
action_to VARCHAR(42),
action_value NUMERIC DEFAULT 0,
action_input TEXT,
action_gas BIGINT,
action_call_type VARCHAR(20), -- 'call', 'delegatecall', 'staticcall'
result_type VARCHAR(20), -- 'callresult', 'createresult'
result_gas_used BIGINT,
result_output TEXT,
result_address VARCHAR(42), -- For create results
result_code TEXT, -- For create results
error TEXT, -- Error message if trace failed
created_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash)
)
```
**Indexes**:
- `idx_traces_chain_tx` ON (chain_id, transaction_hash)
- `idx_traces_chain_block` ON (chain_id, block_number)
- `idx_traces_chain_from` ON (chain_id, action_from)
- `idx_traces_chain_to` ON (chain_id, action_to)
**Note**: Trace data can be large. Consider partitioning or separate storage for historical traces.
### Internal Transaction Schema
**Table**: `internal_transactions`
**Purpose**: Track value transfers that occur within transactions (via calls).
**Fields**:
```sql
internal_transactions (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
transaction_hash VARCHAR(66) NOT NULL,
block_number BIGINT NOT NULL,
trace_address INTEGER[] NOT NULL,
from_address VARCHAR(42) NOT NULL,
to_address VARCHAR(42) NOT NULL,
value NUMERIC NOT NULL,
call_type VARCHAR(20), -- 'call', 'delegatecall', 'staticcall', 'create'
gas_limit BIGINT,
gas_used BIGINT,
input_data TEXT,
output_data TEXT,
error TEXT,
created_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash)
)
```
**Indexes**:
- `idx_internal_tx_chain_tx` ON (chain_id, transaction_hash)
- `idx_internal_tx_chain_from` ON (chain_id, from_address)
- `idx_internal_tx_chain_to` ON (chain_id, to_address)
- `idx_internal_tx_chain_block` ON (chain_id, block_number)
**Relationships**:
- Many-to-one with `transactions`
### Token Transfer Schema
**Table**: `token_transfers`
**Purpose**: Track ERC-20, ERC-721, and ERC-1155 token transfers.
**Fields**:
```sql
token_transfers (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
transaction_hash VARCHAR(66) NOT NULL,
block_number BIGINT NOT NULL,
log_index INTEGER NOT NULL,
token_address VARCHAR(42) NOT NULL,
token_type VARCHAR(10) NOT NULL, -- 'ERC20', 'ERC721', 'ERC1155'
from_address VARCHAR(42) NOT NULL,
to_address VARCHAR(42) NOT NULL,
amount NUMERIC, -- For ERC-20 and ERC-1155
token_id VARCHAR(78), -- For ERC-721 and ERC-1155 (can be large)
operator VARCHAR(42), -- For ERC-1155
created_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
FOREIGN KEY (chain_id, token_address) REFERENCES tokens(chain_id, address)
)
```
**Indexes**:
- `idx_token_transfers_chain_token` ON (chain_id, token_address)
- `idx_token_transfers_chain_from` ON (chain_id, from_address)
- `idx_token_transfers_chain_to` ON (chain_id, to_address)
- `idx_token_transfers_chain_tx` ON (chain_id, transaction_hash)
- `idx_token_transfers_chain_block` ON (chain_id, block_number)
- `idx_token_transfers_chain_token_from` ON (chain_id, token_address, from_address)
- `idx_token_transfers_chain_token_to` ON (chain_id, token_address, to_address)
**Relationships**:
- Many-to-one with `transactions`
- Many-to-one with `tokens`
### Token Schema
**Table**: `tokens`
**Purpose**: Store token metadata (ERC-20, ERC-721, ERC-1155).
**Fields**:
```sql
tokens (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
address VARCHAR(42) NOT NULL,
type VARCHAR(10) NOT NULL, -- 'ERC20', 'ERC721', 'ERC1155'
name VARCHAR(255),
symbol VARCHAR(50),
decimals INTEGER, -- For ERC-20
total_supply NUMERIC,
holder_count INTEGER DEFAULT 0,
transfer_count INTEGER DEFAULT 0,
logo_url TEXT,
website_url TEXT,
description TEXT,
verified BOOLEAN DEFAULT false,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
UNIQUE(chain_id, address)
)
```
**Indexes**:
- `idx_tokens_chain_address` ON (chain_id, address)
- `idx_tokens_chain_type` ON (chain_id, type)
- `idx_tokens_chain_symbol` ON (chain_id, symbol) -- For search
**Relationships**:
- One-to-many with `token_transfers`
- One-to-many with `token_holders` (if maintained)
### Contract Metadata Schema
**Table**: `contracts`
**Purpose**: Store verified contract information.
**Fields**:
```sql
contracts (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
address VARCHAR(42) NOT NULL,
name VARCHAR(255),
compiler_version VARCHAR(50),
optimization_enabled BOOLEAN,
optimization_runs INTEGER,
evm_version VARCHAR(20),
source_code TEXT,
abi JSONB,
constructor_arguments TEXT,
verification_status VARCHAR(20) NOT NULL, -- 'pending', 'verified', 'failed'
verified_at TIMESTAMP,
verification_method VARCHAR(50), -- 'standard_json', 'sourcify', 'multi_file'
license VARCHAR(50),
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
UNIQUE(chain_id, address)
)
```
**Indexes**:
- `idx_contracts_chain_address` ON (chain_id, address)
- `idx_contracts_chain_verified` ON (chain_id, verification_status)
**Relationships**:
- One-to-one with `contract_abis` (if separate ABI storage)
### Contract ABI Schema
**Table**: `contract_abis`
**Purpose**: Store contract ABIs for decoding (can be separate from verification).
**Fields**:
```sql
contract_abis (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
address VARCHAR(42) NOT NULL,
abi JSONB NOT NULL,
source VARCHAR(50) NOT NULL, -- 'verification', 'sourcify', 'public', 'user_submitted'
verified BOOLEAN DEFAULT false,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
UNIQUE(chain_id, address)
)
```
**Indexes**:
- `idx_abis_chain_address` ON (chain_id, address)
## Address-Related Models
### Address Labels Schema
**Table**: `address_labels`
**Purpose**: User-defined and public labels for addresses.
**Fields**:
```sql
address_labels (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
address VARCHAR(42) NOT NULL,
label VARCHAR(255) NOT NULL,
label_type VARCHAR(20) NOT NULL, -- 'user', 'public', 'contract_name'
user_id UUID, -- NULL for public labels
source VARCHAR(50), -- 'user', 'etherscan', 'blockscout', etc.
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
UNIQUE(chain_id, address, label_type, user_id)
)
```
**Indexes**:
- `idx_labels_chain_address` ON (chain_id, address)
- `idx_labels_chain_user` ON (chain_id, user_id)
### Address Tags Schema
**Table**: `address_tags`
**Purpose**: Categorize addresses (e.g., "exchange", "defi", "wallet").
**Fields**:
```sql
address_tags (
id BIGSERIAL PRIMARY KEY,
chain_id INTEGER NOT NULL,
address VARCHAR(42) NOT NULL,
tag VARCHAR(50) NOT NULL,
tag_type VARCHAR(20) NOT NULL, -- 'category', 'risk', 'protocol'
user_id UUID, -- NULL for public tags
created_at TIMESTAMP DEFAULT NOW(),
UNIQUE(chain_id, address, tag, user_id)
)
```
**Indexes**:
- `idx_tags_chain_address` ON (chain_id, address)
- `idx_tags_chain_tag` ON (chain_id, tag)
## User-Related Models
### User Accounts Schema
**Table**: `users`
**Purpose**: User accounts for watchlists, alerts, preferences.
**Fields**:
```sql
users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE,
username VARCHAR(100) UNIQUE,
password_hash TEXT, -- If using password auth
api_key_hash TEXT, -- Hashed API key
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
last_login_at TIMESTAMP
)
```
**Indexes**:
- `idx_users_email` ON (email)
- `idx_users_username` ON (username)
### Watchlists Schema
**Table**: `watchlists`
**Purpose**: User-defined lists of addresses to monitor.
**Fields**:
```sql
watchlists (
id BIGSERIAL PRIMARY KEY,
user_id UUID NOT NULL,
chain_id INTEGER NOT NULL,
address VARCHAR(42) NOT NULL,
label VARCHAR(255),
created_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
UNIQUE(user_id, chain_id, address)
)
```
**Indexes**:
- `idx_watchlists_user` ON (user_id)
- `idx_watchlists_chain_address` ON (chain_id, address)
## Data Type Definitions
### Numeric Types
- **BIGINT**: Used for block numbers, gas values, nonces (64-bit integers)
- **NUMERIC**: Used for token amounts, ETH values (arbitrary precision decimals)
- Precision: 78 digits (sufficient for wei)
- Scale: 0 (integers) or configurable for token decimals
### Address Types
- **VARCHAR(42)**: Ethereum addresses (0x + 40 hex chars)
- Normalize to lowercase for consistency
### Hash Types
- **VARCHAR(66)**: Transaction/block hashes (0x + 64 hex chars)
- **TEXT**: For very long hashes or variable-length data
### JSONB Types
- Used for: ABIs, decoded event data, complex nested structures
- Benefits: Indexing, querying, efficient storage
## Multi-Chain Considerations
### Chain ID Partitioning
All tables include `chain_id` as the first column after primary key:
- Enables efficient partitioning by chain_id
- Ensures data isolation between chains
- Simplifies multi-chain queries
### Partitioning Strategy
**Recommended**: Partition large tables by `chain_id`:
- `blocks`, `transactions`, `logs` partitioned by chain_id
- Benefits: Faster queries, easier maintenance, parallel processing
**Implementation** (PostgreSQL):
```sql
-- Example partitioning
CREATE TABLE blocks (
-- columns
) PARTITION BY LIST (chain_id);
CREATE TABLE blocks_chain_138 PARTITION OF blocks FOR VALUES IN (138);
CREATE TABLE blocks_chain_1 PARTITION OF blocks FOR VALUES IN (1);
```
## Data Consistency
### Foreign Key Constraints
- Enforce referential integrity where possible
- Consider performance impact for high-throughput inserts
- May disable for initial backfill, enable after catch-up
### Unique Constraints
- Prevent duplicate blocks, transactions, logs
- Enable idempotent processing
- Use ON CONFLICT for upserts
## Indexing Strategy
### Index Types
1. **B-tree**: Default for most indexes (equality, range queries)
2. **Hash**: For exact match lookups (addresses, hashes)
3. **GIN**: For JSONB columns (ABIs, decoded data)
4. **BRIN**: For large ordered columns (block numbers, timestamps)
### Index Maintenance
- Regular VACUUM and ANALYZE
- Monitor index bloat
- Consider partial indexes for filtered queries
## References
- Indexer Architecture: See `indexer-architecture.md`
- Database Schema: See `../database/postgres-schema.md`
- Search Index Schema: See `../database/search-index-schema.md`

View File

@@ -0,0 +1,518 @@
# Indexer Architecture Specification
## Overview
This document specifies the architecture for the blockchain indexing pipeline that ingests, processes, and stores blockchain data from ChainID 138 and other supported chains. The indexer is responsible for maintaining a complete, queryable database of blocks, transactions, logs, traces, and token transfers.
## Architecture
```mermaid
flowchart TB
subgraph Input[Input Layer]
Node[RPC Node<br/>ChainID 138]
WS[WebSocket<br/>New Block Events]
end
subgraph Ingest[Ingestion Layer]
BL[Block Listener<br/>Real-time]
BW[Backfill Worker<br/>Historical]
Q[Message Queue<br/>Kafka/RabbitMQ]
end
subgraph Process[Processing Layer]
BP[Block Processor]
TP[Transaction Processor]
LP[Log Processor]
TrP[Trace Processor]
TokenP[Token Transfer Processor]
end
subgraph Decode[Decoding Layer]
ABI[ABI Registry]
SigDB[Signature Database]
Decoder[Event Decoder]
end
subgraph Persist[Persistence Layer]
PG[(PostgreSQL<br/>Canonical Data)]
ES[(Elasticsearch<br/>Search Index)]
TS[(TimescaleDB<br/>Metrics)]
end
subgraph Materialize[Materialization Layer]
Agg[Aggregator<br/>TPS, Gas Stats]
Cache[Cache Layer<br/>Redis]
end
Node --> BL
Node --> BW
WS --> BL
BL --> Q
BW --> Q
Q --> BP
BP --> TP
BP --> LP
BP --> TrP
TP --> TokenP
LP --> Decoder
Decoder --> ABI
Decoder --> SigDB
BP --> PG
TP --> PG
LP --> PG
TrP --> PG
TokenP --> PG
BP --> ES
TP --> ES
LP --> ES
BP --> TS
TP --> TS
PG --> Agg
Agg --> Cache
```
## Block Ingestion Pipeline
### Block Listener (Real-time)
**Purpose**: Monitor blockchain for new blocks and ingest them immediately.
**Implementation**:
- Subscribe to `newHeads` via WebSocket
- Poll `eth_blockNumber` as fallback (every 2 seconds)
- Handle WebSocket reconnection automatically
**Flow**:
1. Receive block header event
2. Fetch full block data via `eth_getBlockByNumber`
3. Enqueue block to processing queue
4. Acknowledge receipt
**Error Handling**:
- Retry on network errors (exponential backoff)
- Handle reorgs (see reorg handling section)
- Log errors for monitoring
### Backfill Worker (Historical)
**Purpose**: Index historical blocks from genesis or a specific starting point.
**Implementation**:
- Parallel workers for faster indexing
- Configurable batch size (e.g., 100 blocks per batch)
- Rate limiting to avoid overloading RPC node
- Checkpoint system for resuming interrupted backfills
**Flow**:
1. Determine starting block (checkpoint or genesis)
2. Fetch batch of blocks
3. Enqueue each block to processing queue
4. Update checkpoint
5. Repeat until caught up with chain head
**Optimization Strategies**:
- Parallel workers process different block ranges
- Skip blocks already indexed (idempotent processing)
- Batch RPC requests where possible
### Message Queue
**Purpose**: Decouple ingestion from processing, enable scaling, ensure durability.
**Technology**: Kafka or RabbitMQ
**Topics/Queues**:
- `blocks`: New blocks to process
- `transactions`: Transactions to decode
- `traces`: Traces to process (async)
**Configuration**:
- Durability: Persistent storage
- Replication: 3 replicas for high availability
- Partitioning: By chain_id and block number (for ordering)
## Transaction Processing Flow
### Block Processing
**Steps**:
1. **Validate Block**: Verify block hash, parent hash, block number
2. **Extract Transactions**: Get transaction list from block
3. **Fetch Receipts**: Get transaction receipts for all transactions
4. **Process Each Transaction**:
- Store transaction data
- Process receipt (logs, status)
- Extract token transfers (ERC-20/721/1155)
- Link to contract interactions
**Data Extracted**:
- Transaction fields (hash, from, to, value, gas, etc.)
- Receipt fields (status, gasUsed, logs, etc.)
- Contract creation detection
- Token transfer events
### Transaction Decoding
**Purpose**: Decode event logs and transaction data using ABIs.
**Process**:
1. Identify contract address (to field or created address)
2. Look up ABI in registry (verified contracts)
3. Decode function calls and events
4. Store decoded data for search and filtering
**Fallback Strategies**:
- Signature database for unknown functions/events (4-byte signatures)
- Heuristic detection for common patterns (Transfer events)
- Store raw data when decoding fails
### ABI Registry
**Purpose**: Store contract ABIs for decoding transactions and events.
**Data Sources**:
- Contract verification submissions
- Sourcify integration
- Public ABI repositories (4byte.directory, etc.)
**Storage**:
- Database table: `contract_abis`
- Cache layer: Redis for frequently accessed ABIs
- Versioning: Support multiple ABI versions per contract
**Schema**:
```sql
contract_abis (
id UUID PRIMARY KEY,
chain_id INTEGER NOT NULL,
address VARCHAR(42) NOT NULL,
abi JSONB NOT NULL,
verified BOOLEAN DEFAULT false,
source VARCHAR(50), -- 'verification', 'sourcify', 'public'
created_at TIMESTAMP,
updated_at TIMESTAMP,
UNIQUE(chain_id, address)
)
```
### Signature Database
**Purpose**: Map 4-byte function signatures and 32-byte event signatures to function/event names.
**Data Sources**:
- Public signature databases (4byte.directory)
- User submissions
- Automatic extraction from verified contracts
**Usage**:
- Lookup function name from signature (e.g., `0x095ea7b3``approve(address,uint256)`)
- Lookup event name from topic[0] (e.g., `0xddf252...``Transfer(address,address,uint256)`)
- Partial decoding when full ABI unavailable
## Event Log Indexing
### Log Processing
**Purpose**: Index event logs for efficient querying and filtering.
**Process**:
1. Extract logs from transaction receipts
2. Decode log topics and data using ABI
3. Index by:
- Contract address
- Event signature (topic[0])
- Indexed parameters (topic[1..3])
- Block number and transaction hash
- Log index
**Indexing Strategy**:
- PostgreSQL table: `logs` with indexes on (address, topic0, block_number)
- Elasticsearch index: Full-text search on decoded event data
- Time-series: Aggregate log counts per contract/event
### Event Decoding
**Decoding Flow**:
1. Identify event signature from topic[0]
2. Look up event definition in ABI registry
3. Decode indexed parameters (topics 1-3)
4. Decode non-indexed parameters (data field)
5. Store decoded parameters as JSONB
**Common Events to Index**:
- ERC-20: `Transfer(address,address,uint256)`
- ERC-721: `Transfer(address,address,uint256)`
- ERC-1155: `TransferSingle`, `TransferBatch`
- Approval events: `Approval(address,address,uint256)`
## Trace Processing
### Call Trace Extraction
**Purpose**: Extract detailed call traces for transaction debugging and internal transaction tracking.
**Trace Types**:
- `call`: Contract calls
- `create`: Contract creation
- `suicide`: Contract self-destruct
- `delegatecall`: Delegate calls
**Process**:
1. Request trace via `trace_transaction` or `trace_block`
2. Parse trace result structure
3. Extract:
- Call hierarchy (parent-child relationships)
- Internal transactions (value transfers)
- Gas usage per call
- Revert information
### Internal Transaction Tracking
**Purpose**: Track value transfers that occur inside transactions (not just top-level).
**Data Extracted**:
- From address (caller)
- To address (callee)
- Value transferred
- Call type (call, delegatecall, etc.)
- Success/failure status
- Gas used
**Storage**:
- Separate table: `internal_transactions`
- Link to parent transaction via `transaction_hash`
- Link to parent call via `trace_address` array
## Token Transfer Extraction
### ERC-20 Transfer Detection
**Detection Method**:
1. Look for `Transfer(address,address,uint256)` event
2. Decode event parameters (from, to, value)
3. Store in `token_transfers` table
4. Update token holder balances
**Data Stored**:
- Token contract address
- From address
- To address
- Amount (with decimals)
- Block number
- Transaction hash
- Log index
### ERC-721 Transfer Detection
**Similar to ERC-20 but**:
- Token ID is tracked (unique NFT)
- Transfer can be from zero address (mint) or to zero address (burn)
### ERC-1155 Transfer Detection
**Events**:
- `TransferSingle`: Single token transfer
- `TransferBatch`: Batch token transfer
**Challenges**:
- Multiple token IDs and amounts per transfer
- Batch operations require array decoding
### Token Holder Tracking
**Purpose**: Maintain list of addresses holding each token.
**Strategy**:
- Real-time updates: Update on each transfer
- Periodic reconciliation: Verify balances via RPC
- Balance snapshots: Store balance at each block (for historical queries)
## Indexer Worker Scaling and Partitioning
### Horizontal Scaling
**Strategy**: Multiple indexer workers processing different blocks/chains.
**Partitioning Methods**:
1. **By Chain**: Each worker handles one chain
2. **By Block Range**: Workers split block ranges (for backfill)
3. **By Processing Stage**: Separate workers for blocks, traces, token transfers
### Worker Coordination
**Mechanisms**:
- Message queue: Workers consume from shared queue
- Database locks: Prevent duplicate processing
- Leader election: For single-worker tasks (reorg handling)
### Load Balancing
**Distribution**:
- Round-robin for backfill workers
- Sticky sessions for chain-specific workers
- Priority queuing: Real-time blocks before historical blocks
### Performance Targets
**Throughput**:
- Process 100 blocks/minute per worker
- Process 1000 transactions/minute per worker
- Process 100 traces/minute per worker (trace operations are slower)
**Latency**:
- Real-time blocks: Indexed within 5 seconds of block production
- Historical blocks: Catch up to chain head within reasonable time
## Data Consistency
### Transaction Isolation
**Strategy**: Process blocks atomically (all or nothing).
**Implementation**:
- Database transactions for block-level operations
- Idempotent processing (can safely retry)
- Checkpoint system to track last processed block
### Idempotency
**Requirements**:
- Processing same block multiple times should not create duplicates
- Use unique constraints in database
- Upsert operations where applicable
## Error Handling and Retry Logic
### Error Types
1. **Transient Errors**: Network issues, temporary RPC failures
- Retry with exponential backoff
- Max retries: 10
- Max backoff: 5 minutes
2. **Permanent Errors**: Invalid data, unsupported features
- Log error and skip
- Alert for investigation
3. **Reorg Errors**: Block replaced by different block
- Handle via reorg detection (see reorg handling spec)
### Retry Strategy
**Exponential Backoff**:
- Initial delay: 1 second
- Multiplier: 2x
- Max delay: 5 minutes
- Jitter: Random ±20% to avoid thundering herd
## Monitoring and Observability
### Key Metrics
**Throughput**:
- Blocks processed per minute
- Transactions processed per minute
- Logs indexed per minute
**Latency**:
- Time from block production to index completion
- Time to process block (p50, p95, p99)
**Lag**:
- Block height lag (current block - last indexed block)
- Time lag (current time - last indexed block time)
**Errors**:
- Error rate by type
- Retry count
- Failed blocks
### Alerting Rules
- Block lag > 10 blocks: Warning
- Block lag > 100 blocks: Critical
- Error rate > 1%: Warning
- Error rate > 5%: Critical
- Worker down: Critical
## Integration Points
### RPC Node Integration
- See `../infrastructure/node-rpc-architecture.md`
- Connection pooling
- Rate limiting awareness
- Failover handling
### Database Integration
- See `../database/postgres-schema.md`
- Connection pooling
- Batch inserts for performance
- Transaction management
### Search Integration
- See `../database/search-index-schema.md`
- Async indexing to Elasticsearch
- Bulk indexing for efficiency
## Implementation Guidelines
### Technology Stack
**Recommended**:
- **Language**: Go, Rust, or Python (performance considerations)
- **Queue**: Kafka (high throughput) or RabbitMQ (simpler setup)
- **Database**: PostgreSQL with connection pooling
- **Caching**: Redis for frequently accessed data
### Code Structure
```
indexer/
├── cmd/
│ ├── block-listener/ # Real-time block listener
│ ├── backfill-worker/ # Historical indexing worker
│ └── processor/ # Block/transaction processor
├── internal/
│ ├── ingestion/ # Ingestion logic
│ ├── processing/ # Processing logic
│ ├── decoding/ # ABI/signature decoding
│ └── persistence/ # Database operations
└── pkg/
├── abi/ # ABI registry
└── rpc/ # RPC client
```
### Testing Strategy
**Unit Tests**:
- Decoding logic
- Data transformation
- Error handling
**Integration Tests**:
- End-to-end block processing
- Database operations
- Queue integration
**Load Tests**:
- Process historical blocks
- Simulate high block production rate
- Test worker scaling
## References
- Data Models: See `data-models.md`
- Reorg Handling: See `reorg-handling.md`
- Database Schema: See `../database/postgres-schema.md`
- RPC Architecture: See `../infrastructure/node-rpc-architecture.md`

View File

@@ -0,0 +1,353 @@
# Reorg Handling Specification
## Overview
This document specifies how the indexer handles blockchain reorganizations (reorgs), where the canonical chain changes and previously indexed blocks become invalid. Reorg handling ensures data consistency and maintains accurate blockchain state.
## Reorg Detection
### Detection Methods
**1. Block Hash Mismatch**
- Compare stored block hash with RPC node block hash
- If mismatch detected, reorg has occurred
**2. Parent Hash Validation**
- Verify each block's parent_hash matches previous block's hash
- Chain break indicates reorg point
**3. Block Height Comparison**
- Monitor chain head block number
- Sudden decrease indicates potential reorg
**4. Chain Head Monitoring**
- Poll `eth_blockNumber` periodically
- Compare with last indexed block number
- Detect rollback scenarios
### Detection Strategy
**Real-time Monitoring**:
- Check block hash after each new block ingestion
- Compare with RPC node's block hash for same block number
- Immediate detection for recent blocks
**Periodic Validation**:
- Validate last N blocks (e.g., 100 blocks) every M seconds
- Detect deep reorgs that may have been missed
- Background validation job
**Checkpoint Validation**:
- Validate checkpoint blocks (every 1000 blocks)
- Ensure checkpoint blocks still exist in canonical chain
- Detect deep reorgs quickly
## Reorg Handling Flow
```mermaid
flowchart TB
Detect[Detect Reorg]
Identify[Identify Reorg Point<br/>Find Common Ancestor]
Mark[Mark Blocks as Orphaned]
Delete[Delete Orphaned Data]
Reindex[Re-index New Chain]
Verify[Verify Data Consistency]
Detect --> Identify
Identify --> Mark
Mark --> Delete
Delete --> Reindex
Reindex --> Verify
Verify --> Done[Complete]
Verify -->|Inconsistency| Delete
```
### Step 1: Identify Reorg Point
**Goal**: Find the common ancestor block (last block that's still valid).
**Algorithm**:
1. Start from current chain head
2. Compare block hash with stored hash at each block number
3. When hash matches, that's the common ancestor
4. All blocks after common ancestor need reorg handling
**Optimization**:
- Binary search to find reorg point quickly
- Start from recent blocks and work backwards
- Cache last N block hashes for faster comparison
### Step 2: Mark Blocks as Orphaned
**Strategy**: Mark blocks as orphaned before deletion (for audit trail).
**Database Update**:
```sql
UPDATE blocks
SET orphaned = true, orphaned_at = NOW()
WHERE chain_id = ?
AND block_number > ?
AND orphaned = false;
```
**Benefits**:
- Audit trail of reorgs
- Ability to query orphaned blocks
- Easier debugging
### Step 3: Delete Orphaned Data
**Cascade Deletion Order**:
1. Token transfers (depends on transactions)
2. Logs (depends on transactions)
3. Traces (depends on transactions)
4. Internal transactions (depends on transactions)
5. Transactions (depends on blocks)
6. Blocks (orphaned blocks)
**Implementation**:
- Use database transactions for atomicity
- Cascade deletes via foreign keys (if enabled)
- Or explicit deletion in correct order
**Performance Considerations**:
- Batch deletions for large reorgs
- Index on `block_number` for efficient deletion
- Consider soft deletes (mark as deleted) vs hard deletes
### Step 4: Re-index New Chain
**Process**:
1. Fetch new blocks from RPC node (from reorg point onward)
2. Process blocks through normal indexing pipeline
3. Verify each block before marking as indexed
**Optimization**:
- Parallel processing for multiple blocks (if safe)
- Use existing indexer workers
- Priority queue: reorg blocks before new blocks
### Step 5: Verify Data Consistency
**Validation Checks**:
- Block hashes match RPC node
- Transaction counts match
- Parent hashes form valid chain
- No gaps in block numbers
**Metrics**:
- Reorg depth (number of blocks affected)
- Reorg duration (time to handle)
- Data consistency verification
## Data Consistency Guarantees
### Transaction Isolation
**Strategy**: Use database transactions to ensure atomic reorg handling.
**Implementation**:
```sql
BEGIN;
-- Mark blocks as orphaned
-- Delete orphaned data
-- Insert new blocks
-- Verify consistency
COMMIT;
```
**Rollback**: If any step fails, rollback entire operation.
### Idempotency
**Requirement**: Reorg handling must be idempotent (safe to retry).
**Mechanisms**:
- Check block hash before processing
- Skip blocks already correctly indexed
- Use unique constraints to prevent duplicates
### Finality Considerations
**Reorg Depth Limits**:
- Only handle reorgs within finality window
- For PoW: Typically 12-100 blocks deep
- For PoS: Typically 1-2 epochs (32-64 blocks)
- For BFT: Typically immediate finality
**Configuration**:
- Configurable reorg depth limit per chain
- Skip reorgs beyond finality window (log for investigation)
## Performance Optimization
### Handling Frequent Reorgs
**Problem**: Some chains have frequent small reorgs (1-2 blocks).
**Solution**:
- Batch reorg handling (wait for stability)
- Detect reorgs but delay handling for short period
- Only handle if reorg persists
**Configuration**:
- Reorg confirmation period: 30 seconds
- Maximum reorg depth to handle immediately: 5 blocks
- Deeper reorgs: Manual investigation
### Handling Deep Reorgs
**Problem**: Deep reorgs require deleting and re-indexing many blocks.
**Optimization Strategies**:
1. **Parallel Processing**: Process new chain blocks in parallel
2. **Batch Operations**: Batch database operations
3. **Incremental Updates**: Only update changed data
4. **Selective Deletion**: Only delete affected data (not entire blocks if possible)
### Index Maintenance
**During Reorg**:
- Pause index updates temporarily
- Resume after reorg handling complete
- Rebuild affected indexes if necessary
## Monitoring and Alerting
### Metrics
**Reorg Metrics**:
- Reorg count (per chain, per time period)
- Reorg depth distribution
- Reorg handling duration (p50, p95, p99)
- Data consistency check results
**Alerting Rules**:
- Reorg depth > 10 blocks: Warning (investigate)
- Reorg depth > 100 blocks: Critical (potential chain issue)
- Reorg handling duration > 5 minutes: Warning
- Data consistency check failure: Critical
### Logging
**Log Events**:
- Reorg detection (block number, depth)
- Reorg point identification (common ancestor)
- Blocks orphaned (count, block numbers)
- Re-indexing progress
- Data consistency verification results
**Log Levels**:
- INFO: Normal reorgs (< 5 blocks)
- WARN: Unusual reorgs (5-10 blocks)
- ERROR: Deep reorgs (> 10 blocks) or failures
## Edge Cases
### Multiple Reorgs in Quick Succession
**Scenario**: Chain reorgs, then reorgs again before first reorg is handled.
**Handling**:
- Cancel in-progress reorg handling
- Start new reorg handling from latest state
- Ensure idempotency
### Reorg During Backfill
**Scenario**: Historical block indexing encounters a reorg.
**Handling**:
- Pause backfill
- Handle reorg
- Resume backfill from reorg point
### Concurrent Reorg Handling
**Prevention**:
- Use database locks to prevent concurrent reorg handling
- Single reorg handler per chain
- Queue reorg events if handler is busy
## Recovery Procedures
### Manual Reorg Handling
**When to Use**:
- Automatic handling fails
- Deep reorgs beyond normal limits
- Data corruption detected
**Procedure**:
1. Identify reorg point manually
2. Verify with RPC node
3. Mark blocks as orphaned
4. Delete orphaned data
5. Trigger re-indexing
6. Verify data consistency
### Data Recovery
**Backup Strategy**:
- Regular database backups
- Point-in-time recovery capability
- Ability to restore to pre-reorg state
**Recovery Steps**:
1. Restore database to point before reorg
2. Re-run indexer from that point
3. Let automatic reorg handling process naturally
## Testing Strategy
### Unit Tests
- Reorg detection logic
- Common ancestor identification
- Orphan marking
- Data deletion logic
### Integration Tests
- Simulate reorgs (testnet with known reorgs)
- Verify data consistency after reorg
- Test concurrent reorg handling
- Test deep reorgs
### Load Tests
- Simulate frequent reorgs
- Measure performance impact
- Test reorg handling during high load
## Configuration
### Reorg Handling Configuration
```yaml
reorg:
detection:
check_interval_seconds: 10
validation_depth: 100
checkpoint_interval: 1000
handling:
max_depth: 100
confirmation_period_seconds: 30
batch_size: 1000
parallel_processing: true
finality:
chain_138:
type: "poa" # or "pos", "pow", "bft"
depth_limit: 50
finality_blocks: 12
```
## References
- Indexer Architecture: See `indexer-architecture.md`
- Data Models: See `data-models.md`
- Database Schema: See `../database/postgres-schema.md`

View File

@@ -0,0 +1,523 @@
# Contract Verification Pipeline Specification
## Overview
This document specifies the pipeline for verifying smart contracts on the explorer platform. Contract verification allows users to submit source code, which is compiled and compared against deployed bytecode to enable source code viewing, debugging, and ABI extraction.
## Architecture
```mermaid
flowchart TB
subgraph Submit[Submission]
User[User Submits<br/>Source Code]
UI[Verification UI]
API[Verification API]
end
subgraph Validate[Validation]
Val[Validate Input]
Check[Check Contract Exists]
Dup[Check Duplicate]
end
subgraph Compile[Compilation]
Comp[Compiler Service]
Versions[Compiler Version<br/>Registry]
Build[Build Artifacts]
end
subgraph Verify[Verification]
Match[Bytecode Matching]
Construct[Constructor Args<br/>Extraction]
MatchResult[Match Result]
end
subgraph Store[Storage]
DB[(Database)]
Artifacts[Artifact Storage<br/>S3/Immutable]
ABI[ABI Registry]
end
User --> UI
UI --> API
API --> Val
Val --> Check
Check --> Dup
Dup --> Comp
Comp --> Versions
Comp --> Build
Build --> Match
Match --> Construct
Construct --> MatchResult
MatchResult --> DB
MatchResult --> Artifacts
MatchResult --> ABI
```
## Source Code Submission Workflow
### Submission Methods
**1. Standard JSON Input** (Recommended)
- Submit Solidity compiler's standard JSON input format
- Includes source files, compiler settings, optimization
- Most reliable for complex contracts
**2. Multi-file Upload**
- Upload individual source files
- Specify compiler version and settings
- Compiler constructs standard JSON input
**3. Sourcify Integration**
- Verify via Sourcify API
- Automatic source code and metadata retrieval
- Supports verified contracts from Sourcify registry
**4. Flattened Source**
- Single flattened source file
- All imports inlined
- Simpler but less flexible
### Submission API
**Endpoint**: `POST /api/v1/contracts/{address}/verify`
**Request Body**:
```json
{
"chain_id": 138,
"address": "0x...",
"compiler_version": "v0.8.19+commit.7dd6d404",
"optimization_enabled": true,
"optimization_runs": 200,
"evm_version": "london",
"source_code": "...", // or standard_json_input
"constructor_arguments": "0x...",
"library_addresses": {
"Lib1": "0x..."
},
"verification_method": "standard_json"
}
```
**Response**:
```json
{
"status": "pending",
"verification_id": "uuid",
"message": "Verification submitted"
}
```
### Input Validation
**Validation Rules**:
1. **Contract Address**: Must be valid Ethereum address, must exist on chain
2. **Compiler Version**: Must be supported compiler version
3. **Source Code**: Must be valid Solidity/Vyper code
4. **Constructor Arguments**: Must match deployed contract (if provided)
5. **Library Addresses**: Must match deployed libraries (if provided)
**Error Handling**:
- Invalid address: 400 Bad Request
- Unsupported compiler: 400 Bad Request
- Invalid source code: 400 Bad Request
- Contract not found: 404 Not Found
## Compiler Version Management
### Compiler Registry
**Purpose**: Manage available compiler versions and their metadata.
**Storage**:
```sql
compiler_versions (
id SERIAL PRIMARY KEY,
version VARCHAR(50) UNIQUE NOT NULL,
compiler_type VARCHAR(20) NOT NULL, -- 'solidity', 'vyper'
evm_version VARCHAR(20),
optimizer_available BOOLEAN DEFAULT true,
download_url TEXT,
checksum VARCHAR(64),
installed BOOLEAN DEFAULT false,
installed_path TEXT,
created_at TIMESTAMP DEFAULT NOW()
)
```
### Compiler Installation
**Methods**:
1. **Pre-installed**: Common versions pre-installed on compilation servers
2. **On-demand**: Download and install when needed
3. **Docker**: Use compiler Docker images (isolated, reproducible)
**Recommended**: Docker-based compilation for isolation and reproducibility.
**Docker Setup**:
```dockerfile
FROM ethereum/solc:0.8.19
# Or use solc-select for version management
```
### Version Selection
**Strategy**:
- Exact match: User specifies exact version
- Pragma matching: Extract version from source code pragma
- Latest compatible: Use latest compatible version if exact not available
**Pragma Parsing**:
- Extract `pragma solidity ^0.8.0;` or `>=0.8.0 <0.9.0`
- Resolve to specific compiler version
- Handle caret (^), tilde (~), and range operators
## Compilation Process
### Standard JSON Input Format
**Structure**:
```json
{
"language": "Solidity",
"sources": {
"Contract.sol": {
"content": "pragma solidity ^0.8.0; ..."
}
},
"settings": {
"optimizer": {
"enabled": true,
"runs": 200
},
"evmVersion": "london",
"outputSelection": {
"*": {
"*": ["abi", "evm.bytecode", "evm.deployedBytecode"]
}
}
}
}
```
### Compilation Steps
1. **Prepare Input**: Construct standard JSON input from user submission
2. **Select Compiler**: Choose appropriate compiler version
3. **Resolve Imports**: Handle import statements (local files, external URLs)
4. **Compile**: Execute compiler with standard JSON input
5. **Extract Artifacts**: Extract ABI, bytecode, deployed bytecode
6. **Handle Errors**: Parse compilation errors and return to user
### Import Resolution
**Import Types**:
- **Local Files**: Included in submission
- **External URLs**: Fetch from URL (GitHub, IPFS, etc.)
- **Standard Libraries**: Known library addresses (OpenZeppelin, etc.)
**Resolution Strategy**:
1. Check local files first
2. Try external URL fetching
3. Check standard library registry
4. Fail if cannot resolve
### Optimization Settings
**Optimizer Configuration**:
- **Enabled**: Boolean flag
- **Runs**: Optimization runs (affects bytecode size vs gas cost)
- **EVN Version**: Target EVM version (affects bytecode generation)
**Matching Strategy**:
- Must match deployed contract's optimization settings exactly
- Try multiple optimization combinations if initial match fails
## Bytecode Matching
### Matching Process
**Goal**: Compare compiled bytecode with deployed bytecode.
**Steps**:
1. Fetch deployed bytecode from chain via `eth_getCode(address)`
2. Extract deployed bytecode from compilation artifacts
3. Compare bytecodes (exact match required)
4. Handle constructor arguments (trimmed from deployed bytecode)
### Bytecode Normalization
**Normalization Steps**:
1. Remove metadata hash (last 53 bytes)
2. Remove constructor arguments (if contract creation)
3. Compare remaining bytecode
**Metadata Hash**:
- Solidity appends metadata hash to bytecode
- Format: `0xa2646970667358221220...` + 43 bytes
- Should be excluded from comparison
### Constructor Arguments Extraction
**Purpose**: Extract constructor arguments from deployed bytecode.
**Process**:
1. Compiled bytecode: `creation_code + constructor_args`
2. Deployed bytecode: `runtime_code` (constructor args removed)
3. Extract constructor args: `deployed_bytecode.length - runtime_code.length`
**Validation**:
- Verify extracted constructor args match user-provided args (if provided)
- Decode constructor args if ABI available
### Library Linking
**Problem**: Contracts using libraries have placeholders in bytecode.
**Solution**:
1. Identify library placeholders in compiled bytecode
2. Replace placeholders with actual library addresses
3. Compare linked bytecode with deployed bytecode
**Library Placeholder Format**:
- `__$...$__` (Solidity)
- Must match user-provided library addresses
## Verification Status Tracking
### Status States
**States**:
1. **pending**: Verification submitted, queued for processing
2. **processing**: Compilation/verification in progress
3. **verified**: Bytecode matches, contract verified
4. **failed**: Verification failed (mismatch, compilation error, etc.)
5. **partially_verified**: Some source files verified (multi-file contracts)
### Status Updates
**Database Schema**:
```sql
contract_verifications (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
chain_id INTEGER NOT NULL,
address VARCHAR(42) NOT NULL,
status VARCHAR(20) NOT NULL,
compiler_version VARCHAR(50),
optimization_enabled BOOLEAN,
optimization_runs INTEGER,
evm_version VARCHAR(20),
source_code TEXT,
abi JSONB,
constructor_arguments TEXT,
verification_method VARCHAR(50),
error_message TEXT,
verified_at TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY (chain_id, address) REFERENCES contracts(chain_id, address)
)
```
**Status Transitions**:
- `pending``processing``verified` or `failed`
- Webhook/notification on status change (optional)
## Build Artifact Storage
### Artifact Types
**Artifacts to Store**:
1. **Source Code**: Original submitted source files
2. **Standard JSON Input**: Compiler input
3. **Compiler Output**: Full compiler JSON output
4. **ABI**: Extracted ABI
5. **Bytecode**: Creation and runtime bytecode
6. **Metadata**: Compiler metadata
### Storage Strategy
**Immutable Storage**:
- Use S3-compatible storage (AWS S3, MinIO, etc.)
- Immutable after verification (no updates)
- Versioned storage if updates needed
**Storage Path Structure**:
```
contracts/{chain_id}/{address}/verification_{id}/
- source_code.sol
- standard_json_input.json
- compiler_output.json
- abi.json
- bytecode.txt
- metadata.json
```
**Database Reference**:
- Store artifact storage path in database
- Link to contract record
### Access Control
**Public Access**:
- Verified contracts: Public read access
- Source code: Public read access
- Artifacts: Public read access
**Private Access**:
- Pending verifications: Owner only
- Failed verifications: Owner only (optional public)
## Sourcify Integration
### Sourcify API
**Endpoint**: `GET /api/v1/verify/{chain_id}/{address}`
**Process**:
1. Query Sourcify API for contract verification
2. Retrieve source files and metadata
3. Verify match with deployed bytecode
4. Store in our database if match
**Benefits**:
- Leverage existing verified contracts
- Automatic verification for popular contracts
- Reduces manual verification workload
### Sourcify Format
**Structure**:
```
contracts/
- {chain_id}/
- {address}/
- metadata.json
- sources/
- Contract.sol
```
**Metadata Format**:
- Compiler version
- Settings
- Source file mapping
## Multi-Compiler Version Support
### Supported Compilers
**Solidity**:
- Versions: 0.4.x through latest
- Multiple versions per contract (updates)
**Vyper**:
- Versions: 0.1.x through latest
- Similar workflow to Solidity
### Version Compatibility
**Handling**:
- Support multiple verification attempts with different versions
- Store all verification attempts (history)
- Mark latest successful verification as active
**Database Schema**:
```sql
contract_verifications (
-- ... fields ...
version INTEGER DEFAULT 1, -- Increment for each new verification
is_active BOOLEAN DEFAULT true -- Latest successful verification
)
```
## Error Handling
### Compilation Errors
**Error Types**:
- Syntax errors
- Type errors
- Import resolution errors
- Optimization errors
**Response**:
- Return detailed error messages to user
- Include file and line number
- Suggest fixes when possible
### Verification Failures
**Failure Reasons**:
- Bytecode mismatch
- Constructor arguments mismatch
- Library address mismatch
- Optimization settings mismatch
**Response**:
- Return specific mismatch reason
- Suggest correct settings if possible
- Allow retry with corrected input
## Performance Considerations
### Compilation Performance
**Optimization**:
- Cache compilation results (same source + settings)
- Parallel compilation for multiple contracts
- Compiler server pool for load distribution
### Queue Management
**Queue System**:
- Use message queue (RabbitMQ, Kafka) for verification jobs
- Priority queue: User submissions before automated checks
- Rate limiting per user/IP
**Processing Time**:
- Target: < 30 seconds for simple contracts
- Target: < 5 minutes for complex contracts
- Timeout: 10 minutes maximum
## Security Considerations
### Source Code Validation
**Validation**:
- Validate source code size (max 10MB)
- Sanitize input to prevent injection attacks
- Validate compiler version (whitelist known versions)
### Artifact Storage Security
**Access Control**:
- Verify ownership before allowing updates
- Audit log all verification submissions
- Rate limit submissions per user/IP
## API Endpoints
### Submit Verification
`POST /api/v1/contracts/{address}/verify`
### Check Status
`GET /api/v1/contracts/{address}/verification/{verification_id}`
### Get Verified Contract
`GET /api/v1/contracts/{address}`
### List Verification History
`GET /api/v1/contracts/{address}/verifications`
## References
- Indexer Architecture: See `indexer-architecture.md`
- Data Models: See `data-models.md`
- Database Schema: See `../database/postgres-schema.md`
- API Specification: See `../api/rest-api.md`