Files

defiQUG 1fb7266469 Add Oracle Aggregator and CCIP Integration

- Introduced Aggregator.sol for Chainlink-compatible oracle functionality, including round-based updates and access control.
- Added OracleWithCCIP.sol to extend Aggregator with CCIP cross-chain messaging capabilities.
- Created .gitmodules to include OpenZeppelin contracts as a submodule.
- Developed a comprehensive deployment guide in NEXT_STEPS_COMPLETE_GUIDE.md for Phase 2 and smart contract deployment.
- Implemented Vite configuration for the orchestration portal, supporting both Vue and React frameworks.
- Added server-side logic for the Multi-Cloud Orchestration Portal, including API endpoints for environment management and monitoring.
- Created scripts for resource import and usage validation across non-US regions.
- Added tests for CCIP error handling and integration to ensure robust functionality.
- Included various new files and directories for the orchestration portal and deployment scripts.

2025-12-12 14:57:48 -08:00

4.3 KiB

Raw Permalink Blame History

Disaster Recovery Runbook

Overview

This runbook provides procedures for disaster recovery in the DeFi Oracle Meta Mainnet (ChainID 138) network.

Recovery Objectives

RTO (Recovery Time Objective)

Critical Services: 1 hour
Non-Critical Services: 4 hours
Full Recovery: 24 hours

RPO (Recovery Point Objective)

Chaindata: 24 hours
Configuration: 1 hour
Keys: Real-time (Key Vault)

Disaster Scenarios

Scenario 1: Complete Cluster Failure

Symptoms:

All pods unavailable
Cluster unresponsive
No network connectivity

Recovery Steps:

Assess damage
Restore infrastructure
Restore chaindata from backups
Restore configuration
Restore keys from Key Vault
Restart services
Verify network operation

Scenario 2: Data Loss

Symptoms:

Chaindata corrupted
Blocks missing
Database errors

Recovery Steps:

Stop affected services
Restore from backup
Verify data integrity
Restart services
Verify synchronization
Monitor for issues

Scenario 3: Key Compromise

Symptoms:

Unauthorized transactions
Suspicious activity
Key exposure

Recovery Steps:

Isolate affected components
Rotate compromised keys
Update validator set
Update configuration
Restart services
Monitor for issues

Scenario 4: Network Partition

Symptoms:

Validators split into groups
Conflicting blocks
Network instability

Recovery Steps:

Identify partition
Stop minority partition
Continue with majority
Resolve conflicts
Restart stopped validators
Verify consensus

Recovery Procedures

Infrastructure Recovery

Restore Terraform State

terraform init -backend-config="..."
terraform plan
terraform apply

Restore Kubernetes Cluster

# Restore from backup
kubectl apply -f backup/cluster-backup.yaml

Restore Network Configuration
```
kubectl apply -f k8s/base/
```

Data Recovery

Restore Chaindata

./scripts/backup/restore-chaindata.sh <backup-file> <pod-name>

Restore Database

# Restore Blockscout database
kubectl exec -i -n besu-network blockscout-db-0 -- \
  psql -U blockscout -d blockscout < backup/blockscout-backup.sql

Restore Configuration
```
kubectl apply -f config/
```

Key Recovery

Restore from Key Vault

az keyvault secret show --vault-name defi-oracle-kv --name validator-key-1

Update Kubernetes Secrets

kubectl create secret generic besu-validator-keys \
  --from-literal=key-1=<key-from-vault> \
  -n besu-network

Restart Validators

kubectl rollout restart statefulset/besu-validator -n besu-network

Backup Procedures

Daily Backups

Chaindata Backup
```
./scripts/backup/backup-chaindata.sh
```

Database Backup

kubectl exec -n besu-network blockscout-db-0 -- \
  pg_dump -U blockscout blockscout > backup/blockscout-$(date +%Y%m%d).sql

Configuration Backup

kubectl get all -n besu-network -o yaml > backup/cluster-$(date +%Y%m%d).yaml

Backup Retention

Daily: 7 days
Weekly: 4 weeks
Monthly: 12 months
Yearly: 7 years

Testing Recovery

Test Procedures

Test Backup Restoration
- Monthly backup restoration test
- Verify data integrity
- Document results
Test Disaster Recovery
- Quarterly disaster recovery drill
- Simulate failure scenarios
- Measure recovery time
- Document lessons learned
Test Key Rotation
- Monthly key rotation test
- Verify key update process
- Document results

Monitoring Recovery

Recovery Metrics

Recovery time
Data loss
Service availability
Error rates

Post-Recovery Checklist

All services running
Network operational
Blocks being produced
RPC endpoints responding
Monitoring working
Alerts configured
Documentation updated

Contacts

On-Call: Check PagerDuty
Engineering Lead: engineering@d-bis.org
Emergency: +1-XXX-XXX-XXXX

4.3 KiB Raw Permalink Blame History

Disaster Recovery Runbook

Overview

Recovery Objectives

RTO (Recovery Time Objective)

RPO (Recovery Point Objective)

Disaster Scenarios

Scenario 1: Complete Cluster Failure

Scenario 2: Data Loss

Scenario 3: Key Compromise

Scenario 4: Network Partition

Recovery Procedures

Infrastructure Recovery

Data Recovery

Key Recovery

Backup Procedures

Daily Backups

Backup Retention

Testing Recovery

Test Procedures

Monitoring Recovery

Recovery Metrics

Post-Recovery Checklist

Contacts

Resources

4.3 KiB

Raw Permalink Blame History