# Grafana Dashboards This directory contains Grafana dashboard JSON files for monitoring the DBIS Core Banking System. ## Dashboard List ### 1. System Health Dashboard (`system-health.json`) **Purpose**: Overall system health and status monitoring **Key Metrics**: - Service health status - Overall system availability - Error rates (5xx, 4xx) - CPU and memory usage by service - Database connection pool status - Active sessions - Queue lengths **Refresh Interval**: 30s **Tags**: `system`, `health`, `overview` --- ### 2. API Performance Dashboard (`api-performance.json`) **Purpose**: API endpoint performance and latency monitoring **Key Metrics**: - Request rate by endpoint - Response time percentiles (P50, P95, P99) - Error rate by endpoint - Top endpoints by request volume - Request distribution by method and status code - SLO compliance (availability, latency) - Request duration distribution **Refresh Interval**: 30s **Tags**: `api`, `performance`, `latency` --- ### 3. Ledger Operations Dashboard (`ledger-operations.json`) **Purpose**: Ledger entry and settlement operations monitoring **Key Metrics**: - Ledger entry rate by ledger ID - Ledger entry amount by ledger and currency - Settlement rate by status - Settlement duration percentiles - Outbox queue status and processing rate - Balance updates by currency - Failed posting operations - Total ledger entries, active accounts, pending settlements **Refresh Interval**: 30s **Tags**: `ledger`, `transactions`, `settlement` --- ### 4. Security & Compliance Dashboard (`security-compliance.json`) **Purpose**: Security events and compliance monitoring **Key Metrics**: - Authentication failures by reason - Authorization failures by resource and action - Sanctions screening results - AML risk score distribution - Audit log events by type - Policy violations by type - Failed transactions by reason - Encryption key rotation status - Data access events (PII, Financial) - Security incidents and compliance violations (24h) **Refresh Interval**: 30s **Tags**: `security`, `compliance`, `audit` --- ## Installation ### Import Dashboards to Grafana 1. **Via Grafana UI**: - Navigate to Grafana → Dashboards → Import - Upload the JSON file or paste JSON content - Configure data source and settings - Save dashboard 2. **Via Grafana Provisioning**: Create a provisioning configuration file: ```yaml # grafana/provisioning/dashboards/dashboards.yml apiVersion: 1 providers: - name: 'DBIS Core Dashboards' orgId: 1 folder: 'DBIS Core' type: file disableDeletion: false updateIntervalSeconds: 10 allowUiUpdates: true options: path: /etc/grafana/dashboards ``` Copy dashboard files to the provisioned path: ```bash cp dbis_core/monitoring/grafana/dashboards/*.json /etc/grafana/dashboards/ ``` 3. **Via Grafana API**: ```bash # Import dashboard via API curl -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d @system-health.json \ http://grafana:3000/api/dashboards/db ``` --- ## Configuration ### Data Source Configuration Ensure Prometheus data source is configured in Grafana: 1. Navigate to Configuration → Data Sources 2. Add Prometheus data source 3. Set URL: `http://prometheus:9090` 4. Configure scrape interval and timeouts ### Variable Configuration Some dashboards may use variables for filtering: - `$datasource`: Prometheus data source - `$service`: Service name filter (optional) - `$environment`: Environment filter (optional) --- ## Metrics Requirements ### Prometheus Metrics These dashboards expect the following Prometheus metrics to be exported: #### System Metrics - `up{job="dbis-core"}` - `process_cpu_seconds_total{job="dbis-core"}` - `process_resident_memory_bytes{job="dbis-core"}` - `db_pool_size{job="dbis-core"}` - `db_pool_active{job="dbis-core"}` - `db_pool_idle{job="dbis-core"}` #### API Metrics - `http_requests_total{job="dbis-core",endpoint,method,status}` - `http_request_duration_seconds_bucket{job="dbis-core",endpoint,le}` #### Ledger Metrics - `ledger_entries_total{ledger_id}` - `ledger_entry_amount_total{ledger_id,currency_code}` - `settlement_total{status}` - `settlement_duration_seconds_bucket{le}` - `dbis_outbox_queue_length` - `outbox_processed_total{status}` - `balance_updates_total{currency_code}` - `ledger_posting_errors_total{error_type}` #### Security Metrics - `authentication_failures_total{reason}` - `authorization_failures_total{resource,action}` - `sanctions_screening_total{result}` - `aml_risk_score_bucket{le}` - `audit_log_events_total{event_type}` - `policy_violations_total{policy_type,violation_type}` - `transaction_failures_total{reason}` - `data_access_events_total{data_type,operation}` - `security_incidents_total` - `compliance_violations_total` --- ## Alerting ### Recommended Alerts Based on these dashboards, configure alerts for: 1. **System Health**: - Service down (`up{job="dbis-core"} == 0`) - High error rate (`rate(http_requests_total{status=~"5.."}[5m]) > 0.05`) - High memory usage (`process_resident_memory_bytes > 8GB`) - Database connection pool exhausted (`db_pool_active >= db_pool_size * 0.9`) 2. **API Performance**: - P95 latency > 500ms - Availability < 99.9% - Error rate > 0.1% 3. **Ledger Operations**: - Outbox queue length > 1000 - Settlement failure rate > 1% - Failed posting operations > 10/min 4. **Security & Compliance**: - Authentication failure rate > 5% - Sanctions match detected - AML risk score > 80 - Security incident detected - Compliance violation detected --- ## References - Metrics Specification: `explorer-monorepo/docs/specs/observability/metrics-monitoring.md` - Tracing Dashboard: `smom-dbis-138/monitoring/grafana/dashboards/tracing.json` - OpenTelemetry Configuration: `smom-dbis-138/monitoring/opentelemetry/otel-collector.yaml`