feat: GPU/TensorCore integration — TensorFlow backend, GPU-accelerated reasoning, training, and memory

- New fusionagi/gpu/ module with TensorBackend protocol abstraction - TensorFlowBackend: GPU-accelerated ops with TensorCore mixed-precision - NumPyBackend: CPU fallback (always available, no extra deps) - Auto-selects best available backend at runtime - GPU-accelerated operations: - Cosine similarity matrix (batched, XLA-compiled) - Multi-head attention for consensus scoring - Batch hypothesis scoring on GPU - Semantic similarity search (pairwise, nearest-neighbor, deduplication) - New TensorFlowAdapter (fusionagi/adapters/): - LLMAdapter for local TF/Keras model inference - TensorCore mixed-precision support - GPU-accelerated embedding synthesis fallback - Reasoning pipeline integration: - gpu_scoring.py: drop-in GPU replacement for multi_path scoring - Super Big Brain: use_gpu config flag, GPU scoring when available - Memory integration: - gpu_search.py: GPU-accelerated semantic search for SemanticGraphMemory - Self-improvement integration: - gpu_training.py: gradient-based heuristic weight optimization - Reflective memory training loop with loss tracking - Dependencies: gpu extra (tensorflow>=2.16, numpy>=1.26) - 64 new tests (276 total), all passing - Architecture spec: docs/gpu_tensorcore_integration.md Co-Authored-By: Nakamoto, S <defi@defi-oracle.io>
2026-04-28 05:05:50 +00:00
parent c052b07662
commit fa71f973a6
22 changed files with 2448 additions and 3 deletions
--- a/docs/gpu_tensorcore_integration.md
+++ b/docs/gpu_tensorcore_integration.md
@@ -0,0 +1,105 @@
+# GPU / TensorCore Integration — Architecture Spec
+
+## Overview
+
+FusionAGI integrates GPU-accelerated compute via TensorFlow, CUDA TensorCores, and JAX
+to transform reasoning, similarity scoring, consensus, and training from CPU-bound
+symbolic operations into massively parallel tensor operations.
+
+## Design Principles
+
+1. **Optional dependency** — GPU support is an extra (`pip install fusionagi[gpu]`).
+   All GPU-accelerated code paths have CPU fallbacks.
+2. **Module boundary** — GPU compute lives in `fusionagi/gpu/` (new module). Other modules
+   import from `fusionagi.gpu` only when GPU acceleration is needed.
+3. **Backend abstraction** — `TensorBackend` protocol abstracts TensorFlow, JAX, and
+   pure-NumPy backends. The system auto-selects the best available backend.
+
+## Module: `fusionagi/gpu/`
+
+```
+fusionagi/gpu/
+├── __init__.py           # Public API, auto-detection
+├── backend.py            # TensorBackend protocol + backend registry
+├── tensorflow_ops.py     # TF/TensorCore similarity, attention, scoring
+├── tensor_similarity.py  # GPU-accelerated embedding similarity
+├── tensor_attention.py   # Multi-head attention for consensus
+├── tensor_scoring.py     # Batch hypothesis scoring on GPU
+└── training.py           # GPU-accelerated training loop for self-improvement
+```
+
+## Integration Points
+
+### 1. Reasoning Pipeline (`reasoning/`)
+
+**Current:** `multi_path.py` scores hypotheses sequentially with word-overlap heuristics.
+**GPU:** Batch embed hypotheses → cosine similarity matrix on GPU → parallel scoring.
+
+**Current:** `consensus_engine.py` uses Jaccard word overlap for similarity.
+**GPU:** Dense embedding vectors + GPU cosine similarity for semantic matching.
+
+### 2. Super Big Brain (`core/super_big_brain.py`)
+
+**Current:** `generate_and_score_parallel` uses ThreadPoolExecutor.
+**GPU:** Tensor-parallel scoring with batched dot-products on TensorCore.
+
+### 3. Memory Subsystem (`memory/`)
+
+**Current:** `semantic_graph.py` is pure Python dict/adjacency list.
+**GPU:** Vector similarity search via GPU-accelerated embedding lookup.
+
+### 4. Self-Improvement (`self_improvement/`)
+
+**Current:** `AutoTrainer` suggests heuristic updates, no actual neural training.
+**GPU:** GPU-backed fine-tuning loops, gradient-based heuristic optimization.
+
+### 5. Adapter Layer (`adapters/`)
+
+**New:** `TensorFlowAdapter` — local model inference via TF/Keras with TensorCore.
+
+## Data Flow
+
+```
+User Prompt
+  │
+  ▼
+Decomposition (CPU — symbolic)
+  │
+  ▼
+Embedding (GPU — TF/TensorCore)
+  │
+  ├──► Similarity Matrix (GPU — batched cosine)
+  │         │
+  │         ▼
+  │    Consensus Scoring (GPU — attention)
+  │
+  ├──► Hypothesis Scoring (GPU — batched inference)
+  │
+  ▼
+Recomposition (CPU — symbolic + GPU scores)
+  │
+  ▼
+Final Response
+```
+
+## Backend Selection
+
+```python
+from fusionagi.gpu import get_backend, TensorBackend
+
+backend: TensorBackend = get_backend()  # Auto-selects best available
+# Returns: TensorFlowBackend > NumPyBackend (fallback)
+```
+
+## Dependencies
+
+```toml
+[project.optional-dependencies]
+gpu = ["tensorflow>=2.16", "numpy>=1.26"]
+```
+
+TensorFlow 2.16+ includes:
+- TensorCore (FP16/BF16 mixed-precision) via `tf.keras.mixed_precision`
+- XLA compilation for GPU kernel fusion
+- `tf.linalg` for batched linear algebra
+- TensorRT integration for inference optimization