- New fusionagi/gpu/ module with TensorBackend protocol abstraction - TensorFlowBackend: GPU-accelerated ops with TensorCore mixed-precision - NumPyBackend: CPU fallback (always available, no extra deps) - Auto-selects best available backend at runtime - GPU-accelerated operations: - Cosine similarity matrix (batched, XLA-compiled) - Multi-head attention for consensus scoring - Batch hypothesis scoring on GPU - Semantic similarity search (pairwise, nearest-neighbor, deduplication) - New TensorFlowAdapter (fusionagi/adapters/): - LLMAdapter for local TF/Keras model inference - TensorCore mixed-precision support - GPU-accelerated embedding synthesis fallback - Reasoning pipeline integration: - gpu_scoring.py: drop-in GPU replacement for multi_path scoring - Super Big Brain: use_gpu config flag, GPU scoring when available - Memory integration: - gpu_search.py: GPU-accelerated semantic search for SemanticGraphMemory - Self-improvement integration: - gpu_training.py: gradient-based heuristic weight optimization - Reflective memory training loop with loss tracking - Dependencies: gpu extra (tensorflow>=2.16, numpy>=1.26) - 64 new tests (276 total), all passing - Architecture spec: docs/gpu_tensorcore_integration.md Co-Authored-By: Nakamoto, S <defi@defi-oracle.io>
3.4 KiB
GPU / TensorCore Integration — Architecture Spec
Overview
FusionAGI integrates GPU-accelerated compute via TensorFlow, CUDA TensorCores, and JAX to transform reasoning, similarity scoring, consensus, and training from CPU-bound symbolic operations into massively parallel tensor operations.
Design Principles
- Optional dependency — GPU support is an extra (
pip install fusionagi[gpu]). All GPU-accelerated code paths have CPU fallbacks. - Module boundary — GPU compute lives in
fusionagi/gpu/(new module). Other modules import fromfusionagi.gpuonly when GPU acceleration is needed. - Backend abstraction —
TensorBackendprotocol abstracts TensorFlow, JAX, and pure-NumPy backends. The system auto-selects the best available backend.
Module: fusionagi/gpu/
fusionagi/gpu/
├── __init__.py # Public API, auto-detection
├── backend.py # TensorBackend protocol + backend registry
├── tensorflow_ops.py # TF/TensorCore similarity, attention, scoring
├── tensor_similarity.py # GPU-accelerated embedding similarity
├── tensor_attention.py # Multi-head attention for consensus
├── tensor_scoring.py # Batch hypothesis scoring on GPU
└── training.py # GPU-accelerated training loop for self-improvement
Integration Points
1. Reasoning Pipeline (reasoning/)
Current: multi_path.py scores hypotheses sequentially with word-overlap heuristics.
GPU: Batch embed hypotheses → cosine similarity matrix on GPU → parallel scoring.
Current: consensus_engine.py uses Jaccard word overlap for similarity.
GPU: Dense embedding vectors + GPU cosine similarity for semantic matching.
2. Super Big Brain (core/super_big_brain.py)
Current: generate_and_score_parallel uses ThreadPoolExecutor.
GPU: Tensor-parallel scoring with batched dot-products on TensorCore.
3. Memory Subsystem (memory/)
Current: semantic_graph.py is pure Python dict/adjacency list.
GPU: Vector similarity search via GPU-accelerated embedding lookup.
4. Self-Improvement (self_improvement/)
Current: AutoTrainer suggests heuristic updates, no actual neural training.
GPU: GPU-backed fine-tuning loops, gradient-based heuristic optimization.
5. Adapter Layer (adapters/)
New: TensorFlowAdapter — local model inference via TF/Keras with TensorCore.
Data Flow
User Prompt
│
▼
Decomposition (CPU — symbolic)
│
▼
Embedding (GPU — TF/TensorCore)
│
├──► Similarity Matrix (GPU — batched cosine)
│ │
│ ▼
│ Consensus Scoring (GPU — attention)
│
├──► Hypothesis Scoring (GPU — batched inference)
│
▼
Recomposition (CPU — symbolic + GPU scores)
│
▼
Final Response
Backend Selection
from fusionagi.gpu import get_backend, TensorBackend
backend: TensorBackend = get_backend() # Auto-selects best available
# Returns: TensorFlowBackend > NumPyBackend (fallback)
Dependencies
[project.optional-dependencies]
gpu = ["tensorflow>=2.16", "numpy>=1.26"]
TensorFlow 2.16+ includes:
- TensorCore (FP16/BF16 mixed-precision) via
tf.keras.mixed_precision - XLA compilation for GPU kernel fusion
tf.linalgfor batched linear algebra- TensorRT integration for inference optimization