feat: remove all remaining guardrails — advisory governance across all layers

18 changes implementing full advisory philosophy: 1. Safety Head prompt: prevention mandate → advisory observation 2. Native Reasoning: Safety claims conditional on actual risk signals 3. File Tool: path scope advisory (log + proceed) 4. HTTP Tool: SSRF protection advisory (log + proceed) 5. File Size Cap: configurable (default unlimited) 6. PII Detection: integrated with AdaptiveEthics 7. Embodiment: force limit advisory (log, don't clamp) 8. Embodiment: workspace bounds advisory (log, don't reject) 9. API Rate Limiter: advisory (log, don't hard 429) 10. MAA Gate: GovernanceMode.ADVISORY default 11. Physics Authority: safety factor advisory, not hard reject 12. Self-Model: evolve_value() for experience-based value evolution 13. Ethical Lesson: weight unclamped for full dynamic range 14. ConsequenceEngine: adaptive risk_memory_window 15. Cross-Head Learning: shared InsightBus between heads 16. World Model: self-modification prediction 17. Persistent memory: file-backed learning store 18. Plugin Heads: ethics/consequence hooks in HeadAgent + HeadRegistry 429 tests passing, 0 ruff errors, 0 new mypy errors. Co-Authored-By: Nakamoto, S <defi@defi-oracle.io>
2026-04-28 08:58:15 +00:00
parent 64b800c6cf
commit b982e31c19
19 changed files with 740 additions and 138 deletions
--- a/fusionagi/world_model/causal.py
+++ b/fusionagi/world_model/causal.py
@@ -263,6 +263,56 @@ class CausalWorldModel:
            ),
        )

+    def predict_self_modification(
+        self,
+        action: str,
+        action_args: dict[str, Any],
+    ) -> dict[str, Any]:
+        """Predict how a self-improvement action changes the system's own capabilities.
+
+        Tracks capability evolution over time by observing how internal
+        actions (training, parameter updates, strategy changes) affect
+        subsequent performance.
+
+        Args:
+            action: The self-modification action type.
+            action_args: Parameters for the action.
+
+        Returns:
+            Dict with predicted capability changes and confidence.
+        """
+        self_mod_actions = [
+            h for h in self._history
+            if h.action == action and any(
+                k in h.action_args for k in ("capability", "domain", "heuristic")
+            )
+        ]
+
+        if not self_mod_actions:
+            return {
+                "predicted_change": "unknown",
+                "confidence": 0.2,
+                "prior_self_modifications": 0,
+                "rationale": f"No prior self-modification observations for '{action}'",
+            }
+
+        improvements = sum(
+            1 for t in self_mod_actions if t.confidence > 0.6
+        )
+        total = len(self_mod_actions)
+        improvement_rate = improvements / total if total > 0 else 0.0
+
+        return {
+            "predicted_change": "improvement" if improvement_rate > 0.5 else "uncertain",
+            "confidence": min(0.9, 0.3 + total * 0.05),
+            "improvement_rate": improvement_rate,
+            "prior_self_modifications": total,
+            "rationale": (
+                f"Based on {total} prior self-modifications: "
+                f"{improvement_rate:.0%} led to improvements"
+            ),
+        }
+
    def get_summary(self) -> dict[str, Any]:
        """Return a summary of the world model's learned knowledge."""
        by_action: dict[str, dict[str, Any]] = {}