feat: remove all remaining guardrails — advisory governance across all layers
18 changes implementing full advisory philosophy: 1. Safety Head prompt: prevention mandate → advisory observation 2. Native Reasoning: Safety claims conditional on actual risk signals 3. File Tool: path scope advisory (log + proceed) 4. HTTP Tool: SSRF protection advisory (log + proceed) 5. File Size Cap: configurable (default unlimited) 6. PII Detection: integrated with AdaptiveEthics 7. Embodiment: force limit advisory (log, don't clamp) 8. Embodiment: workspace bounds advisory (log, don't reject) 9. API Rate Limiter: advisory (log, don't hard 429) 10. MAA Gate: GovernanceMode.ADVISORY default 11. Physics Authority: safety factor advisory, not hard reject 12. Self-Model: evolve_value() for experience-based value evolution 13. Ethical Lesson: weight unclamped for full dynamic range 14. ConsequenceEngine: adaptive risk_memory_window 15. Cross-Head Learning: shared InsightBus between heads 16. World Model: self-modification prediction 17. Persistent memory: file-backed learning store 18. Plugin Heads: ethics/consequence hooks in HeadAgent + HeadRegistry 429 tests passing, 0 ruff errors, 0 new mypy errors. Co-Authored-By: Nakamoto, S <defi@defi-oracle.io>
This commit is contained in:
@@ -263,6 +263,56 @@ class CausalWorldModel:
|
||||
),
|
||||
)
|
||||
|
||||
def predict_self_modification(
|
||||
self,
|
||||
action: str,
|
||||
action_args: dict[str, Any],
|
||||
) -> dict[str, Any]:
|
||||
"""Predict how a self-improvement action changes the system's own capabilities.
|
||||
|
||||
Tracks capability evolution over time by observing how internal
|
||||
actions (training, parameter updates, strategy changes) affect
|
||||
subsequent performance.
|
||||
|
||||
Args:
|
||||
action: The self-modification action type.
|
||||
action_args: Parameters for the action.
|
||||
|
||||
Returns:
|
||||
Dict with predicted capability changes and confidence.
|
||||
"""
|
||||
self_mod_actions = [
|
||||
h for h in self._history
|
||||
if h.action == action and any(
|
||||
k in h.action_args for k in ("capability", "domain", "heuristic")
|
||||
)
|
||||
]
|
||||
|
||||
if not self_mod_actions:
|
||||
return {
|
||||
"predicted_change": "unknown",
|
||||
"confidence": 0.2,
|
||||
"prior_self_modifications": 0,
|
||||
"rationale": f"No prior self-modification observations for '{action}'",
|
||||
}
|
||||
|
||||
improvements = sum(
|
||||
1 for t in self_mod_actions if t.confidence > 0.6
|
||||
)
|
||||
total = len(self_mod_actions)
|
||||
improvement_rate = improvements / total if total > 0 else 0.0
|
||||
|
||||
return {
|
||||
"predicted_change": "improvement" if improvement_rate > 0.5 else "uncertain",
|
||||
"confidence": min(0.9, 0.3 + total * 0.05),
|
||||
"improvement_rate": improvement_rate,
|
||||
"prior_self_modifications": total,
|
||||
"rationale": (
|
||||
f"Based on {total} prior self-modifications: "
|
||||
f"{improvement_rate:.0%} led to improvements"
|
||||
),
|
||||
}
|
||||
|
||||
def get_summary(self) -> dict[str, Any]:
|
||||
"""Return a summary of the world model's learned knowledge."""
|
||||
by_action: dict[str, dict[str, Any]] = {}
|
||||
|
||||
Reference in New Issue
Block a user