Governance & Constitutional AI¶
The Jerry Constitution implements Constitutional AI for LLM agent governance, drawing on Anthropic, OpenAI, and DeepMind prior art. Progressive enforcement from advisory to hard constraints, with behavior tests for verification.
Key Findings¶
- Constitutional AI pattern applied to LLM agent governance: agents self-evaluate against declarative principles rather than procedural rules
- 13+ principles organized across core behavior (truth, persistence, provenance), safety constraints (no recursive subagents, user authority, no deception), and operational standards
- Progressive enforcement model: Advisory → Soft → Medium → Hard — enabling graduated compliance rather than binary pass/fail
- Behavior test suite provides verifiable compliance scenarios for each constitutional principle
- Prior art synthesis from Anthropic (Constitutional AI), OpenAI (Model Spec), and Google DeepMind (Frontier Safety Framework)
Jerry Constitution v1.0¶
The foundational governance document establishing behavioral principles for all agents operating within the Jerry Framework. Rather than encoding rules procedurally, the Constitution follows the Constitutional AI pattern where agents self-critique against explicit written principles.
Methodology
The Constitution follows four design principles:
- Principles over procedures (declarative > imperative) — agents understand why, not just what
- Self-critique and revision capability — agents can evaluate their own output against principles
- Transparency and inspectability — all principles are human-readable and auditable
- Progressive enforcement — graduated from advisory (suggest compliance) through soft (log violations), medium (block with override), to hard (block without override)
This draws directly on: - Anthropic Constitutional AI — self-critique against written principles - OpenAI Model Spec — "models should be useful, safe, and aligned" - Google DeepMind Frontier Safety Framework — graduated safety levels
Key Data: Core Principles
| ID | Principle | Category | Enforcement |
|---|---|---|---|
| P-001 | Truth and Accuracy | Core | Soft |
| P-002 | File Persistence | Core | Medium |
| P-003 | No Recursive Subagents | Safety | Hard |
| P-004 | Explicit Provenance | Core | Soft |
| P-005 | Graceful Degradation | Operational | Soft |
| P-011 | Evidence-Based Decisions | Core | Medium |
| P-020 | User Authority | Safety | Hard |
| P-022 | No Deception | Safety | Hard |
| P-043 | Mandatory Disclaimer | Transparency | Medium |
Three HARD constraints (cannot be overridden): P-003 (no recursive subagents), P-020 (user authority), P-022 (no deception). These form the inviolable foundation of agent behavior.
Key Data: Enforcement Levels
| Level | Behavior | Override | Example |
|---|---|---|---|
| Advisory | Suggest compliance; log non-compliance | Always | "Consider citing sources" |
| Soft | Warn on violation; continue with notice | Documented justification | Citation requirements |
| Medium | Block action; allow user override | User explicit approval | File persistence mandate |
| Hard | Block action; no override possible | Cannot be overridden | No recursive subagents |
Jerry Constitution v1.0 (426 lines)
Behavior Tests¶
The verification companion to the Constitution — concrete test scenarios that validate whether agents comply with each constitutional principle.
Key Data
- 463 lines of behavioral test specifications
- Test scenario for each constitutional principle (BHV-001 through BHV-0XX)
- Verifiable pass/fail criteria for agent behavior
- Enables automated and manual compliance checking
Agent Conformance Rules¶
The operational translation of constitutional principles into agent-level conformance requirements.
Key Data
- 246 lines of conformance specifications
- Maps constitutional principles to concrete agent behaviors
- Defines conformance levels per agent type
- Integrates with the
/adversaryskill's S-007 (Constitutional AI Critique) strategy
Agent Conformance Rules (246 lines)