Governance & Constitutional AI¶

The Jerry Constitution implements Constitutional AI for LLM agent governance, drawing on Anthropic, OpenAI, and DeepMind prior art. Progressive enforcement from advisory to hard constraints, with behavior tests for verification.

Key Findings¶

Constitutional AI pattern applied to LLM agent governance: agents self-evaluate against declarative principles rather than procedural rules
13+ principles organized across core behavior (truth, persistence, provenance), safety constraints (no recursive subagents, user authority, no deception), and operational standards
Progressive enforcement model: Advisory → Soft → Medium → Hard — enabling graduated compliance rather than binary pass/fail
Behavior test suite provides verifiable compliance scenarios for each constitutional principle
Prior art synthesis from Anthropic (Constitutional AI), OpenAI (Model Spec), and Google DeepMind (Frontier Safety Framework)

Jerry Constitution v1.0¶

The foundational governance document establishing behavioral principles for all agents operating within the Jerry Framework. Rather than encoding rules procedurally, the Constitution follows the Constitutional AI pattern where agents self-critique against explicit written principles.

Methodology

The Constitution follows four design principles:

Principles over procedures (declarative > imperative) — agents understand why, not just what
Self-critique and revision capability — agents can evaluate their own output against principles
Transparency and inspectability — all principles are human-readable and auditable
Progressive enforcement — graduated from advisory (suggest compliance) through soft (log violations), medium (block with override), to hard (block without override)

This draws directly on: - Anthropic Constitutional AI — self-critique against written principles - OpenAI Model Spec — "models should be useful, safe, and aligned" - Google DeepMind Frontier Safety Framework — graduated safety levels

Key Data: Core Principles

ID	Principle	Category	Enforcement
P-001	Truth and Accuracy	Core	Soft
P-002	File Persistence	Core	Medium
P-003	No Recursive Subagents	Safety	Hard
P-004	Explicit Provenance	Core	Soft
P-005	Graceful Degradation	Operational	Soft
P-011	Evidence-Based Decisions	Core	Medium
P-020	User Authority	Safety	Hard
P-022	No Deception	Safety	Hard
P-043	Mandatory Disclaimer	Transparency	Medium

Three HARD constraints (cannot be overridden): P-003 (no recursive subagents), P-020 (user authority), P-022 (no deception). These form the inviolable foundation of agent behavior.

Key Data: Enforcement Levels

Level	Behavior	Override	Example
Advisory	Suggest compliance; log non-compliance	Always	"Consider citing sources"
Soft	Warn on violation; continue with notice	Documented justification	Citation requirements
Medium	Block action; allow user override	User explicit approval	File persistence mandate
Hard	Block action; no override possible	Cannot be overridden	No recursive subagents

Jerry Constitution v1.0 (426 lines)

Behavior Tests¶

The verification companion to the Constitution — concrete test scenarios that validate whether agents comply with each constitutional principle.

Key Data

463 lines of behavioral test specifications
Test scenario for each constitutional principle (BHV-001 through BHV-0XX)
Verifiable pass/fail criteria for agent behavior
Enables automated and manual compliance checking

Behavior Tests (463 lines)

Agent Conformance Rules¶

The operational translation of constitutional principles into agent-level conformance requirements.