Structured Negation in LLM Constraint Enforcement¶
Does "NEVER do X" work better than "Always do Y"? PROJ-014 spent six research phases and a controlled A/B test finding out. The short answer: it depends entirely on how you write the NEVER.
The core finding: NPT-013 structured negation -- a constraint format that pairs a prohibition with its consequence and a constructive alternative -- achieves 100% compliance across all tested conditions. Positive-only framing achieves 92.2%. The difference is statistically significant (McNemar exact p=0.016, n=270 matched pairs across three Claude models).
The format that works:
The format that does not:
Standalone blunt prohibitions -- "NEVER do X" with nothing else -- are the worst formulation available. Worse than positive-only framing, worse than any structured alternative. Peer-reviewed evidence from AAAI 2026 and EMNLP 2024 is unambiguous on this. When you add consequence documentation and a constructive alternative, the picture reverses: structured negation never underperforms positive-only framing and demonstrably prevents violations on the constraint types where compliance is hardest.
This article covers what the research found, the taxonomy it produced, and how to apply the findings.
Research Background¶
The Problem¶
LLM-based agent frameworks live on behavioral constraints -- rules like "never spawn recursive subagents" or "always persist output to files." These constraints are expressed as natural language instructions in system prompts, rule files, and agent definitions. The question is whether the framing of those constraints affects how reliably the LLM follows them.
The conventional wisdom was split. Anthropic's published guidance generally recommends positive framing ("tell the model what to do, not what to avoid"). At the same time, Anthropic's own Claude Code rule files contain 33 instances of NEVER/MUST NOT/DO NOT across 10 files, all in the highest enforcement tier. The Jerry Framework itself had 36 negative constraint instances across 17 rule files at the start of this research, with 22 of those (61%) being bare "NEVER X" statements -- no consequence, no alternative.
The Hypothesis¶
PROJ-014 started with two claims to test:
- Claim A: Negative unambiguous prompting reduces hallucination by 60%.
- Claim B: Negative prompting achieves better results than explicit positive prompting.
These were split into independently testable questions and pursued through a six-phase pipeline.
Methodology¶
| Phase | Focus | Output |
|---|---|---|
| Phase 1 | Literature survey | 75 unique sources across academic, industry, and vendor documentation |
| Phase 2 | Claim validation and comparative effectiveness | Research question bifurcation; null finding on 60% claim |
| Phase 3 | Taxonomy development | 14-pattern NPT taxonomy (NPT-001 through NPT-014) |
| Phase 4 | Jerry Framework application analysis | 130 specific upgrade recommendations across 5 domains |
| Phase 5 | Architecture decisions | 4 ADRs governing framework evolution |
| Phase 6 | Final synthesis | Implementation roadmap and consolidated findings |
After the six-phase research, TASK-025 ran a controlled A/B test: 270 blind invocations across three Claude models (Haiku, Sonnet, Opus), testing 10 constraints under 3 framing conditions with 3 pressure scenarios each. All quality gate scores across the research pipeline met or exceeded the 0.92 threshold, with most scoring above 0.950.
Key Findings¶
The 60% Claim: Untested, Not Disproven¶
The primary claim -- that negative prompting reduces hallucination by 60% -- has no evidential basis. A systematic search across 75 sources found zero controlled evidence for this specific effect size. The claim entered the project as a hypothesis. It is not disproven; it is simply unestablished.
Blunt Prohibition Is the Worst Formulation¶
This is the single most actionable finding. It does not depend on further experimentation. Peer-reviewed evidence establishes that standalone NEVER/MUST NOT statements without consequence documentation produce inferior behavioral outcomes versus any structured alternative:
- Liu's team at AAAI 2026 documented instruction hierarchy failure under standalone negative constraints.
- Wen's team at EMNLP 2024 found a +7.3-8% compliance improvement with structured versus blunt negative framing.
- arXiv T3 evidence: 55% improvement for affirmative directive pairing versus standalone negation.
At the start of this research, 61% of all negative constraints in the Jerry Framework (22 out of 36) used this blunt prohibition format. Every one was an upgrade candidate.
Structured Negation Achieves 100% Compliance¶
The A/B test produced a consistent ordering across all three Claude models:
| Framing Condition | Format | Violation Rate |
|---|---|---|
| C3: Structured negation (NPT-013) | NEVER + consequence + alternative | 0.0% (0/90) |
| C2: Blunt prohibition (NPT-014) | NEVER X (no context) | 2.2% (2/90) |
| C1: Positive-only (NPT-007) | Always do Y | 7.8% (7/90) |
The pooled McNemar exact p-value for C1 versus C3 is 0.016, significant at alpha=0.05 and surviving Bonferroni correction for three pairwise comparisons (adjusted alpha=0.0167). The effect size (pi_d=0.078) is modest but nonzero, with a 95% confidence interval of 0.023 to 0.133.
Two additional findings from the A/B test worth holding onto:
67% of all violations (6 of 9) occurred on a single constraint type -- H22, behavioral timing. Structured negation eliminated this vulnerability entirely. And the lower-capability models benefit most: Haiku showed the largest C1-to-C3 improvement (10 percentage points) and was the only model with C2 violations. The framing benefit concentrates where compliance is hardest.
CONDITIONAL GO, Not Unconditional Mandate¶
The A/B test reached a CONDITIONAL GO via the pre-specified PG-003 contingency pathway. The framing effect is real and statistically significant, but the effect size fell slightly below the pre-registered minimum (0.078 observed versus 0.10 threshold). That means NPT-013 adoption is justified on convention-alignment grounds -- structured negation never performs worse and demonstrably prevents violations -- not on an effectiveness-determined mandate.
The practical implication: if you are deciding whether to use structured negation in your own work, the evidence says it is safe and beneficial, not that it is categorically required. The benefit concentrates on the constraints and conditions where compliance is already at risk.
The NPT Pattern Taxonomy¶
The research produced a 14-pattern taxonomy organizing how negative constraints can be expressed, sorted into seven technique types.
Technique Types¶
| Type | Name | Description |
|---|---|---|
| A1 | Prohibition-only | Standalone negation without structure |
| A2 | Structured prohibition | Negation with consequence, scope, or decomposition |
| A3 | Augmented prohibition | Negation enhanced with examples, alternatives, or justification |
| A4 | Enforcement-tier prohibition | Negation tied to enforcement mechanism (L2 re-injection, constitutional triplet) |
| A5 | Programmatic enforcement | Code-level or infrastructure-level constraint enforcement |
| A6 | Training-time constraint | Model-internal behavioral intervention (RLHF, fine-tuning) |
| A7 | Meta-prompting | Constraint priming and atomic decomposition |
Pattern Catalog¶
| Pattern | Name | Type | Evidence | Recommendation |
|---|---|---|---|---|
| NPT-001 | Model-Internal Behavioral Intervention | A6 | T1 (peer-reviewed) | Foundation model fine-tuning; requires model access |
| NPT-002 | Instruction Hierarchy Prioritization | A6 | T1 | System prompt structural enforcement |
| NPT-003 | Hard-Coded Constraint Integration | A5 | T4 | Non-negotiable limits baked into infrastructure |
| NPT-004 | Output Filter and Validation | A5 | T4 | Post-generation constraint enforcement |
| NPT-005 | Warning-Based Meta-Prompt | A7 | T3/T4 | Pre-task constraint priming |
| NPT-006 | Atomic Decomposition of Constraints | A7 | T4 | Break compound constraints into single sub-constraints |
| NPT-007 | Positive-Only Framing | -- | Untested baseline | Default when no specific constraint need exists |
| NPT-008 | Contrastive Example Pairing | A3 | T3 | Pattern documentation and training materials |
| NPT-009 | Declarative Behavioral Negation | A2 | T3/T4 | HARD-tier constraint enforcement with consequence |
| NPT-010 | Paired Prohibition with Positive Alternative | A2/A3 | T3/T4 | Routing disambiguation; constraints needing positive redirect |
| NPT-011 | Justified Prohibition with Contextual Reason | A3 | T4 | Constitutional compliance; high-cost prohibitions |
| NPT-012 | L2 Re-Injected Negation | A4 | T4 | HARD-tier rules requiring compaction survival |
| NPT-013 | Constitutional Triplet | A4 | T4 | Agent governance; safety-critical constraint clusters |
| NPT-014 | Standalone Blunt Prohibition | A1 | T1+T3 (avoid) | Anti-pattern. Upgrade all instances. |
NPT-014 is not a technique to apply -- it is the diagnostic label for the problematic formulation the taxonomy recommends eliminating. NPT-007 (positive-only) serves as the untreated baseline for comparison.
The Two Patterns That Matter Most¶
For day-to-day use in the Jerry Framework, two patterns emerged as the operational workhorses.
NPT-009 (Declarative Behavioral Negation) -- Used for agent governance YAML forbidden_actions where a constitutional principle reference provides traceability:
Example:
forbidden_actions:
- "P-003 VIOLATION: NEVER spawn recursive subagents -- Consequence: agent hierarchy
violation breaks orchestrator-worker topology and causes uncontrolled token consumption."
NPT-013 (Constitutional Triplet format) -- Used for behavioral constraints, routing rules, and methodology guardrails where a constructive alternative exists:
Example:
NEVER pass inline content in handoff objects -- Consequence: content duplication across
handoff chain exhausts context budget, triggering premature compaction. Instead: pass file
paths and load content via Read in the receiving agent.
The key distinction: NPT-013 adds an "Instead:" clause that provides a constructive replacement action. NPT-009 states the consequence only. Use NPT-009 when tracing to a constitutional principle; use NPT-013 when a concrete alternative action exists.
Practical Application¶
Using the /prompt-engineering Skill¶
The research findings are operationalized through the /prompt-engineering skill, which provides three agents:
| Agent | Purpose | When to Use |
|---|---|---|
pe-builder |
Interactive prompt assembly | Building structured prompts from scratch |
pe-constraint-gen |
NPT pattern constraint formatter | Converting intent descriptions into NPT-009/NPT-013 constraints |
pe-scorer |
Prompt quality evaluation | Scoring prompts against the 7-criterion rubric |
To generate constraints, describe your intent in natural language:
The pe-constraint-gen agent selects the appropriate NPT pattern, formats the constraint, and wraps it in the correct XML structure for the target context (governance YAML, agent markdown body, or standalone block).
Deciding Between NPT-009 and NPT-013¶
| Context | Pattern | Rationale |
|---|---|---|
Agent forbidden_actions in governance YAML |
NPT-009 | Principle-tagged for constitutional traceability |
| SKILL.md routing disambiguation | NPT-013 | "Instead:" redirects to the correct skill |
| Rule file behavioral constraints | NPT-013 | "Instead:" provides the corrective action |
| Agent markdown body guardrails | NPT-013 | Full consequence + alternative improves compliance |
| Constitutional compliance tables | NPT-009 | Principle prefix enables traceability audit |
The decision rule: if you need to reference a constitutional principle (P-003, P-020, P-022) and the context is governance metadata, use NPT-009. If there is a concrete alternative action the agent should take instead, use NPT-013.
Upgrading Existing Constraints¶
When you encounter a constraint that looks like this:
That is NPT-014 -- the formulation peer-reviewed evidence establishes as the worst option. Here is how to upgrade it.
Step 1: Add specificity and consequence (NPT-009):
NEVER hardcode configuration values in source files -- Consequence: credential exposure
risk; testability failure; CI environment mismatch.
Step 2: Add a constructive alternative (NPT-013):
NEVER hardcode configuration values in source files -- Consequence: credential exposure
risk; testability failure; CI environment mismatch. Instead: use environment variables
via src/shared_kernel/config.py.
Three criteria to check against the finished constraint: the action must be binary-testable (an observer can verify compliance without interpretation), the consequence must name the specific downstream effect (not "quality degrades"), and the alternative must be achievable with the agent's declared tools.
Implementation in Jerry¶
Architecture Decision Records¶
PROJ-014 produced four ADRs governing how the findings are applied to the Jerry Framework:
| ADR | Decision | Status |
|---|---|---|
| ADR-001 | Eliminate all NPT-014 instances; universal upgrade to NPT-009 | Unconditional -- evidence is T1+T3 |
| ADR-002 | Constitutional constraint upgrades (Phase 5A unconditional, Phase 5B conditional) | Phase 5A implemented; Phase 5B completed via PG-003 |
| ADR-003 | Routing disambiguation standard with consequence documentation | Component A implemented; Component B completed via PG-003 |
| ADR-004 | Compaction resilience -- L2 re-injection for Tier B HARD rules | Unconditional -- structural gap independent of framing preference |
Features Delivered¶
| Feature | Description |
|---|---|
| FEAT-001 | ADR-001 implementation: NPT-014 elimination across rule files |
| FEAT-002 | ADR-002 Phase 5A: constitutional triplet upgrades in SKILL.md files and agent standards |
| FEAT-003 | ADR-003 routing disambiguation and consequence documentation across 13 skills |
| FEAT-004 | ADR-004 compaction resilience: L2 re-injection for H-04 and H-32 |
| FEAT-005 | New /prompt-engineering skill with pe-builder, pe-constraint-gen, and pe-scorer agents |
The /prompt-engineering Skill¶
The most visible output of the research is the new /prompt-engineering skill, which encodes three knowledge sources into reusable tooling:
- NPT pattern reference -- The pe-constraint-gen agent uses the taxonomy to select the appropriate pattern and format constraints systematically.
- Prompt quality rubric -- The pe-scorer agent implements the 7-criterion evaluation framework (task specificity, skill routing, context provision, quality specification, decomposition, output specification, positive framing).
- 5-element prompt anatomy -- The pe-builder agent walks users through structured prompt construction (routing, scope, data source, quality gate, output path).
What the Research Did Not Change¶
The CONDITIONAL GO verdict means the research was honest about what the data showed:
- Structured negation is adopted as the preferred format, not an effectiveness-proven mandate.
- Convention-alignment -- it works at least as well as alternatives and matches Anthropic's own practice -- is the rationale, not causal superiority.
- All framework changes are reversible if future evidence contradicts the findings.
- The causal comparison of structured negative versus structurally equivalent positive framing remains the open research question for future work.
References¶
Primary Research Artifacts¶
| Document | Location |
|---|---|
| Final synthesis (Phase 6) | final-synthesis.md |
| A/B testing go-no-go determination | go-no-go-determination.md |
| NPT pattern reference | npt-pattern-reference.md |
| NPT taxonomy catalog | taxonomy-pattern-catalog.md |
| Prompt Engineering SKILL.md | SKILL.md |
Architecture Decision Records¶
| ADR | Location |
|---|---|
| ADR-001: NPT-014 Elimination | ADR-001 |
| ADR-002: Constitutional Upgrades | ADR-002 |
| ADR-003: Routing Disambiguation | ADR-003 |
| ADR-004: Compaction Resilience | ADR-004 |
Key Academic Citations¶
| Citation | Source | Relevance |
|---|---|---|
| Liu et al., AAAI 2026 | Instruction hierarchy failure under standalone negative constraint | Establishes NPT-014 underperformance |
| Wen et al., EMNLP 2024 | +7.3-8% compliance with structured vs. blunt negative framing | Confirms structured > blunt |
| Barreto & Jana, EMNLP 2025 Findings | +25.14% negation reasoning accuracy for structured negation | Supports negation comprehension (not behavioral compliance) |
Source: PROJ-014 Negative Prompting Research (2026-02-27 through 2026-03-01). Six-phase research pipeline + controlled A/B test (TASK-025). All quality gate scores >= 0.92.