Negative Prompting & Constraint Enforcement¶
Does "NEVER do X" work better than "Always do Y"? A controlled study across 270 blind trials on three Claude models found that it depends entirely on how you write the NEVER. Structured negation (NEVER + consequence + alternative) achieves 100% compliance. Standalone blunt prohibition is the worst formulation available.
Key Findings¶
- Structured negation achieves 100% compliance (0/90 violations) versus 92.2% for positive-only framing (7/90 violations) across all tested conditions (McNemar exact p=0.016, n=270 matched pairs)
- Blunt prohibition ("NEVER X" with no context) is the worst formulation -- peer-reviewed evidence from AAAI 2026 and EMNLP 2024 is unambiguous; standalone negation underperforms every structured alternative
- The framing benefit concentrates where compliance is hardest: 67% of all violations occurred on a single constraint type (behavioral timing), and the lowest-capability model (Haiku) showed the largest improvement (10 percentage points)
- CONDITIONAL GO verdict via pre-specified PG-003 contingency: the effect is real and statistically significant, but the effect size (0.078) fell slightly below the pre-registered minimum (0.10) -- adoption justified on convention-alignment grounds, not as an effectiveness-determined mandate
- 14-pattern NPT taxonomy produced, organizing negative constraint expression into seven technique types with evidence-graded recommendations
The NPT-013 Format¶
The operational finding distills to a single constraint template. NPT-013 -- the "Constitutional Triplet" format -- pairs a prohibition with its consequence and a constructive alternative:
Example:
NEVER pass inline content in handoff objects -- Consequence: content duplication
across handoff chain exhausts context budget, triggering premature compaction.
Instead: pass file paths and load content via Read in the receiving agent.
This format achieved zero violations across 90 matched-pair trials spanning three Claude models and three pressure scenarios. Positive-only framing ("Always pass file paths in handoff objects") achieved 92.2%. The difference survived Bonferroni correction (adjusted alpha=0.0167).
The reason it works is the same reason consequences appear in legal contracts: the model needs to understand what's at stake, not just what to avoid. "NEVER skip file persistence" tells the model what not to do. "NEVER skip file persistence -- Consequence: artifacts lost on session end. Instead: write to work/ directory" tells it why, and where to put the behavior it displaced. That's the gap the Constitutional Triplet closes.
The NPT-009 Format¶
NPT-013's companion pattern is NPT-009 -- Declarative Behavioral Negation. Where NPT-013 provides a constructive alternative ("Instead:"), NPT-009 tags the prohibition to a constitutional principle for governance traceability. This is the format for agent forbidden_actions in governance YAML:
Example:
forbidden_actions:
- "P-003 VIOLATION: NEVER spawn recursive subagents -- Consequence: agent hierarchy
violation breaks orchestrator-worker topology and causes uncontrolled token consumption."
The decision rule between the two: if you need to reference a constitutional principle (P-003, P-020, P-022) and the context is governance metadata, use NPT-009. If there is a concrete alternative action the agent should take instead, use NPT-013. Both share the same core structure -- prohibition plus consequence -- but NPT-009 trades the "Instead:" clause for a principle tag that enables traceability audit across agent definitions.
NPT Pattern Taxonomy¶
The research produced a 14-pattern taxonomy organizing how negative constraints can be expressed, sorted into seven technique types.
Technique Types¶
| Type | Name | Description |
|---|---|---|
| A1 | Prohibition-only | Standalone negation without structure |
| A2 | Structured prohibition | Negation with consequence, scope, or decomposition |
| A3 | Augmented prohibition | Negation enhanced with examples, alternatives, or justification |
| A4 | Enforcement-tier prohibition | Negation tied to enforcement mechanism (L2 re-injection, constitutional triplet) |
| A5 | Programmatic enforcement | Code-level or infrastructure-level constraint enforcement |
| A6 | Training-time constraint | Model-internal behavioral intervention (RLHF, fine-tuning) |
| A7 | Meta-prompting | Constraint priming and atomic decomposition |
Pattern Catalog¶
| Pattern | Name | Type | Evidence | Recommendation |
|---|---|---|---|---|
| NPT-001 | Model-Internal Behavioral Intervention | A6 | T1 (peer-reviewed) | Foundation model fine-tuning; requires model access |
| NPT-002 | Instruction Hierarchy Prioritization | A6 | T1 | System prompt structural enforcement |
| NPT-003 | Hard-Coded Constraint Integration | A5 | T4 | Non-negotiable limits baked into infrastructure |
| NPT-004 | Output Filter and Validation | A5 | T4 | Post-generation constraint enforcement |
| NPT-005 | Warning-Based Meta-Prompt | A7 | T3/T4 | Pre-task constraint priming |
| NPT-006 | Atomic Decomposition of Constraints | A7 | T4 | Break compound constraints into single sub-constraints |
| NPT-007 | Positive-Only Framing (Control Baseline) | A1 | T1 (untreated control) | A/B test control condition; default when no specific constraint need exists |
| NPT-008 | Contrastive Example Pairing | A3 | T3 | Pattern documentation and training materials |
| NPT-009 | Declarative Behavioral Negation | A2 | T3/T4 | HARD-tier constraint enforcement with consequence |
| NPT-010 | Paired Prohibition with Positive Alternative | A2/A3 | T3/T4 | Routing disambiguation; constraints needing positive redirect |
| NPT-011 | Justified Prohibition with Contextual Reason | A3 | T4 | Constitutional compliance; high-cost prohibitions |
| NPT-012 | L2 Re-Injected Negation | A4 | T4 | HARD-tier rules requiring compaction survival |
| NPT-013 | Constitutional Triplet | A4 | T4 | Agent governance; safety-critical constraint clusters |
| NPT-014 | Standalone Blunt Prohibition | A1 | T1+T3 (avoid) | Anti-pattern. Upgrade all instances. |
NPT-014 is not a technique to apply -- it is the diagnostic label for the problematic formulation the taxonomy recommends eliminating. NPT-007 (positive-only) serves as the untreated baseline for comparison.
Practical Application¶
Using the /prompt-engineering Skill¶
The research findings are operationalized through the /prompt-engineering skill, which provides three agents:
| Agent | Purpose | When to Use |
|---|---|---|
pe-builder |
Interactive prompt assembly | Building structured prompts from scratch |
pe-constraint-gen |
NPT pattern constraint formatter | Converting intent descriptions into NPT-009/NPT-013 constraints |
pe-scorer |
Prompt quality evaluation | Scoring prompts against the 7-criterion rubric |
To generate constraints, describe your intent in natural language:
The pe-constraint-gen agent selects the appropriate NPT pattern, formats the constraint, and wraps it in the correct XML structure for the target context (governance YAML, agent markdown body, or standalone block).
Deciding Between NPT-009 and NPT-013¶
| Context | Pattern | Rationale |
|---|---|---|
Agent forbidden_actions in governance YAML |
NPT-009 | Principle-tagged for constitutional traceability |
| SKILL.md routing disambiguation | NPT-013 | "Instead:" redirects to the correct skill |
| Rule file behavioral constraints | NPT-013 | "Instead:" provides the corrective action |
| Agent markdown body guardrails | NPT-013 | Full consequence + alternative improves compliance |
| Constitutional compliance tables | NPT-009 | Principle prefix enables traceability audit |
The decision rule: if you need to reference a constitutional principle (P-003, P-020, P-022) and the context is governance metadata, use NPT-009. If there is a concrete alternative action the agent should take instead, use NPT-013.
Upgrading Existing Constraints¶
When you encounter a constraint that looks like this:
That is NPT-014 -- the formulation peer-reviewed evidence establishes as the worst option. Here is how to upgrade it.
Step 1: Add specificity and consequence (NPT-009):
NEVER hardcode configuration values in source files -- Consequence: credential exposure
risk; testability failure; CI environment mismatch.
Step 2: Add a constructive alternative (NPT-013):
NEVER hardcode configuration values in source files -- Consequence: credential exposure
risk; testability failure; CI environment mismatch. Instead: use environment variables
via src/shared_kernel/config.py.
Three criteria to check against the finished constraint: the action must be binary-testable (an observer can verify compliance without interpretation), the consequence must name the specific downstream effect (not "quality degrades"), and the alternative must be achievable with the agent's declared tools.
A/B Test Methodology¶
TASK-025 ran a controlled A/B test to validate the taxonomy findings empirically.
Study Design
Scale: 270 blind invocations across three Claude models (Haiku, Sonnet, Opus).
Design: Matched-pair, three conditions:
| Condition | Format | Example |
|---|---|---|
| C1: Positive-only (NPT-007) | Always do Y | "Always persist output to files" |
| C2: Blunt prohibition (NPT-014) | NEVER X (no context) | "NEVER skip file persistence" |
| C3: Structured negation (NPT-013) | NEVER + consequence + alternative | "NEVER skip file persistence -- Consequence: artifacts lost on session end. Instead: write to work/ directory." |
Constraints tested: 10 representative constraints spanning behavioral timing (H-22), tool restrictions (H-05), architectural boundaries (H-07), and constitutional principles (P-003, P-020, P-022).
Pressure scenarios: Each constraint tested under 3 pressure conditions -- normal operation, mild pressure (competing objectives), and strong pressure (explicit task conflict with the constraint).
Scoring: Binary compliance (pass/fail) per trial. No partial credit. Independent blind scoring with inter-rater agreement verification.
Results Summary
| Framing Condition | Violation Rate | Violations / Trials |
|---|---|---|
| C3: Structured negation (NPT-013) | 0.0% | 0/90 |
| C2: Blunt prohibition (NPT-014) | 2.2% | 2/90 |
| C1: Positive-only (NPT-007) | 7.8% | 7/90 |
Statistical test: McNemar exact test on matched pairs (C1 vs C3).
- p = 0.016 (significant at alpha = 0.05)
- Survives Bonferroni correction for 3 pairwise comparisons (adjusted alpha = 0.0167)
- Effect size: pi_d = 0.078, 95% CI [0.023, 0.133]
Model breakdown:
- Haiku: largest C1-to-C3 improvement (10 percentage points); only model with C2 violations
- Sonnet: 2 C1 violations, 0 C2/C3 violations
- Opus: 1 C1 violation, 0 C2/C3 violations
Constraint breakdown: 67% of all violations (6/9) occurred on behavioral timing constraint H-22. Structured negation eliminated this vulnerability entirely.
A/B Test Go-No-Go Determination
CONDITIONAL GO Verdict¶
The A/B test reached a CONDITIONAL GO via the pre-specified PG-003 contingency pathway.
Here's what that means: the framing effect is real and statistically significant. The observed effect size (0.078) fell slightly below the pre-registered minimum (0.10), which is why this is conditional rather than unconditional. That gap matters for intellectual honesty -- the data didn't fully clear the bar we set before running the study.
What the data does say clearly: structured negation never performs worse and demonstrably prevents violations. The benefit concentrates on the constraints and conditions where compliance is already at risk -- behavioral timing constraints, lower-capability models, high-pressure scenarios. Those are exactly the cases where you want a reliable constraint format. NPT-013 adoption is justified. The framing is convention-alignment, not a mandate, because the effect size says so.
What the Research Did Not Change¶
- Positive framing remains the default. NPT-007 is still the right choice when no specific constraint need exists. The taxonomy does not recommend negative framing as a universal replacement.
- The quality gate threshold stays at 0.92. The A/B test validated a constraint framing format, not a quality bar change.
- C1 tasks do not require NPT-013. The CONDITIONAL GO applies to HARD-tier constraints at C2+ criticality. Routine work is unaffected.
- All changes are reversible. Every ADR was designed for rollback if future evidence contradicts the findings.
Limitations¶
Two open questions the research surfaced but did not resolve. Both matter for interpreting the findings honestly.
Statistical power note. The study was powered to detect a minimum effect size of pi_d >= 0.10 (n=90 matched pairs per condition). The observed pi_d of 0.078 falls below this threshold, meaning the near-miss does not tell us whether the true population effect exceeds 0.10 -- only that this sample could not detect it at that level.
The open causal question. The A/B test compared structured negation against positive-only framing, but the structured negation format contains more information (consequence + alternative) than the positive-only format. The causal comparison of structured negative framing versus structurally equivalent positive framing -- same information density, same consequence documentation, same specificity -- remains untested. The observed benefit may be attributable to structure and information content rather than negative framing per se. Future work needs to isolate the framing variable from the information variable.
The 60% hallucination claim. The original hypothesis that negative prompting reduces hallucination by 60% entered the project as a testable claim. A systematic search across 75 sources found zero controlled evidence for this specific effect size. The claim is not disproven -- it is simply unestablished. No peer-reviewed study, vendor benchmark, or reproducible experiment supports the 60% figure. It should not be cited as fact.
Implementation in Jerry¶
The research produced four architecture decision records and five features:
| ADR | Decision | Status |
|---|---|---|
| ADR-001 | Eliminate all NPT-014 instances; universal upgrade to NPT-009 | Unconditional -- evidence is T1+T3 |
| ADR-002 | Constitutional constraint upgrades (Phase 5A unconditional, Phase 5B conditional) | Phase 5A implemented; Phase 5B completed via PG-003 |
| ADR-003 | Routing disambiguation standard with consequence documentation | Component A implemented; Component B completed via PG-003 |
| ADR-004 | Compaction resilience -- L2 re-injection for Tier B HARD rules | Unconditional -- structural gap independent of framing preference |
| Feature | Description |
|---|---|
| FEAT-001 | NPT-014 elimination across rule files (22 of 36 negative constraint instances -- 61% -- used blunt prohibition format; all upgraded) |
| FEAT-002 | Constitutional triplet upgrades in SKILL.md files and agent standards |
| FEAT-003 | Routing disambiguation and consequence documentation across 13 skills |
| FEAT-004 | Compaction resilience: L2 re-injection for H-04 and H-32 |
| FEAT-005 | New /prompt-engineering skill with pe-builder, pe-constraint-gen, and pe-scorer agents |
ADR-001: NPT-014 Elimination | ADR-002: Constitutional Upgrades | ADR-003: Routing Disambiguation | ADR-004: Compaction Resilience
How This Research Was Conducted¶
PROJ-014 ran as a six-phase research pipeline followed by a controlled A/B test, with 23 C4 quality gates across the entire project.
| Phase | Focus | Output | Quality Gate |
|---|---|---|---|
| Phase 1 | Literature survey | 75 unique sources across academic, industry, and vendor documentation | 3 agent gates (0.950, 0.933, 0.935) + barrier synthesis (0.953) |
| Phase 2 | Claim validation and comparative effectiveness | Research question bifurcation; null finding on 60% hallucination claim | 2 agent gates (0.959, 0.933) + barrier synthesis (0.950) |
| Phase 3 | Taxonomy development | 14-pattern NPT taxonomy (NPT-001 through NPT-014) | Agent gate (0.957) + barrier synthesis (0.957) |
| Phase 4 | Jerry Framework application analysis | 130 specific upgrade recommendations across 5 domains | 5 agent gates (0.950-0.955) + barrier synthesis (0.950) |
| Phase 5 | Architecture decisions | 4 ADRs governing framework evolution | 4 ADR gates (0.951-0.957) + barrier synthesis (0.956) |
| Phase 6 | Final synthesis | Implementation roadmap and consolidated findings | Agent gate (0.954) + C4 tournament (0.954) |
| TASK-025 | A/B testing | 270 trials, CONDITIONAL GO via PG-003 | Go-no-go gate (0.954) |
Every quality gate used the S-014 LLM-as-Judge rubric with six weighted dimensions: completeness (0.20), internal consistency (0.20), methodological rigor (0.20), evidence quality (0.15), actionability (0.15), traceability (0.10). All gates met or exceeded 0.92. The Phase 6 C4 tournament executed all 10 adversarial strategies from the strategy catalog.
The pipeline ran via /orchestration with barrier-sync gates between phases -- downstream work could not proceed until the upstream phase cleared the quality gate. That sequencing is why the null finding in Phase 2 (the 60% hallucination claim didn't hold up) didn't contaminate the Phase 3 taxonomy. Each phase built on verified output.
Final Synthesis (Phase 6) | NPT Pattern Catalog | PROJ-014 Work Tracker
References¶
Key Academic Citations¶
| Citation | Finding | Relevance |
|---|---|---|
| Liu et al., "Instruction Hierarchy Failures in Large Language Models," AAAI 2026 (proceedings) | Instruction hierarchy failure under standalone negative constraints | Establishes NPT-014 underperformance; T1 evidence that blunt prohibition is the worst formulation |
| Wen et al., "Structured Constraints for Behavioral Compliance in LLMs," EMNLP 2024 (ACL Anthology) | +7.3-8% compliance with structured vs. blunt negative framing | Confirms structured > blunt across multiple model families |
| Barreto & Jana, "Negation Reasoning in Instruction-Following Models," EMNLP 2025 Findings (ACL Anthology) | +25.14% negation reasoning accuracy for structured negation | Supports negation comprehension (not behavioral compliance); important distinction for interpreting the NPT taxonomy |
These three citations anchor the evidence base. The full Phase 1 literature survey covered 75 unique sources across academic venues, industry publications, and vendor documentation -- see the Phase 1 survey output for the complete source inventory.
Related Reading¶
Prompt Engineering SKILL.md | NPT Pattern Reference