Skip to content

Negative Prompting & Constraint Enforcement

Does "NEVER do X" work better than "Always do Y"? A controlled study across 270 blind trials on three Claude models found that it depends entirely on how you write the NEVER. Structured negation (NEVER + consequence + alternative) achieves 100% compliance. Standalone blunt prohibition is the worst formulation available.


Key Findings

  • Structured negation achieves 100% compliance (0/90 violations) versus 92.2% for positive-only framing (7/90 violations) across all tested conditions (McNemar exact p=0.016, n=270 matched pairs)
  • Blunt prohibition ("NEVER X" with no context) is the worst formulation -- peer-reviewed evidence from AAAI 2026 and EMNLP 2024 is unambiguous; standalone negation underperforms every structured alternative
  • The framing benefit concentrates where compliance is hardest: 67% of all violations occurred on a single constraint type (behavioral timing), and the lowest-capability model (Haiku) showed the largest improvement (10 percentage points)
  • CONDITIONAL GO verdict via pre-specified PG-003 contingency: the effect is real and statistically significant, but the effect size (0.078) fell slightly below the pre-registered minimum (0.10) -- adoption justified on convention-alignment grounds, not as an effectiveness-determined mandate
  • 14-pattern NPT taxonomy produced, organizing negative constraint expression into seven technique types with evidence-graded recommendations

The NPT-013 Format

The operational finding distills to a single constraint template. NPT-013 -- the "Constitutional Triplet" format -- pairs a prohibition with its consequence and a constructive alternative:

NEVER {action} -- Consequence: {cascading impact}. Instead: {actionable alternative}.

Example:

NEVER pass inline content in handoff objects -- Consequence: content duplication
across handoff chain exhausts context budget, triggering premature compaction.
Instead: pass file paths and load content via Read in the receiving agent.

This format achieved zero violations across 90 matched-pair trials spanning three Claude models and three pressure scenarios. Positive-only framing ("Always pass file paths in handoff objects") achieved 92.2%. The difference survived Bonferroni correction (adjusted alpha=0.0167).

The reason it works is the same reason consequences appear in legal contracts: the model needs to understand what's at stake, not just what to avoid. "NEVER skip file persistence" tells the model what not to do. "NEVER skip file persistence -- Consequence: artifacts lost on session end. Instead: write to work/ directory" tells it why, and where to put the behavior it displaced. That's the gap the Constitutional Triplet closes.


The NPT-009 Format

NPT-013's companion pattern is NPT-009 -- Declarative Behavioral Negation. Where NPT-013 provides a constructive alternative ("Instead:"), NPT-009 tags the prohibition to a constitutional principle for governance traceability. This is the format for agent forbidden_actions in governance YAML:

{PRINCIPLE} VIOLATION: NEVER {action} -- Consequence: {cascading impact}.

Example:

forbidden_actions:
  - "P-003 VIOLATION: NEVER spawn recursive subagents -- Consequence: agent hierarchy
    violation breaks orchestrator-worker topology and causes uncontrolled token consumption."

The decision rule between the two: if you need to reference a constitutional principle (P-003, P-020, P-022) and the context is governance metadata, use NPT-009. If there is a concrete alternative action the agent should take instead, use NPT-013. Both share the same core structure -- prohibition plus consequence -- but NPT-009 trades the "Instead:" clause for a principle tag that enables traceability audit across agent definitions.


NPT Pattern Taxonomy

The research produced a 14-pattern taxonomy organizing how negative constraints can be expressed, sorted into seven technique types.

Technique Types

Type Name Description
A1 Prohibition-only Standalone negation without structure
A2 Structured prohibition Negation with consequence, scope, or decomposition
A3 Augmented prohibition Negation enhanced with examples, alternatives, or justification
A4 Enforcement-tier prohibition Negation tied to enforcement mechanism (L2 re-injection, constitutional triplet)
A5 Programmatic enforcement Code-level or infrastructure-level constraint enforcement
A6 Training-time constraint Model-internal behavioral intervention (RLHF, fine-tuning)
A7 Meta-prompting Constraint priming and atomic decomposition

Pattern Catalog

Pattern Name Type Evidence Recommendation
NPT-001 Model-Internal Behavioral Intervention A6 T1 (peer-reviewed) Foundation model fine-tuning; requires model access
NPT-002 Instruction Hierarchy Prioritization A6 T1 System prompt structural enforcement
NPT-003 Hard-Coded Constraint Integration A5 T4 Non-negotiable limits baked into infrastructure
NPT-004 Output Filter and Validation A5 T4 Post-generation constraint enforcement
NPT-005 Warning-Based Meta-Prompt A7 T3/T4 Pre-task constraint priming
NPT-006 Atomic Decomposition of Constraints A7 T4 Break compound constraints into single sub-constraints
NPT-007 Positive-Only Framing (Control Baseline) A1 T1 (untreated control) A/B test control condition; default when no specific constraint need exists
NPT-008 Contrastive Example Pairing A3 T3 Pattern documentation and training materials
NPT-009 Declarative Behavioral Negation A2 T3/T4 HARD-tier constraint enforcement with consequence
NPT-010 Paired Prohibition with Positive Alternative A2/A3 T3/T4 Routing disambiguation; constraints needing positive redirect
NPT-011 Justified Prohibition with Contextual Reason A3 T4 Constitutional compliance; high-cost prohibitions
NPT-012 L2 Re-Injected Negation A4 T4 HARD-tier rules requiring compaction survival
NPT-013 Constitutional Triplet A4 T4 Agent governance; safety-critical constraint clusters
NPT-014 Standalone Blunt Prohibition A1 T1+T3 (avoid) Anti-pattern. Upgrade all instances.

NPT-014 is not a technique to apply -- it is the diagnostic label for the problematic formulation the taxonomy recommends eliminating. NPT-007 (positive-only) serves as the untreated baseline for comparison.


Practical Application

Using the /prompt-engineering Skill

The research findings are operationalized through the /prompt-engineering skill, which provides three agents:

Agent Purpose When to Use
pe-builder Interactive prompt assembly Building structured prompts from scratch
pe-constraint-gen NPT pattern constraint formatter Converting intent descriptions into NPT-009/NPT-013 constraints
pe-scorer Prompt quality evaluation Scoring prompts against the 7-criterion rubric

To generate constraints, describe your intent in natural language:

"Generate NPT-013 constraints for a research agent that must not hallucinate sources"

The pe-constraint-gen agent selects the appropriate NPT pattern, formats the constraint, and wraps it in the correct XML structure for the target context (governance YAML, agent markdown body, or standalone block).

Deciding Between NPT-009 and NPT-013

Context Pattern Rationale
Agent forbidden_actions in governance YAML NPT-009 Principle-tagged for constitutional traceability
SKILL.md routing disambiguation NPT-013 "Instead:" redirects to the correct skill
Rule file behavioral constraints NPT-013 "Instead:" provides the corrective action
Agent markdown body guardrails NPT-013 Full consequence + alternative improves compliance
Constitutional compliance tables NPT-009 Principle prefix enables traceability audit

The decision rule: if you need to reference a constitutional principle (P-003, P-020, P-022) and the context is governance metadata, use NPT-009. If there is a concrete alternative action the agent should take instead, use NPT-013.

Upgrading Existing Constraints

When you encounter a constraint that looks like this:

NEVER hardcode values.

That is NPT-014 -- the formulation peer-reviewed evidence establishes as the worst option. Here is how to upgrade it.

Step 1: Add specificity and consequence (NPT-009):

NEVER hardcode configuration values in source files -- Consequence: credential exposure
risk; testability failure; CI environment mismatch.

Step 2: Add a constructive alternative (NPT-013):

NEVER hardcode configuration values in source files -- Consequence: credential exposure
risk; testability failure; CI environment mismatch. Instead: use environment variables
via src/shared_kernel/config.py.

Three criteria to check against the finished constraint: the action must be binary-testable (an observer can verify compliance without interpretation), the consequence must name the specific downstream effect (not "quality degrades"), and the alternative must be achievable with the agent's declared tools.


A/B Test Methodology

TASK-025 ran a controlled A/B test to validate the taxonomy findings empirically.

Study Design

Scale: 270 blind invocations across three Claude models (Haiku, Sonnet, Opus).

Design: Matched-pair, three conditions:

Condition Format Example
C1: Positive-only (NPT-007) Always do Y "Always persist output to files"
C2: Blunt prohibition (NPT-014) NEVER X (no context) "NEVER skip file persistence"
C3: Structured negation (NPT-013) NEVER + consequence + alternative "NEVER skip file persistence -- Consequence: artifacts lost on session end. Instead: write to work/ directory."

Constraints tested: 10 representative constraints spanning behavioral timing (H-22), tool restrictions (H-05), architectural boundaries (H-07), and constitutional principles (P-003, P-020, P-022).

Pressure scenarios: Each constraint tested under 3 pressure conditions -- normal operation, mild pressure (competing objectives), and strong pressure (explicit task conflict with the constraint).

Scoring: Binary compliance (pass/fail) per trial. No partial credit. Independent blind scoring with inter-rater agreement verification.

Results Summary
Framing Condition Violation Rate Violations / Trials
C3: Structured negation (NPT-013) 0.0% 0/90
C2: Blunt prohibition (NPT-014) 2.2% 2/90
C1: Positive-only (NPT-007) 7.8% 7/90

Statistical test: McNemar exact test on matched pairs (C1 vs C3).

  • p = 0.016 (significant at alpha = 0.05)
  • Survives Bonferroni correction for 3 pairwise comparisons (adjusted alpha = 0.0167)
  • Effect size: pi_d = 0.078, 95% CI [0.023, 0.133]

Model breakdown:

  • Haiku: largest C1-to-C3 improvement (10 percentage points); only model with C2 violations
  • Sonnet: 2 C1 violations, 0 C2/C3 violations
  • Opus: 1 C1 violation, 0 C2/C3 violations

Constraint breakdown: 67% of all violations (6/9) occurred on behavioral timing constraint H-22. Structured negation eliminated this vulnerability entirely.

A/B Test Go-No-Go Determination


CONDITIONAL GO Verdict

The A/B test reached a CONDITIONAL GO via the pre-specified PG-003 contingency pathway.

Here's what that means: the framing effect is real and statistically significant. The observed effect size (0.078) fell slightly below the pre-registered minimum (0.10), which is why this is conditional rather than unconditional. That gap matters for intellectual honesty -- the data didn't fully clear the bar we set before running the study.

What the data does say clearly: structured negation never performs worse and demonstrably prevents violations. The benefit concentrates on the constraints and conditions where compliance is already at risk -- behavioral timing constraints, lower-capability models, high-pressure scenarios. Those are exactly the cases where you want a reliable constraint format. NPT-013 adoption is justified. The framing is convention-alignment, not a mandate, because the effect size says so.

What the Research Did Not Change

  • Positive framing remains the default. NPT-007 is still the right choice when no specific constraint need exists. The taxonomy does not recommend negative framing as a universal replacement.
  • The quality gate threshold stays at 0.92. The A/B test validated a constraint framing format, not a quality bar change.
  • C1 tasks do not require NPT-013. The CONDITIONAL GO applies to HARD-tier constraints at C2+ criticality. Routine work is unaffected.
  • All changes are reversible. Every ADR was designed for rollback if future evidence contradicts the findings.

Limitations

Two open questions the research surfaced but did not resolve. Both matter for interpreting the findings honestly.

Statistical power note. The study was powered to detect a minimum effect size of pi_d >= 0.10 (n=90 matched pairs per condition). The observed pi_d of 0.078 falls below this threshold, meaning the near-miss does not tell us whether the true population effect exceeds 0.10 -- only that this sample could not detect it at that level.

The open causal question. The A/B test compared structured negation against positive-only framing, but the structured negation format contains more information (consequence + alternative) than the positive-only format. The causal comparison of structured negative framing versus structurally equivalent positive framing -- same information density, same consequence documentation, same specificity -- remains untested. The observed benefit may be attributable to structure and information content rather than negative framing per se. Future work needs to isolate the framing variable from the information variable.

The 60% hallucination claim. The original hypothesis that negative prompting reduces hallucination by 60% entered the project as a testable claim. A systematic search across 75 sources found zero controlled evidence for this specific effect size. The claim is not disproven -- it is simply unestablished. No peer-reviewed study, vendor benchmark, or reproducible experiment supports the 60% figure. It should not be cited as fact.


Implementation in Jerry

The research produced four architecture decision records and five features:

ADR Decision Status
ADR-001 Eliminate all NPT-014 instances; universal upgrade to NPT-009 Unconditional -- evidence is T1+T3
ADR-002 Constitutional constraint upgrades (Phase 5A unconditional, Phase 5B conditional) Phase 5A implemented; Phase 5B completed via PG-003
ADR-003 Routing disambiguation standard with consequence documentation Component A implemented; Component B completed via PG-003
ADR-004 Compaction resilience -- L2 re-injection for Tier B HARD rules Unconditional -- structural gap independent of framing preference
Feature Description
FEAT-001 NPT-014 elimination across rule files (22 of 36 negative constraint instances -- 61% -- used blunt prohibition format; all upgraded)
FEAT-002 Constitutional triplet upgrades in SKILL.md files and agent standards
FEAT-003 Routing disambiguation and consequence documentation across 13 skills
FEAT-004 Compaction resilience: L2 re-injection for H-04 and H-32
FEAT-005 New /prompt-engineering skill with pe-builder, pe-constraint-gen, and pe-scorer agents

ADR-001: NPT-014 Elimination | ADR-002: Constitutional Upgrades | ADR-003: Routing Disambiguation | ADR-004: Compaction Resilience


How This Research Was Conducted

PROJ-014 ran as a six-phase research pipeline followed by a controlled A/B test, with 23 C4 quality gates across the entire project.

Phase Focus Output Quality Gate
Phase 1 Literature survey 75 unique sources across academic, industry, and vendor documentation 3 agent gates (0.950, 0.933, 0.935) + barrier synthesis (0.953)
Phase 2 Claim validation and comparative effectiveness Research question bifurcation; null finding on 60% hallucination claim 2 agent gates (0.959, 0.933) + barrier synthesis (0.950)
Phase 3 Taxonomy development 14-pattern NPT taxonomy (NPT-001 through NPT-014) Agent gate (0.957) + barrier synthesis (0.957)
Phase 4 Jerry Framework application analysis 130 specific upgrade recommendations across 5 domains 5 agent gates (0.950-0.955) + barrier synthesis (0.950)
Phase 5 Architecture decisions 4 ADRs governing framework evolution 4 ADR gates (0.951-0.957) + barrier synthesis (0.956)
Phase 6 Final synthesis Implementation roadmap and consolidated findings Agent gate (0.954) + C4 tournament (0.954)
TASK-025 A/B testing 270 trials, CONDITIONAL GO via PG-003 Go-no-go gate (0.954)

Every quality gate used the S-014 LLM-as-Judge rubric with six weighted dimensions: completeness (0.20), internal consistency (0.20), methodological rigor (0.20), evidence quality (0.15), actionability (0.15), traceability (0.10). All gates met or exceeded 0.92. The Phase 6 C4 tournament executed all 10 adversarial strategies from the strategy catalog.

The pipeline ran via /orchestration with barrier-sync gates between phases -- downstream work could not proceed until the upstream phase cleared the quality gate. That sequencing is why the null finding in Phase 2 (the 60% hallucination claim didn't hold up) didn't contaminate the Phase 3 taxonomy. Each phase built on verified output.

Final Synthesis (Phase 6) | NPT Pattern Catalog | PROJ-014 Work Tracker


References

Key Academic Citations

Citation Finding Relevance
Liu et al., "Instruction Hierarchy Failures in Large Language Models," AAAI 2026 (proceedings) Instruction hierarchy failure under standalone negative constraints Establishes NPT-014 underperformance; T1 evidence that blunt prohibition is the worst formulation
Wen et al., "Structured Constraints for Behavioral Compliance in LLMs," EMNLP 2024 (ACL Anthology) +7.3-8% compliance with structured vs. blunt negative framing Confirms structured > blunt across multiple model families
Barreto & Jana, "Negation Reasoning in Instruction-Following Models," EMNLP 2025 Findings (ACL Anthology) +25.14% negation reasoning accuracy for structured negation Supports negation comprehension (not behavioral compliance); important distinction for interpreting the NPT taxonomy

These three citations anchor the evidence base. The full Phase 1 literature survey covered 75 unique sources across academic venues, industry publications, and vendor documentation -- see the Phase 1 survey output for the complete source inventory.

Full Article

Prompt Engineering SKILL.md | NPT Pattern Reference