The Revision Engine
The Revision Engine
A Platform for Cognitive Prosthesis Narrative Development
Document ID: CNL-TN-2026-010
Version: 1.0
Date: January 25, 2026
Author: Michael P. Hamilton, Ph.D.
Derivation: Extends CNL-TN-2025-022 (The Novelization Engine)
AI Assistance Disclosure: This technical note was developed collaboratively with Claude (Anthropic, claude-opus-4-5-20250514). The platform described herein was built through iterative human-AI collaboration, with Claude contributing to architecture design, code generation, and documentation. The author takes full responsibility for the content, technical decisions, and conclusions.
Abstract
This technical note documents the Revision Engine—a web-based platform for systematic manuscript revision through human-AI collaboration. The platform extends the Novelization Engine methodology (CNL-TN-2025-022) from drafting into revision, implementing quantified diagnostic tools that transform subjective editorial intuition into actionable data. Core innovations include: a four-dimension engagement scoring system with computed aggregate metrics; voice fingerprinting and similarity detection across characters; automated dropout zone identification; visual heatmap interfaces for manuscript-level pattern recognition; and a complete export-process-import workflow for AI-assisted revision with version control. Applied to Hot Water (218,681 words across 101 chapters), the platform enabled identification of 47 dropout zones, quantification of voice blur across 15 characters, and systematic triage of revision priorities. We introduce the term "cognitive prosthesis narrative development" to describe this approach: extending human cognitive capacity for holding entire manuscripts in working memory while tracking consistency, voice, and engagement across novel-scale texts. The platform demonstrates that AI collaboration in creative work may be most valuable not for generating content but for generating diagnostic infrastructure that makes revision tractable at scale.
1. Introduction
1.1 The Revision Problem
The Novelization Engine (CNL-TN-2025-022) documented a methodology for completing long-incubated fiction through structured human-AI collaboration. That methodology addressed the drafting problem: how to synthesize accumulated creative material into a coherent manuscript. A companion problem remained unaddressed: how to revise that manuscript systematically when it exceeds human working memory capacity.
Novel-length fiction presents a cognitive challenge that intensifies during revision. A 60,000-word manuscript contains approximately 200 pages; a 200,000-word trilogy approaches 700 pages. Traditional revision approaches rely on the author's memory, supplemented by notes and multiple reading passes. These methods are vulnerable to several failure modes:
Consistency drift — The author's mental model of the story evolves during revision, introducing new contradictions while attempting to fix old ones.
Voice blur — Character voices converge toward the author's default register as revision homogenizes prose.
Local optimization — Scene-level improvements may degrade manuscript-level pacing or reader engagement patterns.
Revision fatigue — Repeated passes through the same material produce diminishing returns as the author loses fresh perspective.
The Revision Engine addresses these challenges by externalizing diagnostic functions that authors typically perform intuitively. Rather than relying on memory and instinct to identify problem areas, the platform generates quantified metrics that make manuscript-level patterns visible and tractable.
1.2 Cognitive Prosthesis Narrative Development
We introduce the term "cognitive prosthesis narrative development" to describe the approach implemented in this platform. The term draws on Andy Clark and David Chalmers' Extended Mind thesis [1]: cognitive processes extend beyond the brain when external resources are reliably available, automatically endorsed, and easily accessible.
The Revision Engine functions as a cognitive prosthesis in a specific sense: it extends the author's capacity to hold the entire manuscript in working memory while simultaneously tracking multiple dimensions of quality. No human author can maintain awareness of engagement scores, voice consistency, crutch word density, and reader state across 100+ chapters while making local revision decisions. The platform makes this possible by:
- Quantifying subjective editorial intuitions into comparable metrics
- Visualizing manuscript-level patterns that exceed human perception
- Tracking changes and their effects across revision iterations
- Exporting diagnostic context for AI-assisted revision processing
- Importing revised content with version control and staging
The cognitive load distribution model from CNL-TN-2025-022 extends to revision:
| Load Category | Human Contribution | AI Contribution | Platform Output |
|---|---|---|---|
| Quality judgment | Primary | Scoring assistance | Validated metrics |
| Pattern recognition | Manuscript knowledge | Context capacity | Diagnostic visualizations |
| Voice consistency | Ear for authenticity | Profile matching | Voice fingerprints |
| Revision execution | Editorial control | Prose generation | Staged revisions |
| Version control | Approval decisions | Tracking automation | Revision history |
1.3 Relationship to the Novelization Engine
The Novelization Engine (CNL-TN-2025-022) established documentation infrastructure for drafting: living story bible, character templates, reader state tracking, place documentation, and the eleven-component scene schema. The Revision Engine builds on this foundation by adding:
Diagnostic layer — Automated analysis tools that transform prose into quantified metrics
Visualization layer — Heatmap and dashboard interfaces for manuscript-level pattern recognition
Triage layer — Classification systems that convert diagnosis into actionable revision priorities
Workflow layer — Export/import pipeline for AI-assisted revision with version control
The Serialization Engine (CNL-TN-2025-023) subsequently demonstrated that this combined infrastructure produces format-agnostic story systems rather than format-specific manuscripts. The three documents form a trilogy:
- Novelization Engine — How to draft (methodology)
- Revision Engine — How to revise (platform)
- Serialization Engine — What the methodology produces (theory)
1.4 Scope
This technical note covers:
- Theoretical foundations for quantified revision
- Database architecture and schema design
- The four-dimension engagement scoring system
- Voice analysis and fingerprinting
- Heatmap visualization and dropout zone detection
- The triage classification system
- Export-process-import workflow for AI collaboration
- Results from application to Hot Water
- Implications for creative AI applications
2. Theoretical Framework
2.1 The Quantification Principle
Traditional editorial feedback operates in qualitative registers: "this scene feels slow," "the pacing drags in the middle," "these characters sound too similar." Such feedback identifies problems but provides limited guidance for systematic repair. The author must translate qualitative intuition into specific revision decisions through trial and error.
The Revision Engine implements a quantification principle: every qualitative editorial intuition can be decomposed into measurable dimensions that, aggregated, approximate the intuition's signal. "This scene feels slow" might decompose into:
- Low stakes (nothing at risk)
- Low resistance (no conflict)
- Low change (static situation)
- Low question pull (no reason to continue)
Each dimension becomes a 0-3 scale. The aggregate (0-12) provides a comparable engagement score across all scenes in the manuscript. Scenes scoring below threshold become visible revision targets.
This approach does not claim that numbers capture the full complexity of literary quality. Rather, it claims that quantified proxies are useful for identifying patterns that exceed human working memory. The author retains full editorial judgment; the platform surfaces candidates for that judgment to evaluate.
2.2 The Heatmap Metaphor
Thermal imaging reveals temperature patterns invisible to the naked eye. A building inspector uses infrared cameras to identify heat loss through insulation failures. The patterns exist; the technology makes them visible.
The manuscript heatmap applies this metaphor to engagement patterns. Each chapter receives a color-coded cell based on diagnostic metrics. Viewed individually, any chapter might seem acceptable. Viewed as a heatmap, patterns emerge:
- Consecutive red cells indicate dropout zones where readers will quit
- Yellow clusters reveal pacing problems spanning multiple chapters
- Voice scores trending downward suggest character convergence
- Crutch word density peaks identify prose requiring attention
The heatmap makes visible what the author cannot perceive through sequential reading: the manuscript's engagement topography.
2.3 The Triage Model
Emergency medicine developed triage to allocate limited resources efficiently: identify patients who will survive without intervention, patients who cannot be saved, and patients where intervention matters most. Resources flow to the third category.
Manuscript revision benefits from similar prioritization. Not all chapters require equal attention:
KEEP — Scene works. Intervention unnecessary.
TRIM — Cut 20-40%. Specific problems identifiable and fixable.
COMPRESS — Major reduction (50%+). Preserve only essential beats.
CONVERT — Format change required (journal → scene, summary → action).
MERGE — Combine with adjacent material.
DELETE — Remove entirely; relocate essential plot information.
Triage classification emerges from diagnostic data: engagement scores, voice metrics, crutch word density, and structural analysis. The classification provides actionable guidance that sequential reading cannot.
2.4 The Export-Process-Import Cycle
AI language models excel at prose transformation within specified parameters but lack persistent memory across sessions. The Revision Engine addresses this limitation through a structured workflow:
Export — Generate a revision package containing:
- Complete manuscript text with chapter boundaries
- Character voice profiles with vocabulary signatures
- Engagement scores and triage classifications
- Crutch word alerts and revision guidelines
- Voice analysis summary with similarity warnings
Process — Submit to large-context AI (Gemini 1M, Claude) for targeted revision following embedded guidelines
Import — Paste revised content into staging system with:
- Side-by-side diff view against original
- Word count tracking (original → revised → delta)
- Revision history logging
- Publish/unpublish toggle for A/B comparison
This cycle enables AI collaboration at novel scale while maintaining human editorial control over all revision decisions.
3. Platform Architecture
3.1 Technology Stack
The Revision Engine is implemented as a PHP/MySQL web application:
- Server: Apache 2.4 on macOS (Mac Mini M4 Pro)
- Database: MySQL 8.4 with InnoDB storage engine
- Backend: PHP 8.3 with mysqli (no PDO abstraction)
- Frontend: Vanilla HTML/CSS/JavaScript (no framework dependencies)
- CLI Tools: PHP scripts for batch operations and API integration
Design principles:
- Simplicity — No framework overhead; readable, maintainable code
- Direct database access — mysqli prepared statements; phpMyAdmin for administration
- File-based content — Markdown import/export; version-controlled externally
- Progressive enhancement — Core functionality works without JavaScript
3.2 Database Schema Overview
The schema comprises 23 tables organized into functional groups:
Content Tables
parts— Trilogy volumes (SIGNAL, CHRONICLE, ANCESTOR)chapters— Individual chapters with content, metadata, and revision stagingimages— Artwork and illustrationschapter_assets— Image-chapter associations
Diagnostic Tables
chapter_diagnostics— Aggregate metrics per chapterscene_analysis— Scene-level engagement scores and voice metricstic_word_config— Configurable crutch word detectiontic_word_occurrences— Per-chapter crutch word countscrutch_word_totals— Manuscript-level aggregatesdropout_zones— Detected reader abandonment risk areas
Voice Analysis Tables
voice_fingerprints— Character voice signatures per chaptervoice_analysis_summary— Character-level dialogue statisticsvoice_similarity— Pairwise voice confusion detectionchapter_prose_summary— Telling/showing analysis per chapter
Revision Tables
revision_history— Complete revision audit trailchapters.revised_content— Staging field for pending revisionschapters.show_revised— Toggle for A/B content display
Analytics Tables
chapter_views— Privacy-respecting read tracking (IP hashed)subscribers— Email list with double opt-indownloads— EPUB/PDF generation tracking
3.3 Key Schema Innovations
Computed Engagement Score
The scene_analysis table uses a MySQL generated column for engagement scoring:
engagement_score TINYINT UNSIGNED GENERATED ALWAYS AS (
stakes + resistance + change_level + question_pull
) STORED
This ensures score consistency: changing any component automatically updates the aggregate. The STORED attribute enables indexing for efficient sorting and filtering.
Revision Staging
The chapters table implements a dual-content architecture:
content LONGTEXT, -- Original/current content
revised_content LONGTEXT, -- Staged revision (pending)
show_revised TINYINT(1) -- Display toggle (0=original, 1=revised)
This enables:
- Non-destructive revision (original preserved)
- A/B comparison by toggling
show_revised - Incremental publishing (some chapters revised, others not)
- Rollback capability (clear
revised_content, setshow_revised=0)
JSON-Stored Analysis Data
Complex analysis results use JSON columns for flexibility:
telling_instances JSON, -- Array of {text, paragraph}
crutch_words JSON, -- Object {word: count}
dialogue_by_character JSON, -- Object {character: word_count}
voice_bleed JSON, -- Array of {text, paragraph, expected_voice}
JSON storage enables schema evolution without migration and efficient storage of variable-structure data.
Database Views for Common Queries
Four views simplify common access patterns:
v_chapter_heatmap -- Joins chapters, parts, diagnostics for heatmap display
v_triage_queue -- Chapters requiring revision, sorted by priority
v_revision_progress -- Per-part revision completion statistics
v_dropout_zones_active -- Unresolved dropout zones with chapter titles
4. The Engagement Scoring System
4.1 Theoretical Basis
Reader engagement in fiction correlates with specific narrative qualities identifiable at scene level. Drawing on Robert McKee's Story [2] and craft knowledge accumulated through the Novelization Engine collaboration, we identify four orthogonal dimensions:
Stakes — What is at risk if things go wrong?
Resistance — What pushes back against what the character wants?
Change — How different is the situation at the end versus the beginning?
Question Pull — Does the scene ending compel the reader to continue?
Each dimension operates independently: a scene can have high stakes but low resistance, or high change but low question pull. The four-dimension model captures distinct failure modes invisible to single-metric approaches.
4.2 Scoring Rubrics
Each dimension uses a 0-3 scale with explicit anchors:
| STAKES (S) | Score | Definition | Example |
|---|---|---|---|
| 0 | Nothing at stake | Characters chatting, pure exposition | |
| 1 | Mild discomfort | Awkward social moment, minor inconvenience | |
| 2 | Real consequences | Relationship damage, reputation at risk | |
| 3 | Survival or identity | Physical danger, existential threat |
| RESISTANCE (R) | Score | Definition | Example |
|---|---|---|---|
| 0 | Nothing pushes back | Character gets what they want easily | |
| 1 | Internal doubt | Self-questioning, hesitation | |
| 2 | Interpersonal conflict | Disagreement, competing goals | |
| 3 | Active antagonism | Direct opposition, blocked path |
| CHANGE (C) | Score | Definition | Example |
|---|---|---|---|
| 0 | Static | Same mental/physical state as start | |
| 1 | Information gained | Character learns something new | |
| 2 | Decision made | Character commits to a path | |
| 3 | Irreversible shift | Point of no return, permanent change |
| QUESTION PULL (Q) | Score | Definition | Example |
|---|---|---|---|
| 0 | Resolved | Natural stopping point, reader satisfied | |
| 1 | Mild curiosity | Slight interest in what happens next | |
| 2 | Need to know | Unanswered question that nags | |
| 3 | Hooked | Cannot stop reading, must continue |
4.3 Engagement Thresholds
Aggregate scores (0-12) map to reader engagement risk levels:
| Score Range | Classification | Reader Behavior |
|---|---|---|
| 10-12 | Gripping | Cannot put down |
| 7-9 | Solid | Engaged, will continue |
| 4-6 | Vulnerable | At risk of skimming |
| 0-3 | Dropout | Reader will quit |
The minimum viable scene target is 8/12, requiring at least moderate engagement across all four dimensions.
4.4 Scoring Implementation
Engagement scoring operates through two complementary mechanisms:
Manual Scoring (Scene Audit Interface)
The scene-audit.php interface presents chapter content alongside scoring controls. Human scorers evaluate each dimension against the rubric, with scores saved to scene_analysis. This approach provides highest accuracy but requires significant time investment.
AI-Assisted Scoring (CLI Tool)
The score-scenes.php CLI tool submits chapters to Claude API with embedded scoring rubrics and voice profiles. The AI returns JSON-structured scores that populate scene_analysis. This approach enables batch processing of entire manuscripts.
php score-scenes.php --model=sonnet # Score all unscored
php score-scenes.php --model=opus --chapter=42 # Score specific chapter
php score-scenes.php --model=sonnet --rescore # Re-score everything
AI scoring includes linguistic analysis beyond engagement:
- Voice distinctiveness (1-5)
- Profile adherence (1-5)
- Telling instances with locations
- Crutch word counts
- Dialogue distribution by character
- Round-robin monologue detection
4.5 Score Aggregation
Chapter-level diagnostics aggregate from scene-level scores:
UPDATE chapter_diagnostics SET
avg_engagement_score = (SELECT AVG(engagement_score) FROM scene_analysis WHERE chapter_id = ?),
min_engagement_score = (SELECT MIN(engagement_score) FROM scene_analysis WHERE chapter_id = ?),
max_engagement_score = (SELECT MAX(engagement_score) FROM scene_analysis WHERE chapter_id = ?)
WHERE chapter_id = ?
The avg_engagement_score drives heatmap coloring; min_engagement_score identifies chapters with weak sections even if average is acceptable.
5. Voice Analysis System
5.1 The Voice Blur Problem
Character voice is among the most difficult qualities to maintain across novel-length fiction. Each character should sound distinct through vocabulary choices, sentence patterns, and metaphor domains. In practice, characters often converge toward the author's default voice, particularly during revision when the author is focused on other concerns.
The Hot Water manuscript presents acute voice challenges:
- Six primary POV characters (Amara, David, Susan, Margaret, Jennifer, Starseed)
- Three temporal strands (modern, Darwin 1830s, Pictish 570 CE)
- Technical vocabularies spanning physics, geology, biology, engineering, archaeology
- 15+ speaking characters requiring distinct voices
5.2 Voice Profiles
Each major character receives a documented voice profile specifying:
Vocabulary Signature — Domain-specific terms the character naturally uses:
- Amara: tolerances, load-bearing, thermal gradient, structural integrity
- David: coherence, eigenstate, superposition, probability amplitude
- Susan: taxonomy, phylogeny, adaptation, ecological niche
- Margaret: crystalline matrix, stratification, metamorphic, grain boundaries
Sentence Patterns — Characteristic syntactic structures:
- Amara: Direct and declarative, precise but warm
- David: Short questioning sentences, trails off when confused
- Susan: Patient observation building to insight
- Margaret: Scottish rhythm, unhurried geological perspective
Metaphor Domains — Where the character draws comparisons:
- Amara: Architecture, materials science, electrical systems
- David: Physics, mathematics, uncertainty
- Susan: Evolution, organic systems, deep time
- Margaret: Rocks, minerals, slow processes
5.3 Voice Metrics
The scoring system evaluates voice quality on two dimensions:
| Voice Distinctiveness (1-5) — Does the POV character sound distinct from other characters? | Score | Definition |
|---|---|---|
| 1 | Indistinguishable from other characters | |
| 2 | Occasional distinctive moments | |
| 3 | Moderately distinct voice | |
| 4 | Clearly recognizable voice | |
| 5 | Unmistakably unique |
| Profile Adherence (1-5) — Does the character use vocabulary and metaphors from their profile? | Score | Definition |
|---|---|---|
| 1 | None of expected vocabulary present | |
| 2 | Rare use of profile vocabulary | |
| 3 | Moderate use of profile vocabulary | |
| 4 | Strong use of profile vocabulary | |
| 5 | Vocabulary fully consistent with profile |
5.4 Voice Analysis Aggregation
The aggregate-voice-analysis.php CLI tool processes scored scenes to generate manuscript-level voice statistics:
Dialogue Distribution — Word count and percentage by character:
| Character | Words | % | Distinctiveness | Adherence |
|-----------|---------|-------|-----------------|-----------|
| David | 19,581 | 21.1% | 2.8 | 2.6 |
| Susan | 12,226 | 13.2% | 2.7 | 2.5 |
| Margaret | 11,538 | 12.5% | 3.6 | 3.4 |
Voice Similarity Warnings — Character pairs with high confusion risk:
| Character A | Character B | Similarity | Shared Crutch Words |
|-------------|-------------|------------|---------------------|
| David | Susan | 0.72 | pattern, structure |
Prose Problem Scores — Chapters ranked by telling constructions, restatements, and voice issues
5.5 Voice Bleed Detection
AI scoring identifies specific instances where POV characters use vocabulary belonging to other characters:
"voice_bleed": [
{
"text": "crystalline matrix",
"paragraph": 12,
"expected_voice": "Margaret"
}
]
When Susan (biologist) thinks in geological vocabulary, the scene needs revision to restore voice integrity.
6. Heatmap Visualization
6.1 Design Principles
The heatmap interface (heatmap.php) transforms diagnostic data into a visual representation enabling manuscript-level pattern recognition. Design principles:
Color Psychology — Engagement maps to intuitive thermal scale:
- Green (10-12): Cool/safe, no intervention needed
- Yellow (7-9): Warm/caution, monitor but acceptable
- Orange (4-6): Hot/warning, revision candidate
- Red (0-3): Critical/danger, priority revision target
Information Density — Each cell encodes multiple dimensions:
- Background color: Engagement score
- Text: Chapter title (truncated)
- Badge: Triage action (if assigned)
- Icon: Revision status (pending/revised/published)
Spatial Organization — Chapters ordered by reading sequence:
- Grouped by Part (SIGNAL, CHRONICLE, ANCESTOR)
- Sorted by display_order within Part
- Visual breaks between Parts
6.2 Heatmap Data Structure
The v_chapter_heatmap view joins necessary tables:
CREATE VIEW v_chapter_heatmap AS
SELECT
c.id, c.title, c.chapter_type, c.word_count, c.display_order,
p.id AS part_id, p.name AS part_name, p.display_order AS part_order,
COALESCE(cd.avg_engagement_score, 0) AS engagement_score,
COALESCE(cd.overall_skim_risk, 'medium') AS skim_risk,
COALESCE(cd.avg_exposition_density, 0) AS exposition_density,
COALESCE(cd.total_tic_words, 0) AS tic_words,
cd.triage_action, cd.triage_priority
FROM chapters c
JOIN parts p ON c.part_id = p.id
LEFT JOIN chapter_diagnostics cd ON c.id = cd.chapter_id
ORDER BY p.display_order, c.display_order
6.3 Dropout Zone Detection
The system automatically identifies contiguous sequences of low-engagement chapters:
function detectDropoutZones($conn, $threshold = 5, $minLength = 2) {
// Query chapters ordered by position
// Identify sequences where avg_engagement_score < threshold
// Return zones with: start_chapter, end_chapter, scene_count,
// avg_score, diagnosis, recommendation
}
Detected zones are classified by severity:
- Warning: 2 consecutive low-engagement chapters
- Danger: 3-4 consecutive low-engagement chapters
- Critical: 5+ consecutive low-engagement chapters
Zone types capture specific patterns:
consecutive_low_engagement: Generic dropout riskexposition_cluster: Multiple explanation-heavy chaptersvoice_blur: Character distinctiveness decliningpacing_stall: Static scenes without changejournal_sequence: Multiple document-type chapters
6.4 Statistics Dashboard
The heatmap page displays aggregate statistics:
Total Chapters: 101
Total Words: 218,681
Scored: 89/101 (88%)
Avg Engagement: 6.4/12
Dropout Zones: 4 active
Revision Progress:
- Revised: 23
- Published: 18
- Pending: 60
These metrics provide manuscript health indicators at a glance.
7. Triage System
7.1 Triage Actions
Six triage actions classify revision requirements:
| Action | Definition | Target Reduction |
|---|---|---|
| KEEP | Scene works; preserve structure and length | 0% |
| TRIM | Cut without losing essential content | 20-40% |
| COMPRESS | Major reduction; preserve only key beats | 50%+ |
| CONVERT | Change format entirely | Variable |
| MERGE | Combine with adjacent material | Consolidation |
| DELETE | Remove entirely; relocate plot info | 100% |
7.2 Triage Assignment
Triage actions can be assigned through:
Manual Assignment — Scene audit interface includes triage dropdown and notes field
AI-Assisted Assignment — Scoring prompt requests triage recommendation based on engagement metrics and prose analysis
Algorithmic Rules — Automated assignment based on thresholds:
if ($engagement_score <= 3) {
$triage_action = 'delete';
} elseif ($engagement_score <= 5 && $exposition_density > 0.3) {
$triage_action = 'compress';
} elseif ($tic_word_count > 20) {
$triage_action = 'trim';
}
7.3 Triage Queue
The v_triage_queue view surfaces chapters requiring attention:
CREATE VIEW v_triage_queue AS
SELECT c.id, c.title, c.word_count, p.name AS part_name,
cd.avg_engagement_score, cd.triage_action, cd.triage_priority
FROM chapters c
JOIN parts p ON c.part_id = p.id
LEFT JOIN chapter_diagnostics cd ON c.id = cd.chapter_id
WHERE cd.triage_action IS NOT NULL
AND cd.triage_action != 'keep'
ORDER BY cd.triage_priority ASC, cd.avg_engagement_score ASC
The queue prioritizes:
- Critical triage actions (delete, compress)
- Lower engagement scores within action category
- Earlier chapters (reader reaches them first)
7.4 Triage Notes
Each triage assignment includes explanatory notes:
Scene 42 (triage: COMPRESS): "Three consecutive monologues without
conflict. David and Susan explain the same discovery to each other
that readers already understand. Keep only Margaret's reaction and
the final decision to proceed."
Notes provide revision guidance beyond the action classification.
8. Export-Process-Import Workflow
8.1 Revision Package Export
The export-revision-package.php CLI tool generates a comprehensive revision package:
php export-revision-package.php # Full export
php export-revision-package.php --part=2 # Part 2 only
php export-revision-package.php --triage-only # Just manifest
The package includes:
Header Section
- Generation timestamp
- Total chapters and word count
- Export parameters
Voice Differentiation Section
- Complete voice profiles for all POV characters
- Vocabulary signatures
- Sentence patterns
- Metaphor domains
Crutch Word Alert
- High-frequency terms (100+ occurrences)
- Medium-frequency terms (25-99 occurrences)
- Phrase patterns to eliminate
Revision Guidelines
- Triage action definitions
- Common problems to fix (with detection flags)
- Engagement score targets
- Voice score targets
Voice Analysis Summary
- Dialogue distribution by character
- Voice similarity warnings
- Chapters with worst prose problem scores
Triage Manifest
- Per-chapter: engagement score, triage action, triage notes, voice scores
Full Manuscript Content
- Chapter headers with metadata
- Complete text in Markdown format
- Scene boundaries where applicable
8.2 AI Processing
The revision package is designed for large-context language models:
Gemini 1M Context — Can process entire trilogy (218K words) plus guidelines
Claude with Projects — Revision guidelines as project knowledge; chapters processed individually or in batches
The embedded guidelines provide the AI with:
- What to fix (triage actions)
- How to fix it (revision guidelines)
- What to preserve (voice profiles)
- What to eliminate (crutch words)
8.3 Revision Import
The revision-import.php interface handles revised content:
Chapter Selection
- Dropdown listing all chapters with triage status
- Color-coded by revision state (pending/revised/published)
Side-by-Side View
- Original content (left panel)
- Revised content or input area (right panel)
- Rendered Markdown for readability
Word Count Tracking
- Original word count
- Revised word count
- Delta (positive or negative)
- Real-time calculation as content is pasted
Revision Actions
- Save to Staging — Store in
revised_contentwithout publishing - Publish Revision — Set
show_revised = 1to display revised version - Unpublish — Revert to original (
show_revised = 0) - Clear Revision — Delete staged content entirely
Revision History
- Timestamped log of all revisions
- Original and revised content preserved
- Word delta tracked
- Notes field for revision description
8.4 Version Control for Prose
The staging system enables sophisticated version control:
A/B Testing — Toggle show_revised to compare reader engagement between original and revised versions
Incremental Publishing — Some chapters can show revised content while others remain original
Rollback Capability — Original content always preserved; can revert any chapter
Audit Trail — Complete history of revisions with timestamps and word deltas
9. Results
9.1 Application to Hot Water
The Revision Engine was developed iteratively alongside the Hot Water manuscript:
| Metric | Value |
|---|---|
| Total chapters | 101 |
| Total words | 218,681 |
| Parts | 3 (SIGNAL, CHRONICLE, ANCESTOR) |
| Chapters scored | 89 (88%) |
| Average engagement | 6.4/12 |
| Dropout zones detected | 47 |
| Characters with voice data | 15 |
| Revision history entries | 156 |
9.2 Diagnostic Findings
Engagement Distribution
- Gripping (10-12): 12 chapters (12%)
- Solid (7-9): 34 chapters (34%)
- Vulnerable (4-6): 31 chapters (31%)
- Dropout (0-3): 12 chapters (12%)
- Unscored: 12 chapters (12%)
Triage Classification
- KEEP: 23 chapters
- TRIM: 31 chapters
- COMPRESS: 18 chapters
- CONVERT: 8 chapters
- DELETE: 4 chapters
- Unassigned: 17 chapters
Voice Analysis
- Highest distinctiveness: ARCHIE (AI character), 4.0/5
- Lowest distinctiveness: David, 2.8/5
- Highest similarity pair: David/Susan, 0.72
9.3 Pattern Discovery
The heatmap revealed patterns invisible to sequential reading:
Journal Cluster Problem — Four consecutive journal chapters in Part 2 created a documentation slog. Triage: convert two to dramatized scenes, merge two others.
Darwin Interlude Pacing — Darwin sections consistently scored higher engagement than modern timeline. The contrast highlighted modern sections needing more conflict.
Voice Convergence in Act 3 — Distinctiveness scores declined in final chapters as revision pressure increased. Systematic voice restoration required.
Exposition Front-Loading — First chapters of each Part scored lower due to setup exposition. Structural revision to embed exposition in conflict.
9.4 Revision Workflow Results
The export-process-import workflow enabled:
- Batch processing of 12 chapters in single Gemini session
- Consistent application of voice profiles across revisions
- Word count reduction of 23% in targeted chapters
- Voice distinctiveness improvement averaging 0.8 points post-revision
10. Implementation Details
10.1 Text Analysis Functions
The heatmap_functions.php library provides core analysis utilities:
Exposition Density
function calcExpositionDensity($text) {
$markers = ['which means', 'in other words', 'because',
'therefore', 'essentially', 'basically'];
$sentenceCount = countSentences($text);
$markerCount = 0;
foreach ($markers as $marker) {
$markerCount += substr_count(strtolower($text), $marker);
}
return round($markerCount / $sentenceCount, 3);
}
Dialogue Ratio
function calcDialogueRatio($text) {
// Extract content within quotation marks
preg_match_all('/[""]([^""]+)[""]/u', $text, $matches);
$dialogueWords = str_word_count(implode(' ', $matches[1]));
$totalWords = str_word_count(strip_tags($text));
return round($dialogueWords / $totalWords, 3);
}
Tic Word Detection
function countTicWords($text, $ticWords) {
$results = [];
foreach ($ticWords as $word) {
$pattern = '/\b' . preg_quote($word, '/') . '\b/i';
preg_match_all($pattern, $text, $matches);
if (count($matches[0]) > 0) {
$results[$word] = count($matches[0]);
}
}
return $results;
}
10.2 API Integration
The score-scenes.php tool integrates with Claude API:
function scoreChapterWithAPI($content, $model, $voiceProfiles) {
$prompt = buildScoringPrompt($content, $voiceProfiles);
$response = callClaudeAPI([
'model' => MODELS[$model],
'max_tokens' => MAX_TOKENS,
'messages' => [
['role' => 'user', 'content' => $prompt]
]
]);
return parseJSONResponse($response);
}
The prompt includes:
- Voice profiles for all characters
- Scoring rubrics with explicit anchors
- Linguistic analysis instructions
- JSON output format specification
10.3 Privacy-Respecting Analytics
Reader engagement tracking uses IP hashing:
function hashIP($ip) {
return hash('sha256', $ip . date('Y-m')); // Monthly rotation
}
This enables:
- Unique visitor counting without storing IPs
- Reading pattern analysis without individual tracking
- Monthly hash rotation limits tracking duration
- No personally identifiable information stored
11. Discussion
11.1 Cognitive Prosthesis Effectiveness
The Revision Engine demonstrates that systematic diagnostic infrastructure extends human cognitive capacity for manuscript revision. Specific benefits:
Pattern Visibility — The heatmap reveals engagement topography invisible to sequential reading. Authors can see the manuscript as readers experience it rather than as they wrote it.
Quantified Prioritization — Triage classification converts intuitive "this needs work" into actionable priority queues. Limited revision time flows to highest-impact chapters.
Voice Maintenance — Profile-based voice analysis catches convergence before it becomes endemic. Revision can strengthen distinctiveness rather than homogenize further.
Systematic Workflow — The export-process-import cycle enables AI collaboration at scale while maintaining human editorial control. The author directs; the system executes.
11.2 Limitations
Single-Author Development — The platform was built for a specific author's workflow. Generalization to other authors remains untested.
Scoring Subjectivity — Engagement dimensions are proxies for reader experience. Scores reflect systematic approximation, not ground truth.
AI Dependency — Batch scoring requires API access and incurs costs. Voice analysis quality depends on model capability.
Integration Overhead — The full platform requires MySQL database, PHP server, and CLI access. Simpler tools might serve authors with different technical backgrounds.
11.3 Implications for Creative AI
The Revision Engine suggests that AI collaboration in creative work may be most valuable for infrastructure generation rather than content generation:
- Diagnostic systems that quantify editorial intuition
- Visualization tools that reveal manuscript-level patterns
- Workflow automation that maintains human control
- Version control that enables experimentation without risk
This complements the Novelization Engine finding that AI collaboration produces format-agnostic story systems. Together, the engines demonstrate a paradigm: AI as cognitive prosthesis for creative work, extending human capacity rather than replacing human judgment.
12. Conclusion
The Revision Engine provides a platform for systematic manuscript revision through human-AI collaboration. Core innovations include:
-
Four-dimension engagement scoring with computed aggregates enabling manuscript-level comparison
-
Voice fingerprinting with similarity detection preventing character convergence
-
Heatmap visualization revealing engagement patterns invisible to sequential reading
-
Triage classification converting diagnosis into prioritized revision queues
-
Export-process-import workflow enabling AI collaboration while maintaining editorial control
-
Version control for prose supporting A/B testing, incremental publishing, and rollback
Applied to Hot Water (218,681 words, 101 chapters), the platform enabled identification of 47 dropout zones, quantification of voice metrics across 15 characters, and systematic triage of revision priorities.
The central finding: cognitive prosthesis narrative development—extending human cognitive capacity for holding entire manuscripts in working memory while tracking multiple quality dimensions—makes tractable what would otherwise exceed human capability. The Revision Engine demonstrates that AI collaboration adds most value not by generating content but by generating infrastructure that makes human revision systematic at novel scale.
References
[1] Clark, A. & Chalmers, D. (1998). "The Extended Mind." Analysis, 58(1), 7-19.
[2] McKee, R. (1997). Story: Substance, Structure, Style, and the Principles of Screenwriting. ReganBooks.
[3] Hamilton, M.P. (2025). "The Novelization Engine: A Methodology for AI-Augmented Long-Form Fiction Development." Canemah Nature Laboratory Technical Note CNL-TN-2025-022. https://canemah.org/archive/document.php?id=CNL-TN-2025-022
[4] Hamilton, M.P. (2025). "The Serialization Engine: A Generalized Framework for Format-Agnostic Story System Development." Canemah Nature Laboratory Technical Note CNL-TN-2025-023. https://canemah.org/archive/document.php?id=CNL-TN-2025-023
[5] Hamilton, M.P. (2025). "The Cognitive Prosthesis: Writing, Thinking, and the Observer Inside the Observation." Coffee with Claude. https://coffeewithclaude.com/post.php?slug=the-cognitive-prosthesis-writing-thinking-and-the-observer-inside-the-observation
[6] Gardner, J. (1983). The Art of Fiction: Notes on Craft for Young Writers. Vintage Books.
[7] Flower, L. & Hayes, J.R. (1981). "A Cognitive Process Theory of Writing." College Composition and Communication, 32(4), 365-387.
Appendix A: Database Schema Reference
A.1 Core Tables
-- Chapter diagnostics (aggregate metrics)
CREATE TABLE chapter_diagnostics (
chapter_id INT PRIMARY KEY,
scene_count INT DEFAULT 1,
avg_engagement_score FLOAT,
min_engagement_score TINYINT,
max_engagement_score TINYINT,
overall_skim_risk ENUM('low','medium','high','critical'),
voice_blur_detected TINYINT(1) DEFAULT 0,
total_tic_words INT DEFAULT 0,
triage_action ENUM('keep','trim','compress','convert','merge','delete'),
triage_priority INT,
revision_status ENUM('pending','revised','deleted','skipped') DEFAULT 'pending',
last_analyzed DATETIME,
last_scored DATETIME,
last_revised DATETIME
);
-- Scene-level analysis
CREATE TABLE scene_analysis (
id INT PRIMARY KEY AUTO_INCREMENT,
chapter_id INT NOT NULL,
scene_number INT DEFAULT 1,
word_count INT DEFAULT 0,
timeline_strand ENUM('modern','darwin','pictish','omniscient'),
pov_character VARCHAR(100),
stakes TINYINT UNSIGNED DEFAULT 0,
resistance TINYINT UNSIGNED DEFAULT 0,
change_level TINYINT UNSIGNED DEFAULT 0,
question_pull TINYINT UNSIGNED DEFAULT 0,
engagement_score TINYINT UNSIGNED GENERATED ALWAYS AS (
stakes + resistance + change_level + question_pull
) STORED,
voice_distinctiveness TINYINT,
profile_adherence TINYINT,
triage_action ENUM('keep','trim','compress','convert','merge','delete'),
triage_notes TEXT,
telling_instances JSON,
crutch_words JSON,
dialogue_by_character JSON,
voice_bleed JSON,
scored_at DATETIME,
scored_by VARCHAR(50)
);
A.2 Voice Analysis Tables
-- Per-character dialogue statistics
CREATE TABLE voice_analysis_summary (
id INT PRIMARY KEY AUTO_INCREMENT,
character_name VARCHAR(100) NOT NULL,
total_dialogue_words INT DEFAULT 0,
dialogue_percentage FLOAT,
scene_count INT DEFAULT 0,
avg_voice_distinctiveness FLOAT,
avg_profile_adherence FLOAT,
avg_sentence_length FLOAT,
updated_at DATETIME
);
-- Character pair similarity
CREATE TABLE voice_similarity (
id INT PRIMARY KEY AUTO_INCREMENT,
character_a VARCHAR(100) NOT NULL,
character_b VARCHAR(100) NOT NULL,
similarity_score FLOAT,
shared_crutch_words TEXT,
calculated_at DATETIME
);
Appendix B: CLI Tool Reference
B.1 Scoring Tool
# Score all unscored scenes with Sonnet
php score-scenes.php --model=sonnet
# Score specific chapter with Opus
php score-scenes.php --model=opus --chapter=42
# Re-score all scenes (overwrite existing)
php score-scenes.php --model=sonnet --rescore
# Score only Part 1
php score-scenes.php --model=sonnet --part=1
# Compare models on same chapter
php score-scenes.php --compare --chapter=42
B.2 Voice Aggregation Tool
# Full aggregation with console report
php aggregate-voice-analysis.php
# Report only (no database update)
php aggregate-voice-analysis.php --report
# JSON output for external processing
php aggregate-voice-analysis.php --json
B.3 Export Tool
# Full manuscript export
php export-revision-package.php
# Single part export
php export-revision-package.php --part=2
# Triage manifest only
php export-revision-package.php --triage-only
# Custom output filename
php export-revision-package.php --output=hot-water-v2.md
Appendix C: Engagement Score Quick Reference
C.1 Scoring Rubric Summary
| Dimension | 0 | 1 | 2 | 3 |
|---|---|---|---|---|
| Stakes | Nothing | Mild discomfort | Real consequences | Survival/identity |
| Resistance | Nothing | Internal doubt | Interpersonal | Active antagonism |
| Change | Static | Info gained | Decision made | Irreversible |
| Question Pull | Resolved | Mild curiosity | Need to know | Hooked |
C.2 Engagement Thresholds
| Score | Classification | Color | Action |
|---|---|---|---|
| 10-12 | Gripping | Green | Keep |
| 7-9 | Solid | Yellow | Monitor |
| 4-6 | Vulnerable | Orange | Revise |
| 0-3 | Dropout | Red | Priority |
Document History
| Version | Date | Changes |
|---|---|---|
| 1.0 | 2026-01-25 | Initial release |
End of Technical Note
Permanent URL: https://canemah.org/archive/document.php?id=CNL-TN-2026-010
Cite This Document
BibTeX
Permanent URL: https://canemah.org/archive/document.php?id=CNL-TN-2026-010