CNL-TN-2026-046 Technical Note

The STRATA Substrate

Michael P. Hamilton , Ph.D.
Published: April 8, 2026 Version: 1

The STRATA Substrate

Continuous Ecological Context Architecture

Canemah Nature Laboratory Technical Note Series

Document ID: CNL-TN-2026-046 Version: 0.1 (Draft) Date: April 8, 2026 Author: Michael P. Hamilton, Ph.D. Affiliation: Canemah Nature Laboratory, Oregon City, Oregon


AI Assistance Disclosure: This technical note was developed collaboratively with Claude (Anthropic, claude-opus-4-6) via Cowork. Claude contributed to architectural analysis, system design, and document drafting based on extended conversation about the evolution of STRATA's intelligence pipeline. The author takes full responsibility for the content, accuracy, and conclusions.


Abstract

STRATA 2.0's Observatory and Intelligence dashboards currently generate all sensor summaries, narrative text, and ecological context on demand -- executing 30-50 database queries per page load, discarding the results when the page closes, and rebuilding everything from scratch on the next request. The Chat interface compounds this by dumping the entire sensor state into an LLM system prompt as unstructured text. This pull-on-demand architecture works at development scale but cannot sustain real-time ecological awareness, efficient AI conversations, or the always-on pattern detection required by SOMA.

This technical note specifies the Substrate -- a continuously maintained ecological context layer that keeps a current, structured, curated picture of the world ready to serve. Modeled on the same launchd collector pattern that already gathers raw sensor data on Galatea, the Substrate operates as a second-tier process: collectors deposit raw readings into the macroscope database; the Substrate reads those readings and maintains derived intelligence state. Observatory, Intelligence, Chat, and SOMA become consumers of maintained state rather than independent query engines.

The Substrate maintains three layers of context: Place Identity (from MNG's ecological address model), Sensor State (per-platform summaries refreshed on natural data rhythms), and Ecological Frame (species detection states, phenological context, seasonal norms). Together these layers provide the structured, curated context that STRATA IQ requires for place-based ecological intelligence.

The architecture positions the Substrate as the foundation for convergence between MNG's spatial identity framework and STRATA's temporal intelligence framework, and as the sensory input layer for SOMA's pattern detection meshes.


1. The Problem

1.1 Pull-on-Demand Intelligence

The current STRATA 2.0 architecture generates intelligence reactively. When a user loads the Observatory page, the system:

  1. Calls discover_domain_platforms() to find active sensor platforms across four domains
  2. Loads sensor definitions from 13 registry files via getPlatformSensors()
  3. Executes 13 micro-agent summary functions, each running 2-8 database queries
  4. Builds narrative text and key readings arrays
  5. Renders the page
  6. Discards all computed state

The Intelligence dashboard repeats the identical pipeline. The Chat interface calls temporal_state.php, which runs the same pipeline a third time, serializes the results as a flat text block, and passes it to the browser. The browser then sends this text block back to the server with every chat message, where it gets prepended to the LLM system prompt as unstructured data.

On a typical page load, this means 30-50 database queries, 13 summary computations, and complete context assembly -- all thrown away when the tab closes.

1.2 The Firehose Problem

The Chat interface suffers from an additional architectural weakness: context quality. The temporal_state.php endpoint dumps every platform's summary into a single text block with no curation, no trust metadata, no ecological framing, and no semantic structure. The LLM receives 26 platform summaries whether the user's question requires one or all of them. There is no distinction between a calibrated Tempest weather station and an uncalibrated BirdWeather PUC environmental sensor. There is no temporal species framing -- a bird detected 10 minutes ago and one detected 6 months ago appear with equal weight.

The hallucination tests documented in CNL-TN-2026-045 demonstrated that LLMs require explicit geographic grounding and analytical guidelines to avoid fabricating context. The current firehose approach provides data but not the interpretive structure that prevents hallucination.

1.3 The Always-On Gap

SOMA's pattern detection architecture (CNL-TN-2026-043) requires continuous access to current ecological state across all four domains. RBM meshes cannot query the database on demand -- they need a maintained state representation that updates on natural data rhythms and provides structured input for energy landscape computation. The pull-on-demand architecture offers no persistent state for SOMA to consume.

1.4 The Collector Precedent

Galatea already runs a fleet of launchd-managed collector processes that continuously gather raw sensor data from external APIs (Tempest, Ecowitt, BirdWeather, AmbientWeather, AirLink, Airthings, iNaturalist). These collectors operate on independent schedules matched to each data source's natural update rhythm. They write to the macroscope database and require no user interaction. The architecture is proven, operationally stable, and well-understood.

The Substrate applies this same pattern to derived intelligence: a background process that reads what the collectors have deposited and maintains a continuously current, structured context layer.


2. The Substrate Architecture

2.1 Design Principles

Low simmer, not full boil. The Substrate should impose minimal processor overhead. Most cycles it checks freshness thresholds, finds nothing stale, and sleeps. Only changed data triggers recomputation.

API call conservation. The Substrate performs almost entirely local database reads. External API calls (Open-Meteo, BirdWeather network) are handled by existing collectors or cached aggressively with long TTLs. Place identity data (bioregion, watershed, climate classification) changes on the scale of years, not minutes.

Natural rhythms. Each data type has a natural update frequency. Weather stations push every few minutes. Bird detections accumulate over hours. Health metrics arrive daily. Clinical data changes monthly. The Substrate respects these rhythms rather than polling everything at a single interval.

Ready to serve. When Observatory loads, Chat starts, or SOMA wakes, the context is already assembled. Consumers read maintained state; they do not trigger computation.

2.2 Three-Layer Context Model

The Substrate maintains three layers of ecological context, each with distinct update rhythms and data sources:

Layer 1: Place Identity (refresh: daily or on-demand)

Sourced from macroscope_nexus via MNG's ecological address model. For each monitored site, this layer maintains:

  • Geographic coordinates, elevation, aspect
  • Bioregion and watershed classification
  • Climate type (Koppen) and seasonal norms
  • Land use history and habitat description
  • Site relationships (Canemah and Owl Farm are both in Oregon City; do not invent geographic separation)

This is the stable frame that prevents hallucination. It changes rarely but provides the interpretive foundation for all other data.

Layer 2: Sensor State (refresh: per-platform rhythm)

For each active sensor platform, this layer maintains:

  • Current readings summary (narrative text + structured key readings)
  • Freshness timestamp and staleness classification
  • Platform trust metadata: calibration status, context role, known limitations
  • Domain assignment (EARTH/LIFE/HOME/SELF)

Freshness thresholds by data type:

Data Type Typical Source Refresh Threshold Notes
Weather Tempest, Ecowitt, AmbientWeather 5 minutes Matches collector intervals
Indoor air quality Airthings, AirLink 10 minutes Sensor reporting interval
Bird detections BirdWeather 15 minutes Accumulation window
Biodiversity iNaturalist 1 hour Community observation lag
Health metrics Apple Health, Withings 6 hours Daily sync cadence
Clinical data Manual entry 24 hours Rarely changes
Place identity MNG ecological address 24 hours Nearly static

Layer 3: Ecological Frame (refresh: hourly with event-driven updates)

The interpretive layer that provides temporal and ecological context for raw data:

  • Species detection states: active (detected now), recent (last 24h), seasonal (expected for current season), historical (recorded at site but not recently), expected-but-absent (should be here based on range/season but not detected)
  • Phenological markers: migration status, breeding season indicators, seasonal transitions
  • Cross-domain correlations: weather conditions that explain detection patterns
  • Anomaly flags: readings or patterns outside seasonal norms (SOMA integration point)

2.3 Persistence Model

The Substrate writes to a substrate_state table in strata_db:

substrate_state
  id              INT AUTO_INCREMENT PRIMARY KEY
  site_id         INT                -- FK to macroscope_nexus sites
  layer           ENUM('place','sensor','frame')
  component_key   VARCHAR(100)       -- e.g., 'tempest_12', 'species_frame', 'place_identity'
  content_json    JSON               -- structured state for application queries
  content_text    TEXT               -- pre-rendered text for chat context assembly
  freshness       DATETIME           -- when this component was last refreshed
  stale_after     INT                -- seconds until this component needs refresh
  updated_at      DATETIME

Additionally, the Substrate maintains a pre-assembled context document per site:

substrate_context
  id              INT AUTO_INCREMENT PRIMARY KEY
  site_id         INT
  context_type    ENUM('full','chat','soma')
  content         TEXT               -- assembled context document
  token_estimate  INT                -- approximate token count for LLM budgeting
  updated_at      DATETIME

The chat context type contains the three-layer structured document optimized for LLM system prompts. The soma context type contains the numerical state vector optimized for RBM mesh input. The full context type contains the complete state for Observatory and Intelligence rendering.

2.4 Process Architecture

A single PHP process, managed by launchd on Galatea (prototyped on Data):

com.strata.substrate.plist
  StartInterval: 60          -- wake every 60 seconds
  RunAtLoad: true             -- backfill on boot

Each cycle:

  1. Query substrate_state for components where NOW() > freshness + stale_after
  2. For each stale component, regenerate from source data
  3. Write updated component back to substrate_state
  4. If any components changed, regenerate affected substrate_context documents
  5. Log cycle time and components refreshed

Most cycles complete in under a second (nothing stale, nothing to do). Weather updates trigger recomputation every 5 minutes. Species frame updates every hour. Place identity daily.

2.5 Consumer Integration

Observatory: Replace the 50-query pipeline with reads from substrate_state where layer = 'sensor' and site_id = ?. Domain headlines come from the maintained state. Platform cards render from cached summaries. Page load drops to 4-8 queries (one per domain section plus metadata).

Intelligence: Same pattern. Agent cards render from maintained summaries. The "Generated" timestamp reflects when the Substrate last refreshed each component, not when the page loaded.

Chat: Load the pre-assembled substrate_context document (type = 'chat') for the active site. Token count is pre-computed for model selection and context budgeting. No temporal_state.php call, no firehose dump. The context is structured, curated, and current.

SOMA: Consume the substrate_context (type = 'soma') as input to RBM mesh updates. The Substrate provides exactly the state vector SOMA needs, refreshed on natural data rhythms. SOMA's job becomes pattern detection on maintained state, not data gathering.


3. Place-Centric Observatory

3.1 From Domain-Centric to Place-Centric

The current Observatory organizes data by domain: EARTH tab, LIFE tab, HOME tab, SELF tab. This makes sense for a single-site prototype, but it inverts the natural hierarchy. A field ecologist thinks "what is happening at Canemah right now?" not "what are all the EARTH readings across all sites?"

The Substrate enables a place-centric Observatory where the primary navigation is by site, and domains organize the data within each site's view. Canemah Nature Lab has its own Observatory panel; Owl Farm has its own. Each shows the four domains for that place, powered by the Substrate's maintained state for that site.

3.2 MNG Convergence

MNG already implements place-centric organization through its five-panel place view (Identity, Physical Place, Ecological Setting, Living Systems, Field Notebook). The monitoring widgets within MNG show sensor data for the current place. With the Substrate maintaining continuous context, MNG's monitoring widgets and STRATA's Observatory panels become functionally redundant -- both render sensor state for a place.

The convergence path: STRATA's Observatory intelligence (micro-agent summaries, key readings, freshness tracking, domain organization) replaces MNG's monitoring widgets. MNG provides the place identity framework, the category model, the site registry. STRATA provides the sensor intelligence and ecological framing. The Substrate is the architectural seam.

The result is curated Place Monitoring Panels -- Observatory panels that live within MNG's place context, powered by the Substrate, with STRATA intelligence. Not two systems doing the same job; one system with clear separation between spatial identity (MNG) and temporal intelligence (STRATA).

3.3 Virtual Sensors and Expanded Observatories

Physical sensor platforms (Tempest, Ecowitt, Airthings) are installed at Canemah and Owl Farm. But many data sources are coordinate-based and available for any location:

  • Open-Meteo: Weather data for any coordinate on Earth
  • BirdWeather: Detection data from the network's nearest stations
  • iNaturalist: Observation data within any geographic radius
  • NOAA/NWS: Forecasts and alerts for any US location

Any place registered in MNG's site table with coordinates could have an Observatory panel built from these virtual sensors. A site visited during fieldwork in the Sierra Nevada, a research station in the desert Southwest, a collaborator's property -- all could maintain lightweight ecological awareness through the Substrate, with the same three-layer context model but thinner sensor state.

The Substrate's freshness threshold model handles this naturally. Virtual sensors for non-local sites would use longer refresh intervals (hourly rather than every 5 minutes) to conserve API calls while maintaining awareness.


4. SOMA Integration

4.1 The Substrate as Sensory Input

SOMA's architecture (described in the reference codebase at Projects/Reference/SOMA/) defines domain-specialized RBM meshes that learn energy landscapes from continuous data streams. The Substrate provides exactly the structured, continuously maintained state that SOMA requires:

  • Weather mesh: Consumes Layer 2 sensor state for EARTH platforms
  • Acoustic mesh: Consumes Layer 2 state for BirdWeather platforms + Layer 3 species detection states
  • Ecosystem mesh: Consumes cross-domain correlations from Layer 3

The Substrate's soma context type provides a numerical state vector optimized for RBM input -- no narrative text, no formatting, just the structured values that the meshes need for energy computation.

4.2 Anomaly Feedback Loop

When SOMA detects anomalies (tension in the energy landscape that deviates from learned norms), it writes anomaly flags back to Layer 3 of the Substrate. These flags then appear in the chat and full context documents, enabling the Chat interface to report: "SOMA has flagged unusual barometric pressure decline coinciding with atypical bird activity patterns."

This creates a closed loop: collectors gather raw data, the Substrate maintains derived state, SOMA detects patterns in that state, and anomaly flags feed back into the Substrate for interpretation by downstream consumers.


5. Implementation Roadmap

Phase 1: Persistence Layer (Prototype on Data)

Create the substrate_state and substrate_context tables in strata_db. Write the initial Substrate maintenance script that refreshes Layer 2 (sensor state) from the existing micro-agent summary functions. Verify that Observatory can render from maintained state instead of live queries. No new functionality -- just decoupling computation from rendering.

Deliverable: Observatory loads from cached state. Page load query count drops from ~50 to ~8.

Phase 2: Place Identity Integration

Connect Layer 1 to MNG's macroscope_nexus database. Pull site descriptions, geographic context, and ecological address data into the Substrate. Update the Chat system prompt to include structured place identity rather than hardcoded geographic strings.

Deliverable: Chat context includes place-aware framing from MNG data.

Phase 3: Ecological Frame

Implement Layer 3: species detection state classification, phenological context, seasonal norms. This requires defining "expected" species lists per site per season -- initially manual, eventually derived from historical detection data.

Deliverable: Chat can distinguish between expected and anomalous species detections.

Phase 4: Background Process

Package the Substrate maintenance script as a launchd agent on Data for testing, then deploy to Galatea alongside the existing collectors. Implement the freshness-threshold check cycle. Verify low-overhead operation.

Deliverable: com.strata.substrate.plist running on Galatea, maintaining state 24/7.

Phase 5: Place-Centric Observatory

Refactor Observatory navigation from domain-centric to place-centric. Each site gets its own Observatory view, with domains organizing data within the site. Begin convergence with MNG monitoring widgets.

Deliverable: Place-centric Observatory panels, MNG monitoring widgets deprecated.

Phase 6: SOMA Connection

Implement the soma context type in substrate_context. Connect SOMA mesh inputs to the maintained state vector. Implement the anomaly feedback loop.

Deliverable: SOMA consuming Substrate state, anomaly flags appearing in Chat context.


6. Relationship to Other Documents

  • CNL-TN-2026-027: Macroscope/STRATA and MNG Convergence Plan. The Substrate is the architectural seam for convergence -- MNG provides spatial identity, STRATA provides temporal intelligence, the Substrate connects them.
  • CNL-TN-2026-043: STRATA 2.0 Distributed Intelligence Architecture. The Substrate replaces the pull-on-demand intelligence pipeline described in 043 with a continuous maintenance model.
  • CNL-TN-2026-044: STRATA Sensor Plugin Architecture. The plugin contract defined in 044 specifies what the Substrate needs from each sensor platform. Plugins register; the Substrate maintains their state.
  • CNL-TN-2026-045: STRATA IQ: Place-Based Ecological Intelligence. The three-layer context model in 045 becomes the Substrate's three-layer persistence model. The IQ vision document describes what the Substrate maintains.

7. Conclusion

The Substrate transforms STRATA from a reactive query engine into a continuously aware ecological intelligence system. By applying the proven collector pattern to derived intelligence -- maintaining state on natural data rhythms rather than computing it on demand -- the architecture achieves three goals simultaneously: efficient resource use (low-simmer background processing vs. 50-query page loads), high-quality AI context (structured, curated, place-aware vs. unstructured firehose), and always-on awareness (persistent state for SOMA pattern detection vs. ephemeral computation).

The geological metaphor is precise. STRATA's layers are Observatory (the visible surface), Intelligence (the analytical stratum), and the Substrate (the foundation everything rests on). The Substrate is not the most visible component, but it is the one that makes everything above it possible.

Cite This Document

Michael P. Hamilton, Ph.D. (2026). "The STRATA Substrate." Canemah Nature Laboratory Technical Note CNL-TN-2026-046. https://canemah.org/archive/CNL-TN-2026-046

BibTeX

@techreport{hamilton2026strata, author = {Hamilton, Michael P., Ph.D.}, title = {The STRATA Substrate}, institution = {Canemah Nature Laboratory}, year = {2026}, number = {CNL-TN-2026-046}, month = {april}, url = {https://canemah.org/archive/document.php?id=CNL-TN-2026-046}, abstract = {STRATA 2.0's Observatory and Intelligence dashboards currently generate all sensor summaries, narrative text, and ecological context on demand -- executing 30-50 database queries per page load, discarding the results when the page closes, and rebuilding everything from scratch on the next request. The Chat interface compounds this by dumping the entire sensor state into an LLM system prompt as unstructured text. This pull-on-demand architecture works at development scale but cannot sustain real-time ecological awareness, efficient AI conversations, or the always-on pattern detection required by SOMA. This technical note specifies the **Substrate** -- a continuously maintained ecological context layer that keeps a current, structured, curated picture of the world ready to serve. Modeled on the same launchd collector pattern that already gathers raw sensor data on Galatea, the Substrate operates as a second-tier process: collectors deposit raw readings into the `macroscope` database; the Substrate reads those readings and maintains derived intelligence state. Observatory, Intelligence, Chat, and SOMA become consumers of maintained state rather than independent query engines. The Substrate maintains three layers of context: Place Identity (from MNG's ecological address model), Sensor State (per-platform summaries refreshed on natural data rhythms), and Ecological Frame (species detection states, phenological context, seasonal norms). Together these layers provide the structured, curated context that STRATA IQ requires for place-based ecological intelligence. The architecture positions the Substrate as the foundation for convergence between MNG's spatial identity framework and STRATA's temporal intelligence framework, and as the sensory input layer for SOMA's pattern detection meshes.} }

Permanent URL: https://canemah.org/archive/document.php?id=CNL-TN-2026-046