The Macroscope Data Logger Designer
The Macroscope Data Logger Designer
Substrate-Native Provisioning of Virtual Instruments Across Earth, Life, Home, and Self Domains
Document ID: CNL-TN-2026-052 Version: v2.0 (Draft) Date: April 25, 2026 Author: Michael P. Hamilton, Ph.D. Affiliation: Canemah Nature Laboratory, Oregon City, Oregon
AI Assistance Disclosure: This technical note was developed collaboratively with Claude (Anthropic) via Cowork in Laboratory Mode. Claude contributed to substrate translation analysis, cross-domain generalization, architectural synthesis across predecessor documents, and manuscript drafting. The author takes full responsibility for the content, accuracy, and conclusions.
Supersedes: CNL-TN-2026-052 v0.1 (April 16, 2026), which specified a Climate-Analyst-internal wizard targeting the legacy MySQL macroscope and macroscope_nexus databases.
Revision history:
| Version | Date | Changes |
|---|---|---|
| v0.1 | 2026-04-16 | Initial draft. Climate-Analyst sidebar wizard targeting MySQL substrate; three platform types (openmeteo, wunderground, tempest_api); single launchd PHP collector with INSERT-IGNORE idempotency. |
| v2.0 | 2026-04-25 | Substrate transition to Postgres + TimescaleDB + PostGIS per CNL-TN-2026-056 / CNL-TN-2026-066; promotion from "wizard inside Climate Analyst" to Lab 07 Workflow Designer callable from any Lab; cross-domain generalization across all four Macroscope domains (Earth · Life · Home · Self) with worked plugin examples in each; reading vs. detection event-type distinction made first-class; new §6 "Privacy, ownership, and access" specifies subject_type as the primary privacy classifier with five visibility tiers, a grants table for cross-tier sharing, Postgres row-level security as the enforcement layer, two-flavored DLD wizard (admin-facing for places, participant-facing for persons/households), and DLD as the consent boundary for the MySQL → macroscope_v2 migration; Owl Farm threaded throughout as a worked multi-posture site. Collector runtime decision deferred to §7.2 (forthcoming) and the Phase 2 work in CNL-FN-2026-054 §6.3. |
Abstract
The original Climate Analyst Data Logger Designer (CNL-TN-2026-052 v0.1) specified a Climate-Analyst-internal wizard that provisioned virtual weather instruments by inserting rows into MySQL sensor_platforms and monitoring_sources tables; a single launchd collector on Galatea then discovered the new platforms and began hourly ingestion into vendor-specific reading tables. That design has been overtaken by two converging changes. First, the Macroscope substrate has migrated from MySQL to Postgres + TimescaleDB + PostGIS (CNL-TN-2026-056, CNL-TN-2026-066), collapsing the dual-database YEA / MNG architecture into a unified nexus + public schema layout, replacing vendor-wide reading tables with two canonical hypertables (public.reading and public.detection), and replacing INSERT IGNORE with ON CONFLICT idempotency. Second, the rest of the Macroscope Collaboratory has matured into a multi-Lab research environment whose other instruments (Site Finder, Climate Analyst, Habitat Analyst, Panorama, Biodiversity Analyst, and the in-development Acoustic Phenology Lab 06) all want to provision new data sources from inside their own analytical context — not by leaving for a sidebar in a different Lab.
This v2 specification responds to both pressures. The Data Logger Designer is reframed as a substrate-native, cross-Lab callable surface — the Macroscope Workflow Designer's first occupant — that any Lab can invoke as a modal/drawer with pre-filled provisioning context. The three-tier instrument hierarchy from v0.1 (physical → community → model-derived) is preserved as a generalizable pattern and re-grounded with worked examples in four domains: Earth (weather, climate, soil), Life (acoustic detection, observation feeds, range models), Home (built-environment sensors, household air quality, household energy), and Self (wearables, social activity feeds, derived health models). The single-collector architecture survives the substrate transition with two changes: the runtime decision is explicitly deferred to Phase 2 of the Macroscope migration plan (CNL-FN-2026-054 §6.3), and the collector now writes to two canonical hypertables differentiated by event semantics — continuous metric readings versus event-typed detections. v2 additionally treats privacy as a first-class architectural concern (§6): the four-domain spread divides cleanly along an access axis (Earth/Life are place-subject and shareable by default; Home/Self are household/person-subject and private by default), and DLD encodes this through subject_type, ownership, visibility tiers, a grants table, and Postgres row-level security. The result is an operational pattern in which provisioning a new monitoring instrument anywhere in the curated catalog, in any of the four Macroscope domains, is a database transaction mediated by a wizard the user reaches without leaving the Lab they were working in — and with the correct privacy posture from the moment the substrate write completes.
1. Introduction
1.1 Scope of v2 revision
This revision serves three purposes:
-
Substrate transition. Translate every database reference, table name, schema location, and SQL idiom in v0.1 to the post-migration Postgres + TimescaleDB + PostGIS world established by CNL-TN-2026-056 and locked into permanent operation by CNL-TN-2026-066. Where v0.1 named MySQL tables that no longer exist (
sensor_platforms,monitoring_sourcesin the dual-database sense,openmeteo_hourly,wunderground_readings), v2 names the canonical Postgres equivalents. -
Cross-domain generalization. v0.1 was a Climate Analyst document — its tier hierarchy, worked examples, and STRATA integration all spoke the language of weather data. The Macroscope's reach is broader. This revision generalizes the Data Logger Designer pattern across the four project domains — Earth, Life, Home, and Self — with first-class worked examples in each. The pattern survives the generalization unchanged in its core; only the menu of platforms it provisions expands. Privacy posture differs sharply across the four (§6): Earth and Life are place-subject and shareable by default; Home and Self are household/person-subject and private by default. The architectural division along the privacy axis is what makes the four-domain framing load-bearing rather than decorative.
-
Promotion to a Workflow Designer Lab. v0.1 located DLD as a sidebar wizard inside the Climate Analyst. The arrival of additional Labs that all want to provision instruments — most acutely Lab 06 Acoustic Phenology, which needs to capture neighborhood BirdWeather PUCs at any curated place — exposes the limit of "wizard inside one Lab." v2 promotes DLD to be the first occupant of a new Lab 07 in the Collaboratory — the Workflow Designer — which is callable from any other Lab as a modal with pre-filled context. The cross-Lab call API is specified in §3.2. Lab 07 is positioned to host additional workflow patterns over time (scheduled batch jobs, investigation pipelines, STRATA agent runbooks); DLD is the first tool inside it.
This v2 does not respecify the STRATA intelligence envelope (CNL-TN-2026-044), the place catalog (CNL-TN-2026-030), or the bridge collector (CNL-TN-2026-066). It assumes those as load-bearing context and integrates with them at the seams established in CNL-FN-2026-054 §2.5 (Schema Convergence).
1.2 What carries forward from v0.1
Five v0.1 commitments survive intact and are the architectural backbone of v2:
- The instrument-versus-widget distinction (v0.1 §1.1, §9.1). A widget fetches and displays data on demand, then discards it. An instrument records continuous data to the substrate. The Data Logger Designer creates instruments. STRATA's temporal intelligence requires recorded data; the recording is what DLD produces.
- Database-as-configuration (v0.1 §2.1). Adding an instrument is an INSERT against the substrate, not a deployment. No new cron jobs, no new collector scripts, no SSH sessions. The collector discovers new platforms by querying the substrate.
- The three-tier hierarchy (v0.1 §3). Tier 1 physical / owned, Tier 2 community / shared, Tier 3 model-derived / aggregated. This taxonomy works in every Macroscope domain. v2 preserves it and demonstrates that generality with worked examples in each domain (§4).
- Multi-tier stacking as a feature (v0.1 §3.4). A site or person can host multiple co-located instruments at different tiers. This is not redundancy; it is the substrate for cross-validation and anomaly triangulation. v2 keeps this principle and broadens it to non-spatial co-location (e.g. multiple personal-health platforms reporting on the same individual).
- Single-collector pattern (v0.1 §4). One collector cycle queries the substrate for active platforms and dispatches to platform-specific fetch functions. v2 preserves this pattern; the runtime question is deferred to §6.2.
1.3 What's new in v2
Five additions distinguish v2 from v0.1:
- Substrate-native data model (§5). Identity now lives in
public.monitoring_source(UUID), with the legacynexus.monitoring_sourcesint-keyed shim retained for backward compatibility per CNL-FN-2026-054 §6.1. Reading streams write to thepublic.readinghypertable; event streams write to thepublic.detectionhypertable. Idempotency isON CONFLICT DO NOTHINGagainst composite primary keys. - The reading-versus-detection distinction (§5.3). v0.1 implicitly assumed all platforms produce continuous metric rows (readings). The Postgres substrate makes a first-class distinction between regular-cadence metric readings and irregular-cadence typed events. A platform plugin declares which it produces — or both. This distinction matters because phenology, acoustic detection, camera traps, DNA sampling, manual field-notebook events, and many Self-domain platforms produce detections, not readings.
- Cross-domain tier examples (§4). Tier 1, 2, 3 are populated with concrete worked examples in Earth, Life, Home, and Self. The cross-domain table is not aspirational — each cell is intended to map to a real or near-term plugin.
- Workflow Designer Lab (§3). DLD is now hosted inside Lab 07 — the Workflow Designer — alongside any future workflow tools that emerge. Other Labs invoke DLD as a modal/drawer with pre-filled context via a documented cross-Lab call API; the modal is the same interface a user would reach by visiting Lab 07 directly. Lab 06 (CNL-SP-2026-067) is the first declared consumer.
- Acquisition cadence and offline-resume as first-class configuration (§6.3, §6.4). v0.1 specified collection intervals informally per platform type. v2 makes per-instrument cadence a stored configuration parameter, exposes it in the wizard UI, and specifies the collector's offline-then-resume semantics — important during the Data → Galatea transition when Data may be offline overnight or for development reasons but Galatea is targeted for 24/7 operation.
1.4 Glossary
The terms below are used precisely throughout v2; the same words sometimes have looser meanings elsewhere.
- Platform — a class of data source identified by
platform_type(e.g.openmeteo,wunderground,tempest_api,birdweather_station,apple_health). One platform may have many instances; identity at the platform level is the plugin contract. - Instrument — a provisioned instance of a platform, bound to a specific subject (a place, a person, a study). Identity at the instrument level is
public.monitoring_source.id(UUID). Every instrument has aplatform_type, anobserver_class, a subject reference, and a configuration JSON. - Source — synonym for instrument when emphasizing the data-flow perspective (the source produces readings or detections that arrive in the substrate). The substrate column
public.detection.monitoring_source_idcarries this identity. - Subject — the entity an instrument monitors. For Earth and Life domains, this is a
nexus.places.id(int). For Home, this is anexus.households.id(int — see §6.4 for the new table). For Self, this is anexus.users.id(int — the person whose body the instrument observes; the same person who logs in to see their data). Subject type is not just a routing field; it is the primary privacy classifier (§6.3). The platform plugin declares which subject types it accepts. - Tier — Tier 1 physical, Tier 2 community, Tier 3 model. A property of an instrument (and inherited from its platform's typical use), not of the platform itself — e.g. a Tempest station is Tier 1 when it's the user's own and Tier 2 when it's a stranger's WU-aggregated reading.
- Reading — a row in
public.reading. Continuous-cadence metric observation. Keyed by(monitoring_source_id, time). - Detection — a row in
public.detection. Irregular-cadence typed event with optional taxon, location, confidence, and media references. Keyed by(monitoring_source_id, time, external_observation_id). - Workflow Designer (Lab 07) — the Collaboratory Lab that hosts the DLD wizard and (in the future) related provisioning, scheduling, and lifecycle tools. A user can reach DLD by visiting Lab 07 directly, or by being handed off to it via a cross-Lab modal call from any other Lab.
2. Architecture
2.1 Design principles
The five v0.1 principles carry forward refined for v2:
Database-as-configuration. Adding an instrument anywhere in the substrate is a write transaction against public.monitoring_source (and the legacy nexus.monitoring_sources shim, per CNL-TN-2026-066 §3). No deployment. The collector discovers the new instrument on its next cycle by querying public.monitoring_source WHERE status = 'active'. This principle is the architectural enabler for cross-domain reach: the same wizard can provision an Open-Meteo instrument over the Patagonia coast, a BirdNET-Pi capture in Oregon City, and an Apple Watch logger for a research participant — all by writing the same shape of substrate row.
Instrument, not widget. Recording continuous data to the substrate is what makes downstream temporal intelligence possible. STRATA's nine-window temporal analysis (CNL-TN-2026-044), Lab 06's phenology drift plots (CNL-SP-2026-067 §4.7), and any future Self-domain circadian or HRV trend analysis all require the recording. DLD's distinguishing feature against display-only widgets is that what it provisions persists.
Multi-tier stacking as a feature. A subject — place or person — may host multiple co-located instruments at different tiers. Cross-tier divergence is a signal, not noise. Persistent disagreement between an Open-Meteo gridded estimate and a co-located Tempest reading is microclimate detection (v0.1 §3.4). Persistent disagreement between an Apple Watch HRV reading and a co-located Oura HRV reading is sensor calibration drift (or, more interestingly, a measurement-modality artifact). In every domain, multi-tier stacking is the substrate for cross-validation.
Virtual sensors are first-class citizens. The STRATA plugin contract (CNL-TN-2026-044 §9) treats anything that produces timestamped substrate rows as a sensor. Whether the rows arrive from a physical thermometer, a community-shared station, or a global gridded model is a tier classification, not an identity classification. This principle is what allows DLD to provision an Open-Meteo instrument with the same wizard flow as a Tempest instrument or an Apple Watch instrument.
Cross-Lab provisioning, single source of truth. New in v2. Any Lab in the Collaboratory may invoke DLD to provision an instrument against the subject the user is currently investigating — Climate Analyst against a place's weather, Lab 06 against a place's neighborhood acoustic stations, a future Self Lab against the current user's wearables. Provisioning logic, schema writes, validation, and lifecycle management live exclusively in DLD. Labs are consumers, not implementers, of this surface.
2.2 The four-layer flow
v0.1's three-layer flow (wizard → substrate → collector) gains a fourth front layer in v2: the originating Lab.
┌─────────────────────────────────────────────────────────────┐
│ ANY LAB IN THE COLLABORATORY │
│ User encounters a discovered data source within their │
│ analytical context (a place, a person, a study). │
│ Clicks "Capture in Data Logger" → opens DLD modal │
│ with prefill context. │
└─────────────────────────┬───────────────────────────────────┘
│ DLD.open({platform_type, subject_ref,
│ prefill, on_provision})
v
┌─────────────────────────────────────────────────────────────┐
│ WORKFLOW DESIGNER (LAB 07) — Data Logger Designer modal │
│ Confirm subject → confirm/edit prefill → review available │
│ tiers → set acquisition cadence → provision instrument(s). │
└─────────────────────────┬───────────────────────────────────┘
│ INSERT INTO public.monitoring_source,
│ nexus.monitoring_sources
v
┌─────────────────────────────────────────────────────────────┐
│ MACROSCOPE COLLECTOR (Phase 2 runtime — see §6.2) │
│ SELECT platform_type, configuration │
│ FROM public.monitoring_source WHERE status = 'active'; │
│ Dispatch to platform plugin → fetch → write readings or │
│ detections to canonical hypertables with ON CONFLICT. │
└─────────────────────────┬───────────────────────────────────┘
│ Continuous data accumulates in
│ public.reading and public.detection
v
┌─────────────────────────────────────────────────────────────┐
│ CONSUMERS │
│ STRATA intelligence envelope (temporal analysis, narrative │
│ generation, anomaly detection) │
│ Labs reading from the same hypertables for analysis │
│ Public Observatory pages displaying current and historical │
│ state │
└─────────────────────────────────────────────────────────────┘
Figure 1. Four-layer Macroscope Data Logger Designer flow. The originating Lab provides the analytical context that prefills the wizard. DLD writes substrate identity. The collector populates substrate data. Consumers — STRATA, other Labs, public Observatory — read from the resulting hypertables.
The four-layer separation matters because each layer evolves independently: new Labs add as new originating contexts without DLD changes; new platforms add as plugins without collector framework changes; new substrate optimizations (continuous aggregates, compression policies) add without affecting either layer above.
2.3 Cross-domain spread
The Macroscope Data Logger Designer addresses four project domains, each with its own subject types, default privacy posture, tier examples, and consumer Labs:
| Domain | Subject type | Default visibility | Example consumer Labs | Example platforms (Tier 1 / 2 / 3) |
|---|---|---|---|---|
| Earth (geography, climate, environment) | nexus.places.id |
public_within_macroscope |
Climate Analyst (Lab 02), Habitat Analyst (Lab 03), Site Finder (Lab 01) | Tempest (outdoor), Ecowitt (outdoor) / WU PWS / Open-Meteo, NOAA gridded |
| Life (biodiversity, taxonomy, ecology) | nexus.places.id |
public_within_macroscope |
Biodiversity Analyst (Lab 05), Acoustic Phenology (Lab 06) | Macroscope BirdWeather PUC, Macroscope camera trap / external BirdNET-Pi, iNat project, eBird hotspot / GBIF aggregated, BirdCast model |
| Home (built environment, household sensors) | nexus.households.id |
owner_only |
Future household-environment Lab; relevant to Climate/Habitat Analysts when a place includes indoor monitoring | Airthings, indoor Tempest, indoor Ecowitt, AmbientWeather indoor / PurpleAir neighborhood air quality, utility consumer APIs / modeled indoor comfort, modeled energy from billing |
| Self (personal health, behavior) | nexus.users.id |
owner_only |
Personal MCP Collaboratory (operational today); future MNG-2.1 personal-health Lab | Apple Watch, Withings, Oura, Garmin HRM / Strava activity feeds, study-cohort summaries / Apple Health Kit aggregations, sleep model fits, HRV-derived stress |
Three observations follow from this spread.
The pattern is genuinely domain-portable, not a coincidence. The DLD design works across Earth / Life / Home / Self because the architectural shape it implements — a subject hosts one or more instruments at varying tiers, each producing substrate rows on a configurable cadence with a configurable backfill window, each registered through a plugin that declares its own discovery and ingest semantics — is the shape of distributed monitoring as a class, not the shape of weather monitoring specifically.
Some platforms cross domains via subject_type. A WeatherFlow Tempest deployed outdoors at a research site is Earth-domain (subject_type = 'place'). A WeatherFlow Tempest deployed indoors at Owl Farm is Home-domain (subject_type = 'household'). The platform plugin doesn't change; the subject_type carried by the deployment row determines the domain and the privacy posture. This is the feature that lets a single plugin serve any Lab that finds its data useful while keeping each deployment's access posture correct.
The four-domain spread is partially operational, partially in migration. Earth-domain DLD is operational today (Tempest and Ecowitt at Canemah). Life-domain DLD is in active design (Lab 06's BirdWeather plugin is the first concrete need). Home and Self are operational in the personal MCP Collaboratory's MySQL substrate today (Owl Farm Home streams; Mike's Self platforms via intelligence/tools/health_*.php) and are pending migration into macroscope_v2 — DLD is the consent boundary for that migration (§6.8).
The architecture is built once. The plugins, subjects, and consumer Labs accumulate over time across all four domains.
3. The Workflow Designer (Lab 07)
3.1 Placement and rationale
In v0.1, DLD was a sidebar wizard inside the Climate Analyst (Lab 02). That placement was correct for the Climate-only scope of the original document and incorrect for v2's cross-domain reach. v2 promotes DLD into a new Collaboratory Lab — Lab 07: Workflow Designer — that hosts DLD as its first occupant and accepts new workflow tools over time.
Lab 07 occupies an unusual position in the Collaboratory: every other Lab is analytical (it consumes the substrate), while Lab 07 is configurational (it creates and manages substrate identity). Both are research-essential activities, but they are different in kind. Placing the Workflow Designer inside the Collaboratory reflects the editorial choice that provisioning is part of the research practice — the user designs an investigation, identifies what needs to be captured, configures the capture, then returns to analysis. The wall between "doing research" and "configuring infrastructure" is artificial; Lab 07 dissolves it by making provisioning a first-class Collaboratory activity.
The Lab card sits at lab_number = 7 in nexus.lab_instruments with slug = 'workflow-designer', href = 'workflow-designer.php'. The internal layout deviates from the standard three-pane Lab pattern because the work it supports is procedural (a wizard) rather than analytical (a viewport plus filters). The Workflow Designer's home page presents a list of provisioned instruments grouped by subject and a primary "Provision a new instrument" call-to-action that opens the DLD wizard against an empty subject.
Future occupants of Lab 07 — beyond DLD — are out of scope for v2 but the placement anticipates them. Examples: a Scheduled Job Designer for batch analytical pipelines, an Investigation Designer for the workflow CNL-SP-2026-062 hints at as future UI, an Agent Runbook Designer for STRATA agent orchestration. Lab 07 is the home for Macroscope's procedural authoring tools, not just for DLD.
3.2 The cross-Lab call API
Any Lab in the Collaboratory may invoke DLD with pre-filled context. The intent is that a researcher discovers, in the course of analytical work, a data source worth capturing — and reaches the provisioning wizard without leaving the analytical context. Lab 06's neighborhood-PUC selection is the first concrete consumer; the Climate Analyst's "add another weather instrument here" affordance is the next.
The frontend call surface is exposed as a global Macroscope.WorkflowDesigner.openDLD(payload) function loaded by every Lab via admin/lab/js/workflow-designer-client.js. The payload schema:
{
// Required: which platform to provision
platform_type: 'birdweather_station', // or 'openmeteo', 'tempest_api', etc.
// Required: subject reference — domain determines table
subject_type: 'place', // 'place' | 'person' | (future) 'self_subject'
subject_id: 42, // nexus.places.id (int) for places;
// nexus.people.id (int) for people
// Optional: pre-filled discovery context to skip wizard Step 2
prefill: {
// Platform-specific keys interpreted by the platform plugin
birdweather_station_id: 'abc123xyz',
station_name: 'Backyard Cascadia',
coords: { lat: 45.34, lon: -122.61 },
observer_class: 'community',
discovery_origin: 'lab06_neighborhood'
},
// Optional: instrument metadata to pre-set
suggested_cadence_minutes: 15,
suggested_backfill_days: 90,
// Optional: callbacks
on_provision: (result) => { /* result.monitoring_source_uuid, ... */ },
on_cancel: () => { /* user dismissed wizard */ },
on_error: (err) => { /* substrate write or validation failed */ }
}
The function returns a Promise that resolves with the provisioning result or rejects on cancel / error, supporting await-style usage as well as callback-style. The result payload:
{
monitoring_source_id: 143, // nexus.monitoring_sources.id (int)
monitoring_source_uuid: '7b21...4f', // public.monitoring_source.id (uuid)
platform_type: 'birdweather_station',
subject_type: 'place',
subject_id: 42,
expected_first_data_at: '2026-04-25T15:42:00Z', // next collector cycle
instrument_status: 'active'
}
The calling Lab adds monitoring_source_uuid to its active selection and refreshes its plots. Lab 06's left panel updates the corresponding row's badge from "neighborhood, not selected" → "active, ingesting" → "active, last detection: N min ago" as DLD reports lifecycle progression.
3.3 The DLD wizard UX
The wizard is a four-step modal opened either from inside Lab 07 directly or from another Lab via the call API in §3.2. The same modal serves both entry points — DLD does not maintain two flows.
Step 1: Subject Confirmation. Display the subject (place or person) with its identifying context — for a place, name + coordinates + elevation + ecological summary; for a person, name + study role + relevant cohort tags. The user confirms this is the correct subject. When entered via cross-Lab call, this step is auto-confirmed if the calling context is unambiguous; the user sees the subject panel briefly with a "Continue" button.
Step 2: Source Discovery. The platform plugin's discovery method runs against the subject's identifying context (place coordinates for spatial platforms, person profile for personal-health platforms). For Tier 3 platforms that are universally available (Open-Meteo for any place, Apple Health Kit for any person who has connected the integration), discovery is a no-op. For Tier 2 platforms with subject-specific candidates (Weather Underground stations near a place, BirdWeather stations in a place's neighborhood, Strava connections for a person), the plugin returns a candidate list.
When entered via cross-Lab call with a prefill payload, Step 2 is replaced by the prefilled candidate. The user sees one row pre-selected rather than a list. The wizard may still re-run discovery in the background to catch staleness (a station might have gone offline since the cache was populated) and surface warnings if the prefilled candidate is no longer available.
Step 3: Instrument Configuration. The user reviews and may adjust:
- Acquisition cadence — how often the collector polls this instrument. Defaults from the platform plugin (Open-Meteo hourly, Tempest 5-minute, BirdWeather 15-minute) but explicitly user-editable. This is the place Mike's "rate of data acquisition" framing surfaces.
- Backfill window — how far back to attempt historical ingest on first activation. Defaults from the platform plugin (Open-Meteo: years; BirdWeather: bounded by the source station's retention tier). Constrained by the platform plugin's declared maximum.
- Instrument name — auto-generated as
{Subject Name} — {Platform Display Name}but editable. - Status on creation —
active(start collecting on next cycle) orinactive(provision but pause).
Step 4: Provisioning. The wizard calls the DLD provisioning endpoint, which writes the substrate identity rows (§5), invokes the platform plugin's register_instrument hook for any platform-side setup (e.g. authenticating a Strava connection), and inserts an audit log row. The wizard displays success with the resulting monitoring_source_uuid and the expected first-data timestamp. On dismiss, control returns to the calling context (Lab 07 home page if direct entry; calling Lab if via cross-Lab call) and the on_provision callback fires.
3.4 Authentication, permissions, and audit
Provisioning is an administrative action. Reading existing provisioned instruments is not.
Read access to the Workflow Designer (Lab 07 home page, instrument list, status indicators) follows the same rule as every other Lab in the Collaboratory: any authenticated MNG user. Cross-Lab call into the DLD wizard is similarly available to any authenticated user, but a non-admin user reaches Step 4 and sees a "Request provisioning" affordance rather than a direct provisioning button. Their request is queued for an admin to approve or reject from the Workflow Designer's admin queue (a future capability tracked in §11).
Write access — actually provisioning a new instrument or modifying an existing one (cadence change, pause, retire) — is restricted to MNG users with the admin role per the existing nexus.user_permissions model.
Audit — every provisioning action writes a row to a new nexus.workflow_audit_log table:
CREATE TABLE nexus.workflow_audit_log (
id BIGSERIAL PRIMARY KEY,
actor_user_id INTEGER NOT NULL REFERENCES nexus.users(id),
action TEXT NOT NULL, -- 'provision' | 'pause' | 'resume' | 'retire' | 'reconfigure'
monitoring_source_id INTEGER REFERENCES nexus.monitoring_sources(id),
platform_type TEXT NOT NULL,
subject_type TEXT NOT NULL,
subject_id INTEGER NOT NULL,
payload jsonb DEFAULT '{}'::jsonb NOT NULL, -- before/after configuration
occurred_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
The audit log is the durable record of what was provisioned and by whom. Substrate sources can be soft-deleted; audit history is append-only.
4. Instrument tiers across Earth, Life, Home, and Self
The three-tier hierarchy from v0.1 §3 generalizes cleanly across all four Macroscope domains. Each tier is defined by the relationship between the instrument and the data it reports — not by the technology behind it. A WeatherFlow Tempest is Tier 1 when it is the user's own calibrated instrument and Tier 2 when its readings reach us through Weather Underground's community aggregation; the hardware is the same, the tier reflects ownership and trust posture. The same Tempest is Earth-domain when deployed outdoors at a curated place and Home-domain when deployed indoors at a household — the hardware is the same; subject_type and resulting privacy posture differ (§6.3).
This section presents each tier with worked examples in Earth, Life, Home, and Self, then discusses multi-tier stacking with concrete cross-domain scenarios.
4.1 Tier 1 — Physical / owned
A Tier 1 instrument is owned, calibrated, and operated by the Macroscope user or their immediate research network. It produces the highest-quality observations for the subject because the operator controls placement, maintenance, calibration intervals, and data fidelity. It also imposes the highest cost: hardware purchase, deployment effort, and ongoing care.
| Domain | Worked Tier 1 examples |
|---|---|
| Earth | WeatherFlow Tempest (outdoor) at Canemah Nature Laboratory (1-min cadence, 13 atmospheric variables, free station-owner API). Ecowitt GW1200B (outdoor) at Canemah and Owl Farm (5-min cadence, outdoor temperature, humidity, soil moisture, local network push). Future: Davis Instruments Vantage Pro at a Pacific NW collaborator's site. |
| Life | Macroscope BirdWeather PUC AVR-1 / AVR-2 / AVR-3 at Canemah (continuous acoustic detection events to public.detection, BirdNET model). Macroscope BirdWeather PUCs at Owl Farm. Future: Macroscope-deployed camera traps (image detection events). Future: project-deployed eDNA sequencer outputs (DNA detection events). |
| Home | Indoor Tempest (where deployed indoors as a household sensor rather than outdoors at a research site). Indoor Ecowitt console (Owl Farm dwelling, indoor temperature/humidity, indoor air quality where the unit supports it). Airthings Wave Plus (radon, CO₂, VOCs, temperature, humidity — pending migration from MySQL). AmbientWeather indoor consoles. Future: smart-meter direct integrations for household electricity and water. |
| Self | Apple Watch (1-min HR samples, sleep stages, activity classifications via Apple Health Kit — pending migration into macroscopev2). Withings Body+ scale (event-typed: weight, body composition per measurement). Oura ring (sleep architecture, HRV, body temperature trend). Garmin HRM-Pro (workout-bound HR + HRV during structured exercise). The personal MCP Collaboratory's `intelligence/tools/health*.php` family operationalizes this tier today against the legacy MySQL substrate; DLD provisions the same instruments into macroscope_v2 with correct subject_type and visibility (§6.8). |
Provisioning posture. The wizard prompts for credentials, device IDs, and validation steps appropriate to the platform: API key for Tempest, station MAC for Ecowitt, BirdWeather station ID for Macroscope PUCs, OAuth flow for Apple Health Kit / Strava / Withings / Oura. On successful credential validation, the instrument enters service immediately.
Coverage. Tier 1 coverage is sparse by definition — it requires hardware-and-operator presence at the subject. For places, this means Canemah, Owl Farm, and any future explicit deployment. For people, this means each individual research participant. The DLD design treats Tier 1 as the gold standard but the rare standard.
4.2 Tier 2 — Community / shared
A Tier 2 instrument is owned by a third party — a community-station hobbyist, a citizen-science project, a public agency — and reaches the Macroscope substrate through an aggregator API. The third party did not purchase the hardware for Macroscope's purposes; the data is available rather than commissioned. Quality varies, calibration is rarely guaranteed, and persistence is at the third party's discretion. In return, coverage is dramatically denser than Tier 1.
| Domain | Worked Tier 2 examples |
|---|---|
| Earth | Weather Underground PWS network — consumer weather stations from Davis, Tempest, Ambient Weather, Netatmo, Ecowitt aggregated through api.weather.com/v3/location/near. CNL-TN-2026-023 documented 9–10 stations within 7 km of ecologically diverse test sites including the James San Jacinto Mountains Reserve. |
| Life | External BirdWeather PUCs and BirdNET-Pi installations registered through the BirdWeather GraphQL API — discoverable via lat/lon bounding-box query from any place. iNaturalist project feeds (curated observation streams from a project of interest, e.g. a regional bioblitz). eBird hotspot recent observation feeds (community sightings near a place). |
| Home | PurpleAir neighborhood air quality network — consumer-owned indoor and outdoor PM2.5/PM10 sensors aggregated through PurpleAir's API; especially useful when a household-subject Airthings doesn't measure particulates. Municipal utility consumer APIs (electric, gas, water) where the utility exposes a customer-data portal with OAuth. Community noise monitoring networks where they exist. |
| Self | Strava activity feeds shared by friends or by a study cohort (community-shared workout streams). Public health datasets aggregated for a person's geographic context (CDC weekly indicators, regional flu activity). Apple Health Kit shared cohort summaries when a research study has consented participants. |
Provisioning posture. The wizard runs platform discovery against the subject (places: nearby community stations; people: connected community accounts) and presents candidates with quality indicators. For Earth-domain Weather Underground, this includes the v0.1 elevation-matching logic (CNL-TN-2026-023): stations within 100 m elevation of the place are flagged excellent, 100–300 m acceptable, >300 m cautionary. For Life-domain BirdWeather neighborhoods, distance-from-place is the primary indicator; station kind (PUC vs. BirdNET-Pi) is shown for transparency. For Self-domain shared feeds, the indicators are platform-specific (study cohort affiliation, last-active recency).
Coverage. Tier 2 reach is the v0.1 surprise: in populated regions, community coverage exceeds purpose-built research infrastructure for many measurement classes. The Weather Underground PWS network's North American density, the BirdWeather + BirdNET-Pi global network's continuing growth, and the iNaturalist-eBird community-science ecosystems each instantiate this. Tier 2 is how DLD scales to subjects where Tier 1 is impractical.
4.3 Tier 3 — Model-derived / aggregated
A Tier 3 instrument is not an observation at all; it is a model output or aggregation available globally for any subject. The data is uniform in coverage, low in marginal cost, and known-imperfect in fidelity (the model has biases; the aggregation smooths over local variation). Tier 3 is the universal fallback — always available, never the only thing you want.
| Domain | Worked Tier 3 examples |
|---|---|
| Earth | Open-Meteo gridded weather (ERA5 reanalysis, ICON, GFS — ~1 km resolution, hourly cadence, no key required for basic tier; Macroscope holds Pro tier as donation). NOAA gridded products (NDFD, RTMA) for North America. ESA CCI surface temperature for global retrospective analysis. |
| Life | BirdCast nocturnal migration density forecasts (continental US, refreshed nightly during migration seasons). Map of Life predicted species ranges for a place. GBIF aggregated occurrence density grids for a taxon across a geographic window. |
| Home | Modeled indoor comfort estimates derived from outdoor weather conditions plus household envelope characteristics (insulation, HVAC type, dwelling square footage). Modeled energy consumption from utility billing patterns blended with regional weather (degree-day decomposition). Modeled indoor air quality from outdoor PurpleAir + ventilation assumptions when no indoor sensor exists. |
| Self | Apple Health Kit aggregated activity summaries (weekly steps, monthly resting HR trends — Apple's roll-ups, not the raw samples). Sleep model fits from Whoop or Oura that produce a modeled sleep architecture from raw HR / accelerometer. HRV-derived stress scores from any platform that synthesizes them. Predictive nutrition models derived from logged intake and demographic norms. |
Provisioning posture. Tier 3 is the simplest case: the wizard auto-selects the platform without discovery (it is universally available for any subject of the matching type) and proceeds directly to Step 3 configuration. Acquisition cadence and variable selection are the only meaningful user-facing decisions.
Coverage. Tier 3 is global by construction. Every place on Earth has Open-Meteo data; every person who has connected Apple Health Kit has aggregated summaries. This makes Tier 3 the right baseline for catalog-scale instrumentation — provisioning Open-Meteo across all 540+ published nexus.places is a batch script (CNL-FN-2026-054 §6.2 Track 2).
4.4 Multi-tier stacking — cross-domain examples
The architectural payoff of the tier hierarchy is not classification but stacking. A subject hosts instruments at multiple tiers simultaneously, and the divergences between them are signal.
Earth-domain stacking at Canemah. Open-Meteo (Tier 3) running hourly, plus Weather Underground neighbors (Tier 2) running hourly, plus Tempest and Ecowitt (Tier 1) running 1-min and 5-min respectively. Persistent disagreement between the Tempest reading and the Open-Meteo grid is microclimate detection — the model is predicting one valley when the actual station is in a different topographic regime. v0.1 §3.4 framed this as the mechanism for the bias characterization proposed in CNL-TN-2026-023; v2 keeps the framing.
Life-domain stacking at a Pacific NW Macroscope place. Macroscope BirdWeather PUC (Tier 1, calibrated, operator-known) plus a neighborhood BirdNET-Pi 600 m away (Tier 2, third-party-owned) plus BirdCast nocturnal migration density (Tier 3, modeled). Cross-tier signal: when BirdCast predicts elevated migration density and both PUCs detect a corresponding species spike, confidence that an actual migration event is occurring is higher than either source could establish alone. When BirdCast predicts elevated density but neither PUC sees it, the model is wrong for this microregion or the migration corridor avoided this site — itself a signal.
Home-domain stacking at Owl Farm Dwelling. Indoor Tempest (Tier 1, household-subject, owner_only visibility — Merry's instrument) plus Airthings Wave Plus (Tier 1, household-subject, alternative manufacturer measuring radon/CO₂ that the Tempest doesn't) plus modeled indoor comfort estimates from outdoor weather + dwelling envelope characteristics (Tier 3). Cross-tier signal: persistent divergence between indoor Tempest temperature and the model's prediction reveals the dwelling's thermal-mass behavior (slow to warm, slow to cool); divergence between Airthings CO₂ and outdoor PurpleAir reveals indoor occupancy and ventilation patterns. For the household members, this is information about how their dwelling actually behaves — not a research output but a personal-environment intelligence stream. The privacy posture (owner_only by default) means this analysis stays inside the household unless a member explicitly grants research access.
Self-domain stacking for a research participant. Apple Watch (Tier 1, raw HR + sleep stages) plus Oura ring (Tier 1, alternative raw measurement modality on the same person) plus Apple Health Kit aggregated summaries (Tier 3, modeled roll-ups from the same raw data). Cross-tier signal: persistent disagreement between Watch HRV and Oura HRV on the same nights is a measurement-modality artifact — interesting in itself for cross-validation studies, and a flag for downstream consumers that the HRV signal is contested. Persistent agreement between Watch + Oura but divergence from Apple Health Kit's roll-up reveals the model's smoothing assumptions.
Cross-domain stacking becomes possible when the same user owns or has been granted access to instruments across multiple subjects. Mike's personal MCP Collaboratory regularly composes Earth + Life + Home + Self data in a single investigation: outdoor pollen counts (Earth) + dawn chorus species detections (Life) + indoor air quality (Home) + his own respiratory HRV pattern (Self) for a personal allergy-response inquiry. The substrate's row-level security (§6.6) ensures cross-domain composition only succeeds where the user has the right to see all the contributing instruments — Mike has god-level access to his own Self/Home and to Macroscope's Earth/Life, so the composition succeeds. A different researcher attempting the same composition without grants would get back only the Earth/Life rows, transparently. Cross-domain investigation is a personal capability when subject-ownership coincides; it is a study capability when consent grants coincide.
In every domain, cross-tier divergence is the cheapest available signal for something interesting is happening. Multi-tier stacking is the architecture for finding it. Cross-domain stacking is what makes the four-domain Macroscope larger than the sum of its parts.
5. Postgres-native data flow
This section maps the v0.1 MySQL data flow onto the post-migration Postgres + TimescaleDB + PostGIS substrate. Where v0.1 named MySQL tables, columns, or syntactic patterns that no longer apply, v2 names the canonical Postgres equivalents established by CNL-TN-2026-056 (Schema Design) and CNL-TN-2026-066 (Bridge Permanent Completion).
5.1 Identity — the UUID/int dual key
The v0.1 substrate placed instrument identity in two MySQL tables: sensor_platforms (in the macroscope database) for the platform-level identity and monitoring_sources (in macroscope_nexus) for the place-binding identity. The Postgres substrate consolidates these into:
-
public.monitoring_source(UUID) — the canonical substrate-side identity. Every reading and detection row references this UUID viamonitoring_source_id. Schema:id uuid PRIMARY KEY DEFAULT gen_random_uuid() place_id uuid NOT NULL -- public.place.id source_type text NOT NULL -- 'birdweather_station', 'openmeteo', etc. external_id text -- platform-side identifier method_provenance_id uuid NOT NULL macroscope_platform_id bigint -- legacy bridge id name text deployed_at timestamptz retired_at timestamptz metadata jsonb DEFAULT '{}'::jsonb -
nexus.monitoring_sources(int) — preserved as a backward-compatibility shim per CNL-TN-2026-066. Carries int IDs that the existing widget API surfaces (e.g.mw-birdweather.php?source_id=3) need; bridges to the UUID identity viabridge_nexus_monitoring_source_by_id().
DLD writes to both tables on provisioning to maintain the dual-key invariant. The shim table is the authoritative read source for legacy widget APIs; the UUID table is the authoritative source for the new substrate (Lab 06's queries, STRATA reads, Observatory pages).
The provisioning function returns both identities — the int nexus.monitoring_sources.id for legacy widget compatibility and the UUID public.monitoring_source.id for substrate queries — so the calling Lab can use whichever it needs (Lab 06 uses the UUID in bridge_detection_temporal_grid queries; the Climate Analyst's existing widget code uses the int).
5.2 Reading versus Detection — which hypertable for which platform
The Postgres substrate makes a first-class architectural distinction that v0.1 did not:
public.reading— continuous-cadence metric observations. Temperature, humidity, wind, HR, HRV, pressure, soil moisture. Regular intervals; one row per(monitoring_source_id, time). Hypertable partitioned ontime.public.detection— irregular-cadence typed events. Acoustic species detections, camera trap captures, eDNA hits, manual field-notebook events. Each row carries adetection_type(acoustic|image|dna|visual), aconfidence, anexternal_observation_id, and optionally a media reference. Hypertable partitioned ontime.
A platform plugin declares which it produces:
| Platform | Produces | Domain(s) | Notes |
|---|---|---|---|
openmeteo |
reading | Earth | Hourly metric rows; subject_type = 'place' only. |
tempest_api |
reading | Earth (outdoor), Home (indoor) | 1-min metric rows; subject_type = 'place' or 'household'. |
ecowitt |
reading | Earth (outdoor), Home (indoor) | 5-min metric rows; subject_type = 'place' or 'household'. |
wunderground |
reading | Earth | Hourly metric rows from a community station; subject_type = 'place' only. |
airthings |
reading | Home | Indoor air quality (radon, CO₂, VOCs, T, RH); subject_type = 'household' only. |
ambientweather |
reading | Earth (outdoor), Home (indoor) | Outdoor or indoor consoles depending on deployment. |
purpleair |
reading | Earth (community network), Home (when subject is the household and a unit is co-located) | PM2.5/PM10 community network. |
birdweather_station |
detection | Life | Acoustic events, irregular cadence, species-typed; subject_type = 'place'. |
inat_project |
detection | Life | Observation events from a curated iNaturalist project; subject_type = 'place'. |
birdcast |
reading | Life | Nightly migration density (a per-night metric). |
apple_health |
reading | Self | HR samples, HRV samples; subject_type = 'person'. |
strava_activity |
detection | Self | Workout-event rows (start/end/duration as metadata). |
withings_scale |
detection | Self | Weighing events. |
oura_ring |
reading + detection | Self | Sleep architecture (reading) plus night-event detections (sleep onset/offset). |
utility_meter |
reading | Home | Smart-meter electric/gas/water readings. |
A platform may produce both reading and detection rows — e.g. a future generation of BirdWeather PUC that also exposes ambient temperature would produce detection rows to public.detection and reading rows to public.reading from the same instrument. The plugin declares this multiplicity.
This distinction is the core difference between v0.1 (Earth-only, all platforms produced "readings") and v2 (cross-domain, where many Life, Home, and Self platforms produce typed events). It is not cosmetic: the two hypertables have different schemas, different compression profiles, different consumer query patterns (continuous aggregates work naturally on reading; species-keyed and event-keyed grouping is the dominant pattern on detection). DLD is the layer that routes a platform's output to the correct hypertable.
5.3 ON CONFLICT idempotency
v0.1's INSERT IGNORE was MySQL idiom. The Postgres equivalent — and the substrate-wide convention — is ON CONFLICT DO NOTHING against the canonical primary key for each hypertable:
public.reading:(monitoring_source_id, time)— duplicate readings are silently dropped.public.detection:(monitoring_source_id, time, external_observation_id)— duplicate detections (same source, same time, same upstream event id) are silently dropped.
The collector framework (§6) wraps every fetch-and-write cycle in this idempotency contract. A collector cycle that is interrupted mid-write and restarted produces no duplicates; a backfill that overlaps an already-ingested window produces no duplicates; a manual replay of a known-suspect time window produces no duplicates. The substrate is correct under all retry scenarios.
The cost is occasional silent skipping of rows the collector thought were new. The platform plugin declares its idempotency key derivation; if that derivation is wrong (e.g. two genuinely different events that happen to share a key), data is lost. v2 makes the idempotency key declaration explicit in the plugin contract (§7) so this risk is auditable.
5.4 PostGIS geom population
The Postgres substrate uses PostGIS for spatial identity. Both public.detection and public.monitoring_source carry geom columns (geometry(Point, 4326)) that DLD must populate at provisioning time for spatial platforms.
For place-bound instruments (Earth-domain weather, Life-domain acoustic), the geom is derived from nexus.places.latitude / longitude at provisioning:
INSERT INTO public.monitoring_source (..., geom)
VALUES (..., ST_SetSRID(ST_MakePoint($lon, $lat), 4326));
For person-bound instruments (Self-domain wearables), there is no spatial identity — the geom remains NULL. The platform plugin declares whether it is spatial; DLD respects the declaration.
For detection rows produced by spatial platforms, the per-detection geom is platform-specific. Acoustic detections inherit the instrument's location (the bird was in earshot of the PUC, so the PUC's geom is the right approximation). Camera-trap detections inherit similarly. A future GPS-collared-animal platform would produce per-detection geom from the device's reported coordinates, not the deployment site. The plugin's write_detection function carries this responsibility.
5.5 TimescaleDB infrastructure as DLD-aware
The Postgres substrate uses TimescaleDB hypertables with chunking on time. v0.1 predates this architectural choice. v2's DLD takes advantage of three TimescaleDB capabilities the substrate already supports — but does not yet exploit:
- Continuous aggregates — materialized views that incrementally maintain time-bucketed summaries. A
cagg_detection_5min(monitoring_source_id, time_bucket('5 min', time), species_key, presence)defined at provisioning time of the first BirdWeather instrument (or already, project-wide) accelerates Lab 06's phenology queries by orders of magnitude. DLD does not create cags directly, but the platform plugin's metadata declares which cags should exist for its data; a project-level setup script materializes them. - Compression policies — chunks older than a threshold (e.g. 90 days) are columnar-compressed. Phenology queries scan large time ranges and benefit materially from compressed reads. DLD respects whatever project-level policy is in force; the platform plugin can declare an optimal compression-after threshold for its data class.
- Retention policies — chunks older than a longer threshold (e.g. 5 years) can be dropped. v2 takes no position on retention but exposes a per-platform retention recommendation in the plugin contract; the operator decides whether to enable it.
The pattern here is: TimescaleDB infrastructure is project-level concerns that DLD informs but does not own. The plugin contract carries platform-specific recommendations; project setup scripts materialize them.
6. Privacy, ownership, and access
The four Macroscope domains divide along a privacy axis as cleanly as they divide along a topical one. Earth and Life data is about places — natural areas, ecosystems, monitoring sites — and is shareable by default within an authenticated research community. Home and Self data is about people and dwellings — bodies, households, daily lives — and is private by default to the data subject. The correlation is not coincidence: it tracks the subject of the instrument. v2 makes this alignment explicit and load-bearing in the substrate, the wizard, and the consumer query path.
This section specifies the access model. It is the first revision of TN-052 that addresses privacy as a first-class architectural concern; v0.1 had no Self or Home domain to consider, and the prior revisions of TN-052 v2 in this drafting session treated the four domains as a topical taxonomy without recognizing the privacy axis underneath it.
6.1 The privacy/domain alignment
The substrate hosts four domains. Their data subjects fall into three categories, and the categories carry their own access posture:
| Domain | Typical subject | Default visibility | Rationale |
|---|---|---|---|
| Earth (climate, weather, environment) | A curated place | Public-within-Macroscope | Natural-area data is research infrastructure; dense access enables cross-validation, comparative studies, planetary-science context |
| Life (biodiversity, taxonomy, ecology) | A curated place | Public-within-Macroscope | Same — community-science data benefits from broad researcher access; the species at a place are not anyone's private information |
| Home (built environment, household sensors) | A household / dwelling | Owner-only | Indoor air quality, energy use, occupancy — these reveal household routines and members' lives; they belong to the household |
| Self (personal health, behavior) | An individual person | Owner-only | HRV, sleep, weight, activity — these are the person's body data; they belong to the person |
This alignment is exact: every Earth or Life instrument has a place as its subject; every Home instrument has a household; every Self instrument has a person. The subject type is therefore a sufficient privacy classifier — DLD does not need a separate "domain" field on the instrument to determine default visibility.
6.2 The five visibility tiers
Every instrument carries a visibility value drawn from a closed enum. Five tiers, ordered from most open to most restricted, plus an implicit superuser bypass:
public_within_macroscope— any user authenticated to the MNG-2.1 platform can read this instrument's data. Default forsubject_type = 'place'. The natural posture for Earth/Life infrastructure that the project chooses to share with its research community.org_internal— restricted to users belonging to a named organization (a research lab, a stewardship team, a specific institution). Visibility metadata names the organization. Useful for unpublished study data, in-progress instrument calibration, organizational infrastructure not yet ready for community-wide release.study_cohort— restricted to users participating in a named research study, plus the study's investigators. Visibility metadata names the study. Time-bounded by the study's lifetime. Useful for personal-health research where participants opt their instruments into a specific study for a specific duration.owner_only— restricted to the instrument's owner (and superusers, and explicit grantees per §6.5). Default forsubject_type IN ('person', 'household'). The natural posture for Self/Home data.superuser(implicit) — Mike's god-level access. Not a stored visibility tier; an attribute of the requesting user that bypasses visibility filtering across all rows. Load-bearing for development, debugging, substrate operation, migration, and operational support. Implemented by checkingnexus.users.role IN ('superuser', 'admin')ahead of all visibility predicates.
The wizard exposes the first four tiers in Step 3 (Instrument Configuration). The default is computed from subject_type and the user can override only if their role permits it (a non-admin researcher provisioning a Self instrument cannot set its visibility to public_within_macroscope; an admin provisioning a Macroscope-managed place instrument can choose any of the first four).
6.3 subject_type as the primary privacy classifier
subject_type was introduced in §3.2 as a routing field for the cross-Lab call API. v2 elevates it to the primary privacy classifier. The closed enum:
subject_type |
Subject table | Default visibility | Default owner |
|---|---|---|---|
'place' |
nexus.places |
public_within_macroscope |
Project user (typically the admin who provisioned) |
'person' |
nexus.users (the person whose body is monitored) |
owner_only |
The person themselves |
'household' |
nexus.households (new — see §6.4) |
owner_only |
A designated household head, with member grants |
Cross-domain platforms (a Tempest indoors at Owl Farm vs. outdoors at Canemah) carry different subject_types in the two deployments. The platform plugin doesn't change; the deployment row in public.monitoring_source carries the subject_type that determines its access posture.
A platform plugin declares the subject_types it accepts. Open-Meteo accepts 'place' only. A consumer wearable like Apple Watch accepts 'person' only. An indoor air-quality monitor like Airthings accepts 'household' (its primary intended use) and may also accept 'place' (when deployed at a research site for environmental monitoring of an indoor habitat). The plugin's accepts_subject_types declaration in §7 is what DLD uses to filter platforms in Step 2 of the wizard.
6.4 Substrate schema additions
Three changes to existing tables, plus one new table.
public.monitoring_source — add ownership and visibility:
ALTER TABLE public.monitoring_source
ADD COLUMN owner_user_id integer NOT NULL REFERENCES nexus.users(id),
ADD COLUMN visibility text NOT NULL CHECK (visibility IN
('public_within_macroscope', 'org_internal', 'study_cohort', 'owner_only')),
ADD COLUMN visibility_metadata jsonb DEFAULT '{}'::jsonb NOT NULL,
ADD COLUMN subject_type text NOT NULL CHECK (subject_type IN
('place', 'person', 'household')),
ADD COLUMN subject_id_int integer; -- for place/person/household refs
CREATE INDEX idx_monitoring_source_owner ON public.monitoring_source (owner_user_id);
CREATE INDEX idx_monitoring_source_subject ON public.monitoring_source (subject_type, subject_id_int);
subject_id_int is the int reference to the appropriate subject table (resolved by subject_type). The existing place_id uuid column is preserved for backward compatibility but new instruments use subject_id_int exclusively; the migration in §6.8 backfills both for existing rows.
public.monitoring_source — backfill on existing rows:
-- Every pre-existing row is a place-subject Earth/Life instrument
-- under the project user's ownership, public-within-Macroscope visibility.
UPDATE public.monitoring_source
SET owner_user_id = (SELECT id FROM nexus.users WHERE username = 'mhamilton'),
visibility = 'public_within_macroscope',
subject_type = 'place',
subject_id_int = (SELECT id FROM nexus.places WHERE id = ms.place_id::uuid::text::int)
WHERE owner_user_id IS NULL;
The exact backfill query depends on how the existing UUID place_id maps to the int nexus.places.id — likely via a dictionary table populated during the bridge migration. The point is: pre-existing instruments get the correct posture without manual intervention.
nexus.households — new table:
CREATE TABLE nexus.households (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL, -- e.g. "Owl Farm Dwelling"
slug VARCHAR(100) NOT NULL UNIQUE,
address_obfuscated TEXT, -- city/state level only; no street address in substrate
primary_owner_user_id INTEGER NOT NULL REFERENCES nexus.users(id),
place_id INTEGER REFERENCES nexus.places(id), -- optional link to the curated place hosting this household
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
Households are first-class but light: just enough identity to serve as a subject for Home-domain instruments. Member grants (housemates, family, study collaborators) live in the grants table (§6.5) rather than in the household row. Future revisions may add a nexus.household_members join table; v2 keeps it minimal.
nexus.organizations — add a users membership table:
CREATE TABLE nexus.organization_members (
id SERIAL PRIMARY KEY,
organization_id INTEGER NOT NULL REFERENCES nexus.organizations(id),
user_id INTEGER NOT NULL REFERENCES nexus.users(id),
role TEXT NOT NULL DEFAULT 'member',
granted_by INTEGER REFERENCES nexus.users(id),
granted_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (organization_id, user_id)
);
Required for visibility = 'org_internal' enforcement: the row-level security policy (§6.6) must be able to test "is this user a member of the organization referenced in visibility_metadata->>'organization_id'?"
6.5 The grants table — cross-tier sharing
Default visibility tiers cover the common cases. The grants table covers the exceptions:
CREATE TABLE nexus.monitoring_source_grants (
id BIGSERIAL PRIMARY KEY,
monitoring_source_id uuid NOT NULL REFERENCES public.monitoring_source(id),
grantee_user_id integer NOT NULL REFERENCES nexus.users(id),
grant_type text NOT NULL CHECK (grant_type IN
('explicit', 'study_cohort', 'household_member')),
grant_metadata jsonb DEFAULT '{}'::jsonb NOT NULL, -- e.g. {"study_id": 42}
granted_by_user_id integer NOT NULL REFERENCES nexus.users(id),
granted_at TIMESTAMPTZ NOT NULL DEFAULT now(),
expires_at TIMESTAMPTZ, -- NULL = no expiration
revoked_at TIMESTAMPTZ,
revoked_by_user_id integer REFERENCES nexus.users(id),
notes text,
UNIQUE (monitoring_source_id, grantee_user_id, grant_type)
);
CREATE INDEX idx_msg_grantee ON nexus.monitoring_source_grants (grantee_user_id) WHERE revoked_at IS NULL;
CREATE INDEX idx_msg_expires ON nexus.monitoring_source_grants (expires_at) WHERE expires_at IS NOT NULL AND revoked_at IS NULL;
Three grant types:
explicit— a one-off grant by the owner to a specific user. Mike grants his cardiologist read access to his HRV stream for three months. Time-bounded (expires_at), revocable.study_cohort— a participant opts an instrument into a named study. The study's investigators inherit access for the study's duration. A research participant adds their Apple Watch to "Sleep & Pollen Exposure 2026" by settingvisibility = 'study_cohort'plus a grant of typestudy_cohortlinking their user_id to the study.household_member— a household head grants a co-resident access to a household instrument. Mike grants Merry access to Owl Farm's smart-meter readings as a household_member. Long-lived; usually revoked only when membership changes.
The expires_at column provides cheap time-bounding. The revoked_at column provides cheap revocation without losing the audit history of when access was granted in the first place.
A note on retroactive access. When a grant is created, does the grantee gain access to data from before the grant existed? v2's default is yes — the grant covers all data from the instrument, past and future. Studies can opt for a different policy by including {"data_window_start": "2026-01-01"} in grant_metadata and relying on the row-level security policy to enforce the window. This is documented but not enforced by default; the substrate and the row-level security policy do the right thing for the common case (full access) without per-row policy logic for the exceptional case.
6.6 Row-level security as the enforcement layer
Application-layer access checks are fragile: every new query path has to remember to filter, and every forgotten filter is a leak. The Postgres-native answer is row-level security policies on the substrate hypertables themselves — once enabled, no query (regardless of which Lab issued it, regardless of which bridge primitive ran it, regardless of which API endpoint composed it) can return rows the requesting user shouldn't see.
DLD requires three policy patterns to be in place. The exact SQL is verbose; what follows sketches the shape.
On public.monitoring_source (catalog table):
ALTER TABLE public.monitoring_source ENABLE ROW LEVEL SECURITY;
CREATE POLICY monitoring_source_visible_to_user ON public.monitoring_source
FOR SELECT
USING (
-- superuser bypass
(SELECT role FROM nexus.users WHERE id = current_setting('macroscope.current_user_id')::int)
IN ('superuser', 'admin')
OR
-- public-within-Macroscope: any authenticated user
(visibility = 'public_within_macroscope')
OR
-- org_internal: user must be a member of the named organization
(visibility = 'org_internal' AND EXISTS (
SELECT 1 FROM nexus.organization_members om
WHERE om.user_id = current_setting('macroscope.current_user_id')::int
AND om.organization_id = (visibility_metadata->>'organization_id')::int
))
OR
-- study_cohort: user must have a grant of type study_cohort for this source
(visibility = 'study_cohort' AND EXISTS (
SELECT 1 FROM nexus.monitoring_source_grants g
WHERE g.monitoring_source_id = monitoring_source.id
AND g.grantee_user_id = current_setting('macroscope.current_user_id')::int
AND g.grant_type = 'study_cohort'
AND g.revoked_at IS NULL
AND (g.expires_at IS NULL OR g.expires_at > now())
))
OR
-- owner_only: user must be the owner OR have an explicit/household_member grant
(visibility = 'owner_only' AND (
owner_user_id = current_setting('macroscope.current_user_id')::int
OR EXISTS (
SELECT 1 FROM nexus.monitoring_source_grants g
WHERE g.monitoring_source_id = monitoring_source.id
AND g.grantee_user_id = current_setting('macroscope.current_user_id')::int
AND g.grant_type IN ('explicit', 'household_member')
AND g.revoked_at IS NULL
AND (g.expires_at IS NULL OR g.expires_at > now())
)
))
);
On public.reading and public.detection (data hypertables): the policy delegates to the catalog policy:
ALTER TABLE public.reading ENABLE ROW LEVEL SECURITY;
CREATE POLICY reading_visible_via_source ON public.reading
FOR SELECT
USING (
EXISTS (SELECT 1 FROM public.monitoring_source ms
WHERE ms.id = reading.monitoring_source_id)
);
The monitoring_source policy enforces visibility; the reading and detection policies inherit it via the EXISTS subquery. A query against public.reading returns only rows whose source the user can see, transparently. Lab 06 doesn't need to filter; the bridge primitives don't need to filter; the consumer Labs don't need to filter. The substrate is the enforcement layer.
The current_setting('macroscope.current_user_id') pattern. The application sets a Postgres session variable at the start of every request, identifying the requesting user. The RLS policies read this variable. The variable is per-session, so concurrent requests don't interfere. Setting it requires the application to authenticate the user before opening the connection — which auth_check.php already does. The application's job is to call SELECT set_config('macroscope.current_user_id', '$user_id', false) once per session; the substrate handles the rest.
6.7 DLD's two wizard flavors
Provisioning an Earth/Life instrument and provisioning a Self/Home instrument differ in who is doing the provisioning and for whose data. DLD's modal UX accommodates both with the same plumbing but different framings:
Admin-facing flavor (subject_type = 'place'):
- Headline: "Provision a new instrument"
- Subject panel: shows the place with map, ecological context, existing instruments
- Tone: institutional, technical
- Default owner: the project user (Macroscope's admin account)
- Default visibility:
public_within_macroscope - Visibility selector: admin can choose any of the four stored tiers
- Trust posture: admin is acting on behalf of the project; data is research infrastructure
Participant-facing flavor (subject_type IN ('person', 'household')):
- Headline: "Connect a new device" (Self) or "Add a household sensor" (Home)
- Subject panel: shows the participant's name and a brief description of what the data will be used for; a household subject panel shows the dwelling name and the participant's role (head / member)
- Tone: personal, consent-aware
- Default owner: the participant themselves (their
nexus.users.id) - Default visibility:
owner_only - Visibility selector: participant can choose owner_only (default) or study_cohort (if they're in a study); admin/researcher cannot raise visibility above what the participant chooses
- Trust posture: the participant is the data subject; their consent is the substrate write's authority
The plumbing — substrate writes, collector registration, reading-vs-detection routing, idempotency — is identical. The framing differs because the meaning of the action differs. Admin-facing provisioning is project infrastructure. Participant-facing provisioning is informed consent.
A non-admin researcher cannot provision a participant's instrument. A participant can provision their own. Mike (superuser) can do either. The wizard's permission model gates which flavor is reachable based on the requesting user's role and the chosen subject_type + subject_id.
6.8 DLD as the consent boundary for the MySQL → macroscope_v2 migration
The pre-existing Owl Farm Home-domain streams currently in MySQL — and any other personal-data streams that exist in the legacy substrate — should land in macroscope_v2 with the correct subject_type, owner, and visibility from day one. The wrong default is "import as place-subject, public-within-Macroscope, owned by the project" because it would silently expose private data to every authenticated user until someone remembered to revisit each row.
DLD is the natural place for the migration to do the right thing. The proposed pattern:
- The bridge collector's MySQL → Postgres replication path (CNL-TN-2026-066) detects pre-existing rows in the legacy
monitoring_sourcestable that lack a subject_type assignment. - Rather than importing them with defaults, the bridge holds them in a quarantine table (
nexus.migration_pending_classification) with the legacy metadata. - An admin (Mike) walks each quarantined row through a streamlined DLD migration wizard: confirm subject_type, confirm owner, confirm visibility, confirm any grants. The DLD wizard surfaces the legacy metadata to inform each decision but does not auto-fill subject_type for rows it cannot be confident about.
- On confirm, DLD writes the row to
public.monitoring_sourcewith the correct posture, marks the quarantine entry as resolved, and lets the bridge resume normal replication for the source's data.
This makes the migration's privacy correctness a deliberate human-in-the-loop step rather than an implicit consequence of import defaults. Mike sees every row before it lands in the unified substrate. Wrong classifications are caught at the boundary, not after they've leaked.
For Owl Farm's Home streams specifically: the indoor Tempest, indoor Ecowitt, and any Airthings or AmbientWeather indoor instruments would each pass through the migration wizard, get classified as subject_type = 'household', get owner-assigned to Merry's nexus.users row (after Merry is provisioned as a user), and get visibility = owner_only. Mike's superuser bypass means he sees the data; nobody else does until grants are created.
6.9 Worked example — Owl Farm as a multi-posture site
Owl Farm is the canonical worked example because it hosts all three subject_types simultaneously. Following migration:
nexus.places.id = 12 — "Owl Farm" (curated place)
nexus.households.id = 1 — "Owl Farm Dwelling"
primary_owner_user_id → nexus.users (Merry)
place_id → nexus.places.id = 12
nexus.users.id = 2 — Merry
nexus.users.id = 1 — Mike (superuser)
Six instruments, three subject_types, three visibilities:
| Instrument | platform_type | subject_type | subject_id | owner_user_id | visibility | Who can see |
|---|---|---|---|---|---|---|
| AVR-1 (BirdWeather PUC) | birdweather_station | place | 12 | 1 (Mike, project user) | public_within_macroscope | Any authenticated user |
| AVR-2 (BirdWeather PUC) | birdweather_station | place | 12 | 1 | public_within_macroscope | Any authenticated user |
| AVR-3 (BirdWeather PUC) | birdweather_station | place | 12 | 1 | public_within_macroscope | Any authenticated user |
| Outdoor Tempest | tempest_api | place | 12 | 1 | public_within_macroscope | Any authenticated user |
| Indoor Ecowitt | ecowitt | household | 1 | 2 (Merry) | owner_only | Merry; Mike (superuser) |
| Airthings (radon, CO₂) | airthings | household | 1 | 2 (Merry) | owner_only | Merry; Mike (superuser) |
A researcher logging in to MNG-2.1 and opening Lab 06 sees the three BirdWeather PUCs at Owl Farm in the place neighborhood, can include them in phenology analyses, gets a clean answer. The same researcher running a query that would combine the indoor Ecowitt's temperature with the BirdWeather acoustic data — a microclimate-vs-acoustic-activity study, say — gets BirdWeather data only; the Ecowitt rows are filtered out by the substrate before they reach the consumer query. No leak. No special-case code in Lab 06. The bridge primitive that ran the join silently respected RLS without knowing it was doing so.
Merry, logged in to MNG-2.1, can see her household instruments alongside the place instruments — her query returns all six. If she wants to share her Ecowitt with a building-science researcher for a month, she creates a grant: INSERT INTO nexus.monitoring_source_grants (monitoring_source_id, grantee_user_id, grant_type, expires_at, ...). The researcher's queries now return her Ecowitt data until the grant expires.
Mike, as superuser, sees everything. His personal MCP Collaboratory (running as Mike's user) has full access to the substrate including Merry's household data — a posture that is acceptable because the personal Collaboratory is Mike's own cognitive workspace and the household is a co-resident relationship. If at any point Merry wishes to revoke Mike's access to her data specifically, the existing grant model accommodates it (Mike's superuser bypass is itself revocable in the substrate's role assignments).
This is the architecture. One substrate, one DLD, one access model, three subject_types, three visibility tiers in active use. The privacy posture is built into the data, not bolted onto the consumers.
6.10 What this revision does not yet do
Three known gaps that v2 does not close, recorded for future revisions:
- Per-row grants. All grants in v2 are at the instrument level. A future revision might support per-row time-window grants (a researcher gets access to a participant's HRV data only for the dates of a sleep study, not the full history). This is enforceable through the row-level security policy by extending grant metadata; not specified here.
- Audit logging on data reads. Provisioning actions are audited (§3.4). Data reads are not. For some research and clinical contexts, read auditing is required; the substrate could log via Postgres triggers but v2 does not specify the schema.
- Consent revocation propagation. When a participant revokes a study grant, downstream artifacts (cached aggregations, exported figures, AI-generated narratives) may still contain the data. v2 specifies revocation at the substrate level; downstream cleanup is left to the consumer applications.
Recorded in §11 (open questions).
7. The collector
The collector is the operational tier that turns DLD's substrate writes into actual data flowing into public.reading and public.detection. v0.1 specified a single launchd PHP collector against MySQL on Galatea. v2 preserves the single-collector pattern but adapts every operational detail to the post-migration Postgres world, the cross-domain platform population, the privacy-aware substrate, and the still-deferred Phase 2 runtime decision.
7.1 Single collector vs. per-domain collectors
The naive scaling answer is "one collector per platform type" — an Open-Meteo daemon, a BirdWeather daemon, an Apple Health Kit daemon, and so on. At three platforms this is manageable; at fifteen it becomes operational debt. v0.1 §4.1 argued for a single collector that iterates over all active platforms and dispatches to platform-specific fetch functions. v2 keeps that argument: one collector process, one launchd registration, one log directory, one set of monitoring dashboards, one place to look when something is wrong.
The dispatch loop, expressed against the v2 substrate:
SELECT ms.id AS monitoring_source_uuid,
ms.source_type AS platform_type,
ms.subject_type,
ms.subject_id_int,
ms.metadata AS instrument_config,
sub.id AS subject_internal_id,
sub.lat, sub.lon -- via subject-specific join
FROM public.monitoring_source ms
LEFT JOIN nexus.places pl ON ms.subject_type = 'place' AND pl.id = ms.subject_id_int
LEFT JOIN nexus.households hh ON ms.subject_type = 'household' AND hh.id = ms.subject_id_int
LEFT JOIN nexus.users us ON ms.subject_type = 'person' AND us.id = ms.subject_id_int
WHERE ms.status = 'active'
AND ms.source_type IN (SELECT platform_type FROM public.platform_registry WHERE active = true);
For each row the collector dispatches to the platform plugin's fetch_and_write(monitoring_source_uuid, instrument_config, subject_context) function, which performs the platform-specific work and writes results to the appropriate hypertable. Plugin discovery is filesystem-based (one directory per platform; the plugin's manifest declares its platform_type); the platform_registry table is the on/off switch the operator uses to disable a platform without removing its plugin code.
7.2 Runtime decision (Phase 2)
v0.1 specified PHP plus launchd because the legacy MySQL substrate and the Macroscope worker idiom were both PHP. v2 does not re-affirm that choice. CNL-FN-2026-054 §6.3 explicitly defers the Phase 2 direct-collector runtime decision to be made on the merits of vendor API library ecosystems (BirdWeather has a Python GraphQL client; Apple Health Kit has a Swift SDK that's awkward to call from anything; Open-Meteo has lightweight HTTP clients in every language). The Macroscope project has Python operational on Data and Galatea (used by olmOCR-2 worker, the bridge collector recon scripts, the lookup-cache cutover scripts). The Phase 2 decision is most likely Python; v2 will not pre-commit.
What v2 does commit to: whatever runtime is chosen, it executes a single collector process on a single host (Data during development, Galatea in production), uses launchd for lifecycle management on macOS, exposes a structured log to /var/log/macroscope/dld_collector/, and surfaces health metrics via Postgres rows (a public.collector_cycle_log table written at the end of each cycle with rows touched, errors, duration). A future revision picks the runtime when a real platform that needs it lands in scope.
7.3 Acquisition cadence and backfill
This is the place your "rate of data acquisition" framing surfaces architecturally. Every instrument carries a cadence_minutes and a backfill_window_days value in its instrument_config metadata, set at provisioning time in DLD Step 3 and editable later through DLD's instrument-management UI. The collector consults these per row:
cadence_minutes— the inter-fetch interval. If the last successful fetch for an instrument was less thancadence_minutesago, skip it this cycle. Defaults from the platform plugin (Open-Meteo 60, Tempest 5, BirdWeather 15) but explicitly user-editable to accommodate research needs (a study might want sub-minute Tempest readings for a thermal-mass investigation; a casual Earth observatory might be content with hourly Tempest to reduce traffic).backfill_window_days— on first activation of an instrument or after extended downtime, how far back to attempt historical ingest. Bounded by the platform's declaredmax_backfill_days(BirdWeather GraphQL has tier-dependent retention; Apple Health Kit allows years; Open-Meteo allows decades).last_successful_fetch_at— collector-managed timestamp onpublic.monitoring_source, advanced after each successful cycle. Drives the cadence check above and serves as the resume point after offline periods.
The cadence is consulted on a base loop frequency that's faster than any individual instrument's cadence (default: collector wakes every 60 seconds). At each wake, the collector queries for instruments whose last_successful_fetch_at + cadence_minutes <= now() and processes them. An instrument with a 5-minute cadence is touched roughly every 5 minutes; an instrument with an hourly cadence is touched roughly every hour. The collector's per-cycle work is bounded; it does not try to do all instruments on every wake.
7.4 Offline-then-resume semantics
Data (the development host) is not 24/7. Galatea (the production target per CNL-FN-2026-054) is intended to be. v2 specifies the collector's behavior when the host has been offline:
On startup, the collector identifies instruments whose now() - last_successful_fetch_at > cadence_minutes (i.e. fell behind during the offline period). For each, it attempts a backfill fetch covering the gap, bounded by the platform's max_backfill_days and the instrument's backfill_window_days. The platform plugin's fetch_and_write function is called with an explicit time-range argument rather than the implicit "since last fetch" assumption. INSERT-IGNORE-style idempotency (ON CONFLICT DO NOTHING) ensures that re-fetching an already-ingested window produces no duplicates; this is the substrate-level guarantee that makes backfill safe.
After backfill, the instrument's last_successful_fetch_at advances to the end of the backfilled window, and the collector enters its normal cadence loop.
For Galatea production, this offline-resume capability is mostly a safety net (against power loss, OS updates, network outages). For Data development, it's the primary operating mode — Mike turns Data on in the morning, the collector wakes up, backfills the overnight gap from every active instrument, then resumes normal cadence. The same code path serves both cases.
7.5 Provenance flagging
Every reading or detection row written by the collector carries provenance metadata in the row's metadata jsonb field:
{
"ingested_by": "dld_collector",
"collector_cycle_id": 178293,
"platform_plugin_version": "birdweather/1.4.2",
"fetch_window_start": "2026-04-25T14:00:00Z",
"fetch_window_end": "2026-04-25T14:15:00Z"
}
This is light, costs essentially nothing in storage, and provides downstream consumers a clear audit trail: every row knows which collector cycle wrote it, which plugin version was responsible, and what window the fetch covered. When a plugin bug is discovered and a window of bad data needs to be re-ingested, the affected rows are identifiable by metadata->>'platform_plugin_version'. When a research finding traces back to a specific cycle's data, the collector_cycle_id is the identifier that joins to the cycle log for full operational context.
7.6 Error handling, retry, circuit breakers
Three failure modes deserve specification:
- Transient fetch failure (network blip, 503 from upstream, rate limit). The collector logs the failure, leaves
last_successful_fetch_atunchanged, and tries again on the next wake. No data loss; the instrument simply trails behind by one cadence interval. - Persistent fetch failure (credentials expired, upstream API removed, station deregistered). After N consecutive failures (default N=12, configurable per platform), the collector marks the instrument's
status = 'error', writes anerror_messageto its row, and stops trying until an operator intervenes via DLD's instrument-management UI. This is the circuit breaker — it prevents one broken instrument from filling the logs and consuming retry budget indefinitely. - Plugin runtime exception (the plugin's own code crashed). The collector catches at the dispatch layer, logs full traceback, increments the per-instrument failure counter, and continues to the next instrument. One broken plugin does not stop the cycle; the failure surfaces in the next cycle log row and in the instrument-management UI's status indicator.
Operator visibility into all three modes lives in the DLD instrument-management UI (TN-052 v0.1 §5.3, preserved in v2): green/yellow/red/gray indicators tied to a per-instrument view of recent collector activity. An instrument that has been red for 24 hours is the operator's signal to investigate.
7.7 RLS-aware collector
§6.6 establishes Postgres row-level security as the access enforcement layer for consumer queries. The collector is a producer — it writes to substrate tables, and the writes themselves carry no privacy posture (every row inherits the visibility of its parent monitoring_source). The collector does not need to read access-controlled data to do its job; it reads its own dispatch list (publicly visible to the collector's role) and writes new rows to hypertables.
The collector therefore runs under a dedicated Postgres role (dld_collector) that has:
SELECT, UPDATEonpublic.monitoring_source(to read the dispatch list and updatelast_successful_fetch_at)INSERTonpublic.reading,public.detection,public.collector_cycle_log(to write new data)- Membership in a
bypass_rlsgroup that exempts it from the row-level security policies on the source table (the collector needs to see every active instrument, including private ones, to fetch their data)
The bypass_rls exemption is the load-bearing privilege; without it the collector could not service Self/Home instruments (whose visibility excludes the collector role from default reads). The exemption is justified: the collector is infrastructure, not a user. It does not surface data to anyone — it only writes data into the substrate where the row-level security policies then govern who can read it back.
This separation — production privileged, consumption gated — is the cleanest version of the privacy story. The collector reads what it needs to ingest; the consumer queries see only what their user is permitted to see. Same substrate, different roles, different access posture.
8. Platform plugins
A platform plugin is the unit of code that knows how to talk to one upstream data source. v0.1 §3.1–3.3 named three plugins (openmeteo, wunderground, tempest_api) by example without specifying their interface. v2 specifies the plugin contract formally and provides four worked examples spanning Earth, Life, Home, and Self.
8.1 The substrate-native plugin contract
A plugin is a directory under dld/platforms/{platform_type}/ containing:
manifest.json— declarative metadata read by DLD and by the collector at startup.fetch_and_write.<ext>— the runtime entry point called by the collector for each scheduled instrument.discover.<ext>— the discovery routine called by DLD's wizard Step 2 to list candidate instruments for a subject.register_instrument.<ext>— optional; called by DLD at provisioning time for any platform-side setup (OAuth, webhook registration, validation).README.md— human-readable description, credentials, known issues.
The file extension depends on the Phase 2 runtime decision (.py if Python, .php if PHP). The interfaces are the same regardless.
The manifest:
{
"platform_type": "birdweather_station",
"platform_name": "BirdWeather PUC / BirdNET-Pi",
"description": "Acoustic species detection from BirdWeather-registered stations.",
"produces": ["detection"],
"accepts_subject_types": ["place"],
"tier_default": 1,
"default_cadence_minutes": 15,
"default_backfill_window_days": 30,
"max_backfill_days": 365,
"discovery_kind": "spatial",
"credential_kind": "none",
"strata_plugin_path": "strata/plugins/birdweather.plugin.<ext>",
"version": "1.4.2"
}
Field meanings:
produces— array of'reading'and/or'detection'indicating which hypertables the plugin writes to.accepts_subject_types— array of'place' | 'person' | 'household'indicating which subject types are valid for this platform.tier_default— 1, 2, or 3, the canonical tier for a typical deployment of this platform (informational; the actual tier of an instance is implicit in its ownership posture).discovery_kind—'spatial'(subject must be place- or household-located, discovery uses lat/lon),'identity'(subject must be person, discovery uses identity proof such as OAuth),'global'(universally available; no discovery needed), or'none'(manual entry only).credential_kind—'none' | 'api_key' | 'oauth' | 'device_id', drives the wizard's Step 4 credential prompt.
8.2 Required runtime interfaces
Every plugin implements three callable units. The signatures are runtime-agnostic; the bodies are runtime-specific.
discover(subject_type, subject_context, search_params) -> [candidate_instrument]
subject_type: 'place' | 'person' | 'household'
subject_context: dict including lat, lon, timezone, name, etc. (subject-type-specific)
search_params: dict including radius_km (for spatial), study_id (for cohort), etc.
Returns a list of candidate dicts ready for DLD wizard Step 3 display.
register_instrument(monitoring_source_uuid, candidate, instrument_config) -> {success, message}
Called once at provisioning time. Performs platform-side setup (OAuth flow,
webhook registration, credential validation). Idempotent on re-call.
Returns success status and human-readable message for the wizard.
fetch_and_write(monitoring_source_uuid, instrument_config, subject_context, fetch_window) -> {rows_written, errors}
Called on every collector cycle for an active, scheduled instrument.
Fetches data for the requested window (or the implicit since-last-fetch
window if fetch_window is None), writes to public.reading and/or
public.detection with ON CONFLICT DO NOTHING, returns counts.
The plugin owns the platform-specific logic — what API to call, how to paginate, how to map upstream fields to substrate columns, what idempotency key to use. The collector owns the cycle scheduling, error handling, and provenance metadata. Clean separation; small surface area; replaceable plugins.
8.3 Worked example A — openmeteo (Earth, Tier 3)
The simplest case. discovery_kind = 'global' — any place is a valid subject; no upstream discovery needed. credential_kind = 'none' for the free tier; 'api_key' if using Macroscope's Pro-tier key (read from ~/Sites-secure/dld/openmeteo.key).
discover returns a single candidate per call: {platform_type: 'openmeteo', name: '{place_name} — Open-Meteo', external_id: '{lat},{lon}'}. There is nothing to choose; the wizard skips Step 2's candidate-list view and jumps to Step 3 with this one candidate pre-selected.
fetch_and_write calls the Open-Meteo current-conditions API for {lat, lon, hourly_variables_list}, parses the JSON response into rows, writes them to public.reading keyed by (monitoring_source_id, time) with ON CONFLICT DO NOTHING. Backfill mode calls the Open-Meteo Historical API instead of current-conditions, which has different pagination semantics; the plugin handles both.
Variable mapping is plugin-internal: Open-Meteo's temperature_2m → substrate's temperature_c; Open-Meteo's relative_humidity_2m → substrate's relative_humidity. Drift in Open-Meteo's response shape requires a plugin update, not a substrate migration.
8.4 Worked example B — birdweather_station (Life, Tier 1 or 2)
The platform plugin Lab 06 needs. Spatial discovery: given a place subject with lat/lon, query BirdWeather GraphQL for stations in a bounding box. discover returns all candidates (Macroscope-managed and external) with distance from subject and last-seen freshness; the wizard's Step 2 presents the list and lets the user select one or more.
Idempotency key: (monitoring_source_id, time, external_observation_id) where external_observation_id is BirdWeather's per-detection identifier. Re-fetching an already-ingested time window produces no duplicates.
fetch_and_write calls BirdWeather GraphQL paginated through detection events for the instrument's external station id within the requested time window; for each detection, it writes a row to public.detection with detection_type = 'acoustic', taxon_id from BirdWeather's species mapping (or NULL with metadata->>'scientific_name' populated when the species hasn't been resolved into the local taxon table), confidence from BirdWeather, external_url linking back to the detection on app.birdweather.com, and geom set to the station's coordinates.
Tier-1 deployments (Macroscope's own PUCs) use a credential-authenticated GraphQL endpoint with full historical access. Tier-2 deployments (external stations) use the public GraphQL endpoint with retention bounded by the source station's BirdWeather subscription tier — the plugin discovers and surfaces this constraint to the wizard's Step 3 so the user understands what backfill window is achievable.
This plugin is the dependency Lab 06 (CNL-SP-2026-067) names as a precondition.
8.5 Worked example C — apple_health (Self, Tier 1)
The trickiest case because Apple Health Kit is not a network API — it is an iOS-side framework. The plugin is therefore a receiver rather than a fetcher: a Swift companion app on the participant's iPhone (built once, distributed via TestFlight or sideload) periodically exports recent HealthKit samples to a Macroscope endpoint, which the plugin's fetch_and_write reads from a staging area and writes to public.reading.
discovery_kind = 'identity'. discover does not enumerate candidates from an external API; it returns one candidate per authenticated user who has the companion app paired with the Macroscope account. The wizard's Step 2 lists "Apple Watch (paired with iPhone XXXX...)" if pairing is detected, otherwise prompts the user to install the companion app and rerun.
credential_kind = 'oauth' in spirit — the pairing handshake establishes a per-user secret used to authenticate companion-app uploads.
register_instrument registers the user's pairing token on the Macroscope side and returns instructions for the user to grant HealthKit permissions inside the iPhone app.
fetch_and_write reads from a staging table (populated by the companion app's uploads) and idempotently writes to public.reading. The companion-app design is out of scope for v2; this section specifies the plugin's substrate-side contract.
Subject_type for this platform is 'person'; the owner is the user themselves; default visibility is 'owner_only'. The DLD wizard runs in participant-facing flavor (§6.7) — tone is "Connect your Apple Watch," not "Provision a sensor."
8.6 Worked example D — airthings (Home, Tier 1)
The Home-domain anchor. Airthings exposes a cloud API (api.airthings.com) with OAuth authentication and per-device endpoints for radon, CO₂, VOCs, PM2.5 (on Wave Plus units), temperature, and humidity. The plugin authenticates against the API on behalf of the household owner and pulls readings on a 30-minute cadence (Airthings doesn't update faster than that).
Spatial discovery is not the right model for Airthings — the Airthings cloud doesn't expose other people's devices to nearby queries. discovery_kind = 'identity': discover returns the list of devices the authenticated household owner owns in their Airthings account. accepts_subject_types = ['household'] only.
Provisioning flow runs in participant-facing flavor (§6.7) — the household owner (Merry, in Owl Farm's case) opens the DLD wizard from inside the Workflow Designer, connects her Airthings account via OAuth, picks the device(s) she wants to provision into Macroscope, sets cadence, and confirms. Each Airthings device becomes one public.monitoring_source row with subject_type = 'household', owner_user_id = Merry, visibility = 'owner_only'.
fetch_and_write calls Airthings' device-readings endpoint for the requested window, writes one row per measurement to public.reading keyed by (monitoring_source_id, time). Idempotent on re-fetch.
This plugin is the concrete grounding of the Owl Farm worked example in §6.9.
9. STRATA integration
The STRATA intelligence envelope (CNL-TN-2026-044) wraps any registered sensor with temporal analysis, narrative generation, anomaly detection, and tool-calling endpoints. v0.1 §7 specified how virtual instruments provisioned by DLD enter the STRATA contract. v2 re-articulates this for the post-migration substrate, the cross-domain platform population, and the privacy-aware access model.
9.1 The plugin contract on the new substrate
The STRATA sensor plugin contract from CNL-TN-2026-044 §9 takes a manifest declaring how STRATA should treat a platform's data:
return [
'platform_type' => 'birdweather_station',
'platform_name' => 'BirdWeather PUC',
'domain' => 'LIFE',
'table' => 'public.detection', // v0.1: vendor-specific table; v2: canonical hypertable
'timestamp_col' => 'time',
'platform_col' => 'monitoring_source_id', // v0.1: 'platform_id'; v2: 'monitoring_source_id'
'event_class' => 'detection', // new in v2: 'reading' or 'detection'
'freshness_minutes' => 30,
'observer_class' => 'community', // adapts per-instrument; default from manifest
'metrics_or_events' => [ // v0.1: 'metrics'; v2: same shape, broader name
'detection' => [
'label' => 'Acoustic detection',
'type' => 'event',
'category' => 'biodiversity',
'narrative' => 'species_observed',
'soma_weight' => 1.0,
'taxon_field' => 'taxon_id', // for detection-type events
'confidence_field' => 'confidence',
],
],
];
Three v0.1-vs-v2 changes: table references canonical hypertables instead of vendor-specific ones; platform_col is monitoring_source_id (UUID) instead of platform_id (int); event_class is new and indicates whether STRATA's downstream analysis should treat this platform's rows as continuous metrics (reading) or discrete events (detection). The metrics schema generalizes to event schemas as appropriate (taxon, confidence for acoustic; activity_type, distance for Strava workouts; etc.).
DLD does not implement STRATA plugins itself. DLD's platform plugin (§8) declares a strata_plugin_path in its manifest; the STRATA plugin lives at that path and is registered with STRATA at provisioning time via STRATA's existing plugin-loading mechanism.
9.2 Cross-tier validation
v0.1 §7.2 specified cross-tier validation for Earth platforms (Open-Meteo vs. Tempest, Tempest vs. WU neighbor). v2 generalizes to all four domains:
- Earth: Tempest (Tier 1) vs. WU PWS (Tier 2) vs. Open-Meteo (Tier 3) — already operational at Canemah; the bias characterization in CNL-TN-2026-023 §4.4 is the prototype for what STRATA does automatically across domains.
- Life: Macroscope BirdWeather PUC (Tier 1) vs. neighborhood BirdNET-Pi (Tier 2) vs. BirdCast migration density (Tier 3). When all three agree on a species spike at the right time of year, confidence in the migration event is high; when they diverge, the divergence is itself the signal worth investigating.
- Home: Indoor Tempest (Tier 1) vs. modeled indoor comfort (Tier 3). Persistent divergence reveals the dwelling's thermal-mass behavior — an automatic, continuous version of what would otherwise be a one-off energy audit.
- Self: Apple Watch HRV (Tier 1) vs. Oura HRV (Tier 1, alternative modality) vs. Apple Health Kit aggregated stress (Tier 3). Cross-modality divergence is a measurement-artifact signal; cross-modality agreement strengthens any downstream inference.
STRATA performs cross-tier validation as a derived analysis whenever multiple tiers exist for a single subject. The mechanism is the same in every domain: subscribe to public.reading and public.detection rows that share a subject_type + subject_id, compute domain-appropriate divergence statistics, surface persistent divergence as anomaly events, surface persistent agreement as confidence boosts on derived narratives.
9.3 Domain-specific intelligence envelopes
STRATA's nine-window temporal analysis (CNL-TN-2026-044 §4.1) was specified against weather data. It generalizes — but not uniformly — across the four domains:
- Earth: the original specification. Hourly to multi-decadal windows, climate baselines, anomaly detection against historical norms.
- Life: phenology windows. The relevant temporal axes are seasonal-cycle position and year-over-year drift, not hour-of-day. Lab 06's plot menu (CNL-SP-2026-067 §4) is the consumer-facing surface; STRATA's downstream analysis would compute phenological-shift statistics on the same data.
- Home: circadian and thermal-cycle windows. Daily occupancy patterns, weekly ventilation rhythms, seasonal heating and cooling cycles. STRATA characterizations of "this household is unusual relative to its history" are the household-scale analog of climate anomaly detection.
- Self: circadian and ultradian windows. Sleep cycles, HRV diurnal variation, activity bouts. STRATA characterizations of "this person's sleep architecture is unusual relative to their history" are the personal-scale analog. Critically, these characterizations are computed only against the data the STRATA worker has access to under §6.6 RLS — Mike's STRATA narratives over Mike's Self data work; a generic STRATA worker cannot compose them across users without explicit grants.
The intelligence envelope is the same; the temporal axes and the domain-specific narratives differ. STRATA platform plugins declare their domain ('EARTH' | 'LIFE' | 'HOME' | 'SELF') and STRATA's downstream analyzers use that declaration to pick the right envelope.
9.4 RLS interaction with STRATA queries
STRATA reads the same hypertables as any other consumer. It runs under a Postgres role (strata_worker) with no row-level security exemption — which means it sees only the data the requesting user is permitted to see. When a user opens a STRATA narrative for "this place" or "this person," the application sets macroscope.current_user_id to the requesting user's ID before issuing STRATA's queries, and the row-level security policy filters accordingly.
Practical consequences:
- A STRATA narrative composed for a research place returns the public-within-Macroscope readings and detections from that place's instruments — the same data any researcher would see.
- A STRATA narrative composed for Mike's personal Self data returns Mike's data when Mike is the requesting user (he's the owner); returns the same data when his cardiologist is requesting (because of an explicit grant); returns nothing when an unrelated researcher is requesting.
- A STRATA narrative composed for a research-cohort study returns the cohort's data to the study's investigators (
study_cohortvisibility plus matching grants); returns nothing to non-investigators.
The substrate is the enforcement layer. STRATA does not need to know about the access model — it issues queries and gets the rows it's allowed to see.
9.5 Cross-domain composition and consent
The most powerful STRATA narratives compose data across domains: "Mike's HRV (Self) drops on high-pollen days (Earth) when his Airthings indoor CO₂ is also elevated (Home), and the dawn chorus species mix is shifted (Life)." This composition is trivially possible when one user has access to all four contributing instruments — Mike has god-level access plus owns his Self/Home, so the composition succeeds end-to-end.
For other users, cross-domain composition is gated by which instruments their access spans. A study designed to compose participant Self + their household Home + their local Earth would require participants to opt in their relevant instruments to the study cohort (visibility = 'study_cohort', plus grants); the study's investigators then have the access required to compose. STRATA does not enforce this; it simply runs the composed query, and the substrate's row-level security returns only the rows the investigator can see. If the cohort's grants are incomplete, the composition silently degrades — the narrative will reflect what data is available, with a freshness/coverage indicator showing what's missing.
Cross-domain composition is therefore a function of (a) STRATA's analytical capability, (b) the substrate's row-level security, and (c) the deliberate consent posture of subject owners. The architecture supports it; the policy decides when it happens.
10. Schema additions for DLD v2
This section consolidates every schema change v2 introduces. Each migration is self-contained; the order matters because later migrations reference earlier ones. All migrations are intended to apply to a substrate that already has the nexus.* and public.* schemas from CNL-TN-2026-056 in place.
10.1 What persists from v0.1 schema
The following pre-existing tables are used unchanged:
nexus.users— auth identity (already exists; serves as the owner reference for every instrument).nexus.places— curated place catalog (already exists; serves assubject_type = 'place'target).nexus.monitoring_sources— legacy int-keyed shim (already exists; v2 preserves it for backward compatibility per CNL-TN-2026-066).public.monitoring_source— UUID-keyed substrate identity (already exists; v2 adds columns per §10.2 below).public.readingandpublic.detection— canonical hypertables (already exist; v2 adds RLS policies per §10.4 below).public.taxon— taxonomic registry (already exists; unchanged).nexus.organizations— already exists; v2 adds anorganization_membersjoin table per §10.3.
10.2 Newly required tables
Five new tables. Filed under schema/ per the existing convention:
schema/006_dld_v2_households.sql—nexus.households(subject table for Home-domain instruments). Specified in §6.4.schema/007_dld_v2_organization_members.sql—nexus.organization_members(membership join table required fororg_internalvisibility enforcement). Specified in §6.4.schema/008_dld_v2_monitoring_source_grants.sql—nexus.monitoring_source_grants(cross-tier sharing). Specified in §6.5.schema/009_dld_v2_workflow_audit_log.sql—nexus.workflow_audit_log(audit trail for provisioning actions). Specified in §3.4.schema/010_dld_v2_collector_metadata.sql—public.platform_registry(collector's platform on/off switch) andpublic.collector_cycle_log(per-cycle health metrics). Specified in §7.1 and §7.2.
10.3 Newly required column additions
Two ALTERs. Filed as their own migrations because they touch existing data:
schema/011_dld_v2_monitoring_source_columns.sql— addsowner_user_id,visibility,visibility_metadata,subject_type,subject_id_inttopublic.monitoring_source. Specified in §6.4.schema/012_dld_v2_monitoring_source_backfill.sql— backfills the new columns for pre-existing rows (every row becomes a place-subject Earth/Life instrument under the project user withpublic_within_macroscopevisibility, except those flagged for migration-wizard review per §6.8). Specified in §6.4 and §6.8.
10.4 Row-level security policies
Three policy installations. Each enables RLS on its target table and installs the policy that delegates to public.monitoring_source visibility:
schema/013_dld_v2_rls_monitoring_source.sql— primary policy onpublic.monitoring_source. The five-tier visibility check sketched in §6.6.schema/014_dld_v2_rls_reading.sql— delegating policy onpublic.reading(EXISTS check against monitoring_source).schema/015_dld_v2_rls_detection.sql— delegating policy onpublic.detection(EXISTS check against monitoring_source).
10.5 Roles and grants
Two Postgres roles need to exist for the v2 architecture:
dld_collector— the collector's identity. HasSELECT, UPDATEonpublic.monitoring_source,INSERTon hypertables andpublic.collector_cycle_log, and is a member of thebypass_rlsgroup. Specified in §7.7.strata_worker— STRATA's identity. HasSELECTon hypertables and substrate catalog tables but no RLS bypass. Setsmacroscope.current_user_idper session to the requesting user before issuing queries. Specified in §9.4.
These are filed as schema/016_dld_v2_roles.sql.
10.6 Quarantine table for the migration wizard
One additional table supports DLD-as-consent-boundary (§6.8) during the MySQL → macroscope_v2 migration. Lifecycle: populated by the bridge collector when it encounters a row of unclear privacy classification; drained by the migration wizard as Mike walks each through:
schema/017_dld_v2_migration_pending_classification.sql—nexus.migration_pending_classificationtable holding (legacy_source_id, legacy_metadata jsonb, awaiting_review_since timestamptz, resolved_by_user_id nullable, resolved_at nullable, resolved_to_monitoring_source_id nullable).
This table is dropped after the migration completes.
11. Implementation plan
The work ahead is sequenced into eight phases. Phases A–C deliver the substrate and the first end-to-end pattern; phases D–E unblock specific consumers (Lab 06, Owl Farm migration); phases F–H broaden coverage. Each phase ends with a verification step and a "proceed?" pause.
Phase A — Substrate readiness. Apply migrations §10.2–§10.6 to the macroscope_v2 substrate on Data. Create the dld_collector and strata_worker roles. Verify RLS policies against synthetic data: a place-subject row visible to any authenticated user, a household-subject row visible only to its owner and superusers, an org_internal row visible only to organization members. Verification scripts go in tests/dld_v2/rls/.
Phase B — Lab 07 shell + cross-Lab call API. Build the Workflow Designer Lab page at admin/lab/workflow-designer.php following the existing Lab pattern (header, breadcrumb, registration in nexus.lab_instruments). Build the JavaScript module admin/lab/js/workflow-designer-client.js exposing Macroscope.WorkflowDesigner.openDLD(payload) per §3.2. Validate the cross-Lab call by wiring a stub button into Lab 06 that opens an empty DLD modal.
Phase C — Open-Meteo plugin (proof of pattern). Build the openmeteo plugin per §8.3 in whichever runtime Phase 2 selects (likely Python). Build the collector framework that dispatches against public.monitoring_source per §7.1. Provision an Open-Meteo instrument for one curated place via the DLD wizard. Verify rows arriving in public.reading on the next cycle. This is the smallest end-to-end deliverable; it validates the substrate, the wizard, the collector, and the plugin contract together.
Phase D — BirdWeather plugin (unblocks Lab 06). Build the birdweather_station plugin per §8.4. Provision a Macroscope-managed PUC at Canemah (Tier 1 deployment) via DLD; verify acoustic detections arriving in public.detection. Provision an external neighborhood PUC (Tier 2) via DLD; verify the same. Wire Lab 06's external-PUC selection path to actually invoke DLD (closing the §7 hand-off in CNL-SP-2026-067 v0.4).
Phase E — Airthings plugin + Owl Farm Home migration. Build the airthings plugin per §8.6. Use it to migrate Owl Farm's existing Home-domain MySQL streams into macroscope_v2 via the migration wizard (§6.8). Validate the participant-facing wizard flavor by having Merry walk through the provisioning (or, in development, simulate Merry's role). Verify owner_only visibility — a non-superuser non-Merry test account cannot see her household data.
Phase F — Migration wizard for the broader MySQL → macroscope_v2 transition. Build the streamlined DLD migration wizard per §6.8 against nexus.migration_pending_classification. Walk every pre-existing legacy row through it. Validate that public Earth/Life rows land with public_within_macroscope visibility and personal Home/Self rows land with owner_only visibility. Drop the quarantine table.
Phase G — STRATA plugin updates for the new substrate. Update STRATA's existing plugin loaders to consume the v2 manifest format (canonical hypertables, UUID monitoring_source_id, event_class, RLS-aware queries). Validate that STRATA's nine-window analysis works against public.reading and public.detection for at least one platform per domain. This is the bridge between DLD v2's instrument provisioning and STRATA's intelligence envelope.
Phase H — Apple Health Kit + remaining Self platforms (longer horizon). Out of scope for the immediate Hatfield May 2026 horizon. Tracked here so the implementation plan is complete: design the iPhone companion app per §8.5 worked example, ship the apple_health plugin, then progressively add oura_ring, withings_scale, strava_activity, and the other Self platforms named in §4.1–§4.3.
Phases A–E are the critical path for the public-facing MNG-2.1 + Lab 06 launch. Phases F–G are necessary for the substrate unification but can run in parallel with consumer development. Phase H is post-Hatfield.
12. Open questions
Eleven open questions, organized by section.
Substrate and migration
- Phase 2 collector runtime decision. Python is most likely; the decision is deferred to whatever vendor SDK ecosystem dominates the first batch of plugins. Affects §7 specifications materially; revisit when Phase A is underway.
- Bridge collector handoff for personal-data streams. §6.8 specifies that the bridge holds personal-data rows in
nexus.migration_pending_classificationfor human review. The exact detection logic — what makes a legacy row "personal-data, needs review" vs. "public, import directly" — needs spec coordination with CNL-TN-2026-066. Likely heuristic: any row from ahome_*orself_*MySQL table is personal; any row from a station with a knownis_householdflag is personal; everything else is public. - Migration of
nexus.user_permissionssemantics. The existing permissions table grants users access to (place, domain) tuples. v2's grants are at the instrument level. A backfill conversion turns each existing user_permissions row into a set of monitoring_source_grants — but only for instruments at the granted place×domain. Verification needed.
DLD wizard and UX
- Lab 07 internal layout. §3.1 notes Lab 07 deviates from the standard three-pane Lab pattern because the work is procedural. The actual layout — instrument list + provisioning button on the home page; modal wizard for the provisioning flow itself — needs a lightweight design pass before Phase B.
- Cross-Lab call API ergonomics. §3.2 sketches a JavaScript Promise-based API. Whether modal, drawer, or inline panel is the right interaction pattern depends on calling-Lab context (Lab 06's left-panel might prefer a drawer; Climate Analyst's place-detail might prefer a modal). Decision deferred until Phase B has live testing.
- Participant-facing wizard discovery. §6.7 specifies a participant-facing wizard flavor. How does a participant reach it for the first time? Email invite link? Self-service signup at a known URL? Embedded in a study-onboarding flow? Out of scope for the immediate work but a real question for any non-Mike Self-domain user.
Privacy and consent
- Per-row time-window grants. §6.5 supports instrument-level grants only. A future revision should support row-level grants ("you can see my HRV data only for the dates of this study"). The substrate column to support this would live on the grants table; the row-level security policy would extend to check it.
- Read-event audit logging. §6.10 noted this gap. For some clinical or regulatory contexts, every read of personal data must be logged. Postgres triggers can do this; the schema and retention policy are unspecified.
- Consent revocation propagation. §6.10 again. When a participant revokes a study grant, downstream artifacts (cached aggregations, exported figures, AI narratives) may still contain the data. The substrate enforces revocation; consumer applications need to re-validate any cached state. Specifying this propagation is consumer-application-specific work.
Operational and edge cases
- Household membership model. §6.4 deferred multi-resident household membership to a future revision. Owl Farm has Mike and Merry; presumably Merry is the household head and Mike has a
household_membergrant on her household instruments. Whether Mike's superuser status implies he should also have explicithousehold_membergrants for audit cleanliness, or whether superuser bypass is sufficient, is a design call. - User deletion semantics. When a
nexus.usersrow is deleted (hard delete or status='retired'), what happens to instruments they own? Their grants? CurrentlyON DELETEsemantics on the foreign keys are unspecified. Likely answer: ON DELETE RESTRICT for owner_user_id (you must reassign ownership before deleting); ON DELETE CASCADE for grants where the deleted user was the grantee.
13. Out of scope for v2
Recorded so future revisions know what v2 deliberately did not address.
- Specific platform plugins beyond the four worked examples in §8.
wunderground,ecowitt,tempest_api,inat_project,birdcast,purpleair,oura_ring,withings_scale,strava_activity,utility_meter, and the dozen-plus other named platforms in §4 and §5.2 are expected but not specified in v2. Each requires its own plugin manifest and runtime entry points; the contract is in §8 but the implementations are deferred to phased adoption. - The STRATA intelligence envelope itself (CNL-TN-2026-044). v2 specifies how DLD-provisioned instruments enter the STRATA contract; it does not respecify STRATA. The two documents are siblings.
- Public-facing (non-authenticated) data access. v2 assumes every consumer is an authenticated MNG user. A future "public Observatory" surface that exposes Earth/Life data without authentication is out of scope.
- Mobile app for participants. §8.5 (
apple_health) refers to an iPhone companion app; that app's design is out of scope for v2. - Email/notification system for instrument status. §7.6 specifies operator-visible status indicators in the DLD UI but does not specify push notifications, email digests, or pager-style alerting on persistent failures. A useful future addition.
- Public investigation workflow (the surface CNL-SP-2026-062 §11 hints at). v2 puts DLD inside Lab 07 as the first occupant; the broader Workflow Designer's other future tools (scheduled batch jobs, investigation pipelines, agent runbooks) are deferred to their own specifications.
- Migration of the personal MCP Collaboratory's MCP server. The personal MCP Collaboratory (Projects/Workbench/Collaboratory/) currently runs against MySQL with its own MCP server. Its eventual migration to read from the unified Postgres substrate is out of scope for TN-052 v2 — this is a separate planning conversation about how Mike's personal cognitive workspace transitions while preserving the workflow patterns documented in CNL-TN-2026-047 v0.3.
- STRATA's domain-specific intelligence-envelope implementations beyond Earth. §9.3 sketches Life, Home, and Self envelope concepts; the actual analytical code for phenology windows, circadian windows, etc. is STRATA's work to specify in its own future revisions.
References
[1] Hamilton, M. P. (2026). The Climate Analyst Data Logger Designer. CNL-TN-2026-052 v0.1, Canemah Nature Laboratory. (Superseded by this document.)
[2] Hamilton, M. P. (2026). MNG 2.0 Schema Design. CNL-TN-2026-056, Canemah Nature Laboratory.
[3] Hamilton, M. P. (2026). MNG-2.1 Bridge-Permanent Completion. CNL-TN-2026-066, Canemah Nature Laboratory.
[4] Hamilton, M. P. (2026). MNG 2.0 Bridge Collector Spike (v0.2). CNL-FN-2026-054, Canemah Nature Laboratory.
[5] Hamilton, M. P. (2026). MNG-2.1 Phase 4 Step 1: Lookup Cache Cutover. CNL-TN-2026-063, Canemah Nature Laboratory.
[6] Hamilton, M. P. (2026). Galatea Infrastructure Transition. CNL-TN-2026-064, Canemah Nature Laboratory.
[7] Hamilton, M. P. (2026). Macroscope Engine Replacement. CNL-TN-2026-065, Canemah Nature Laboratory.
[8] Hamilton, M. P. (2026). MNG UI Functional Specification (v0.1). CNL-SP-2026-062, Canemah Nature Laboratory.
[9] Hamilton, M. P. (2026). Lab 06 — Acoustic Phenology Instrument (v0.4). CNL-SP-2026-067, Canemah Nature Laboratory. (First declared consumer of DLD v2 cross-Lab call API.)
[10] Hamilton, M. P. (2026). The Macroscope Collaboratory: Architectural Pivot to MCP-Hosted Investigation Framework (v0.3). CNL-TN-2026-047, Canemah Nature Laboratory. (Personal MCP Collaboratory architecture; informs DLD's eventual cross-architecture consumer model.)
[11] Hamilton, M. P. (2026). Ground-Truthing the Sky: Evaluating Free Weather APIs and Community Station Networks Against Field Station Instruments for Citizen Science Ecological Monitoring. CNL-TN-2026-023, Canemah Nature Laboratory. (Empirical grounding for the Earth-domain three-tier hierarchy.)
[12] Hamilton, M. P. (2026). STRATA Sensor Plugin Architecture. CNL-TN-2026-044, Canemah Nature Laboratory. (Plugin contract DLD-provisioned instruments enter at substrate write completion.)
[13] Hamilton, M. P. (2026). YEA Labs: A Research Instrument Suite for Quantitative Ecological Analysis. CNL-TN-2026-030, Canemah Nature Laboratory. (Place catalog and user-interface lineage.)
[14] Open-Meteo (2026). Open-Meteo Weather API. https://open-meteo.com (accessed April 2026).
[15] Weather Underground (2026). PWS Network Overview. IBM / The Weather Company. https://www.wunderground.com/pws/overview (accessed April 2026).
[16] BirdWeather (2026). BirdWeather Stations and GraphQL API. https://app.birdweather.com (accessed April 2026).
[17] WeatherFlow (2026). Tempest Weather System. WeatherFlow-Tempest, Inc. https://weatherflow.com/tempest-weather-system/ (accessed April 2026).
[18] Allen Institute (2025). olmOCR-2: Document Ingestion for Academic Apparatus. https://allenai.org (cited in CNL-TN-2026-049 for the Collaboratory document pipeline).
End of CNL-TN-2026-052 v2.0.
Cite This Document
BibTeX
Permanent URL: https://canemah.org/archive/document.php?id=CNL-TN-2026-052