CNL-WP-2026-022 Working Paper Viewing v2 — View latest (v6)

Toward a Large Sensor Model for Ecological Perception

Published: February 12, 2026 Version: 2

Toward a Large Sensor Model for Ecological Perception

Leveraging Crowdsourced Environmental Infrastructure for a Foundation Model of Ecosystem Dynamics

Document ID: CNL-FN-2026-022
Date: February 12, 2026
Author: Michael P. Hamilton, Ph.D.
Project: Macroscope Ecological Observatory


AI Assistance Disclosure: This research proposal was developed with assistance from Claude (Anthropic). The AI contributed to literature synthesis, conceptual framework development, and manuscript drafting through extended dialogue. The author takes full responsibility for the content, accuracy, and conclusions.


Abstract

Google DeepMind’s GraphCast demonstrated that machine learning models trained on historical atmospheric data can outperform physics-based weather forecasting on 90% of verification targets. SOMA (Stochastic Observatory for Mesh Awareness), operating at Canemah Nature Laboratory, demonstrated that energy-based models trained on ecological sensor data can detect cross-domain anomalies invisible to single-domain analysis. This proposal argues that these results converge on an unrealized architectural target: a Large Sensor Model (LSM) for multi-domain ecological perception—a foundation model trained not on text tokens but on continuous environmental sensor streams, learning the joint probability distribution over atmospheric and biological variables across geographic space.

Critically, the infrastructure to train such a model already exists. WeatherFlow’s Tempest network operates 85,000+ standardized weather stations worldwide. BirdWeather maintains approximately 2,000 active acoustic monitoring stations running BirdNET species classification. iNaturalist has accumulated over 250 million verifiable observations—170 million at research grade—spanning all major taxonomic groups with geolocated timestamps. These three platforms collectively provide the training corpus for a foundation model of ecosystem dynamics, requiring no new hardware deployment.

We propose a phased development path: beginning with paired-site validation (Canemah, Oregon and Bellingham, Washington), expanding through the UC Natural Reserve System’s 41-site network with existing climate monitoring infrastructure, and scaling to the full crowdsourced network. The resulting model would learn weather-biodiversity coupling across continental ecological gradients, producing predictions of ecosystem state whose deviations from observation constitute an anomaly detection system of unprecedented scope.


1. Introduction

1.1 The Problem

Ecological monitoring generates vast quantities of data with limited integration. Weather stations report atmospheric conditions. Acoustic monitors detect bird species. Camera traps record mammals. Citizen scientists photograph plants, insects, and fungi. Each stream is analyzed independently or, at best, correlated post hoc through statistical methods that require explicit hypothesis specification. No system learns the joint distribution over environmental and biological variables—the statistical structure of how ecosystems actually behave.

This limitation matters because ecological dynamics are fundamentally cross-domain. Barometric pressure affects bird vocalization patterns. Soil moisture modulates insect emergence. Photoperiod drives phenological transitions that cascade through food webs. These couplings are nonlinear, context-dependent, and vary across geography and season. Rule-based systems cannot enumerate them. Statistical correlation requires knowing what to look for. What is needed is a model that learns the structure from data, the way a field ecologist accumulates intuition through decades of observation.

1.2 The GraphCast Precedent

GraphCast (Lam et al. 2023), published in Science, demonstrated that a graph neural network trained on 39 years of atmospheric reanalysis data can predict 227 weather variables at 0.25° global resolution for 10-day horizons, outperforming physics-based forecasting on 90% of 1,380 verification targets. The model generates forecasts in under one minute on a single machine, versus hours on supercomputers for conventional numerical weather prediction.

GraphCast’s significance extends beyond weather. It proved that the statistical structure of a complex physical system—the atmosphere—can be learned from historical observations alone, without encoding any physics equations. The model operates on a graph neural network over an icosahedral mesh, where nodes represent spatial locations and edges encode how atmospheric state propagates between them. The learned weights embody the dynamics of a far-from-equilibrium thermodynamic system.

GenCast (Price et al. 2023) extended this with diffusion-based ensemble forecasting, generating probabilistic predictions that quantify forecast uncertainty—a capability directly aligned with the energy-based probabilistic framework already demonstrated in ecological contexts.

1.3 SOMA: Proof of Concept at Site Scale

SOMA (Hamilton 2026) implements three Restricted Boltzmann Machine meshes at Canemah Nature Laboratory, trained on 118 days of weather and acoustic biodiversity data. The ecosystem mesh—65 visible nodes encoding weather variables and species detection patterns, connected to 100 hidden nodes—successfully detected a cross-domain anomaly where weather and species conditions were individually unremarkable but their combination violated learned expectations.

This result established a critical principle: joint distribution modeling across ecological domains captures structure that domain-specific monitoring misses. The RBM weights encode learned correlations between atmospheric conditions and biological activity. When incoming sensor data violates those learned expectations, the mesh registers mathematical tension rather than calculating a derived metric—embodied perception rather than representational comparison.

1.4 The Convergence

GraphCast is, in essence, a Large Sensor Model scoped to a single domain (atmosphere) at planetary scale. SOMA is a Large Sensor Model scoped to multiple domains (atmosphere and biodiversity) at site scale. The architectural trajectory connecting them—from site-scale Boltzmann machines through stacked deep networks to graph neural networks operating across geographic space—is well-established in the machine learning literature.

What has been missing is the recognition that the training infrastructure for an ecological foundation model already exists, deployed and operating, generating data continuously, accessible through standardized APIs.


2. The Existing Infrastructure

2.1 WeatherFlow Tempest Network

WeatherFlow-Tempest operates over 85,000 consumer weather stations worldwide, producing billions of observational data points daily. Each station reports identical variables through a standardized API: temperature, humidity, barometric pressure, wind speed and direction, solar radiation, UV index, and precipitation at one-minute intervals. The stations are factory-calibrated consumer hardware—not research grade individually, but collectively powerful. Known biases (solar radiation heating from poor siting, wind obstruction) are characterizable, and a foundation model trained on thousands of stations learns to see through these biases because the signal is consistent while the noise is random across deployments. The network is dense in North America, Europe, and Australia, with growing coverage globally.

2.2 BirdWeather Acoustic Network

BirdWeather operates approximately 2,000 active acoustic monitoring stations globally, running continuous BirdNET neural network classification against audio streams. Each station reports species detections with timestamps, confidence scores, and species identification for over 6,000 recognized species. The PUC (Physical Universe Codec) hardware includes dual microphones, environmental sensors, GPS, and on-board neural processing. BirdNET-Pi stations running on Raspberry Pi hardware extend the network further. The system produces continuous presence/absence data for avian species, the most ecologically informative vertebrate taxon for phenological and community monitoring.

The first large-scale scientific use of the BirdWeather detection library—a study of light pollution effects on bird vocalization timing across species, space, and seasons—demonstrates the platform’s research utility.

2.3 iNaturalist Biodiversity Observations

iNaturalist has accumulated over 250 million verifiable observations from nearly 4 million observers worldwide, with approximately 170 million observations achieving research-grade identification through community consensus. The platform’s computer vision model recognizes over 112,000 taxa.

For the purposes of an ecological foundation model, iNaturalist provides what acoustic monitoring cannot: observations across all major taxonomic groups—plants, insects, fungi, amphibians, reptiles, and mammals—with geolocated timestamps and, critically, phenological annotations. Flowering dates, fruiting times, insect emergence, migration arrivals—the temporal structure of ecological communities is encoded in the collective observation record. The iNaturalist API provides programmatic access to research-grade observations filtered by taxon, location, date, and quality grade.

The data are episodic rather than continuous—clustered around population centers and weekends, biased toward charismatic taxa—but these biases are characterizable and the sheer volume provides statistical power. A foundation model trained on this data does not need every observation to be equally reliable; it needs the aggregate distribution to be ecologically meaningful.

2.4 UC Natural Reserve System

The UC Natural Reserve System (NRS) comprises 41 reserves covering over 750,000 acres across California, representing most of the state’s major habitat types. The NRS Climate Monitoring Network operates 35 standardized weather stations at 30 reserves, all feeding into the Dendra cyberinfrastructure platform for real-time data storage, retrieval, and management. Additionally, the California Heartbeat Initiative (CHI) has deployed research-grade sensor packages measuring weather, photosynthetically active radiation, leaf wetness, soil moisture, and sap flow across multiple reserves.

The NRS is actively developing AI-powered wildlife monitoring—acoustic recorders paired with camera traps using on-board species identification, with real-time data transmission via satellite uplink. The California Department of Fish and Wildlife has adopted this methodology for 42 sentinel sites statewide.

The NRS represents a bridge between crowdsourced consumer infrastructure and research-grade scientific monitoring. Its reserves span California’s ecological gradients—from coastal tidepools to alpine peaks, from redwood forests to inland deserts—with standardized instrumentation, professional maintenance, and long-term data archives. For a foundation model, NRS reserves provide ground truth against which crowdsourced data can be calibrated.


3. Architecture

3.1 Domain Scope: EARTH and LIFE

We restrict the initial model to two domains: EARTH (atmospheric and environmental conditions) and LIFE (biodiversity patterns across taxa). This scoping decision is deliberate. Weather and biodiversity sensors produce relatively standardized, well-characterized data streams. The physics of weather-biology coupling is universal—every ecosystem on Earth experiences it. And the crowdsourced infrastructure described above provides dense coverage for precisely these two domains.

Excluding the Macroscope’s HOME and SELF domains eliminates idiosyncratic, non-generalizable data streams while focusing on what transfers across sites, ecosystems, and investigators. A model that learns how atmosphere and biosphere couple is scientifically general. A model that includes one person’s indoor temperature and sleep patterns is not.

3.2 Geographic Graph Structure

Each monitoring location becomes a node in a geographic graph. The graph structure is provided by biogeography itself—sites within the same ecoregion share species pools, climate drivers, and seasonal patterns. Edges connect nearby stations, with edge weights reflecting ecological similarity (shared species, correlated climate) rather than geographic distance alone.

A Tempest station in Portland and a Tempest station in Medford share Pacific Northwest weather patterns but differ in species communities. A BirdWeather station at sea level and one at 5,000 feet may be geographically close but ecologically distant. The graph neural network learns these relationships from data, discovering which connections carry ecological information.

This is the GraphCast architecture adapted to ecology. Where GraphCast’s icosahedral mesh tiles Earth’s surface uniformly, the ecological graph is irregular—dense where monitoring stations cluster, sparse where they do not. The model must learn to interpolate across gaps, a capability that graph neural networks handle naturally through message passing between connected nodes.

3.3 Multi-Stream Input Encoding

Each node receives three classes of input, each with different temporal characteristics:

Continuous streams (Tempest): Weather variables at one-minute intervals. Temperature, humidity, pressure, wind, solar radiation, precipitation. These form the backbone temporal signal—the heartbeat of the EARTH domain.

Continuous acoustic streams (BirdWeather): Species detections at irregular intervals, aggregated into activity profiles per hour or per 15-minute window. Species presence/absence, detection confidence, vocalization timing. The LIFE domain’s continuous signal.

Episodic observations (iNaturalist): Species occurrences with timestamps, geolocated but temporally sparse. Plants, insects, fungi, mammals—taxa invisible to acoustic monitoring. Aggregated into monthly or seasonal phenological profiles per grid cell.

The architectural challenge is fusing these streams—continuous weather, continuous acoustics, and episodic community observations—into a coherent representation. Transformer architectures with learned temporal encodings handle variable sampling rates and irregular gaps naturally. Each input type receives its own positional encoding scheme: absolute timestamps for Tempest, event-based encoding for BirdWeather, and seasonal cyclical encoding for iNaturalist aggregates.

3.4 Temporal Hierarchy

Following the temporal topology described in CNL-TN-2026-014, the model embeds multiple timescales in its architecture:

Surface layer: Current conditions—the state of the atmosphere and biosphere right now. Updated with each sensor reading.

Diurnal layer: Daily patterns—dawn chorus timing, temperature cycling, nocturnal activity. Encodes what this time of day should feel like.

Seasonal layer: Phenological rhythms—when species should be active, what temperatures are normal for this week of the year. Encodes what this season should feel like.

Interannual layer: Climate context—ENSO state, long-term trends, multi-year baselines. Encodes what this year should feel like relative to the historical record.

Each layer provides context for the layers above it. A temperature anomaly at the surface layer is evaluated against the diurnal norm, the seasonal expectation, and the interannual trend simultaneously. The model does not report “temperature is 72°F” but rather “this February afternoon, in this La Niña year, feels warmer than it should.”

3.5 Output: Prediction and Anomaly

The model’s primary output is a predicted state vector for each node at the next time step—expected weather conditions and expected biological activity given the current state and all contextual layers. The predicted state spans all input dimensions: expected temperature, expected pressure, expected species activity by hour, expected phenological state.

The deviation between prediction and observation is the anomaly signal. Large deviations indicate surprise: the ecosystem is doing something it has never done before in this context. The deviation can be decomposed by domain (is the surprise in weather, in species, or in their coupling?), by timescale (is this a daily anomaly or a seasonal one?), and by spatial extent (is this local to one node or propagating across the graph?).

This is SOMA’s tension signal scaled to continental scope with deep temporal context.


4. Training Corpus

4.1 Scale

The combined data corpus is substantial:

Tempest: 85,000+ stations × one-minute readings × years of archive. At 15 variables per station, this represents billions of weather state vectors.

BirdWeather: 2,000+ stations × continuous detection streams × years of operation. Millions of species detection events with temporal and environmental context.

iNaturalist: 170+ million research-grade observations spanning all major taxa, geolocated and timestamped, with phenological annotations for plants.

UC NRS: 35 climate stations with professional-grade, quality-controlled data from 2013 onward, plus Dendra-managed sensor archives from individual reserves extending back decades in some cases.

4.2 Comparison to GraphCast

GraphCast trained on 39 years of ERA5 reanalysis data: 227 variables at ~1 million grid points, sampled every 6 hours. The ecological training corpus differs in structure—irregular spatial distribution, heterogeneous sampling rates, mixed continuous and episodic streams—but is comparable or larger in total information content.

The key advantage of ecological data is its lower entropy relative to atmospheric dynamics. The number of meaningfully different ecosystem states at a given location is far smaller than the number of meaningful atmospheric configurations. Weather is chaotic; ecosystems, while complex, are heavily constrained by biogeography, phenology, and energetics. The learnable manifold is more compact, suggesting that effective training may require less data per location than atmospheric modeling demands.

4.3 Data Quality and Bias

Consumer weather stations have known biases. BirdNET classifications carry false positive rates. iNaturalist observations cluster around cities and weekends. These limitations are real but manageable for three reasons.

First, systematic biases are characterizable and consistent within each platform. A foundation model trained on thousands of Tempest stations learns the systematic offset between consumer and research-grade measurements.

Second, random noise averages out across the network. Any individual station may have poor siting, but the statistical signal across thousands of stations in a region reflects actual atmospheric state.

Third, the NRS network provides calibration anchors. Research-grade stations at 30 reserves across California’s ecological gradients serve as ground truth against which nearby consumer stations can be implicitly calibrated through the learned model.


5. Development Phases

Phase 1: Paired-Site Validation (Current–Near Term)

Sites: Canemah Nature Laboratory (Oregon City, OR) and Owl Farm (Bellingham, WA).

Data: Tempest weather stations and BirdWeather acoustic monitors at both locations. Two years of Macroscope archive data at Canemah; new deployment at Bellingham.

Architecture: Extend SOMA from single-site RBMs to a two-node graph with shared hidden layers. Train a joint model that learns what is common to Pacific Northwest ecology (shared weather-biology coupling dynamics) and what is site-specific (different species pools, different coastal influence, different latitude).

Validation: Does the joint model outperform site-specific models at anomaly detection? Does knowledge transfer occur—does training on Canemah data improve predictions at Bellingham?

Compute: M4 Max laptop (Data), Mac Mini M4 Pro (Galatea). CPU and Metal-accelerated JAX.

Phase 2: UC Natural Reserve System Expansion (Medium Term)

Sites: 5–10 UC NRS reserves selected for sensor infrastructure quality and ecological diversity. Priority candidates include James San Jacinto Mountains Reserve (montane), Blue Oak Ranch Reserve (oak savanna), Angelo Coast Range Reserve (temperate rainforest), Sedgwick Reserve (coastal), and Sagehen Creek Field Station (subalpine).

Data: NRS Climate Monitoring Network (Dendra), supplemented with BirdWeather and/or BirdNET-Pi deployments at selected reserves. iNaturalist observations within reserve boundaries and surrounding landscapes.

Architecture: Graph neural network with NRS reserves as nodes. Edges defined by ecological similarity and geographic proximity. Expand visible dimensions to include soil moisture, sap flow, and leaf wetness from CHI sensor packages where available.

Validation: Cross-reserve prediction accuracy. Can the model predict phenological timing at one reserve given weather and species data from neighboring reserves? Can it detect climate-driven anomalies (atmospheric rivers, heat domes, drought onset) through their cross-domain ecological signatures?

Collaboration: The NRS’s existing Dendra infrastructure, professional station maintenance, and data quality protocols provide the institutional framework for this phase. The author’s 36-year history with the NRS—including 26 years directing the James Reserve and 10 years at Blue Oak Ranch—provides the relationships and institutional knowledge necessary to negotiate data access and collaborative deployment.

Phase 3: Crowdsourced Network Integration (Longer Term)

Sites: All Tempest stations, BirdWeather stations, and iNaturalist observations within selected geographic regions, starting with the Pacific Coast of North America (dense monitoring coverage, strong ecological gradients, high iNaturalist activity).

Data: Full API access to Tempest, BirdWeather, and iNaturalist platforms. Tens of thousands of nodes with heterogeneous data streams.

Architecture: Scaled graph neural network with learned spatial embeddings. Consumer stations connect to their regional neighbors. NRS research stations serve as calibration anchors with higher confidence weights. iNaturalist observations attach to the nearest geographic node as episodic phenological context.

Capabilities at scale:

Continental anomaly detection. The model knows what February should feel and sound like from San Diego to Bellingham. When spring arrives two weeks early in one region but not its neighbor, the model detects the spatial boundary of the phenological shift.

Climate-ecology coupling discovery. Cross-domain relationships emerge from training—the model discovers that Pacific Decadal Oscillation state modulates breeding chronology across the coast range, or that atmospheric river events trigger invertebrate emergence pulses that cascade through avian communities. These discoveries arise from the model’s learned weights, not from investigator hypotheses.

Absence and silence detection. Following SOMA’s demonstrated capability for absence-as-signal, the scaled model detects what should be present but is not. Missing species at expected phenological windows. Silent dawn choruses. Failed fruiting events. These absences are ecologically significant and invisible to presence-only monitoring.

Transfer learning across ecosystems. The deep layers encode general ecological dynamics that transfer across sites. The shallow layers encode local character. A new monitoring station—a single Tempest and BirdWeather deployment—can join the network and begin receiving contextualized anomaly detection within days of deployment, bootstrapped by the foundation model’s general ecological knowledge.

Phase 4: Open Platform

Specification: Publish standardized sensor deployment protocols, data format specifications, API integration requirements, and model architecture documentation.

Access: Any site that deploys a Tempest station and a BirdWeather monitor (total cost under $1,000) can join the network. The foundation model fine-tunes on their local data. The graph grows.

Community: Leverage the existing communities—150,000+ Tempest users, thousands of BirdWeather operators, nearly 4 million iNaturalist observers—as both data contributors and model validators. Citizen scientists who know their local ecosystems provide ground truth that no remote system can match.


6. Relationship to Existing Work

6.1 What Exists

Foundation models for remote sensing are under active development. NASA’s Prithvi, IBM’s GeospatialFM, and Clay have demonstrated pre-training on satellite imagery for land cover classification and change detection. The ESA-sponsored “Foundation Models for Climate and Society” initiative targets ice, drought, and flood-zone mapping.

BirdCast, a collaboration between Cornell Lab of Ornithology, Colorado State University, and University of Massachusetts, uses radar data and machine learning to forecast nocturnal bird migration across the United States in real time.

The PNAS perspective “A synergistic future for AI and ecology” (2023) calls explicitly for convergence between ecological science and AI, noting that “challenges that are commonplace in multiscale, context-dependent, and imperfectly observed ecological systems offer a panoply of problems through which AI moves closer to realizing its full potential.”

6.2 What Does Not Exist

No existing system learns the joint distribution over ground-truth atmospheric and biological variables across geographic space. Remote sensing foundation models observe from above—they detect land cover change but not species vocalization patterns. BirdCast predicts migration volume from radar, not species-specific activity from acoustic detection. Weather AI predicts atmospheric state but not its ecological consequences.

The proposed LSM operates at the intersection: ground-truth weather coupled with ground-truth biodiversity, learned jointly, at the spatial resolution of individual monitoring stations rather than satellite pixels. It would be the first model capable of predicting not just what the atmosphere will do tomorrow, but what the ecosystem will do—which species will be active, when the dawn chorus should begin, whether the phenological calendar is on track.

6.3 The DeepMind Observation

The GraphCast developers noted that their technology could be extended to “climate and ecology, energy, agriculture, and human and biological activity.” They described the vision of this proposal without building it. The reason it has not been built is not technical—the architecture exists, the compute is accessible, the training data is available. The reason is that it requires someone who understands both the ecological systems and the machine learning architecture, and who has access to the institutional relationships necessary to integrate research-grade and crowdsourced data networks.


7. Technical Requirements

7.1 Compute

Phases 1–2 operate within the capacity of Apple Silicon hardware with Metal acceleration. The M4 Max with 128GB unified memory supports models with thousands of input dimensions and millions of parameters. Phase 3 may require cloud compute for initial training (comparable to GraphCast’s training requirements scaled down by the ratio of spatial nodes), but inference—the ongoing monitoring operation—runs on modest hardware.

7.2 Software

JAX provides the computational framework, consistent with both SOMA’s existing implementation and GraphCast’s open-source codebase. Flax or Haiku provide neural network building blocks including attention mechanisms. The entire stack runs on the same platform.

7.3 Data Access

All three crowdsourced platforms provide API access. Tempest offers REST and WebSocket APIs for real-time and historical data. BirdWeather provides open APIs for detection data. iNaturalist supports programmatic access to research-grade observations with geographic and taxonomic filtering. The NRS Dendra platform provides API-based access to climate monitoring data.

7.4 No New Hardware

This is the proposal’s most distinctive feature. The training corpus exists. The sensors are deployed. The APIs are live. The compute fits on a desk. The only infrastructure that must be built is the model itself.


8. Expected Outcomes

A trained Large Sensor Model for ecological perception would produce:

Continental-scale anomaly detection. Real-time identification of ecosystem departures from learned expectations, decomposed by domain, timescale, and spatial extent. Early warning for phenological shifts, population crashes, invasive species establishment, and climate regime transitions.

Discovered ecological relationships. Cross-domain couplings encoded in the model’s learned weights, discoverable through attribution analysis. Relationships that field ecologists suspected but could not quantify, and relationships that no one anticipated.

Predictive ecological state. Next-day, next-week, next-season predictions of biodiversity activity at every monitored location. Not weather forecasting, but ecosystem forecasting—what the landscape should feel and sound like tomorrow.

Scalable citizen science integration. A framework that turns every $300 Tempest station and every iNaturalist observation into a node in a continental perception system. The value of each observation increases as the network grows, because the model provides context that makes local data more interpretable.

A new paradigm for ecological observation. The transition from representational monitoring (measuring, storing, comparing) to embodied monitoring (learning, predicting, perceiving) at the scale of a continent.


9. The Personal Dimension

This proposal emerges from a specific intellectual trajectory. Thirty-six years of directing UC biological field stations, from the James San Jacinto Mountains Reserve to Blue Oak Ranch, taught me that ecological perception is fundamentally cross-domain—you cannot understand the birds without understanding the weather, the soil, the season, the history. The CENS era (2002–2012) demonstrated that distributed sensor networks could capture ecological dynamics at scales impossible for human observers. The Macroscope synthesizes these lessons into a personal research observatory.

SOMA proved the concept at the scale of one backyard. The question is whether the same architecture—learning the statistical structure of how ecosystems behave from observation data alone—works at continental scale. The infrastructure says yes. The mathematics says yes. GraphCast says yes for the atmosphere. What remains is to build it.

The idea crystallized on the drive from Oregon City to Bellingham on February 12, 2026, connecting three threads that had been developing independently: the thermodynamic sensing framework, the GraphCast precedent, and the realization that the training data already exists in the crowdsourced networks I had been drawing from for my own backyard. This proposal is an attempt to capture that convergence before the threads separate again.


References

  • Hamilton, M.P. (2026). “Embodied Ecological Sensing via Thermodynamic Models.” Canemah Nature Laboratory Technical Note CNL-TN-2026-014. https://canemah.org/archive/document.php?id=CNL-TN-2026-014
  • Lam, R. et al. (2023). “GraphCast: Learning skillful medium-range global weather forecasting.” Science. https://doi.org/10.1126/science.adi2336
  • Price, I. et al. (2023). “GenCast: Diffusion-based ensemble forecasting for medium-range weather.” arXiv:2312.15796.
  • Jelinčič, A. et al. (2025). “An efficient probabilistic hardware architecture for diffusion-like models.” arXiv:2510.23972.
  • Wolpert, D.H. et al. (2024). “Is stochastic thermodynamics the key to understanding the energy costs of computation?” Proceedings of the National Academy of Sciences, 121(45), e2321112121.
  • Hinton, G.E. (2012). “A Practical Guide to Training Restricted Boltzmann Machines.” Neural Networks: Tricks of the Trade, Springer.
  • Ravi, S. et al. (2023). “A synergistic future for AI and ecology.” Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.2220283120

End of Research Proposal

Cite This Document

(2026). "Toward a Large Sensor Model for Ecological Perception." Canemah Nature Laboratory Working Paper CNL-WP-2026-022v2. https://canemah.org/archive/CNL-WP-2026-022v2

BibTeX

@unpublished{cnl2026toward, author = {}, title = {Toward a Large Sensor Model for Ecological Perception}, institution = {Canemah Nature Laboratory}, year = {2026}, number = {CNL-WP-2026-022}, month = {february}, url = {https://canemah.org/archive/document.php?id=CNL-WP-2026-022}, abstract = {Google DeepMind’s GraphCast demonstrated that machine learning models trained on historical atmospheric data can outperform physics-based weather forecasting on 90\% of verification targets. BioAnalyst (Trantas et al. 2025) demonstrated that multimodal foundation models can learn joint species-climate representations from satellite and occurrence data at continental scale. SOMA (Stochastic Observatory for Mesh Awareness), operating at Canemah Nature Laboratory, demonstrated that energy-based models trained on ecological sensor data can detect cross-domain anomalies invisible to single-domain analysis. This proposal argues that these results converge on an unrealized architectural target: a Large Sensor Model (LSM) for multi-domain ecological perception—a foundation model trained not on gridded reanalysis products or satellite imagery but on continuous ground-truth environmental sensor streams, learning the joint probability distribution over atmospheric and biological variables at the temporal and spatial resolution where ecological interactions actually occur. Critically, the infrastructure to train such a model already exists. Consumer weather station networks—WeatherFlow-Tempest, Ambient Weather, and Davis Instruments—collectively operate over half a million standardized stations worldwide with API access. BirdWeather maintains approximately 2,000 active acoustic monitoring stations running BirdNET species classification. iNaturalist has accumulated over 250 million verifiable observations—170 million at research grade—spanning all major taxonomic groups with geolocated timestamps. These platforms collectively provide the training corpus for a foundation model of ecosystem dynamics, requiring no new hardware deployment. We propose a phased development path: beginning with paired-site validation (Canemah, Oregon and Bellingham, Washington), expanding through regional recruitment of existing weather station and BirdWeather operators in the Pacific Northwest, and scaling organically as the network demonstrates value—following the grassroots trajectory that built iNaturalist from a graduate student project into a global biodiversity platform with 4 million observers. The resulting model would learn weather-biodiversity coupling across ecological gradients, producing predictions of ecosystem state whose deviations from observation constitute an anomaly detection system of unprecedented scope.} }

Permanent URL: https://canemah.org/archive/document.php?id=CNL-WP-2026-022