CNL-WP-2026-022 Working Paper

Toward a Large Sensor Model for Ecological Perception

Published: February 12, 2026 Version: 6

Abstract

Google DeepMind’s GraphCast demonstrated that machine learning models trained on historical atmospheric data can outperform physics-based weather forecasting on 90% of verification targets. BioAnalyst (Trantas et al. 2025) demonstrated that multimodal foundation models can learn joint species-climate representations from satellite and occurrence data at continental scale. SOMA (Stochastic Observatory for Mesh Awareness), operating at Canemah Nature Laboratory, demonstrated that energy-based models trained on ecological sensor data can detect cross-domain anomalies invisible to single-domain analysis. This proposal argues that these results converge on an unrealized architectural target: a Large Sensor Model (LSM) for multi-domain ecological perception—a foundation model trained not on gridded reanalysis products or satellite imagery but on continuous ground-truth environmental sensor streams, learning the joint probability distribution over atmospheric and biological variables at the temporal and spatial resolution where ecological interactions actually occur.

Critically, the infrastructure to train such a model already exists. Consumer weather station networks—WeatherFlow-Tempest, Ambient Weather, and Davis Instruments—collectively operate over half a million standardized stations worldwide with API access. BirdWeather maintains approximately 2,000 active acoustic monitoring stations running BirdNET species classification. iNaturalist has accumulated over 250 million verifiable observations—170 million at research grade—spanning all major taxonomic groups with geolocated timestamps. These platforms collectively provide the training corpus for a foundation model of ecosystem dynamics, requiring no new hardware deployment.

We propose a phased development path: beginning with paired-site validation (Canemah, Oregon and Bellingham, Washington), expanding through regional recruitment of existing weather station and BirdWeather operators in the Pacific Northwest, and scaling organically as the network demonstrates value—following the grassroots trajectory that built iNaturalist from a graduate student project into a global biodiversity platform with 4 million observers. The resulting model would learn weather-biodiversity coupling across ecological gradients, producing predictions of ecosystem state whose deviations from observation constitute an anomaly detection system of unprecedented scope.

-----

Access

AI Collaboration Disclosure

Claude (Anthropic ) — Analysis

This field note was developed with assistance from Claude (Anthropic). The AI contributed to literature synthesis and manuscript drafting through extended dialogue. The author takes full responsibility for the content, accuracy, and conclusions.

Human review: full

Version History

Version Date Notes Link
v6 February 12, 2026 Latest
v5 February 12, 2026 View
v4 February 12, 2026 View
v3 February 12, 2026 View
v2 February 12, 2026 View
v1 February 12, 2026 Initial publication View

Cite This Document

(2026). "Toward a Large Sensor Model for Ecological Perception." Canemah Nature Laboratory Working Paper CNL-WP-2026-022. https://canemah.org/archive/CNL-WP-2026-022

BibTeX

@unpublished{cnl2026toward, author = {}, title = {Toward a Large Sensor Model for Ecological Perception}, institution = {Canemah Nature Laboratory}, year = {2026}, number = {CNL-WP-2026-022}, month = {february}, url = {https://canemah.org/archive/document.php?id=CNL-WP-2026-022}, abstract = {Google DeepMind’s GraphCast demonstrated that machine learning models trained on historical atmospheric data can outperform physics-based weather forecasting on 90\% of verification targets. BioAnalyst (Trantas et al. 2025) demonstrated that multimodal foundation models can learn joint species-climate representations from satellite and occurrence data at continental scale. SOMA (Stochastic Observatory for Mesh Awareness), operating at Canemah Nature Laboratory, demonstrated that energy-based models trained on ecological sensor data can detect cross-domain anomalies invisible to single-domain analysis. This proposal argues that these results converge on an unrealized architectural target: a Large Sensor Model (LSM) for multi-domain ecological perception—a foundation model trained not on gridded reanalysis products or satellite imagery but on continuous ground-truth environmental sensor streams, learning the joint probability distribution over atmospheric and biological variables at the temporal and spatial resolution where ecological interactions actually occur. Critically, the infrastructure to train such a model already exists. Consumer weather station networks—WeatherFlow-Tempest, Ambient Weather, and Davis Instruments—collectively operate over half a million standardized stations worldwide with API access. BirdWeather maintains approximately 2,000 active acoustic monitoring stations running BirdNET species classification. iNaturalist has accumulated over 250 million verifiable observations—170 million at research grade—spanning all major taxonomic groups with geolocated timestamps. These platforms collectively provide the training corpus for a foundation model of ecosystem dynamics, requiring no new hardware deployment. We propose a phased development path: beginning with paired-site validation (Canemah, Oregon and Bellingham, Washington), expanding through regional recruitment of existing weather station and BirdWeather operators in the Pacific Northwest, and scaling organically as the network demonstrates value—following the grassroots trajectory that built iNaturalist from a graduate student project into a global biodiversity platform with 4 million observers. The resulting model would learn weather-biodiversity coupling across ecological gradients, producing predictions of ecosystem state whose deviations from observation constitute an anomaly detection system of unprecedented scope.} }

Permanent URL: https://canemah.org/archive/document.php?id=CNL-WP-2026-022