Toward a Large Sensor Model for Ecological Perception
Toward a Large Sensor Model for Ecological Perception
Leveraging Crowdsourced Environmental Infrastructure for a Foundation Model of Ecosystem Dynamics
Document ID: CNL-FN-2026-022
Date: February 12, 2026
Author: Michael P. Hamilton, Ph.D.
Project: Macroscope Ecological Observatory
AI Assistance Disclosure: This research proposal was developed with assistance from Claude (Anthropic). The AI contributed to literature synthesis, conceptual framework development, and manuscript drafting through extended dialogue. The author takes full responsibility for the content, accuracy, and conclusions.
Abstract
Google DeepMind’s GraphCast demonstrated that machine learning models trained on historical atmospheric data can outperform physics-based weather forecasting on 90% of verification targets. BioAnalyst (Trantas et al. 2025) demonstrated that multimodal foundation models can learn joint species-climate representations from satellite and occurrence data at continental scale. SOMA (Stochastic Observatory for Mesh Awareness), operating at Canemah Nature Laboratory, demonstrated that energy-based models trained on ecological sensor data can detect cross-domain anomalies invisible to single-domain analysis. This proposal argues that these results converge on an unrealized architectural target: a Large Sensor Model (LSM) for multi-domain ecological perception—a foundation model trained not on gridded reanalysis products or satellite imagery but on continuous ground-truth environmental sensor streams, learning the joint probability distribution over atmospheric and biological variables at the temporal and spatial resolution where ecological interactions actually occur.
Critically, the infrastructure to train such a model already exists. WeatherFlow’s Tempest network operates 85,000+ standardized weather stations worldwide. BirdWeather maintains approximately 2,000 active acoustic monitoring stations running BirdNET species classification. iNaturalist has accumulated over 250 million verifiable observations—170 million at research grade—spanning all major taxonomic groups with geolocated timestamps. These three platforms collectively provide the training corpus for a foundation model of ecosystem dynamics, requiring no new hardware deployment.
We propose a phased development path: beginning with paired-site validation (Canemah, Oregon and Bellingham, Washington), expanding through the UC Natural Reserve System’s 41-site network with existing climate monitoring infrastructure, and scaling to the full crowdsourced network. The resulting model would learn weather-biodiversity coupling across continental ecological gradients, producing predictions of ecosystem state whose deviations from observation constitute an anomaly detection system of unprecedented scope.
1. Introduction
1.1 The Problem
Ecological monitoring generates vast quantities of data with limited integration. Weather stations report atmospheric conditions. Acoustic monitors detect bird species. Camera traps record mammals. Citizen scientists photograph plants, insects, and fungi. Each stream is analyzed independently or, at best, correlated post hoc through statistical methods that require explicit hypothesis specification. No system learns the joint distribution over environmental and biological variables—the statistical structure of how ecosystems actually behave.
This limitation matters because ecological dynamics are fundamentally cross-domain. Barometric pressure affects bird vocalization patterns. Soil moisture modulates insect emergence. Photoperiod drives phenological transitions that cascade through food webs. These couplings are nonlinear, context-dependent, and vary across geography and season. Rule-based systems cannot enumerate them. Statistical correlation requires knowing what to look for. What is needed is a model that learns the structure from data, the way a field ecologist accumulates intuition through decades of observation.
1.2 The GraphCast Precedent
GraphCast (Lam et al. 2023), published in Science, demonstrated that a graph neural network trained on 39 years of atmospheric reanalysis data can predict 227 weather variables at 0.25° global resolution for 10-day horizons, outperforming physics-based forecasting on 90% of 1,380 verification targets. The model generates forecasts in under one minute on a single machine, versus hours on supercomputers for conventional numerical weather prediction.
GraphCast’s significance extends beyond weather. It proved that the statistical structure of a complex physical system—the atmosphere—can be learned from historical observations alone, without encoding any physics equations. The model operates on a graph neural network over an icosahedral mesh, where nodes represent spatial locations and edges encode how atmospheric state propagates between them. The learned weights embody the dynamics of a far-from-equilibrium thermodynamic system.
GenCast (Price et al. 2023) extended this with diffusion-based ensemble forecasting, generating probabilistic predictions that quantify forecast uncertainty—a capability directly aligned with the energy-based probabilistic framework already demonstrated in ecological contexts.
1.3 SOMA: Proof of Concept at Site Scale
SOMA (Hamilton 2026) implements three Restricted Boltzmann Machine meshes at Canemah Nature Laboratory, trained on 118 days of weather and acoustic biodiversity data. The ecosystem mesh—65 visible nodes encoding weather variables and species detection patterns, connected to 100 hidden nodes—successfully detected a cross-domain anomaly where weather and species conditions were individually unremarkable but their combination violated learned expectations.
This result established a critical principle: joint distribution modeling across ecological domains captures structure that domain-specific monitoring misses. The RBM weights encode learned correlations between atmospheric conditions and biological activity. When incoming sensor data violates those learned expectations, the mesh registers mathematical tension rather than calculating a derived metric—embodied perception rather than representational comparison.
1.4 The Convergence
GraphCast is, in essence, a Large Sensor Model scoped to a single domain (atmosphere) at planetary scale. BioAnalyst is a foundation model that integrates biodiversity and climate from satellite and occurrence data at continental scale and monthly resolution. SOMA is a Large Sensor Model scoped to multiple domains (atmosphere and biodiversity) at site scale, operating on ground-truth sensor streams at minute-level resolution. The architectural trajectory connecting SOMA to a continental-scale ground-truth ecological foundation model—from site-scale Boltzmann machines through stacked deep networks to graph neural networks operating across geographic space—is well-established in the machine learning literature.
What has been missing is the recognition that the training infrastructure for such a model already exists, deployed and operating, generating data continuously, accessible through standardized APIs—and that it captures ecological dynamics at a fundamentally different resolution than the satellite-and-reanalysis approach that existing foundation models employ.
2. The Existing Infrastructure
2.1 WeatherFlow Tempest Network
WeatherFlow-Tempest operates over 85,000 consumer weather stations worldwide, producing billions of observational data points daily. Each station reports identical variables through a standardized API: temperature, humidity, barometric pressure, wind speed and direction, solar radiation, UV index, and precipitation at one-minute intervals. The stations are factory-calibrated consumer hardware—not research grade individually, but collectively powerful. Known biases (solar radiation heating from poor siting, wind obstruction) are characterizable, and a foundation model trained on thousands of stations learns to see through these biases because the signal is consistent while the noise is random across deployments. The network is dense in North America, Europe, and Australia, with growing coverage globally.
2.2 BirdWeather Acoustic Network
BirdWeather operates approximately 2,000 active acoustic monitoring stations globally, running continuous BirdNET neural network classification against audio streams. Each station reports species detections with timestamps, confidence scores, and species identification for over 6,000 recognized species. The PUC (Physical Universe Codec) hardware includes dual microphones, environmental sensors, GPS, and on-board neural processing. BirdNET-Pi stations running on Raspberry Pi hardware extend the network further. The system produces continuous presence/absence data for avian species, the most ecologically informative vertebrate taxon for phenological and community monitoring.
The first large-scale scientific use of the BirdWeather detection library—a study of light pollution effects on bird vocalization timing across species, space, and seasons—demonstrates the platform’s research utility.
2.3 iNaturalist Biodiversity Observations
iNaturalist has accumulated over 250 million verifiable observations from nearly 4 million observers worldwide, with approximately 170 million observations achieving research-grade identification through community consensus. The platform’s computer vision model recognizes over 112,000 taxa.
For the purposes of an ecological foundation model, iNaturalist provides what acoustic monitoring cannot: observations across all major taxonomic groups—plants, insects, fungi, amphibians, reptiles, and mammals—with geolocated timestamps and, critically, phenological annotations. Flowering dates, fruiting times, insect emergence, migration arrivals—the temporal structure of ecological communities is encoded in the collective observation record. The iNaturalist API provides programmatic access to research-grade observations filtered by taxon, location, date, and quality grade.
The data are episodic rather than continuous—clustered around population centers and weekends, biased toward charismatic taxa—but these biases are characterizable and the sheer volume provides statistical power. A foundation model trained on this data does not need every observation to be equally reliable; it needs the aggregate distribution to be ecologically meaningful.
2.4 UC Natural Reserve System
The UC Natural Reserve System (NRS) comprises 41 reserves covering over 750,000 acres across California, representing most of the state’s major habitat types. The NRS Climate Monitoring Network operates 35 standardized weather stations at 30 reserves, all feeding into the Dendra cyberinfrastructure platform for real-time data storage, retrieval, and management.
The NRS is actively developing AI-powered wildlife monitoring—acoustic recorders paired with camera traps using on-board species identification, with real-time data transmission via satellite uplink. The California Department of Fish and Wildlife has adopted this methodology for 42 sentinel sites statewide.
The NRS represents a bridge between crowdsourced consumer infrastructure and research-grade scientific monitoring. Its reserves span California’s ecological gradients—from coastal tidepools to alpine peaks, from redwood forests to inland deserts—with standardized instrumentation, professional maintenance, and long-term data archives. For a foundation model, NRS reserves provide ground truth against which crowdsourced data can be calibrated.
3. Architecture
3.1 Domain Scope: EARTH and LIFE
We restrict the initial model to two domains: EARTH (atmospheric and environmental conditions) and LIFE (biodiversity patterns across taxa). This scoping decision is deliberate. Weather and biodiversity sensors produce relatively standardized, well-characterized data streams. The physics of weather-biology coupling is universal—every ecosystem on Earth experiences it. And the crowdsourced infrastructure described above provides dense coverage for precisely these two domains.
Excluding the Macroscope’s HOME and SELF domains eliminates idiosyncratic, non-generalizable data streams while focusing on what transfers across sites, ecosystems, and investigators. A model that learns how atmosphere and biosphere couple is scientifically general. A model that includes one person’s indoor temperature and sleep patterns is not.
3.2 Geographic Graph Structure
Each monitoring location becomes a node in a geographic graph. The graph structure is provided by biogeography itself—sites within the same ecoregion share species pools, climate drivers, and seasonal patterns. Edges connect nearby stations, with edge weights reflecting ecological similarity (shared species, correlated climate) rather than geographic distance alone.
A Tempest station in Portland and a Tempest station in Medford share Pacific Northwest weather patterns but differ in species communities. A BirdWeather station at sea level and one at 5,000 feet may be geographically close but ecologically distant. The graph neural network learns these relationships from data, discovering which connections carry ecological information.
This is the GraphCast architecture adapted to ecology. Where GraphCast’s icosahedral mesh tiles Earth’s surface uniformly, the ecological graph is irregular—dense where monitoring stations cluster, sparse where they do not. The model must learn to interpolate across gaps, a capability that graph neural networks handle naturally through message passing between connected nodes.
3.3 Multi-Stream Input Encoding
Each node receives three classes of input, each with different temporal characteristics:
Continuous streams (Tempest): Weather variables at one-minute intervals. Temperature, humidity, pressure, wind, solar radiation, precipitation. These form the backbone temporal signal—the heartbeat of the EARTH domain.
Continuous acoustic streams (BirdWeather): Species detections at irregular intervals, aggregated into activity profiles per hour or per 15-minute window. Species presence/absence, detection confidence, vocalization timing. The LIFE domain’s continuous signal.
Episodic observations (iNaturalist): Species occurrences with timestamps, geolocated but temporally sparse. Plants, insects, fungi, mammals—taxa invisible to acoustic monitoring. Aggregated into monthly or seasonal phenological profiles per grid cell.
The architectural challenge is fusing these streams—continuous weather, continuous acoustics, and episodic community observations—into a coherent representation. Transformer architectures with learned temporal encodings handle variable sampling rates and irregular gaps naturally. Each input type receives its own positional encoding scheme: absolute timestamps for Tempest, event-based encoding for BirdWeather, and seasonal cyclical encoding for iNaturalist aggregates.
3.4 Temporal Hierarchy
Following the temporal topology described in CNL-TN-2026-014, the model embeds multiple timescales in its architecture:
Surface layer: Current conditions—the state of the atmosphere and biosphere right now. Updated with each sensor reading.
Diurnal layer: Daily patterns—dawn chorus timing, temperature cycling, nocturnal activity. Encodes what this time of day should feel like.
Seasonal layer: Phenological rhythms—when species should be active, what temperatures are normal for this week of the year. Encodes what this season should feel like.
Interannual layer: Climate context—ENSO state, long-term trends, multi-year baselines. Encodes what this year should feel like relative to the historical record.
Each layer provides context for the layers above it. A temperature anomaly at the surface layer is evaluated against the diurnal norm, the seasonal expectation, and the interannual trend simultaneously. The model does not report “temperature is 72°F” but rather “this February afternoon, in this La Niña year, feels warmer than it should.”
3.5 Output: Prediction and Anomaly
The model’s primary output is a predicted state vector for each node at the next time step—expected weather conditions and expected biological activity given the current state and all contextual layers. The predicted state spans all input dimensions: expected temperature, expected pressure, expected species activity by hour, expected phenological state.
The deviation between prediction and observation is the anomaly signal. Large deviations indicate surprise: the ecosystem is doing something it has never done before in this context. The deviation can be decomposed by domain (is the surprise in weather, in species, or in their coupling?), by timescale (is this a daily anomaly or a seasonal one?), and by spatial extent (is this local to one node or propagating across the graph?).
This is SOMA’s tension signal scaled to continental scope with deep temporal context.
4. Training Corpus
4.1 Scale
The combined data corpus is substantial:
Tempest: 85,000+ stations × one-minute readings × years of archive. At 15 variables per station, this represents billions of weather state vectors.
BirdWeather: 2,000+ stations × continuous detection streams × years of operation. Millions of species detection events with temporal and environmental context.
iNaturalist: 170+ million research-grade observations spanning all major taxa, geolocated and timestamped, with phenological annotations for plants.
UC NRS: 35 climate stations with professional-grade, quality-controlled data from 2013 onward, plus Dendra-managed sensor archives from individual reserves extending back decades in some cases.
4.2 Comparison to GraphCast
GraphCast trained on 39 years of ERA5 reanalysis data: 227 variables at ~1 million grid points, sampled every 6 hours. The ecological training corpus differs in structure—irregular spatial distribution, heterogeneous sampling rates, mixed continuous and episodic streams—but is comparable or larger in total information content.
The key advantage of ecological data is its lower entropy relative to atmospheric dynamics. The number of meaningfully different ecosystem states at a given location is far smaller than the number of meaningful atmospheric configurations. Weather is chaotic; ecosystems, while complex, are heavily constrained by biogeography, phenology, and energetics. The learnable manifold is more compact, suggesting that effective training may require less data per location than atmospheric modeling demands.
4.3 Data Quality and Bias
Consumer weather stations have known biases. BirdNET classifications carry false positive rates. iNaturalist observations cluster around cities and weekends. These limitations are real but manageable for three reasons.
First, systematic biases are characterizable and consistent within each platform. A foundation model trained on thousands of Tempest stations learns the systematic offset between consumer and research-grade measurements.
Second, random noise averages out across the network. Any individual station may have poor siting, but the statistical signal across thousands of stations in a region reflects actual atmospheric state.
Third, the NRS network provides calibration anchors. Research-grade stations at 30 reserves across California’s ecological gradients serve as ground truth against which nearby consumer stations can be implicitly calibrated through the learned model.
5. Development Phases
Phase 1: Paired-Site Validation (Current–Near Term)
Sites: Canemah Nature Laboratory (Oregon City, OR) and Owl Farm (Bellingham, WA).
Data: Tempest weather stations and BirdWeather acoustic monitors at both locations. Two years of Macroscope archive data at Canemah; new deployment at Bellingham.
Architecture: Extend SOMA from single-site RBMs to a two-node graph with shared hidden layers. Train a joint model that learns what is common to Pacific Northwest ecology (shared weather-biology coupling dynamics) and what is site-specific (different species pools, different coastal influence, different latitude).
Validation: Does the joint model outperform site-specific models at anomaly detection? Does knowledge transfer occur—does training on Canemah data improve predictions at Bellingham?
Compute: M4 Max laptop (Data), Mac Mini M4 Pro (Galatea). CPU and Metal-accelerated JAX.
Phase 2: UC Natural Reserve System Expansion (Medium Term)
Sites: 5–10 UC NRS reserves selected for sensor infrastructure quality and ecological diversity. Priority candidates include James San Jacinto Mountains Reserve (montane), Blue Oak Ranch Reserve (oak savanna), Angelo Coast Range Reserve (temperate rainforest), Sedgwick Reserve (coastal), and Sagehen Creek Field Station (subalpine).
Data: NRS Climate Monitoring Network (Dendra), supplemented with BirdWeather and/or BirdNET-Pi deployments at selected reserves. iNaturalist observations within reserve boundaries and surrounding landscapes.
Architecture: Graph neural network with NRS reserves as nodes. Edges defined by ecological similarity and geographic proximity. Expand visible dimensions to include soil moisture and other environmental variables where reserve instrumentation supports it.
Validation: Cross-reserve prediction accuracy. Can the model predict phenological timing at one reserve given weather and species data from neighboring reserves? Can it detect climate-driven anomalies (atmospheric rivers, heat domes, drought onset) through their cross-domain ecological signatures?
Collaboration: The NRS’s existing Dendra infrastructure, professional station maintenance, and data quality protocols provide the institutional framework for this phase. The author’s 36-year history with the NRS—including 26 years directing the James Reserve and 10 years at Blue Oak Ranch—provides the relationships and institutional knowledge necessary to negotiate data access and collaborative deployment.
Phase 3: Crowdsourced Network Integration (Longer Term)
Sites: All Tempest stations, BirdWeather stations, and iNaturalist observations within selected geographic regions, starting with the Pacific Coast of North America (dense monitoring coverage, strong ecological gradients, high iNaturalist activity).
Data: Full API access to Tempest, BirdWeather, and iNaturalist platforms. Tens of thousands of nodes with heterogeneous data streams.
Architecture: Scaled graph neural network with learned spatial embeddings. Consumer stations connect to their regional neighbors. NRS research stations serve as calibration anchors with higher confidence weights. iNaturalist observations attach to the nearest geographic node as episodic phenological context.
Capabilities at scale:
Continental anomaly detection. The model knows what February should feel and sound like from San Diego to Bellingham. When spring arrives two weeks early in one region but not its neighbor, the model detects the spatial boundary of the phenological shift.
Climate-ecology coupling discovery. Cross-domain relationships emerge from training—the model discovers that Pacific Decadal Oscillation state modulates breeding chronology across the coast range, or that atmospheric river events trigger invertebrate emergence pulses that cascade through avian communities. These discoveries arise from the model’s learned weights, not from investigator hypotheses.
Absence and silence detection. Following SOMA’s demonstrated capability for absence-as-signal, the scaled model detects what should be present but is not. Missing species at expected phenological windows. Silent dawn choruses. Failed fruiting events. These absences are ecologically significant and invisible to presence-only monitoring.
Transfer learning across ecosystems. The deep layers encode general ecological dynamics that transfer across sites. The shallow layers encode local character. A new monitoring station—a single Tempest and BirdWeather deployment—can join the network and begin receiving contextualized anomaly detection within days of deployment, bootstrapped by the foundation model’s general ecological knowledge.
Phase 4: Open Platform
Specification: Publish standardized sensor deployment protocols, data format specifications, API integration requirements, and model architecture documentation.
Access: Any site that deploys a Tempest station and a BirdWeather monitor (total cost under $1,000) can join the network. The foundation model fine-tunes on their local data. The graph grows.
Community: Leverage the existing communities—150,000+ Tempest users, thousands of BirdWeather operators, nearly 4 million iNaturalist observers—as both data contributors and model validators. Citizen scientists who know their local ecosystems provide ground truth that no remote system can match.
6. Relationship to Existing Work
6.1 What Exists
Foundation models for remote sensing are under active development. NASA’s Prithvi, IBM’s GeospatialFM, and Clay have demonstrated pre-training on satellite imagery for land cover classification and change detection. The ESA-sponsored “Foundation Models for Climate and Society” initiative targets ice, drought, and flood-zone mapping.
Most significantly, BioAnalyst (Trantas et al. 2025) describes itself as “the first multimodal Foundation Model tailored to biodiversity analysis and conservation planning.” Using a Perceiver IO encoder and 3D Swin Transformer backbone, BioAnalyst ingests 10 data modalities—species occurrence records, remote sensing indicators, climate variables, and environmental covariates—across 20 years of European spatiotemporal data at 0.25° (approximately 28 km) resolution. The model demonstrates competence at joint species distribution modeling for 500 vascular plant species and monthly climate linear probing, establishing that foundation model architectures can indeed learn cross-domain ecological representations. BioAnalyst is open-source, with published weights and fine-tuning pipelines.
BirdCast, a collaboration between Cornell Lab of Ornithology, Colorado State University, and University of Massachusetts, uses radar data and machine learning to forecast nocturnal bird migration across the United States in real time. Foundation models for bioacoustics are also emerging, with multiple architectures demonstrating transfer learning for species classification from passive acoustic monitoring data.
The PNAS perspective “A synergistic future for AI and ecology” (2023) calls explicitly for convergence between ecological science and AI, noting that “challenges that are commonplace in multiscale, context-dependent, and imperfectly observed ecological systems offer a panoply of problems through which AI moves closer to realizing its full potential.”
6.2 What Does Not Exist—and Why It Matters
BioAnalyst represents an important advance, but it approaches ecological intelligence from above. Its data sources are satellite-derived remote sensing indices, gridded climate reanalysis products, and species occurrence records aggregated to 28-km grid cells at monthly temporal resolution. It learns correlations between what satellites see and what occurrence databases report. This is powerful for continental-scale conservation planning, habitat suitability assessment, and population trend forecasting—the tasks for which it was designed.
What BioAnalyst cannot do—what no existing system does—is learn from the ground up. A Tempest weather station measures actual conditions at a specific point every 60 seconds. A BirdWeather microphone hears actual birds vocalizing in real time—capturing behavioral signals like dawn chorus timing, raptor-induced silence, and nocturnal flight calls that no satellite or occurrence database records. An iNaturalist observer photographs an actual organism at an actual moment in its phenological cycle. The temporal resolution is minutes, not months. The spatial resolution is the individual station, not a 28-km grid cell. And critically, acoustic data encodes behavior—when species vocalize, when they fall silent, when their activity patterns shift—information that exists at no other observational scale.
The difference is the difference between reading about a forest and standing in one. BioAnalyst can model that species richness in grid cell X correlates with NDVI and mean annual temperature. The proposed LSM can model that the towhees went quiet at 8:15 AM when the barometer dropped 2 mb/hour and no raptor was detected—and learn that this silence carries ecological meaning.
No existing system learns the joint distribution over ground-truth atmospheric and biological variables from distributed sensor networks at the temporal and spatial resolution of individual monitoring stations. Remote sensing foundation models observe from above. Weather AI predicts atmospheric state but not ecological consequences. BirdCast forecasts migration volume from radar, not species-specific behavioral patterns from acoustic detection. BioAnalyst integrates biodiversity and climate but at coarse spatiotemporal resolution from aggregated records, not real-time sensor streams.
The proposed LSM occupies a fundamentally different position: ground-truth weather coupled with ground-truth biodiversity, learned jointly, at the resolution where ecological interactions actually occur. It would be the first model trained on the continuous sensor streams that field ecologists recognize as the actual fabric of ecosystem dynamics—the minute-by-minute pulse of weather and the hour-by-hour rhythm of biological activity at specific places. The goal is not to predict species ranges across Europe, but to predict what the ecosystem at a given monitoring station should sound and feel like tomorrow—and to detect, in real time, when it doesn’t.
6.3 The DeepMind Observation
The GraphCast developers noted that their technology could be extended to “climate and ecology, energy, agriculture, and human and biological activity.” BioAnalyst has begun to realize this vision from the satellite perspective. But the ground-truth sensor approach—learning ecosystem dynamics from the instruments that actually measure weather and hear birds at specific places in real time—has not been attempted. The reason is not technical—the architecture exists, the compute is accessible, the training data is available. The reason is that it requires someone who understands both the ecological systems and the machine learning architecture, and who has access to the institutional relationships necessary to integrate research-grade and crowdsourced data networks.
7. Technical Requirements
7.1 Compute
Phases 1–2 operate within the capacity of Apple Silicon hardware with Metal acceleration. The M4 Max with 128GB unified memory supports models with thousands of input dimensions and millions of parameters. Phase 3 may require cloud compute for initial training (comparable to GraphCast’s training requirements scaled down by the ratio of spatial nodes), but inference—the ongoing monitoring operation—runs on modest hardware.
7.2 Software
JAX provides the computational framework, consistent with both SOMA’s existing implementation and GraphCast’s open-source codebase. Flax or Haiku provide neural network building blocks including attention mechanisms. The entire stack runs on the same platform.
7.3 Data Access
All three crowdsourced platforms provide API access. Tempest offers REST and WebSocket APIs for real-time and historical data. BirdWeather provides open APIs for detection data. iNaturalist supports programmatic access to research-grade observations with geographic and taxonomic filtering. The NRS Dendra platform provides API-based access to climate monitoring data.
7.4 No New Hardware
This is the proposal’s most distinctive feature. The training corpus exists. The sensors are deployed. The APIs are live. The compute fits on a desk. The only infrastructure that must be built is the model itself.
8. Expected Outcomes
A trained Large Sensor Model for ecological perception would produce:
Continental-scale anomaly detection. Real-time identification of ecosystem departures from learned expectations, decomposed by domain, timescale, and spatial extent. Early warning for phenological shifts, population crashes, invasive species establishment, and climate regime transitions.
Discovered ecological relationships. Cross-domain couplings encoded in the model’s learned weights, discoverable through attribution analysis. Relationships that field ecologists suspected but could not quantify, and relationships that no one anticipated.
Predictive ecological state. Next-day, next-week, next-season predictions of biodiversity activity at every monitored location. Not weather forecasting, but ecosystem forecasting—what the landscape should feel and sound like tomorrow.
Scalable citizen science integration. A framework that turns every $300 Tempest station and every iNaturalist observation into a node in a continental perception system. The value of each observation increases as the network grows, because the model provides context that makes local data more interpretable.
A new paradigm for ecological observation. The transition from representational monitoring (measuring, storing, comparing) to embodied monitoring (learning, predicting, perceiving) at the scale of a continent.
9. The Personal Dimension
This proposal emerges from a specific intellectual trajectory. Thirty-six years of directing UC biological field stations, from the James San Jacinto Mountains Reserve to Blue Oak Ranch, taught me that ecological perception is fundamentally cross-domain—you cannot understand the birds without understanding the weather, the soil, the season, the history. The CENS era (2002–2012) demonstrated that distributed sensor networks could capture ecological dynamics at scales impossible for human observers. The Macroscope synthesizes these lessons into a personal research observatory.
SOMA proved the concept at the scale of one backyard. The question is whether the same architecture—learning the statistical structure of how ecosystems behave from observation data alone—works at continental scale. The infrastructure says yes. The mathematics says yes. GraphCast says yes for the atmosphere. What remains is to build it.
The idea crystallized on the drive from Oregon City to Bellingham on February 12, 2026, connecting three threads that had been developing independently: the thermodynamic sensing framework, the GraphCast precedent, and the realization that the training data already exists in the crowdsourced networks I had been drawing from for my own backyard. This proposal is an attempt to capture that convergence before the threads separate again.
References
- Hamilton, M.P. (2026). “Embodied Ecological Sensing via Thermodynamic Models.” Canemah Nature Laboratory Technical Note CNL-TN-2026-014. https://canemah.org/archive/document.php?id=CNL-TN-2026-014
- Lam, R. et al. (2023). “GraphCast: Learning skillful medium-range global weather forecasting.” Science. https://doi.org/10.1126/science.adi2336
- Price, I. et al. (2023). “GenCast: Diffusion-based ensemble forecasting for medium-range weather.” arXiv:2312.15796.
- Jelinčič, A. et al. (2025). “An efficient probabilistic hardware architecture for diffusion-like models.” arXiv:2510.23972.
- Wolpert, D.H. et al. (2024). “Is stochastic thermodynamics the key to understanding the energy costs of computation?” Proceedings of the National Academy of Sciences, 121(45), e2321112121.
- Hinton, G.E. (2012). “A Practical Guide to Training Restricted Boltzmann Machines.” Neural Networks: Tricks of the Trade, Springer.
- Ravi, S. et al. (2023). “A synergistic future for AI and ecology.” Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.2220283120
- Trantas, A. et al. (2025). “BioAnalyst: A Foundation Model for Biodiversity.” arXiv:2507.09080. https://arxiv.org/abs/2507.09080
End of Research Proposal
Cite This Document
BibTeX
Permanent URL: https://canemah.org/archive/document.php?id=CNL-WP-2026-022