THRML Proof-of-Concept Specification: Embodied Sensing Experiments
THRML Proof-of-Concept Specification: Embodied Sensing Experiments
Document ID: CNL-SP-2026-001
Version: 1.0
Date: February 1, 2026
Status: Draft
Author: Michael P. Hamilton, Ph.D.
Project: Macroscope Ecological Observatory
Reference Doc: CNL-TN-2026-014
AI Assistance Disclosure: This specification was developed with assistance from Claude (Anthropic, Opus 4.5). The AI contributed to technical architecture design and document drafting. The author takes full responsibility for the content and technical decisions.
1. Abstract
This specification defines two proof-of-concept experiments for validating the embodied sensing framework described in CNL-TN-2026-014. Using the THRML library on Apple M4 Max hardware, we implement minimal Boltzmann machine meshes trained on Macroscope sensor data. Experiment A uses Tempest weather station readings (continuous environmental variables). Experiment B uses BirdWeather detections (discrete species presence/absence with relational structure). Success criteria: measurable energy differential between normal and anomalous input states.
2. Development Environment
2.1 Hardware
| System | Role | Specs |
|---|---|---|
| Data | Development, iteration | MacBook Pro M4 Max, 128GB unified memory |
| Galatea | Production data source | Mac Mini M4 Pro, 1Gb fiber, continuous streams |
2.2 Software Dependencies
# Python environment (Python 3.12+)
python3 -m venv ~/thrml-poc
source ~/thrml-poc/bin/activate
# Core dependencies
pip install jax jaxlib
pip install thrml
pip install mysql-connector-python
pip install numpy pandas
# Optional visualization
pip install matplotlib seaborn
2.3 JAX Configuration for Apple Silicon
# Verify Metal backend
import jax
print(jax.devices()) # Should show Metal device
# Set memory allocation
import os
os.environ['XLA_PYTHON_CLIENT_PREALLOCATE'] = 'false'
os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION'] = '0.8'
2.4 Database Connection
# config.py
DB_CONFIG = {
'host': 'localhost',
'database': 'macroscope',
'user': 'mikehamilton',
'password': '***', # From secure config
'charset': 'utf8mb4'
}
3. Experiment A: Tempest Environmental Mesh
3.1 Objective
Train a Boltzmann machine on multi-variate environmental time series. Validate that the mesh settles to low energy on typical readings and exhibits measurable tension on anomalous inputs.
3.2 Data Extraction
Source Table: tempest_readings
Selected Variables:
| Field | Description | Range | Encoding |
|---|---|---|---|
temperature_f |
Air temperature | 20-100°F | 8 bits |
humidity |
Relative humidity | 0-100% | 7 bits |
pressure_inhg |
Barometric pressure | 28-32 inHg | 6 bits |
wind_mph |
Wind speed | 0-50 mph | 6 bits |
solar_radiation |
Solar irradiance | 0-1200 W/m² | 8 bits |
Total visible nodes: 35 bits
Query:
SELECT
recorded_at,
temperature_f,
humidity,
pressure_inhg,
wind_mph,
solar_radiation
FROM tempest_readings
WHERE recorded_at >= DATE_SUB(NOW(), INTERVAL 30 DAY)
ORDER BY recorded_at ASC;
3.3 Binary Encoding
Continuous values are discretized into binary representations using uniform quantization:
def encode_continuous(value, min_val, max_val, n_bits):
"""Encode continuous value to binary array."""
# Clamp to range
value = max(min_val, min(max_val, value))
# Normalize to [0, 1]
normalized = (value - min_val) / (max_val - min_val)
# Convert to integer in [0, 2^n_bits - 1]
int_val = int(normalized * (2**n_bits - 1))
# Convert to binary array
return np.array([(int_val >> i) & 1 for i in range(n_bits)], dtype=np.int32)
# Encoding specification
TEMPEST_ENCODING = {
'temperature_f': {'min': 20, 'max': 100, 'bits': 8},
'humidity': {'min': 0, 'max': 100, 'bits': 7},
'pressure_inhg': {'min': 28, 'max': 32, 'bits': 6},
'wind_mph': {'min': 0, 'max': 50, 'bits': 6},
'solar_radiation': {'min': 0, 'max': 1200, 'bits': 8},
}
3.4 Mesh Architecture
from thrml import SpinNode, Block, SamplingSchedule, sample_states
from thrml.models import IsingEBM, IsingSamplingProgram, hinton_init
import jax.numpy as jnp
# Configuration
N_VISIBLE = 35 # Input nodes (encoded sensor values)
N_HIDDEN = 100 # Hidden layer nodes
N_TOTAL = N_VISIBLE + N_HIDDEN
# Create nodes
visible_nodes = [SpinNode() for _ in range(N_VISIBLE)]
hidden_nodes = [SpinNode() for _ in range(N_HIDDEN)]
all_nodes = visible_nodes + hidden_nodes
# Bipartite connectivity: each visible connects to all hidden
edges = [(v, h) for v in visible_nodes for h in hidden_nodes]
# Initialize weights (will be learned)
biases = jnp.zeros((N_TOTAL,))
weights = jnp.zeros((len(edges),)) # Initialize flat, learn structure
beta = jnp.array(1.0) # Inverse temperature
model = IsingEBM(all_nodes, edges, biases, weights, beta)
3.5 Training Procedure
Using Contrastive Divergence (CD-k):
def train_rbm(model, data, n_epochs=100, learning_rate=0.01, k=1):
"""Train RBM using CD-k."""
for epoch in range(n_epochs):
for batch in data_loader(data, batch_size=32):
# Positive phase: clamp visible, sample hidden
v_pos = batch
h_pos = sample_hidden(model, v_pos)
# Negative phase: k steps of Gibbs sampling
v_neg, h_neg = gibbs_sample(model, v_pos, k=k)
# Update weights
model.weights += learning_rate * (
outer(v_pos, h_pos) - outer(v_neg, h_neg)
).mean(axis=0)
# Update biases
model.biases[:N_VISIBLE] += learning_rate * (v_pos - v_neg).mean(axis=0)
model.biases[N_VISIBLE:] += learning_rate * (h_pos - h_neg).mean(axis=0)
# Log energy statistics
energy = compute_energy(model, data)
print(f"Epoch {epoch}: mean_energy={energy.mean():.4f}")
3.6 Success Criteria
- Baseline Energy: After training, compute mean energy for held-out February data
- Anomaly Injection: Test with:
- July temperatures (85°F) injected into February context
- Pressure at 29.0 inHg (storm) with solar radiation at 1000 W/m² (clear sky)
- Wind at 40 mph with humidity at 10% (unusual combination)
- Threshold: Anomalous inputs should produce energy > 2σ above baseline mean
4. Experiment B: BirdWeather Species Mesh
4.1 Objective
Train a Boltzmann machine on species co-occurrence patterns. Validate that the mesh encodes relational structure and exhibits tension when expected species are absent.
4.2 Data Extraction
Source Table: birdweather_detections
Species Selection: Rather than top-N by count, we explicitly select:
| Category | Species | Rationale |
|---|---|---|
| Raptors (predators) | Cooper's Hawk, Red-tailed Hawk, Merlin | Trigger community-wide alarm/silence |
| Resident songbirds | Spotted Towhee, Song Sparrow, Dark-eyed Junco, Bewick's Wren, Black-capped Chickadee, Chestnut-backed Chickadee, Bushtit, American Robin, Northern Flicker, California Scrub-Jay, American Crow | React to raptor presence |
| Additional context | Lesser Goldfinch, House Finch, Anna's Hummingbird, Golden-crowned Sparrow | Seasonal/behavioral variation |
Total species nodes: 18 (3 raptors + 15 songbirds)
Aggregation: 15-minute presence/absence windows (finer than hourly to capture silence events)
Query:
-- Species list with categories
SELECT
s.id as species_id,
s.common_name,
CASE
WHEN s.common_name IN ('Cooper''s Hawk', 'Red-tailed Hawk', 'Merlin') THEN 'raptor'
ELSE 'songbird'
END as category,
COUNT(*) as detection_count
FROM birdweather_detections bd
JOIN species s ON bd.species_id = s.id
WHERE bd.detected_at >= DATE_SUB(NOW(), INTERVAL 90 DAY)
AND bd.confidence >= 0.7
AND s.common_name IN (
'Cooper''s Hawk', 'Red-tailed Hawk', 'Merlin',
'Spotted Towhee', 'Song Sparrow', 'Dark-eyed Junco',
'Bewick''s Wren', 'Black-capped Chickadee', 'Chestnut-backed Chickadee',
'Bushtit', 'American Robin', 'Northern Flicker',
'California Scrub-Jay', 'American Crow',
'Lesser Goldfinch', 'House Finch', 'Anna''s Hummingbird',
'Golden-crowned Sparrow'
)
GROUP BY s.id, s.common_name
ORDER BY category, detection_count DESC;
-- 15-minute presence matrix
SELECT
DATE_FORMAT(detected_at, '%Y-%m-%d %H:') as hour_part,
LPAD(FLOOR(MINUTE(detected_at) / 15) * 15, 2, '0') as minute_bucket,
species_id,
1 as present
FROM birdweather_detections
WHERE detected_at >= DATE_SUB(NOW(), INTERVAL 90 DAY)
AND confidence >= 0.7
AND species_id IN (/* selected species_ids */)
GROUP BY hour_part, minute_bucket, species_id;
4.3 Binary Encoding
Species presence is naturally binary. Raptors and songbirds are encoded separately to facilitate analysis:
def build_presence_matrix(detections_df, species_list, time_buckets):
"""Build binary presence matrix [time_buckets × species]."""
matrix = np.zeros((len(time_buckets), len(species_list)), dtype=np.int32)
for idx, bucket in enumerate(time_buckets):
present = detections_df[detections_df['time_bucket'] == bucket]['species_id'].unique()
for sp_idx, sp_id in enumerate(species_list):
if sp_id in present:
matrix[idx, sp_idx] = 1
return matrix
# Node allocation
N_RAPTORS = 3 # Cooper's Hawk, Red-tailed Hawk, Merlin
N_SONGBIRDS = 15 # Resident and seasonal songbirds
N_SPECIES = N_RAPTORS + N_SONGBIRDS # 18 total
HOUR_OF_DAY = 5 # 5 bits for hour (0-23)
MONTH = 4 # 4 bits for month (1-12)
N_VISIBLE = N_SPECIES + HOUR_OF_DAY + MONTH # 27 visible nodes
4.4 Mesh Architecture
# Configuration
N_VISIBLE = 27 # 18 species + 5 hour + 4 month
N_HIDDEN = 50 # Hidden layer
N_TOTAL = N_VISIBLE + N_HIDDEN
# Node indices for analysis
RAPTOR_INDICES = [0, 1, 2] # First 3 nodes
SONGBIRD_INDICES = list(range(3, 18)) # Next 15 nodes
# Create nodes
visible_nodes = [SpinNode() for _ in range(N_VISIBLE)]
hidden_nodes = [SpinNode() for _ in range(N_HIDDEN)]
all_nodes = visible_nodes + hidden_nodes
# Bipartite edges (visible-hidden)
edges = [(v, h) for v in visible_nodes for h in hidden_nodes]
# Raptor-songbird lateral connections (capture suppression relationship)
for raptor_idx in RAPTOR_INDICES:
for songbird_idx in SONGBIRD_INDICES:
edges.append((visible_nodes[raptor_idx], visible_nodes[songbird_idx]))
# Songbird-songbird lateral connections (capture co-occurrence)
for i in SONGBIRD_INDICES:
for j in SONGBIRD_INDICES:
if i < j:
edges.append((visible_nodes[i], visible_nodes[j]))
model = IsingEBM(all_nodes, edges, biases, weights, beta)
Edge count:
- Bipartite: 27 × 50 = 1,350
- Raptor-songbird: 3 × 15 = 45
- Songbird-songbird: C(15,2) = 105
- Total: 1,500 edges
4.5 Training Procedure
Same CD-k procedure as Experiment A, but with:
- Batch size: 64 (more samples per update)
- k=5 (more Gibbs steps for discrete structure)
- Learning rate: 0.001 (smaller for stability)
4.6 Success Criteria
-
Baseline Energy: Compute mean energy for:
- Typical February morning (7-9 AM): songbirds active, no raptors
- Typical raptor event: raptor=1, most songbirds=0 (learned silence)
-
Relational Tests:
| Scenario | Raptor | Songbirds | Expected Energy | Interpretation |
|---|---|---|---|---|
| Normal morning | 0 | Active (many=1) | Low | Equilibrium |
| Raptor hunting | 1 | Silent (most=0) | Low | Learned normal response |
| Unexplained silence | 0 | Silent (most=0) | High | Tension: why silent? |
| Unusual boldness | 1 | Active (many=1) | High | Tension: should be hiding |
-
Threshold: Anomalous scenarios (unexplained silence, unusual boldness) should produce energy > 2σ above the appropriate baseline
-
Absence-as-Signal Validation: The mesh should register higher tension for "raptor=0, songbirds=0" than for simple low-activity periods (e.g., nighttime). The context of silence matters.
5. Database Integration
5.1 New Tables
-- Model registry
CREATE TABLE thrml_models (
id INT NOT NULL AUTO_INCREMENT,
model_name VARCHAR(100) NOT NULL,
model_type ENUM('tempest', 'birdweather', 'combined') NOT NULL,
description TEXT,
n_visible INT NOT NULL,
n_hidden INT NOT NULL,
n_edges INT NOT NULL,
training_start DATETIME,
training_end DATETIME,
training_samples INT,
weights_path VARCHAR(500),
baseline_energy_mean DECIMAL(10,4),
baseline_energy_std DECIMAL(10,4),
status ENUM('training', 'ready', 'archived') DEFAULT 'training',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id),
KEY idx_model_type (model_type),
KEY idx_status (status)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
-- Inference results (time series)
CREATE TABLE thrml_inference (
id BIGINT NOT NULL AUTO_INCREMENT,
model_id INT NOT NULL,
inferred_at DATETIME(6) NOT NULL,
energy DECIMAL(12,6) NOT NULL,
energy_zscore DECIMAL(8,4),
mixing_time_ms INT,
tension_level ENUM('normal', 'elevated', 'high', 'critical'),
input_hash CHAR(32),
input_summary JSON,
notes TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id),
KEY idx_model_inferred (model_id, inferred_at),
KEY idx_tension (tension_level),
CONSTRAINT fk_inference_model FOREIGN KEY (model_id)
REFERENCES thrml_models(id) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
-- Anomaly events (flagged for review)
CREATE TABLE thrml_anomalies (
id INT NOT NULL AUTO_INCREMENT,
inference_id BIGINT NOT NULL,
model_id INT NOT NULL,
detected_at DATETIME NOT NULL,
anomaly_type VARCHAR(50),
energy DECIMAL(12,6),
energy_zscore DECIMAL(8,4),
description TEXT,
input_snapshot JSON,
reviewed TINYINT(1) DEFAULT 0,
review_notes TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id),
KEY idx_model_detected (model_id, detected_at),
KEY idx_reviewed (reviewed),
CONSTRAINT fk_anomaly_inference FOREIGN KEY (inference_id)
REFERENCES thrml_inference(id) ON DELETE CASCADE,
CONSTRAINT fk_anomaly_model FOREIGN KEY (model_id)
REFERENCES thrml_models(id) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
5.2 Integration Pattern
┌─────────────────────────────────────────────────────────────────┐
│ MySQL │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────────────────┐ │
│ │ tempest_ │ │ birdweather_ │ │ thrml_models │ │
│ │ readings │ │ detections │ │ thrml_inference │ │
│ └──────┬───────┘ └──────┬───────┘ │ thrml_anomalies │ │
│ │ │ └───────────┬───────────┘ │
└─────────┼─────────────────┼──────────────────────┼──────────────┘
│ read │ read │ write/read
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ Python Layer │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Training │ │ Inference │ │ Anomaly │ │
│ │ (manual) │ │ (cron) │ │ Detection │ │
│ └────────────┘ └────────────┘ └────────────┘ │
│ │ │ │ │
│ └──────────────┴───────────────┘ │
│ │ │
│ THRML / JAX / M4 Max │
└─────────────────────────────────────────────────────────────────┘
▲
│ read only
▼
┌─────────────────────────────────────────────────────────────────┐
│ PHP/LAMP Layer │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Dashboard │ │ Energy │ │ Anomaly │ │
│ │ Overview │ │ Time Series│ │ Review │ │
│ └────────────┘ └────────────┘ └────────────┘ │
└─────────────────────────────────────────────────────────────────┘
5.3 Operational Flow
Training (manual, ~monthly):
cd ~/Macroscope/THRML_POC
source ~/thrml-poc/bin/activate
python training/train_tempest.py --days 90
python training/train_bird.py --days 90
Writes to thrml_models, stores weights in filesystem.
Inference (cron, every 15 minutes):
*/15 * * * * /Users/mikehamilton/thrml-poc/bin/python /Users/mikehamilton/Macroscope/THRML_POC/inference/run_inference.py
Reads latest sensor data, runs mesh, writes to thrml_inference.
If energy > 2σ, also writes to thrml_anomalies.
Display (PHP, on request):
- Query
thrml_inferencefor recent energy time series - Query
thrml_anomaliesfor unreviewed events - Display tension level, trends, flagged events
5.4 Tension Level Thresholds
| Level | Z-Score | Interpretation |
|---|---|---|
| normal | < 1.5 | Typical conditions |
| elevated | 1.5 - 2.0 | Minor deviation |
| high | 2.0 - 3.0 | Notable anomaly |
| critical | > 3.0 | Significant event, flag for review |
6. Implementation Files
6.1 Directory Structure
~/Macroscope/THRML_POC/
├── config.py # Database credentials, encoding params
├── data/
│ ├── tempest_loader.py # Tempest data extraction
│ └── bird_loader.py # BirdWeather data extraction
├── models/
│ ├── tempest_rbm.py # Experiment A mesh
│ └── bird_rbm.py # Experiment B mesh
├── training/
│ ├── train_tempest.py # Training script A
│ └── train_bird.py # Training script B
├── evaluation/
│ ├── anomaly_test.py # Anomaly injection tests
│ └── energy_plots.py # Visualization
└── notebooks/
└── poc_exploration.ipynb # Interactive development
6.2 Deliverables
| Artifact | Description |
|---|---|
tempest_model.pkl |
Trained Tempest RBM weights |
bird_model.pkl |
Trained BirdWeather RBM weights |
baseline_energies.json |
Normal state energy statistics |
anomaly_results.json |
Anomaly injection test results |
energy_distribution.png |
Histogram of normal vs anomaly energies |
7. Scaling Path
7.1 Phase 1: Local Validation (This Spec)
- Data source: Data (real-time) or recent Galatea export
- Training window: 30-90 days
- Mesh size: 135 nodes (Tempest), 77 nodes (BirdWeather)
7.2 Phase 2: Full Baseline Training
- Data source: Galatea (full archive)
- Training window: 12+ months (capture seasonal structure)
- Mesh size: Scale to 1,000-5,000 nodes
7.3 Phase 3: Multi-Stream Integration
- Combine Tempest + BirdWeather into unified mesh
- Add temporal hierarchy (daily/seasonal/annual layers)
- Target: 50,000+ nodes as described in CNL-TN-2026-014
8. Risk Factors
| Risk | Mitigation |
|---|---|
| JAX/Metal compatibility issues | Fall back to CPU; THRML supports both |
| Insufficient training data on Data | Pull 90-day export from Galatea |
| Encoding loses ecological signal | Experiment with thermometer vs binary encoding |
| Mesh fails to learn structure | Start with known-good Ising parameters; tune beta |
9. References
[1] Jelinčič, A., et al. (2025). "An efficient probabilistic hardware architecture for diffusion-like models." arXiv:2510.23972.
[2] Extropic Corp. (2025). "THRML: Thermodynamic Hypergraphical Model Library." https://github.com/extropic-ai/thrml
[3] Hamilton, M. P. (2026). "Embodied Ecological Sensing via Denoising Thermodynamic Models." CNL-TN-2026-014.
10. Document History
| Version | Date | Changes |
|---|---|---|
| 1.0 | 2026-02-01 | Initial specification |
Cite This Document
BibTeX
Permanent URL: https://canemah.org/archive/document.php?id=CNL-SP-2026-015