Single-Image 3D Reconstruction from 360° Imagery: Experimental Findings Using Apple SHARP
Single-Image 3D Reconstruction from 360° Imagery: Experimental Findings Using Apple SHARP
Document ID: CNL-FN-2026-XXX (auto-assigned)
Date: January 17, 2026
Author: Michael P. Hamilton, Ph.D.
Version: 0.2 (Post-experiment)
Status: Initial findings documented
AI Assistance Disclosure: This field note was developed with assistance from Claude (Anthropic, Opus 4.5). The AI contributed to literature review of the SHARP repository documentation, experimental design discussion, protocol development, software development, and manuscript drafting. The author takes full responsibility for the content, accuracy, and conclusions.
Abstract
This field note documents experimental findings from testing Apple's SHARP model for 3D reconstruction from 360° imagery. The original hypothesis—that cubemap faces from a spherical capture could be processed independently through SHARP and merged into a unified scene model—proved incorrect. SHARP generates independent coordinate systems for each input image with inconsistent depth scales, making geometric fusion infeasible. However, the experiment yielded a productive reframing: each cubemap face produces a valid "terrarium"—a measurable 3D frustum suitable for per-view analysis. We developed a complete toolkit including high-resolution cubemap extraction, a custom WebGL2 Gaussian splatting renderer matching commercial quality, and format converters for GIS/CAD integration. Key finding: input image resolution significantly impacts splat quality; 1536px cubemap faces (from 6080×3040 source imagery) produce substantially better results than 512px extractions.
1. Background and Motivation
The original experimental design proposed using GPS-tagged 360° imagery as a registration framework for SHARP-based 3D reconstruction. The reasoning: cubemap decomposition extracts perspective views with mathematically defined angular relationships, potentially enabling deterministic alignment without feature matching.
This hypothesis required testing before field deployment.
2. Experimental Process
2.1 Pipeline Development
We built a processing pipeline with the following components:
Cubemap Extraction (equirect_to_cubemap.py): Extracts six perspective faces from equirectangular 360° images. Supports automatic resolution detection—for a 6080×3040 source, optimal face size is 1536px (source width ÷ 4).
SHARP Processing: Each cubemap face processed independently:
sharp predict -i front.jpg -o front.ply
Output: ~1.18 million 3D Gaussians per face (fixed by SHARP architecture, regardless of input resolution).
Visualization: Custom WebGL2 Gaussian splatting renderer (splat_viewer.html) achieving visual quality comparable to SuperSplat.
Format Conversion (splat_convert.py): Export to .splat (web), .xyz/.las (GIS), .obj (CAD), .csv (analysis).
2.2 The Merge Attempt
Initial attempts to merge six cubemap faces into a unified spherical model failed. Investigation revealed the core problem:
Each SHARP prediction exists in its own coordinate system with independent depth scale.
| Face | Z Range (meters) | Span |
|---|---|---|
| down | 0.8 – 3.4 | 2.6m |
| up | 1.4 – 17.5 | 16.1m |
| left | 1.1 – 66.7 | 65.6m |
| back | 0.9 – 80.2 | 79.3m |
| right | 1.3 – 132.3 | 131.0m |
| front | 1.2 – 135.9 | 134.7m |
The side faces look through forest to distant points 100+ meters away; up/down have compact ranges. These scales cannot be reconciled through rotation alone—the depth predictions are fundamentally incompatible at boundaries.
2.3 Reframing: The Terrarium Model
Rather than a failed merge, we reframed the output as six independent measurement windows. Each cubemap face produces a valid 3D frustum—a "terrarium" with analyzable structure:
- Ground plane detection
- Height distribution analysis
- Vegetation strata classification
- Canopy cover estimation (for "up" face)
This reframing aligns with how SHARP was designed: single-image view synthesis for nearby camera movements, not omnidirectional scene reconstruction.
2.4 Resolution Discovery
A critical finding: input resolution significantly impacts output quality.
Initial tests used 512px cubemap faces (from a downscaled 1568×784 test image). Results were visibly inferior to the SuperSplat reference viewer.
Re-extracting at 1536px (from the full 6080×3040 source) produced dramatically sharper splats, matching SuperSplat's rendering quality in our custom viewer.
SHARP internally resizes all inputs to 1536×1536 and outputs a fixed ~1.18M Gaussians. However, starting with higher-resolution input preserves finer detail before this internal resize.
Recommendation: Always extract cubemap faces at source width ÷ 4, minimum 1024px, optimal 1536px.
3. Toolkit Developed
3.1 Cubemap Extractor
equirect_to_cubemap.py
python3 equirect_to_cubemap.py source.jpg ./output --size 1536
python3 equirect_to_cubemap.py source.jpg ./output --size max # auto-detect
Features:
- Bilinear interpolation for quality extraction
- Metadata JSON with rotation parameters
- Support for standard cubemap (6 faces) or denser grids
3.2 Gaussian Splat Viewer
splat_viewer.html
Custom WebGL2 renderer implementing proper Gaussian splatting:
- Vertex shader projects 3D covariance → 2D screen-space ellipse
- Fragment shader computes Gaussian alpha falloff
- Back-to-front depth sorting for correct blending
- Uses all Gaussian attributes: position, scale (3D), rotation (quaternion), color, opacity
- Retina/HiDPI support via devicePixelRatio
- Orbit controls with zoom range 0.1–50
Performance: 60 FPS on M4 Max with 1.18M Gaussians.
3.3 Format Converter
splat_convert.py
python3 splat_convert.py input.ply output.splat # Web viewer format
python3 splat_convert.py input.ply output.xyz # Point cloud (ASCII)
python3 splat_convert.py input.ply output.las # LIDAR/GIS format
python3 splat_convert.py input.ply output.obj # Blender/CAD
python3 splat_convert.py input.ply output.csv # Full data for analysis
python3 splat_convert.py input.ply output.ply --standard # MeshLab compatible
4. Key Findings
4.1 What Works
- Per-face 3D reconstruction: Each cubemap face produces valid, analyzable 3D structure
- Rapid processing: <1 second per face on Apple Silicon (MPS)
- High visual quality: Custom renderer matches commercial SuperSplat quality
- Format flexibility: Export chain to GIS, CAD, and web platforms
- Resolution scaling: Higher input resolution → finer splat detail
4.2 What Doesn't Work
- Spherical merge: Independent depth scales prevent geometric fusion
- Cross-face consistency: Adjacent faces don't align at boundaries
- Metric accuracy: Depth is plausible but not calibrated to real-world scale without reference objects
4.3 Implications for Field Use
SHARP + 360° imagery is not a replacement for photogrammetric reconstruction of unified scene models. However, it offers:
- Rapid single-view 3D documentation
- Per-direction habitat structure analysis
- Visual exploration of forest structure
- Educational/outreach 3D content
- Preliminary site assessment before committing to full photogrammetry
5. Future Directions
5.1 WebXR Integration
The custom viewer could be extended with Three.js WebXR support for immersive exploration. Alternatively, GaussianSplats3D library provides ready-made WebXR compatibility with our PLY files.
5.2 Analysis Tools
Height profiling, vegetation strata classification, and canopy cover analysis could be integrated into the viewer or developed as CLI tools operating on the Gaussian data.
5.3 Multi-Station Capture
While single-sphere merge fails, between-station registration via GPS + ICP might still enable corridor or transect documentation, treating each station as a separate 6-terrarium sampling point rather than attempting unified reconstruction.
6. Files Delivered
| File | Purpose |
|---|---|
equirect_to_cubemap.py |
Extract perspective faces from 360° images |
splat_viewer.html |
WebGL2 Gaussian splat renderer |
splat_convert.py |
Format conversion (PLY → splat/xyz/las/obj/csv) |
7. Connection to Macroscope
This experiment exemplifies the Macroscope approach: rapid prototyping to evaluate emerging tools, honest documentation of both successes and limitations, and productive reframing when initial hypotheses fail.
The "terrarium" model—six measurable windows rather than one unified sphere—may prove more useful for certain ecological questions than the originally envisioned seamless reconstruction. Vertical structure analysis (down face), canopy openness (up face), and directional habitat characterization (cardinal faces) are valid measurement targets even without geometric fusion.
The toolkit developed here joins the Macroscope instrument collection: sensors and software for technology-mediated environmental observation.
References
[1] Apple Machine Learning Research (2025). "SHARP: Single-image High-Accuracy Real-time Parallax." GitHub repository. https://github.com/apple/ml-sharp
[2] Kerbl, B., Kopanas, G., Leimkühler, T., & Drettakis, G. (2023). "3D Gaussian Splatting for Real-Time Radiance Field Rendering." SIGGRAPH 2023.
Document History
| Version | Date | Changes |
|---|---|---|
| 0.1 | 2026-01-17 | Initial draft, pre-field trial |
| 0.2 | 2026-01-17 | Post-experiment: documented merge failure, terrarium reframing, toolkit development, resolution findings |
Cite This Document
BibTeX
Permanent URL: https://canemah.org/archive/document.php?id=CNL-TN-2026-005