Online Scene Change Detection (SCD) is an extremely challenging problem that requires an agent to detect relevant changes on the fly while observing the scene from unconstrained viewpoints. Existing online SCD methods are significantly less accurate than offline approaches. We present the first online SCD approach that is pose-agnostic, label-free, and ensures multi-view consistency, while operating at over 10 FPS and achieving new state-of-the-art performance, surpassing even the best offline approaches.
Our method introduces a new self-supervised fusion loss to infer scene changes from multiple cues and observations, PnP-based fast pose estimation against the reference scene, and a fast change-guided representation update strategy for the 3D Gaussian Splatting. Extensive experiments on complex real-world datasets demonstrate that our approach outperforms both online and offline baselines.
Our online Scene Change Detection method establishes a new state of the art, detecting changes more reliably than all prior methods, including the strongest offline baselines. It operates at a runtime comparable to the fastest online approaches while achieving substantially higher F1 scores. These gains are enabled by a self-supervised loss enforcing multi-view consistency and a lightweight PnP-based pose estimation module.
We register an incoming inference image Iinfk to an existing reference representation ℛref with a lightweight PnP-based pose estimator. Using the estimated pose Pinfk and ℛref to render an aligned image Irenk, we extract change cues Ck as a combination of pixel- and feature-level cues. Our novel self-supervised fusion loss LSSF guides the fusion of all observed change cues to build a change representation ℛchange that collectively learns change information from multiple viewpoints and infer change masks Mk. Finally, we selectively reconstruct changed regions to update the 3D Gaussian Splatting representation to the current state ℛinf.
Qualitative comparison with MV3DCD. MV3DCD's hard thresholding and intersection heuristic lead to missed or spurious detections, especially for subtle appearance changes in semantically similar objects (red-to-blue T-shaped object in Meeting Room, blue-to-black bench in Porch). Hard thresholding risks discarding subtle but important changes, while the intersection fails to capture true changes unless present in both masks. Our method jointly learns complementary change information in pixel- and feature-level cues via our novel self-supervised loss, capturing fine-grained changes and achieving state-of-the-art performance in both online and offline settings.
Quantitative results for SCD on PASLCD averaged over all 20 instances. LF: Label-Free, PA: Pose-Agnostic, MV: Multi-View consistency for change detection. We report total runtime for offline methods and operating frame rate (FPS) for online methods. Our method achieves the best performance in both settings.
Qualitative comparison of rendered views from the updated representation with CLNeRF and 3DGS (from scratch). Our method more accurately reconstructs changed regions (red boxes) while reusing primitives from ℛref to preserve high fidelity in unchanged areas (yellow boxes), compared to naïve reconstruction at each time.
Quantitative comparison of scene representation update on PASLCD and CL-Splats. Our method achieves comparable or higher reconstruction quality than approaches that fully re-optimize the evolved scene from scratch, while providing updated representations within seconds (<60s), achieving up to 8–9× faster runtimes. Results are averaged over all instances and scenes.
@article{galappaththige2024online,
title={Changes in Real Time: Online Scene Change Detection with Multi-View Fusion},
author={Galappaththige, Chamuditha Jayanga and Lai, Jason and Windrim, Lloyd and Dansereau, Donald and S{\"u}nderhauf, Niko and Miller, Dimity},
journal={arXiv preprint arXiv:2511.12370},
year={2024}
}
This work was supported by the ARC Research Hub in Intelligent Robotic Systems for Real-Time Asset Management (IH210100030) and Abyss Solutions. C.J., N.S., and D.M. also acknowledge ongoing support from the QUT Centre for Robotics.