P. Trespeuch, Y. Fournier, et al.
Civil-Comp Proceedings
Abstract Active vision enables dynamic and robust visual perception, offering an alternative to the static, passive nature of feedforward architectures commonly used in computer vision, which depend on large datasets and high computational resources. Biological selective attention mechanisms allow agents to focus on salient regions of interest (ROIs), reducing computational demand while maintaining real-time responsiveness. Event-based cameras, inspired by the mammalian retina, further enhance this capability by capturing asynchronous scene changes, enabling efficient, low-latency processing. To distinguish moving objects while the event-based camera is also in motion, the agent requires an object motion segmentation mechanism to accurately detect targets and position them at the centre of the visual field (fovea). Integrating event-based sensors with neuromorphic algorithms represents a paradigm shift, using spiking neural networks (SNNs) to parallelise computation and adapt to dynamic environments. This work presents a spiking convolutional neural network bioinspired attention system for selective attention through object motion sensitivity. The system generates events via fixational eye movements using a dynamic vision sensor integrated into the Speck neuromorphic hardware, mounted on a Pan-Tilt unit, to identify the ROI and saccade toward it. The system, characterised using ideal gratings and benchmarked against the event camera motion segmentation dataset, reaches a mean IoU of 82.2% and a mean structural similarity index of 96% in multi-object motion segmentation. Additionally, the detection of salient objects reaches an accuracy of 88.8% in office scenarios and 89.8% in challenging indoor and outdoor low-light conditions, as evaluated on the event-assisted low-light video object segmentation dataset. A real-time demonstrator showcases the system’s capabilities of detecting the salient object through object motion sensitivity in 0.124 s in dynamic scenes. Its learning-free design ensures robustness across diverse perceptual scenes, making it a reliable foundation for real-time robotic applications and serving as a basis for more complex architectures. Media: The accompanying video can be found online7 7 https://youtu.be/dcAJlDgVR0o. .
P. Trespeuch, Y. Fournier, et al.
Civil-Comp Proceedings
Hong-linh Truong, Maja Vukovic, et al.
ICDH 2024
Atul Kumar
ISEC 2025
Daniel Karl I. Weidele, Hendrik Strobelt, et al.
SysML 2019