Video surveillance has become necessary to ensure public safety and to secure and protect critical physical infrastructures such as banks, airports and retail stores. Current surveillance systems are usually restricted to the deployment of 2D cameras. The effectiveness of these systems is limited due to 2D cameras sensitivity to illumination and to occlusions; which unavoidably leads to segmentation and tracking failures. With the latest developments in 3D sensing technologies, it is now possible to consider complementing 2D videos with 3D information in near real-time. This addition of a third dimension, i.e., depth information, ensures a more accurate sensing of the scene. We therefore propose to explore fusion approaches to address the shortfalls of current 2D video surveillance systems. Our first objective is to exploit fused data from different sources of visual information. These include different modalities, e.g., 2D cameras, Time-of-Flight range cameras, 3D laser scanners, and infrared cameras, and/or different levels of abstraction. After the data acquisition phase, as a second objective, we propose to model the fused data for an intelligent sensing tailored for real-world security scenarios. In the context of high-level security systems that require a full control over the surveyed area, we investigate multi-view 2D/3D fusion methods. Furthermore, for an accurate feature extraction for automatic scene understanding, we aim at achieving high-resolution space-time depth maps. Privacy enforcement in digital imaging systems is another important design component that is today impossible to ignore in view of the boiling debate about privacy violation. An interdisciplinary effort will hence be undertaken for the definition and design of privacy-aware imaging systems.