We present a novel approach for vehicle detection in urban surveillance videos, capable of handling unstructured and crowded environments with large occlusions, different vehicle shapes, and environmental conditions such as lighting changes, rain, shadows, and reflections. This is achieved with virtually no manual labeling efforts. The system runs quite efficiently at an average of 66Hz on a conventional laptop computer. Our proposed approach relies on three key contributions: 1) a co-training scheme where data is automatically captured based on motion and shape cues and used to train a detector based on appearance information; 2) an occlusion handling technique based on synthetically generated training samples obtained through Poisson image reconstruction from image gradients; 3) massively parallel feature selection over multiple feature planes which allows the final detector to be more accurate and more efficient. We perform a comprehensive quantitative analysis to validate our approach, showing its usefulness in realistic urban surveillance settings. © 2010 IEEE.