Existing techniques for object tracking with Multiple Instance Learning take the approach of extracting low-level patches of fixed size and aspect ratios within each image, and employ many simplistic assumptions. In this work, we propose an approach that automatically utilizes image segments as input primitives to develop a multi-level segmentation-based system, and build a target model refinement procedure that learns the optimal model corresponding to the target object. To go beyond existing restrictive assumptions, we further develop automatic scene environmental models to assign prior probabilities to segment instances of belonging to the target vs scene. We demonstrate impressive qualitative and quantitative results with tracking sequences in typical outdoor surveillance settings. © 2012 ICPR Org Committee.