In this paper we present our current work on automatic speaker recognition using keyword-conditioned phone N-gram modeling. We propose the use of contextual information around keywords in modeling a speaker's pronunciation characteristics at a phonetic level. Our approach is to add time margins around keywords when aligning keyword regions with keyword-specific phone events for feature vector generation. Including such additional information by incorporating time margins can capture idiosyncratic pronunciation information and is shown to help our keyword-conditioned phonetic speaker verification system achieve more than 50% (relative) performance improvement. This leads our high-level speaker verification system (i.e., fusion of non-conditioned and keyword-conditioned phonetic speaker verification systems) to currently achieve the best published result for the English 8-conversation enrollment telephony task of the 2008 NIST Speaker Recognition Evaluation for systems utilizing features not based directly on low-level acoustic information. © 2012 IEEE.