Conditional random fields (CRFs) are a flexible yet powerful probabilistic approach and have shown advantages for popular applications in various areas, including text analysis, bioinformatics, and computer vision. Traditional CRF models, however, are incapable of selecting relevant features as well as suppressing noise from noisy original features. Moreover, conventional optimization methods often converge slowly in solving the training procedure of CRFs, and will degrade significantly for tasks with a large number of samples and features. In this paper, we propose robust CRFs (RCRFs) to simultaneously select relevant features. An optimal gradient method (OGM) is further designed to train RCRFs efficiently. Specifically, the proposed RCRFs employ the ℓ<inf>1</inf> norm of the model parameters to regularize the objective used by traditional CRFs, therefore enabling discovery of the relevant unary features and pairwise features of CRFs. In each iteration of OGM, the gradient direction is determined jointly by the current gradient together with the historical gradients, and the Lipschitz constant is leveraged to specify the proper step size. We show that an OGM can tackle the RCRF model training very efficiently, achieving the optimal convergence rate O(1/k<sup>2</sup>) (where k is the number of iterations). This convergence rate is theoretically superior to the convergence rate O(1/k) of previous first-order optimization methods. Extensive experiments performed on three practical image segmentation tasks demonstrate the efficacy of OGM in training our proposed RCRFs.