In this work, we study the performance of a two-stage ensemble visual machine learning framework for classification of medical images. In the first stage, models are built for subsets of features and data, and in the second stage, models are combined. We demonstrate the performance of this framework in four contexts: 1) The public ImageCLEF (Cross Language Evaluation Forum) 2013 medical modality recognition benchmark, 2) echocardiography view and mode recognition, 3) dermatology disease recognition across two datasets, and 4) a broad medical image dataset, merged from multiple data sources into a collection of 158 categories covering both general and specific medical concepts-including modalities, body regions, views, and disease states. In the first context, the presented system achieves state-of-art performance of 82.2% multiclass accuracy. In the second context, the system attains 90.48% multiclass accuracy. In the third, state-of-art performance of 90% specificity and 90% sensitivity is obtained on a small standardized dataset of 200 images using a leave-one-out strategy. For a larger dataset of 2,761 images, 95% specificity and 98% sensitivity is obtained on a 20% held-out test set. Finally, in the fourth context, the system achieves sensitivity and specificity of 94.7% and 98.4%, respectively, demonstrating the ability to generalize over domains.