Current large-scale implementations of deep learning and data mining require thousands of processors, massive amounts of off-chip memory, and consume gigajoules of energy. New memory technologies, such as nanoscale two-terminal resistive switching memory devices, offer a compact, scalable, and low-power alternative that permits on-chip colocated processing and memory in fine-grain distributed parallel architecture. Here, we report the first use of resistive memory devices for implementing and training a restricted Boltzmann machine (RBM), a generative probabilistic graphical model as a key component for unsupervised learning in deep networks. We experimentally demonstrate a 45-synapse RBM realized with 90 resistive phase change memory (PCM) elements trained with a bioinspired variant of the contrastive divergence algorithm, implementing Hebbian and anti-Hebbian weight updates. The resistive PCM devices show a twofold to tenfold reduction in error rate in a missing pixel pattern completion task trained over 30 epochs, compared with untrained case. Measured programming energy consumption is 6.1 nJ per epoch with the PCM devices, a factor of 150 times lower than the conventional processor-memory systems. We analyze and discuss the dependence of learning performance on cycle-to-cycle variations and number of gradual levels in the PCM analog memory devices.