Human action, as the basic unit of most human-relevant video content, bridges the gap between low-level visual features and high-level semantics. Human action recognition is of great significance in the applications of human-computer interaction, intelligent video surveillance, video retrieval and search. In this paper, we propose a novel unsupervised approach to mining categories from action video sequences, which consists of two modules: action representation for video data structurization and learning model for unsupervised categorization. In action representation, a novel view of video decomposition is presented. Videos are regarded as spatially distributed dynamic pixel time series, and these dynamic pixels are first quantized into pixel prototypes. After replacing the pixel time series with their corresponding prototype labels, the video sequences are compressed into two-dimensional action matrices. In the learning model, we put these matrices together to form an multi-action tensor, and propose the joint matrix factorization method to simultaneously cluster the pixel prototypes into pixel signatures, and matrices into action classes with the consideration of the duality between pixel clustering and action clustering. The approach is tested on public and popular Weizmann, and KTH datasets, and promising results are achieved. © 2006 IEEE.