Subsequence search, especially subsequence similarity search, is one of the most important subroutines in time series data mining algorithms, and there is increasing evidence that Dynamic Time Warping (DTW) is the best distance metric. However, in spite of the great effort in software speedup techniques, including early abandoning strategies, lower bound, indexing, computation-reuse, DTW still cost too much time for many applications, e.g. 80% of the total time. Since DTW is a 2-Dimension sequential dynamic search with quite high data dependency, it is hard to use parallel hardware to accelerate it. In this work, we propose a novel framework for FPGA based subsequence similarity search and a novel PE-ring structure for DTW calculation. This framework utilizes the data reusability of continuous DTW calculations to reduce the bandwidth and exploit the coarse-grain parallelism; meanwhile guarantees the accuracy with a two-phase precision reduction. The PE-ring supports on-line updating patterns of arbitrary lengths, and utilizes the hard-wired synchronization of FPGA to realize the fine-grained parallelism. It also achieves flexible parallelism degree to do performance-cost trade-off. The experimental results show that we can achieve several orders of magnitude speedup in accelerating subsequence similarity search compared with the best software and current GPU/FPGA implementations in different datasets. © 2013 ACM.