Long short-term memory (LSTM) is an extension of the recurrent neural network (RNN) and has achieved excellent performance in various tasks, especially sequential problems. The LSTM is chain-structured, and its architecture is limited by sequential information propagation. In practice, it is hard to solve the problems involving very long term dependencies. Recent studies have indicated that adding recurrent skip connections across multiple timescales may help the RNN improve its performance in long-term dependencies. Moreover, capturing local features can improve the performance of the RNN. In this paper, we propose a novel architecture (Att-LSTM) for the LSTM, which connects continuous hidden states of previous time steps to the current time step and applies an attention mechanism to these hidden states. This architecture can not only capture local features effectively but also help learn very long-distance correlations in an input sequence. We evaluate Att-LSTM in various sequential tasks, such as adding problem, sequence classification, and character-level language modeling. In addition, to prove the generalization and practicality of the novel architecture, we design a character-level hierarchical Att-LSTM and refine the word representation with a highway network. This hierarchical model achieved excellent performance on question classification.