The Web has evolved into a dominant digital medium for conducting many types of online transactions such as shopping, paying bills, making travel plans, etc. Such transactions typically involve a number of steps spanning several Web pages. For sighted users these steps are relatively straightforward to do with graphical Web browsers. But they pose tremendous challenges for visually impaired individuals. This is because screen readers, the dominant assistive technology used by visually impaired users, function by speaking out the screen's content serially. Consequently, using them for conducting transactions can cause considerable information overload. But usually one needs to browse only a small fragment of a Web page to do a step of a transaction (e. g., choosing an item from a search results list). Based on this observation this paper presents a model-directed transaction framework to identify, extract and aurally render only the "relevant" page fragments in each step of a transaction. The framework uses a process model to encode the state of the transaction and a concept model to identify the page fragments relevant for the transaction in that state. We also present algorithms to mine such models from click stream data generated by transactions and experimental evidence of the practical effectiveness of our models in improving user experience when conducting online transactions with non-visual modalities. © 2011 Springer Science+Business Media, LLC.