Self-supervised language models called transformers have recently revolutionized natural language processing and show tremendous potential when applied to text-based representations of chemical reactions. The patterns in chemical reactions are learned by predicting masked parts of reaction SMILES. The pretrained models can then be specialized on a task like reaction classification, where they reach unprecedented accuracies. Not only can specific outputs of the transformer models serve as fingerprints to map the chemical reaction space without the need of knowing the reaction center or distinguishing between reactants and reagents, but they can also be used to recover the rearrangement between reactant and product atoms. By opening the black-box using detailed visual analysis, we discovered that the transformer models learned atom-mapping without supervision. Atom-mapping, known to be an NP-hard problem, is necessary for making chemical reaction data better machine-accessible and crucial for graph- and template-based reaction prediction and synthesis planning approaches. Here, we present an attention-guided reaction mapper that shows remarkable performance in terms of speed and accuracy, even for strongly imbalanced reactions as typically found in patents. This work is the first demonstration of knowledge extraction from a self-supervised language model with a direct practical application in the chemical reaction domain.