Addressing Limitations of Encoder-Decoder Based Approach to Text-to-SQL
Abstract
Most attempts on Text-to-SQL task using encoder-decoder approach show a big problem of dramatic decline in performance for new databases. For the popular Spider dataset, despite models achieving 70\% accuracy on its development or test sets, the same models show a huge decline below 20\% accuracy for unseen databases. This problem cannot be resolved by adding new training data since it is expensive to create training examples for new database and the number of examples is limited. In this paper we address the problem and propose a solution that is a hybrid system using automated training-data augmentation technique. Our system consists of a rule-based and a deep learning components that interact to understand crucial information in a given query and produce correct SQL as a result. It constantly achieves double-digit percentage improvement for databases that are not part of the Spider corpus.