Conference paper

Addressing Limitations of Encoder-Decoder Based Approach to Text-to-SQL

Download paper


Most attempts on Text-to-SQL task using encoder-decoder approach show a big problem of dramatic decline in performance for new databases. Models trained on Spider dataset, despite achieving 75% accuracy on Spider development or test sets, show a huge decline below 20% accuracy for databases not in Spider. We present a system that combines automated training-data augmentation and ensemble technique. We achieve double-digit percentage improvement for databases that are not part of the Spider corpus.