Jaehyun Kim, Heesu Kim, et al.
Neurocomputing
Today's common practice in developing conversational agents is pipelining off-the-shelf modularized services as ready-made building blocks. However, the discrete and sequential nature of the modules yields long response latency. We introduce Sci-Fii, a speculative inference framework accelerating conversational agent systems built with off-the-shelf modules, while keeping the modules unchanged.