Run-time performance optimization of a BigData query language

Yanbin Liu; Parijat Dube; Scott C. Gray

doi:10.1145/2568088.2576800

ICPE 2014

Conference paper

22 Mar 2014

Run-time performance optimization of a BigData query language

View publication

Abstract

JAQL is a query language for large-scale data that connects BigData analytics and MapReduce framework together. Also an IBM product, JAQL's performance is critical for IBM In-foSphere BigInsights, a BigData analytics platform. In this paper, we report our work on improving JAQL performance from multiple perspectives. We explore the parallelism of JAQL, profile JAQL for performance analysis, identify I/O as the dominant performance bottleneck, and improve JAQL performance with an emphasis on reducing I/O data size and increasing (de)serialization efficiency. With TPCH benchmark on a simple Hadoop cluster, we report up to 2x performance improvements in JAQL with our optimization fixes. Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Conference paper