Supporting incremental join queries on ranked inputs
Abstract
This paper investigates the problem of incremental joins of multiple ranked data sets when the join condition is a list of arbitrary user-defined predicates on the input tuples. This problem arises in many important applications dealing with ordered inputs and multiple ranked data sets, and requiring the top k solutions. We use multimedia applications as the motivating examples but the problem is equally applicable to traditional database applications involving optimal resource allocation, scheduling, decision making, ranking, etc. We propose an algorithm J∗ that enables querying of ordered data sets by imposing arbitrary userdefined join predicates. The basic version of the algorithm does not use any random access but a J∗ PA variation can exploit available indexes for efficient random access based on the join predicates. A special case includes the join scenario considered by Fagin [1] for joins based on identical keys, and in that case, our algorithms perform as efficiently as Fagin's. Our main contribution, however, is the generalization to join scenarios that were previously unsupported, including cases where random access in the algorithm is not possible due to lack of unique keys. In addition, J∗ can support multiple join levels, or nested join hierarchies, which are the norm for modeling multimedia data. We also give ε-approximation versions of both of the above algorithms. Finally, we give strong optimality results for some of the proposed algorithms, and we study their performance empirically.