Publication
BigData Congress 2014
Conference paper

Big R: Large-scale analytics on hadoop using R

View publication

Abstract

As the volume of available data continues to rapidly grow from a variety of sources, scalable and performant analytics solutions have become an essential tool to enhance business productivity and revenue. Existing data analysis environments, such as R, are constrained by the size of the main memory and cannot scale in many applications. This paper introduces Big R, a new platform which enables accessing, manipulating, analyzing, and visualizing data residing on a Hadoop cluster from the R user interface. Big R is inspired by R semantics and overloads a number of R primitives to support big data. Hence, users will be able to quickly prototype big data analytics routines without the need of learning a new programming paradigm. The current Big R implementation works on two main fronts: (1) data exploration, which enables R as a query language for Hadoop and (2) partitioned execution, allowing the execution of any R function on smaller pieces of a large dataset across the nodes in the cluster.

Date

Publication

BigData Congress 2014

Authors

Topics

Share