About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
MEMSYS 2017
Conference paper
Identifying the potential of near data processing for Apache Spark
Abstract
While cluster computing frameworks are continuously evolv- ing to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analyt- ics for being a unified framework for both, batch and stream data processing. There is also a renewed interest in Near Data Processing (NDP) due to technological advancement in the last decade. However, it is not known if NDP archi- tectures can improve the performance of big data processing frameworks such as Apache Spark. In this paper, we build the case of NDP architecture comprising programmable logic based hybrid 2D integrated processing-in-memory and in- storage processing for Apache Spark, by extensive profiling of Apache Spark based workloads on Ivy Bridge Server.