Exploiting Multicore Architectures in Big Data Applications: The JUNIPER Approach

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The volume of data that is stored and processed is increasing exponentially, resulting in a wide variety of systems being put together, from commodity clusters to supercomputers. This demands new and efficient approaches to parallel programming in distributed environments. This paper describes several new features in the upcoming Java 8 release and how they enable bulk operations to be performed in parallel. We evaluated the performance of Java 8 programs utilising these features against Hadoop, a popular Java-based Big Data framework, on a single multicore. Results show that although the Java 8 programs run significantly faster than the equivalent Hadoop programs, they fail for large inputs due to the entire dataset having to fit in main memory. This weakness makes built-in collections unsuitable for Big Data processing. From this observation, we propose the idea of a stored collection a Java collection that reads data on-demand from the file system. Experimental results show that the equivalent programs that use stored collections are much more memory-efficient and faster than the default Java collections. They are also faster than the equivalent Hadoop programs. While the use of stored collections may improve memory and time efficiency, it still does not address fundamental Big Data issues such as time constraints and data locality. We report work in progress in the JUNIPER project, which provides a Java-based platform for Big Data processing.
Original languageEnglish
Title of host publicationProgrammability Issues for Heterogeneous Multicores (MULTIPROG)
Publication statusPublished - 2014

Cite this