TY - GEN
T1 - On the locality of Java 8 streams in real-time big data applications
AU - Chan, Yu
AU - Gray, Ian
AU - Wellings, Andy
AU - Audsley, Neil
PY - 2014/10/13
Y1 - 2014/10/13
N2 - Typical Big Data frameworks do not consider the architecture of the servers that make up the cluster. However, these computers are increasingly heterogeneous and are based on a ccNUMA architecture. In such architectures, main memory access times differ depending on the core on which access is requested. Hence, as well as locality of data access throughout a cluster of servers, locality of memory access within individual servers can have an impact on performance. Java is a commonly-used language for Big Data applications (through the popularity of Hadoop) and the newlyreleased Java 8 introduces streams to simplify data-parallel programming. However, this paper argues that there are no built-in parallel stream sources that can efficiently operate on very large datasets and take data locality into account. This paper details recent work from the JUNIPER project, an EU Framework 7 Project, which is investigating how the Java 8 platform (augmented by the Real-Time Specification for Java) can be used for real-time Big Data applications. JUNIPER introduces architecture-aware stream sources which are suitable for Big Data systems and which preserve locality of data. Our results show that when reading data from disk, thread affinity can seriously degrade the performance of standard Java streams, but JUNIPER's architecture-aware streams maintain their performance.
AB - Typical Big Data frameworks do not consider the architecture of the servers that make up the cluster. However, these computers are increasingly heterogeneous and are based on a ccNUMA architecture. In such architectures, main memory access times differ depending on the core on which access is requested. Hence, as well as locality of data access throughout a cluster of servers, locality of memory access within individual servers can have an impact on performance. Java is a commonly-used language for Big Data applications (through the popularity of Hadoop) and the newlyreleased Java 8 introduces streams to simplify data-parallel programming. However, this paper argues that there are no built-in parallel stream sources that can efficiently operate on very large datasets and take data locality into account. This paper details recent work from the JUNIPER project, an EU Framework 7 Project, which is investigating how the Java 8 platform (augmented by the Real-Time Specification for Java) can be used for real-time Big Data applications. JUNIPER introduces architecture-aware stream sources which are suitable for Big Data systems and which preserve locality of data. Our results show that when reading data from disk, thread affinity can seriously degrade the performance of standard Java streams, but JUNIPER's architecture-aware streams maintain their performance.
UR - http://www.scopus.com/inward/record.url?scp=84937801674&partnerID=8YFLogxK
U2 - 10.1145/2661020.2661028
DO - 10.1145/2661020.2661028
M3 - Conference contribution
AN - SCOPUS:84937801674
T3 - ACM International Conference Proceeding Series
SP - 20
EP - 28
BT - JTRES '14
PB - ACM
T2 - 12th International Workshop on Java Technologies for Real-Time and Embedded Systems, JTRES 2014
Y2 - 13 October 2014 through 14 October 2014
ER -