Distributed Data Locality-Aware Job Allocation

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Scheduling tasks close to their associated data is crucial in distributed systems to minimize network traffic and latency. Some Big Data frameworks like Apache Spark employ locality functions and job allocation algorithms to minimize network traffic and execution times. However, these frameworks rely on centralized mechanisms, where the master node determines data locality by allocating tasks to available workers with minimal data transfer time, ignoring variances in worker configurations and availability. To address these limitations, we propose a decentralized approach to locality-driven scheduling that grants workers autonomy in the job allocation process while factoring in workers' configurations, such as network and CPU speed differences. Our approach is developed and evaluated on Crossflow, a distributed stream processing platform with data-aware independent worker nodes. Preliminary evaluation experiments indicate that our approach can yield up to 3.57x faster execution times when compared to the baseline centralized approach where the master controls data locality.

Original languageEnglish
Title of host publicationProceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023
PublisherAssociation for Computing Machinery, Inc
Pages2089-2096
Number of pages8
ISBN (Electronic)9798400707858
DOIs
Publication statusPublished - 12 Nov 2023
Event2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023 - Denver, United States
Duration: 12 Nov 202317 Nov 2023

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023
Country/TerritoryUnited States
CityDenver
Period12/11/2317/11/23

Bibliographical note

Publisher Copyright:
© 2023 ACM. This is an author-produced version of the published paper. Uploaded in accordance with the University’s Research Publications and Open Access policy.

Keywords

  • Big Data processing
  • Data-aware job scheduling
  • Distributed processing
  • Locality scheduler

Cite this