Abstract
Scheduling tasks close to their associated data is crucial in distributed systems to minimize network traffic and latency. Some Big Data frameworks like Apache Spark employ locality functions and job allocation algorithms to minimize network traffic and execution times. However, these frameworks rely on centralized mechanisms, where the master node determines data locality by allocating tasks to available workers with minimal data transfer time, ignoring variances in worker configurations and availability. To address these limitations, we propose a decentralized approach to locality-driven scheduling that grants workers autonomy in the job allocation process while factoring in workers' configurations, such as network and CPU speed differences. Our approach is developed and evaluated on Crossflow, a distributed stream processing platform with data-aware independent worker nodes. Preliminary evaluation experiments indicate that our approach can yield up to 3.57x faster execution times when compared to the baseline centralized approach where the master controls data locality.
Original language | English |
---|---|
Title of host publication | Proceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023 |
Publisher | Association for Computing Machinery, Inc |
Pages | 2089-2096 |
Number of pages | 8 |
ISBN (Electronic) | 9798400707858 |
DOIs | |
Publication status | Published - 12 Nov 2023 |
Event | 2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023 - Denver, United States Duration: 12 Nov 2023 → 17 Nov 2023 |
Publication series
Name | ACM International Conference Proceeding Series |
---|
Conference
Conference | 2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023 |
---|---|
Country/Territory | United States |
City | Denver |
Period | 12/11/23 → 17/11/23 |
Bibliographical note
Publisher Copyright:© 2023 ACM. This is an author-produced version of the published paper. Uploaded in accordance with the University’s Research Publications and Open Access policy.
Keywords
- Big Data processing
- Data-aware job scheduling
- Distributed processing
- Locality scheduler