By the same authors

Quantifying the Effects of Contention on Parallel File Systems

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Standard

Quantifying the Effects of Contention on Parallel File Systems. / Wright, Steven A.; Jarvis, Stephen A.

Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015. Institute of Electrical and Electronics Engineers Inc., 2015. p. 932-940 7284412.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Harvard

Wright, SA & Jarvis, SA 2015, Quantifying the Effects of Contention on Parallel File Systems. in Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015., 7284412, Institute of Electrical and Electronics Engineers Inc., pp. 932-940, 29th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015, Hyderabad, India, 25/05/15. https://doi.org/10.1109/IPDPSW.2015.8

APA

Wright, S. A., & Jarvis, S. A. (2015). Quantifying the Effects of Contention on Parallel File Systems. In Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015 (pp. 932-940). [7284412] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IPDPSW.2015.8

Vancouver

Wright SA, Jarvis SA. Quantifying the Effects of Contention on Parallel File Systems. In Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015. Institute of Electrical and Electronics Engineers Inc. 2015. p. 932-940. 7284412 https://doi.org/10.1109/IPDPSW.2015.8

Author

Wright, Steven A. ; Jarvis, Stephen A. / Quantifying the Effects of Contention on Parallel File Systems. Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 932-940

Bibtex - Download

@inproceedings{9562e180835341a4bf1813edca86d514,
title = "Quantifying the Effects of Contention on Parallel File Systems",
abstract = "As we move towards the Exactable era of supercomputing, node-level failures are becoming more common-place, frequent check pointing is currently used to recover from such failures in long-running science applications. While compute performance has steadily improved year-on-year, parallel I/O performance has stalled, meaning check pointing is fast becoming a bottleneck to performance. Using current file systems in the most efficient way possible will alleviate some of these issues and will help prepare developers and system designers for Exactable, unfortunately, many domain-scientists simply submit their jobs with the default file system configuration. In this paper, we analyse previous work on finding optimality on Lustre file systems, demonstrating that by exposing parallelism in the parallel file system, performance can be improved by up to 49×. However, we demonstrate that on systems where many applications are competing for a finite number of object storage targets (OSTs), competing tasks may reduce optimal performance considerably. We show that reducing each job's request for OSTs by 40% decreases performance by only 13%, while increasing the availability and quality of service of the file system. Further, we present a series of metrics designed to analyse and explain the effects of contention on parallel file systems. Finally, we re-evaluate our previous work with the Parallel Log-structured File System (PLFS), comparing it to Lustre at various scales. We show that PLFS may perform better than Lustre in particular configurations, but that at large scale PLFS becomes a bottleneck to performance. We extend the metrics proposed in this paper to explain these performance deficiencies that exist in PLFS, demonstrating that the software creates high levels of self-contention at scale.",
keywords = "Data storage systems, File servers, File systems, High performance computing, Optimization, Performance analysis, Supercomputers",
author = "Wright, {Steven A.} and Jarvis, {Stephen A.}",
note = "{\textcopyright} IEEE, 2015. This is an author-produced version of the published paper. Uploaded in accordance with the publisher{\textquoteright}s self-archiving policy. Further copying may not be permitted; contact the publisher for details; 29th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015 ; Conference date: 25-05-2015 Through 29-05-2015",
year = "2015",
month = sep,
day = "29",
doi = "10.1109/IPDPSW.2015.8",
language = "English",
pages = "932--940",
booktitle = "Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

RIS (suitable for import to EndNote) - Download

TY - GEN

T1 - Quantifying the Effects of Contention on Parallel File Systems

AU - Wright, Steven A.

AU - Jarvis, Stephen A.

N1 - © IEEE, 2015. This is an author-produced version of the published paper. Uploaded in accordance with the publisher’s self-archiving policy. Further copying may not be permitted; contact the publisher for details

PY - 2015/9/29

Y1 - 2015/9/29

N2 - As we move towards the Exactable era of supercomputing, node-level failures are becoming more common-place, frequent check pointing is currently used to recover from such failures in long-running science applications. While compute performance has steadily improved year-on-year, parallel I/O performance has stalled, meaning check pointing is fast becoming a bottleneck to performance. Using current file systems in the most efficient way possible will alleviate some of these issues and will help prepare developers and system designers for Exactable, unfortunately, many domain-scientists simply submit their jobs with the default file system configuration. In this paper, we analyse previous work on finding optimality on Lustre file systems, demonstrating that by exposing parallelism in the parallel file system, performance can be improved by up to 49×. However, we demonstrate that on systems where many applications are competing for a finite number of object storage targets (OSTs), competing tasks may reduce optimal performance considerably. We show that reducing each job's request for OSTs by 40% decreases performance by only 13%, while increasing the availability and quality of service of the file system. Further, we present a series of metrics designed to analyse and explain the effects of contention on parallel file systems. Finally, we re-evaluate our previous work with the Parallel Log-structured File System (PLFS), comparing it to Lustre at various scales. We show that PLFS may perform better than Lustre in particular configurations, but that at large scale PLFS becomes a bottleneck to performance. We extend the metrics proposed in this paper to explain these performance deficiencies that exist in PLFS, demonstrating that the software creates high levels of self-contention at scale.

AB - As we move towards the Exactable era of supercomputing, node-level failures are becoming more common-place, frequent check pointing is currently used to recover from such failures in long-running science applications. While compute performance has steadily improved year-on-year, parallel I/O performance has stalled, meaning check pointing is fast becoming a bottleneck to performance. Using current file systems in the most efficient way possible will alleviate some of these issues and will help prepare developers and system designers for Exactable, unfortunately, many domain-scientists simply submit their jobs with the default file system configuration. In this paper, we analyse previous work on finding optimality on Lustre file systems, demonstrating that by exposing parallelism in the parallel file system, performance can be improved by up to 49×. However, we demonstrate that on systems where many applications are competing for a finite number of object storage targets (OSTs), competing tasks may reduce optimal performance considerably. We show that reducing each job's request for OSTs by 40% decreases performance by only 13%, while increasing the availability and quality of service of the file system. Further, we present a series of metrics designed to analyse and explain the effects of contention on parallel file systems. Finally, we re-evaluate our previous work with the Parallel Log-structured File System (PLFS), comparing it to Lustre at various scales. We show that PLFS may perform better than Lustre in particular configurations, but that at large scale PLFS becomes a bottleneck to performance. We extend the metrics proposed in this paper to explain these performance deficiencies that exist in PLFS, demonstrating that the software creates high levels of self-contention at scale.

KW - Data storage systems

KW - File servers

KW - File systems

KW - High performance computing

KW - Optimization

KW - Performance analysis

KW - Supercomputers

UR - http://www.scopus.com/inward/record.url?scp=84962234189&partnerID=8YFLogxK

U2 - 10.1109/IPDPSW.2015.8

DO - 10.1109/IPDPSW.2015.8

M3 - Conference contribution

AN - SCOPUS:84962234189

SP - 932

EP - 940

BT - Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 29th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015

Y2 - 25 May 2015 through 29 May 2015

ER -