Title: Storage Reliability and I/O Observability in High-performance Computing Systems
Abstract: High-performance computing (HPC) processes data and perform complex calculations at high speeds with clusters of powerful processors. Parallel file system (PFS) is the backbone of HPC as it maintains huge amount of data and handles intense I/O generated by HPC applications. Significant effort has been made to study both PFS reliability and I/O behavior of HPC workloads. The first half of this presentation will discuss reliability issues in PFS checkers which are developed to bring a corrupted PFS back to a healthy state. The second half
will discuss how to improve the observability of I/O behavior in HPC workloads by using data provenance.
Bio: Runzhou Han is a 4th year Ph.D. PhD candidate in ECpE department. His research is related to large scale storage systems, data provenance, and serverless computing. Before coming to Iowa State University, he received a MS degree from Boston University and a BS degree from Wuhan Univeristy
