A Study of Client-based Caching for Parallel I/O

Bradley W. Settlemyer.
“A Study of Client-based Caching for Parallel I/O.” [pdf]
Clemson University Doctoral Dissertation, Aug 2009.

Abstract — The trend in parallel computing toward large-scale cluster computers running thousands
of cooperating processes per application has led to an I/O bottleneck that has only gotten
more severe as the the number of processing cores per CPU has increased. Current parallel
file systems are able to provide high bandwidth file access for large contiguous file region
accesses; however, applications repeatedly accessing small file regions on unaligned file
region boundaries continue to experience poor I/O throughput due to the high overhead
associated with accessing parallel file system data.
In this dissertation we demonstrate how client-side file data caching can improve
parallel file system throughput for applications performing frequent small and unaligned
file I/O. We explore the impacts of cache page size and cache capacity using the popular
FLASH I/O benchmark and explore a novel cache sharing approach that leverages
the trend toward multi-core processors. We also explore a technique we call progressive
page caching that represents cache data using dynamic data structures rather than fixed-size
pages of file data. Finally, we explore a cache aggregation scheme that leverages the highlevel
file I/O interfaces provided by the PVFS file system to provide further performance
enhancements.
In summary, our results indicate that a correctly configured middleware-based file
data cache can dramatically improve the performance of I/O workloads dominated by small
unaligned file accesses. Further, we demonstrate that a well designed cache can offer stable
performance even when the selected cache page granularity is not well matched to
the provided workload. Finally, we have shown that high-level file system interfaces can
significantly accelerate application performance, and interfaces beyond those currently envisioned
by the MPI-IO standard could provide further performance benefits.

Comments are closed.