June 1, 2007-December 31, 2010
Level of Access
As the size of the data sets manipulated by data-intensive scientific applications approaches the petabyte level and beyond, the need for scalable I/O techniques becomes increasingly important and difficult. Much of the research on this issue has been performed within the context of
MPI-IO: the de-facto standard parallel I/O interface for data-intensive applications. Its popularity stems from the fact that MPI-IO provides to applications a rich and flexile parallel I/O API coupled with highly efficient implementations of this API. This problem is being further addressed by the development of powerful parallel I/O subsystems, and state-of-the-art file systems that can efficiently access this infrastructure. However, even with such advances, I/O continues to be a significant bottleneck in application performance.
The goal of this research is to provide high-performance I/O for data-intensive applications. A key insight is that a major obstacle in the way of this goal is the legacy view of a file as a linear sequence of bytes. This is because scientific applications rarely access data in a way that matches this file model, using instead what is more accurately described as an object model. In fact, it is the runtime translation between these two data models that is a major contributor to poor I/O performance. To address this issue, this research will develop a more powerful object-based file model for MPI applications, and an object-based caching system to serve as an interface between MPI applications and object-based files. Objects will be carefully defined to encapsulate information about an application's I/O access patterns, and such information will be used to increase the parallelism of file accesses and decrease the cost of maintaining global cache coherence.
Dickens, Phillip M., "Object-Based Caching for MPI-IO" (2011). University of Maine Office of Research and Sponsored Programs: Grant Reports. 301.