Object-Based Caching for MPI-IO
Level of Access
The goal of this research is to provide high-performance, scalable I/O for large-scale scientific computing applications. The problem is that the I/O requirements of such applications are, in many cases, outpacing the capabilities of even the most powerful file systems available today, and are becoming a significant bottleneck in application performance. The most often cited reasons for such poor performance include the I/O access patterns exhibited by scientific applications (e.g., noncontiguous I/O), poor file system support for parallel I/O optimizations, the high cost of enforcing strict file consistency semantics, and the latency of accessing I/O devices across a network. However, it has been our hypothesis that a more fundamental problem, whose solution would help alleviate all of these challenges, is the legacy view of a file as a linear sequence of bytes. The problem is that application processes rarely access data in a way that matches this file data model, and a large component of the scalability problem is the cost of dynamically translating between the process data model and the file data model at runtime. In fact, the data model used by applications is more accurately defined as an object model, where each process maintains a set of perhaps unrelated objects. It has been our belief that aligning these two different data models will significantly enhance the performance of parallel I/O for data-intensive scientific applications.
To address this issue, we have developed a more powerful object-based file model, which is much more closely aligned with the application’s I/O access patterns. Additionally, we have integrated the support system for this new file model into the ROMIO version of the MPI-IO standard. A critical component of this support system is the object-based caching system that provides an interface between MPI applications and object-based files. Object-based files and the caching system are based on MPI file views, or, more precisely, the intersections of these views. The idea is that such intersections, which is what we term objects, can identify all of the file regions within which conflicting accesses are possible, and, by extension, all of those regions for which there can be no conflicts. This information is leveraged by the underlying runtime system to significantly increase the parallelism of file accesses and decrease the cost of enforcing file consistency semantics and global cache coherence. This, in turn, provides new opportunities to support the I/O requirements of current and next-generation data-intensive applications.
Rights and Access Note
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. In addition, no permission is required from the rights-holder(s) for educational uses. For other uses, you need to obtain permission from the rights-holder(s).
Dickens, Phillip, "Object-Based Caching for MPI-IO" (2014). University of Maine Office of Research Administration: Grant Reports. 11.