Additional Participants

Graduate Student

Zhao Cai
Jianhui Yue
Ali Shareef
Lin Lin

Organizational Partners

Department of Energy Argonne National Laboratory, Argonne, IL
Huazhong University of Science and Technology, China
Fermi National Accelerator Laboratory, Batavia, IL
Sun Microsystems

Other Collaborators

Bob Ross, Senior Scientist, Northwestern-Argonne Institute for Science and Engineering
Rajeev Thakur, Deputy Director of the Mathematics and Computer Science Division at Argonne National Laboratory
Dr. Dan Feng, Wuhan National Laboratory for Optoelectronics and Huazhong University of Science and Technology, China

Project Period

August 15, 2006-July 31, 2010

Level of Access

Open-Access Report

Grant Number


Submission Date



The increasing demand for Exa-byte-scale storage capacity by high end computing applications requires a higher level of scalability and dependability than that provided by current file and storage systems. The proposal deals with file systems research for metadata management of scalable cluster-based parallel and distributed file storage systems in the HEC environment. It aims to develop a scalable and adaptive metadata management (SAM2) toolkit to extend features of and fully leverage the peak performance promised by state-of-the-art cluster-based parallel and distributed file storage systems used by the high performance computing community. There is a large body of research on data movement and management scaling, however, the need to scale up the attributes of cluster-based file systems and I/O, that is, metadata, has been underestimated. An understanding of the characteristics of metadata traffic, and an application of proper load-balancing, caching, prefetching and grouping mechanisms to perform metadata management correspondingly, will lead to a high scalability. It is anticipated that by appropriately plugging the scalable and adaptive metadata management components into the state-of-the-art cluster-based parallel and distributed file storage systems one could potentially increase the performance of applications and file systems, and help translate the promise and potential of high peak performance of such systems to real application performance improvements.

The project involves the following components:

1. Develop multi-variable forecasting models to analyze and predict file metadata access patterns.
2. Develop scalable and adaptive file name mapping schemes using the duplicative Bloom filter array technique to enforce load balance and increase scalability
3. Develop decentralized, locality-aware metadata grouping schemes to facilitate the bulk
metadata operations such as prefetching.
4. Develop an adaptive cache coherence protocol using a distributed shared object model for
client-side and server-side metadata caching.
5. Prototype the SAM2 components into the state-of-the-art parallel virtual file system PVFS2 and a distributed storage data caching system, set up an experimental framework for a DOE CMS Tier 2 site at University of Nebraska-Lincoln and conduct benchmark, evaluation and validation studies.