Newswise — A team of researchers led by Pavan Balaji of Argonne National Laboratory and Wu Feng of Virginia Tech won an international competition for the most effective approach in using large-scale storage for high-performance computing. The award was presented November 15 at SC|07, the world's premier conference on high- performance computing and networking.
Using a novel software framework for distributed I/O called ParaMEDIC (www.newswise.com/articles/view/535270/), the team of researchers from Argonne National Laboratory, Virginia Tech, and North Carolina State University searched the sequences of all completed microbial genomes against each other. The aim was twofold: to discover missing genes and to speed future searches by generating a complete genome similarity tree. The ParaMEDIC software framework used a semantics-based approach to create a metadata representation that was four orders of magnitude smaller than the actual output data.
"Using ParaMEDIC, the entire genome similarity tree, corresponding to a petabyte of data, can fit into a 4-gigabyte iPod nano," said Balaji.
This entire task required many millions of CPU-hours of computational capability and generated a petabyte of uncompressed output. Since not many supercomputer centers provide both the computational and storage resources required for this task simultaneously, the research team relied on a worldwide supercomputer that aggregated the compute resources from various locations within the U.S. and the TSUBAME storage resources at the Tokyo Institute of Technology in Japan, with technical support from Sun Microsystems. The largest portion of the compute cycles were provided by Virginia Tech's System X supercomputer.
"In total, we relied on six U.S. supercomputing institutions and accessed over 12,000 processors across eight supercomputers. The ParaMEDIC framework then improved compute utilization from 10 percent to nearly 100 percent for the compute resources and storage bandwidth utilization from 0.04 percent to 90 percent for the storage resources," said Feng.
The ParaMEDIC team is indebted to the support of the following people who made the impossible possible:
Virginia Tech (System X): J. Setubal, A. Warren, K. Shinpaugh, L. Scharf, G. Zelenka, T. Herdman
Argonne National Laboratory (Jazz, SiCortex, BlueGene/L and Breadboard): R. Stevens, E. Lusk, S. Coghlan
Tokyo Institute of Technology (TSUBAME): S. Matsuoka, T. Yamanashi, S. Ono, R. Fukushima
U. Chicago (TeraGrid): I. Foster, M. Papka
Center for Computation & Technology at Louisiana State University (Oliver): D. Katz, S. Jha, H. Liu
Renaissance Computing Institute (Open Science Grid): D. Reed, J. McGee, M. Rynge
Sun Microsystems: T. Kujiraoka, S. Ihara, S. Vail, S. Cochrane, C. Kingwood, S. See, A. Katz
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or Department of Energy.
The nation's first national laboratory, Argonne National Laboratory (http://www.anl.gov) conducts basic and applied scientific research across a wide spectrum of disciplines ranging from high-energy physics to climatology and biotechnology. Argonne works with numerous companies and numerous federal agencies and other organizations to help advance America's scientific leadership and prepare the nation for the future. Argonne is managed by U. Chicago Argonne, LLC for the U.S. Department of Energy's Office of Science.
About Virginia Tech
Founded in 1872 as a land-grant college, Virginia Tech is a leading comprehensive research university. Today, Virginia Tech's eight colleges invent the future through teaching, research, and engagement. At its 2,600-acre main campus in Blacksburg and other campus centers in Northern Virginia, Southwest Virginia, Hampton Roads, Richmond, and Roanoke, Virginia Tech enrolls more than 27,000 undergraduate and graduate students from all 50 states and more than 100 countries in 180 academic degree programs.