Newswise — A team of scientists at The Johns Hopkins University has received a grant for $9.5 million over five years to develop, build and maintain large-scale data sets that will allow for greater access and better usability of the information for the science community.

Alexander Szalay, the Alumni Centennial Professor of Astronomy and professor in the Department of Computer Science at the university, is the principal investigator on the Data Infrastructure Building Blocks, or DIBBs, project.

The funding was awarded earlier this month and is part of a larger collaborative agreement between the university and the National Science Foundation’s Advanced Cyberinfrastructure division. Partners on the project include the Sloan Digital Sky Survey (SDSS), the Virtual Astronomy Observatory, the GalaxyZoo project, the San Diego Supercomputer Center and Towson University. Additional collaborators include scientists from Microsoft and Google.

The SDSS collaboration has been operating a dedicated 2.5 meter telescope at Apache Point Observatory in New Mexico, surveying a large part of the sky and making data available online to astronomers and non-scientists alike. In the 12 years the telescope has been in operation, it has captured deep, multi-color images covering more than a quarter of the sky and created three-dimensional maps that contain more than 1.8 million galaxies and 320,000 quasars. Johns Hopkins became a part of the SDSS collaboration in 1992.

The data obtained from the project has gone into SkyServer, a public database managed by SDSS, designed and built at Johns Hopkins. Currently about 40 percent of the world’s professional astronomy community is using the JHU team’s software. Based on citations, the project has become the most-used astronomy facility of the world and has transformed the way astronomers work. Its database has attracted an additional 4 million non-scientist astronomy fans since its launch.

“Open data is not necessarily accessible,” said Szalay. “We have to overcome several important challenges before a data set that is public is really usable and useful.”

Advances in technology allow researchers to collect and store large data sets. As these data sets continue to grow to unprecedented sizes, however, researchers are faced with new challenges to make them accessible, usable and useful. There is a need for a flexible and reusable framework that allows for more efficient viewing and analysis of the data and a platform to better facilitate new discoveries within these data sets.

Scientists like Szalay have to make sense of the overabundance of information by asking intelligent questions with the hope of receiving more refined answers. The goal of the new project is to operate the SkyServer for the community and update and modify components of the system so that it can be easily reused by other areas of science like turbulence research, environmental science, neuroscience, genomics and radiation oncology.

“How to ask a question of such data sets is a science in itself,” Szalay said. “There’s lots of data, so it’s a little like drinking from a fire hose. It’s not just about computing. We’re trying to build a new kind of scientific instrument, a virtual telescope and a microscope of data, one that can observe data and find and extract knowledge to help you see the patterns.”

Of the funding, $7.6 million has officially been awarded. The remaining $1.9 million for the fifth year is contingent on a successful 18-month and 36-month review from NSF.

The DIBBs project also has a community outreach component that will build on the existing online educational materials and teacher guides available on the SDSS website and the Galaxy Zoo citizen science project. The new framework will make it easier to integrate existing and new educational tools and lesson plans into SDSS as well as other websites, and to launch new big data sites with the tools already in place. Data exploration tools for citizen scientists will be optimized and seamlessly integrated into current and future citizen science projects.

Szalay, who is also director of the university’s Institute for Data Intensive Engineering and Science, received the DIBBs funding along with the following collaborators: Randal Burns, associate professor of computer science; Charles Meneveau, the Louis M. Sardella Professor of Mechanical Engineering; Steven Salzberg, professor at the School of Medicine, Bloomberg School of Public Health and Whiting School of Engineering; and Aniruddha Thakar, principal research scientist with the Center for Astrophysical Sciences.

More than 20 people from across the university are working on SDSS and DIBBs.

“The SDSS project is the astronomy version of the Human Genome Project,” Szalay said. “And now with DIBBs, it is beginning to have an impact on other branches of science as well.”

This research was supported by the National Science Foundation grant number ACI-1261715.###