Scientists Link Cutting-Edge Biodiversity Genomics with Environmental Metadata Through New Public Database

Article ID: 679019

Released: 3-Aug-2017 2:00 PM EDT

Source Newsroom: California State University, Monterey Bay

Newswise — A new publicly available database will catalogue metadata associated with biologic samples, making it easier for researchers to share and reuse genetic data for environmental and ecological analyses. The resource, called the Genomic Observatories Metadatabase (GeOMe), was developed by researchers at the Smithsonian Institution’s National Museum of Natural History in collaboration with researchers at California State University Monterey Bay (CSUMB), UC Berkeley and six other museums and research institutions.

GeOMe links publicly available genetic data to records of where and when sample were collected, providing contextual information that until now has been missing from widely shared databases. Such information, about the environment, location and date of each biological sample is critical for comparing biodiversity in different locations worldwide and tracking it across time. Despite calls for more data sharing within the research community, researchers have until now lacked the tools to make this information readily available.

The developers of the database, described August 3 in the journal PLOS Biology, say standardizing and preserving this metadata will greatly enhance the value of the genetic sequence data that researchers are already collecting. They might investigate how the inhabitants of a specific altitude throughout the world have shifted as our planet’s climate has changed, for example, or assess the stability of microbial communities facing increasingly acidic marine environments.

“Studying the world’s biodiversity requires collaborative effort by laboratories around the world, in developed and developing countries alike,” said Eric Crandall, assistant professor at CSUMB’s School of Natural Sciences and the senior author on the paper. “As the name suggests, GeOMe unites geospatial and environmental metadata with genomic data, and allows it to be shared in a structured, searchable manner.”

The database enables a “genomic observatory” model of data collection, wherein individual researchers and institutions can easily share and reuse data in a global pool. With GeOMe, researchers will be able to find and access genetic data collected at specific times and places anywhere in the world, enabling them to ask big questions about the structure and sustainability of life on the planet. 

Scientists who analyze ecological samples—whether they are plants or animals or entire communities of microbes, gathered from the oceans, freshwater, or on land—have their own individual systems for keeping track of when are where those samples were collected. But for the broader research community, such information has been difficult to share and obtain and impossible to comprehensively search. GeOMe provides a solution by permanently linking information about samples’ temporal, environmental, geospatial, and scholarly context to genetic sequence data stored by the National Center for Biotechnology Information

The team of researchers say they devoted the time and resources to developing GeOMe because they knew it would be a powerful tool to accelerate discovery. As museum and biodiversity scientists, they recognize the value of tracking and preserving information. 

GeOMe’s developers, including Eric Crandall at California State University Monterey Bay, Michelle Gaither at CSUMB and Hawai'i Institute of Marine Biology, Chris Meyer at the Smithsonian National Museum of Natural History, and John Deck at the Berkeley Natural History Museums, have worked to ensure that the resource is easy to use and adaptable for a wide range of needs. With the database and toolkit freely available to the research community, scientific journals can now mandate that authors make their metadata available in a searchable and standardized format, just as they have long done for genetic sequence data, they say.

“Genomic data are the foundational layer for our understanding of biodiversity – but until now it has been difficult to put them in their environmental and geospatial context,” Crandall said. “Biodiversity scientists put a lot of time and effort into developing genomic datasets, and these data deserve to be stored in a way that will maximize their potential for reuse and further discovery."

The team notes, data in GeOMe will conform to standards developed by the Genomic Standards Consortium and the Biodiversity Information Standards organization, ensuring that submitters capture and record the same essential information about every sample. Employing these standards is essential to ensure that in the future, researchers will be able to conduct analyses across datasets. 

GeOMe’s development was a collaboration between researchers and computer scientists at the following institutions: the Smithsonian Institution’s National Museum of Natural History; Berkeley Natural History Museums at the University of California, Berkeley; the Hawai'i Institute of Marine Biology at the University of Hawai'i; Biocode; Texas A&M University; the University of California’s Gump South Pacific Research Station, in Moorea, French Polynesia; Berkeley Institute for Data Science at the University of California; the University of Queensland in Australia; and California State University Monterey Bay. 

Funding for this study was provided by the National Science Foundation, the Gordon and Betty Moore Foundation and the National Oceanic and Atmospheric Administration. GeOMe reached its current level of development with NSF funding to the Diversity of the Indo-Pacific Network.