Newswise — ATLANTA—Georgia State University researchers have compiled a large solar dataset from the National Aeronautics and Space Administration (NASA), making several hundred thousand solar events found on high-resolution solar images available to the public.

The massive dataset from NASA’s Solar Dynamics Observatory (SDO) mission gives unprecedented insight into the sun’s activity. The mission has captured 70,000 images a day since February 2010, creating one of the richest and largest repositories of solar image data available to mankind. The data have led to advances in the detection of solar events.

The researchers standardized and curated the large-scale dataset in order to reduce the time scientists need to spend performing data acquisition and curation. Their work has improved the quality of the data and will accelerate computer vision research on these solar images. Details about the dataset are published in the journal Scientific Data.

The SDO is a spacecraft in an inclined geosynchronous orbit around Earth that will capture full-disk images of the sun for up to 10 years. The mission is an attempt to obtain scientific knowledge that will help scientists understand the influence of the sun on the Earth. Three different instruments were involved, and this dataset was built from images produced by the Atmospheric Imaging Assembly (AIA), which captures high-definition, full-disk images of the sun every 10 to 12 seconds in eight different wavelengths using four AIA telescopes.

“The dataset is 284 gigabytes of highly compressed, labeled data,” said Dr. Rafal Angryk, professor of computer science at Georgia State and co-author of the study. “This is the largest benchmark ever released in solar physics. It’s high-resolution data, 4,000 by 4,000 pixels, which is really high resolution. It’s better quality data than your high-definition television (HDTV) or Blu-ray. The typical TV is 1,000 by 1,000 pixels.

“The problem with the SDO data, if you’re not a solar expert, you don’t know where to look for labels or events and you don’t know what mergers are meaningful. We’ve built an image retrieval system from this data. In addition to searching for similar events or data, ranking the images (which requires a far more challenging task of ordering them based on their similarities), can be a problem. That’s a really good application of this benchmark. I think that’s what makes this benchmark unique. Most benchmarks are focused on labeling, but not ranking.”

Ahmet Kucuk, a Ph.D. student in the Department of Computer Science at Georgia State, is lead author of the study. Second author is Dr. Juan M. Banda, a former Ph.D. graduate from Angryk’s Data Mining Research Lab, who is employed by Stanford University and will soon join Georgia State as a Next Generation Initiative hire in astroinformatics.

The study is funded by the National Science Foundation and NASA.