Newswise — Data storage and management presents an increasing challenge to data-intensive research facilities, especially as artificial intelligence (AI) and machine learning projects generate a skyrocketing amount of data to be processed and analyzed faster and faster. Argonne National Laboratory is no different: as an emerging leader in data science and exascale computing, its researchers conduct leading-edge research in virtually every scientific discipline, collaborating with scientists at industries, universities, and government agencies. This work demands lab-wide tools and capabilities to abstract data storage and management tasks into the background for researchers.
To address the challenge of streamlining data movement, storage and access, the Globus team worked with Argonne Leadership Computing Facility (ALCF) scientists to develop a lab-wide service for storing and sharing data among distributed collaborators.
Petrel: A New Approach to Research Data Storage and Management
The Petrel Data Service allows Argonne researchers to store large-scale datasets and very easily share those datasets with collaborators, without requiring local account management.
Developed and operated via a collaboration between ALCF and Globus, Petrel provides a uniform, discoverable data fabric across the lab with seamless access to data for use in computation, collaboration and distribution. Petrel leverages storage and infrastructure located at Argonne along with Globus data transfer and sharing services. Researchers access Petrel using their campus or institution federated login; Globus manages authentication and access, eliminating the need to create temporary local user accounts and manage permissions and thus enabling data-centric research collaboration. Researchers can use Petrel to move large datasets in and out of storage and can easily, securely share data with external research partners. Petrel also offers fine-grained access control so permissions can be aligned with data access policies and requirements.
With Petrel, researchers can rapidly move data from Argonne supercomputers to Petrel and then share it with collaborators who don't have (and don’t need) system access. This process places the focus for authentication on the data rather than the system.
If Petrel were not available, Argonne research computing groups where users perform computing tasks (e.g. ALCF, the Laboratory Computing Resource Center (LCRC), the Advanced Photon Source (APS)) would need to find ways to create temporary accounts on a regular basis simply to enable collaboration on shared data.
Thanks to Globus, any Petrel user with an allocation can manage data access under his or her project completely independent of system administrators – there’s no need to request local accounts for collaborators or worry about managing permissions.
Making a Difference for Researchers
With Petrel, Argonne researchers can focus on their research instead of worrying about how to share and store their data. The following examples illustrate how a few projects are benefiting from the Petrel Data Service:
- Argonne neuroscientists use Petrel in their work creating high-resolution 3D brain images using resources from the laboratory’s Advanced Photon Source (APS). Scientists use Globus to manage the data once it is acquired from APS, perform analysis, then place the results on Petrel to share with other researchers. (Read the ALCF story on using Petrel at APS for brain research)
- Petrel provides the data management and storage platform for RAMSES, an Argonne-led Department of Energy (DOE) project for which scientists at multiple DOE labs are collaborating to develop a new science of end-to-end analytical performance modeling.
- Commercial auto manufacturers use Petrel in projects at APS studying the impact of fuel spray on engine performance, which involves streaming data from APS to Cooley, the ALCF's visualization and analysis cluster, to explore and analyze their experimental findings.
Key Features
Key Petrel features include:
- Usability: Petrel is simple and intuitive for researchers to use, with point and click access from any web browser to move and share data. And with HTTPS support now available via Globus, it’s even easier to access data or build distribution portals. Data is accessible remotely from storage system to applications, with inline viewing, previews, and other robust functionality users have come to expect from web-based services.
- Accessibility: Unlike other storage at Argonne, which can require local accounts and security access mechanisms like CRYPTOCards, researchers can get access to Petrel storage simply by applying for an allocation.
- Manageability: Users can manage Petrel data access independently, without burdening IT admins or requesting local accounts for collaborators.
- Performance: Petrel is a high-speed, high-capacity data store that offers seamless, fast access to very large datasets at up to 100 Gbps.
- Scale: Petrel offers petabytes usable GPFS storage, typically allocating 100TB of storage per project.
Quotes:
- “The future is all about data. ALCF is exploring what is needed, and how to achieve it, for enabling data access beyond facility users and make data available to a wider science audience. With capabilities like Petrel with Globus, we can ensure in-house and visiting researchers have the tools they need to facilitate data sharing and make collaboration as easy and productive as possible.” —Mike Papka, ALCF division director and deputy associate laboratory director for Computing, Environment, and Life Sciences (CELS)
- “To support the data-intensive projects being allocated for the Petrel data sharing service, we needed the reliable, high-performance capabilities Globus offers. Thanks to Globus sharing and Globus Auth, Petrel makes it possible for research teams at Argonne to collaborate seamlessly with onsite or remote colleagues.” —Ian Foster, director of Argonne's Data Science and Learning Division
Read the case study: https://www.globus.org/globus-user-story-petrel-argonne
Read about a machine learning project at Argonne which provided a testbed for developing this service: https://www.globus.org/Argonne-DSL-Machine-Learning