OCTOBER 18, 2021 | IAN FOSTER
Newswise — The petabytes produced and consumed by exascale computers must often be moved among elements of the international scientific infrastructure. Scientists have long relied upon Globus for efficient, reliable, and secure data movement, and indeed as of 10/18/2021, Globus has been used to move more than 1018 bytes (1,343,546,072,100,387,000,000 bytes to be precise) among more than 20,000 labs worldwide.
As computers and networks get faster, we see researchers creating larger and larger files. At one U.S. research computing center, for example, we find 20% of users creating terabyte or larger files. In response, Globus, with support from the DOE Exascale Computing Project, recently deployed new features that accelerate one- or few-file transfers by striping across data transfer nodes at source and destination. Studies performed by Zhengchun Liu of Argonne National Laboratory’s Data Science and Learning division demonstrate the benefits: a single one-terabyte file can now be moved from NERSC in California to the Argonne Leadership Computing Facility in Illinois at more than 30 gigabits per second, more than four times faster than before. That’s little more than four minutes to move a terabyte across the country.
Says Rachana Ananthakrishnan, who leads the Globus team at the University of Chicago, “Globus has long been able to process many-file tasks at close to line rates, thanks to our extensive use of parallelism. One- or few-file transfers introduce new challenges, including efficient overlapping of data movement and checksums. But with the hybrid Globus architecture the complexities are handled by the transfer service, making it surprisingly easy for researchers to use the striping option."
The striping option is most useful for cases where you are transferring few files (typically, where the number of files is less than number of servers), and the files are large (typically, greater than 100MB). Striping may be enabled upon request for a particular user account, or per endpoint, for any POSIX storage system. Please contact [email protected] to enable this feature for your environment, or if you’d like to learn more about how to accelerate single- or few file data transfer challenges.