SDSC Researchers Invited to Attend White House Information Technology Event
Initiative to focus on collaborations in data-enabled science and engineering
Source Newsroom: University of California, San Diego
Newswise — The directors of two big data “centers of excellence” at the San Diego Supercomputer Center (SDSC) at the University of California, San Diego – the Center for Large-scale Data Systems research (CLDS) and the Predictive Analytics Center of Excellence (PACE) – have been invited by the White House Office of Science and Technology Policy to attend a two-day event focused on accelerating research, development, and collaborations in data-enabled science and engineering.
The event, called ‘Data to Knowledge to Action: Building New Partnerships’ and scheduled for October 2-3, 2013 in Washington D.C., is being held by the Obama Administration’s Networking and Information Technology R&D (NITRD) program, which represents the information technology portfolios of 18 federal agencies.
In early 2012 the Administration announced a new initiative focused on research and development in data science and engineering. At that time six federal departments and agencies made commitments of more than $200 million in new investments that together are aimed at developing new tools, techniques, and expertise needed to “move from data to knowledge to action,” according to the NITRD. This year the Administration is encouraging multiple stakeholders, such as federal/state/local agencies, private industry, academia, non-profits, and foundations, to forge innovative partnerships, including collaborations that support advanced data management and data analytic techniques.
“It is indeed gratifying to see that some of our best and brightest are being recognized at such a high level for their work in big data applications,” said SDSC Director Michael Norman. “This reflects the Center’s focus on addressing both the management and technical aspects of big data and other data-enabled applications such as predictive analytics, which are now becoming pervasive among academia, industry, and government.”
In a recent program solicitation, the National Science Foundation described big data as “large, diverse, complex, longitudinal, and/or distributed datasets generated from instruments, sensors, Internet transactions, email, video, click streams, and/or all other digital sources available today and in the future.” Additionally, big data applications are characterized by the need to provide timely analytics while dealing with large data volumes, high data rates, and a wide range of data sources. Many of those datasets are so voluminous or numerous that most conventional computers and software cannot effectively process them.
‘Big Data’ Benchmarking
Chaitan Baru, an SDSC Distinguished Scientist and director of CLDS, was invited to attend the White House event in recognition of coordinating a program between industry, academia, and government to develop industry-standard, application-level benchmarks for evaluating hardware and software systems for big data applications. The BigData Top100 List is a new open, community-based big data benchmarking initiative coordinated by a board of directors that includes representation from SDSC, Pivotal, Cisco, Oracle, Intel, Brocade, Seagate, NetApp, Mellanox, Facebook, IBM, Google, and the University of Toronto.
“This initiative marks the start of the Big Data Benchmark Challenge to seek community input in defining big data benchmarks and metrics,” said Baru. “The creation of objective standards for application-level performance and price/performance fosters competition and innovation in the marketplace. During the past 20 years, for example, the benchmark performance of commercial database software has improved by about a million times, while the price/performance ratio has improved by a factor of a couple of hundred thousand.”
With support from the National Science Foundation, the initiative has been hosting a series of community workshops, with a fourth workshop to be held next month (October 2013). As a part of this effort, the National Institute for Standards and Technology (NIST) recently funded SDSC researchers to study different strategies for synthetic data generation for big data.
Data Infrastructure-based Sustainable Communities Project
Natasha Balac, director of SDSC’s Predictive Analytics Center of Excellence, was recognized for a project she is coordinating with Clean Tech San Diego and OSIsoft to develop a “sustainable communities” infrastructure for downtown San Diego.
“We envision deploying a data infrastructure that connects physical systems such as those managing electricity, gas, water, waste, buildings, transportation and traffic,” said Balac, who’s SDSC group is a non-profit public educational organization focused on leveraging the potential of predictive data analytics and developing a comprehensive, sustainable, and secure cyberinfrastructure. “This project will enable the city of San Diego to use city-scale applications that will result in reduced electricity consumption and cost, while at the same time anticipating or uncovering grid instabilities, educating the public, and improving both the quality of life and economic development.”
OSIsoft’s software system will connect to and acquire significant volumes of detailed data streams which will be published in a cyber-secure, private cloud that is only accessible via signed and approved access mechanism protocols. Presently, San Diego Gas & Electric (SDGE) and UC San Diego are beta-testing the OSIsoft software, and UC San Diego researchers are using the campus’ Microgrid system to analyze the data on the main SDGE grid and the UC San Diego Smart Grid.
“A key goal of this project and its broad collaboration is to develop a model for the collection and refinement of data that is transportable to other communities and applications,” said Balac. “Processes developed, as well as their results, will be published to help enable other communities on their own path to sustainability.”