Iowa State Is First U.S. Student Team to Win International Data Mining Competition
Source Newsroom: Iowa State University
Newswise — AMES, Iowa -- A team of Iowa State University graduate students topped 98 other universities from 28 countries to capture first place in the 15th annual Data Mining Cup. The winner was announced July 2 in Berlin. It is the first time a team from the United States has won the competition.
Prudsys AG, a leading European data mining company, sponsors the intelligent-data analysis competition for universities. According to Prudsys, the competition is meant to be a "bridge between university and industry to identify the best up-and-coming data miners."
Teams had six weeks to develop a solution for a data mining problem about optimal return prognosis. This year, teams had to use an unidentified online store's historical purchase data to create a model for new orders that predicts the probability of a purchase being returned.
"The motivation for this contest data is that some online retailers offering free return shipping have almost half of their orders returned," said Iowa State's team leader and statistics Ph.D. candidate Cory Lanker.
"We could advance our ideas to create an application that helps online retailers reduce returned shipments and increase profit margins," he said.
Between April 2 and May 14, teams worked at their respective universities to develop their probability predictions.
"Teams submitted return probabilities for approximately 50,000 purchases made in one month using data from approximately 481,000 orders from the previous 12 months," Lanker said.
"They used 12 variables that characterize the customer information — such as age, location and purchase history — and information about ordered items — such as size, color, price, etc."
Lanker said that the basis of Iowa State's technical solution was "to fully characterize customer behavior, which we did using advanced statistical learning concepts on the provided history of purchases. Once we successfully characterized customer behavior, we could then best predict whether a new purchase would be returned."
"This was specifically a student contest," said Steve Vardeman, University Professor of statistics and industrial engineering. "The team had no direct faculty input on the problem. They organized and executed their solution entirely on their own."
A jury scored all 57 submitted solutions, and invited the top 10 teams to Berlin to present their solution methods at the Prudsys User Days conference. Each team gave a 10-minute presentation.
The top-place Iowa State team received 2,000 euro prize money (about $2,700) and a plaque. No other American university placed in the top 20. The next highest were Northwestern University (24th place) and the University of Southern California (36th place).
Iowa State team members and their departments are Guillermo Basulto-Elias (statistics), Fan Cao (statistics), Xiaoyue Cheng (statistics), Marius Dragomiroiu (computer science), Jessica Hicks (bioinformatics and computational biology), Cory Lanker (statistics), Ian Mouzon (statistics), Lanfeng Pan (statistics) and Xin Yin (bioinformatics and computational biology/statistics). Lanker and Mouzon were on last year's Iowa State team, which finished in fifth place.
Basulto-Elias, Yin and Lanker went to Berlin for the presentation and announcement. Final team rankings were announced beginning with 10th place.
"Before long, fifth place was announced and it wasn't us, so I knew we did better this year," Lanker said. "When it was down to two teams, (Prudsys organizer) Jens Scholz said, 'The United States lost in the World Cup last night,' and I thought, 'Well this is us, we finished second,' but then he added, 'But a United States team has won the 2014 Data Mining Cup!'"
Lanker says the shock has not worn off yet. He attributes the team's success to multiple weekly team meetings that were well attended at the end of the semester, demonstrating the "dedication we all had to our team's success."
"As a leader, I stressed sticking to a schedule so we didn't run out of time, and involving everyone in discussions about making the many important statistical decisions," Lanker said. "The level of teamwork was extraordinary ... with many large contributions from all members."