Students Crunch and Conquer Datasets at Ohio State's First DataFest
DataFest, founded at UCLA in 2011, now sponsored by the American Statistical Association and hosted by several of the country’s top colleges and universities, takes mastering data analysis to a level far beyond the classroom — and offers prizes to the winning teams.
One of the coolest things about this unique competition is the emphasis on the art of storytelling with data. Competitors have complete autonomy regarding their approach to their analysis problem. Stakes are low (no grades), and rewards are high (prizes and a great résumé boost), inspiring students to generate risky, innovative ideas to solve the problem they’re given.
This year, Ohio State’s Department of Statistics and undergraduate data analytics major hosted DataFest for the first time, along with 14 other institutions nationwide. The competition is open to all undergraduates across campus, regardless of major.
“While ASA helps secure the data set from a provider and offers a ‘how-to guide’ with helpful tips and tricks, they don’t help in planning, judging or other major ways,” said Matt Miller, data analytics major academic planning specialist and DataFest organizer. “Each event is free to take their own path. The goal is to help grow interest in working with data and statistical analysis. Most of my work involved the logistics of finding space and funding to pay for everything from food to prizes.
“One of the most compelling narratives for me was the interdisciplinary nature of the teams,” Miller said. “For example, the team that won ‘Best Use of Outside Data’ consisted of four first-year students with majors in data analytics, atmospheric science, French/environment and natural resources, and international studies. Their work really showed how effectively students can think outside the box when combining talent from different disciplines.”
Ninety-three undergraduates participated, representing 25 different majors and five colleges: the College of Arts and Sciences; the College of Engineering; the College of Food, Agricultural, and Environmental Sciences; the College of Medicine; and the Fisher College of Business. The top five majors participating: data analytics, accounting, computer science, math and finance.
During the two-day, weekend event on April 9-10, 22 teams of up to five students worked feverishly to extract insights from a large, rich data set. Then, permitted a limited number of slides, teams were given just a few minutes to present their findings to the judges — professors and data analytics professionals from local and national businesses.
Three different teams won prizes in each of the three categories: “Best Analysis,” “Best Visualization,” and “Best Use of Outside Data.”
Alisa Noll, third-year integrated systems engineering (ISE) student, said of her experience, “DataFest allowed me the opportunity to expand on my analytics skills while working with a multifaceted team of peers, many of whom were originally strangers. It was exciting to draw profit-increasing insights from colossal amounts of Ticketmaster data. I learned from extremely talented people, including those from Capital One, Ford and Ohio State’s stat department. I know many of these students had never used Python, R or Tableau before, but turned out winning prizes! Definitely worth the 24-plus hours.”
Categories, winning teams and summary findings (*indicates team name):
Best Overall Analysis: Imagineers*
Cameron Priest (Business)
Brett Bejcek (Data Analytics)
Alisa Noll (ISE)
Jack Schroder (ISE)
Grant Savage (ISE)
Our team discovered that the genre of Hiphop had the highest ticket sales and also most bought presale tickets. Rock had much higher average sales, but less pre-sale offerings, during which customers will usually pay more. We then determined which states had the highest average cost of tickets for the Rock genre in order to determine where to present more pre-sale options. We also determined the average disposable income per county and overlaid that on a map with the ticket prices to determine what areas are willing to pay more.
Best Use of Outside Data: Statistical Anomaly*
Mae Hutchison (Data Analytics)
James White (Atmospheric Science)
Kathleen Fillingim (French & Environment and Natural Resources)
Mohamed Taha Meziane-Tani (International Studies)
We found a slight negative correlation between the density of events per year in the Gulf region of the U.S. and the density of category 3+ hurricanes and higher in the same area. The more severe tropical events in a year, the less concerts or other events were held in that year. We concluded that with further study, Ticketmaster may be able to develop a model to predict the relative number of events in a region based on major events such as extreme weather.
Best Visualization: Team MK*
Kenji Gerhardt (data analytics)
Morgan Phillips (Integrated Systems Engineering)
Our goal was to produce a method of targeting populations according to their musical interests by region and adjust ticket pricing to meet levels acceptable to the populations of those regions in order to maximize profits. We found what we believe is a useful link between region of the country and preferred type of music genre as well as an analysis of the price sensitivity of the same regions. By targeting advertising to the most interested groups and finding one method of optimizing pricing for max profit, we believe we proposed a plan that would achieve our goal.
While the data and the challenge differ each year, the goal — making sense of big data — remains the same. Ticketmaster provided this year’s data set, which was not unveiled until the start of the DataFest competition. The data set — over 3 GB of online advertising, user-behavior and purchasing data — is real-world data that Ticketmaster analysts are currently using to better understand and improve their business practices.
“The students did a great job handling the difficulties associated with working with such a large amount of data,” said Christopher Hans, associate professor of statistics and co-director of the data analytics major. “I was very impressed with the energy and enthusiasm the students put into their work on the challenge.”
Here’s what Bob Thomas, manager and technical leader of Enterprise Risk and Engineering Analytics at Ford Global Data Insights & Analytics, reported to his colleagues:
Let me start by simply saying what an awesome event!
Over 100+ undergraduate students, yes, I said undergraduate students, from across The Ohio State University gathered to tackle a real-world data challenge that was undoubtedly beyond the scope any student could have encountered in the classroom.
I had the wonderful opportunity to act as a mentor to 18 teams comprised of 2-5 students on data preparation, problem formation, coding considerations, and presentation tips or “story-telling”. The funny thing was not only was I amazed at what undergraduate students could do and how quickly they learned but I also actually became more and more energized as the event progressed.
These students gave up their entire weekend with several teams pulling “all-nighters” to formulate their problem, import a 1.8 GB dataset, learn various statistical techniques on-the-fly to perform an analysis (e.g. classification and regression trees which I’m confident is not part of an undergraduate program), and cleverly visualize and communicate their story in just 4 minutes using 3 slides. My only regret was I didn’t get to work with all 100+ students BUT I was happy to learn I would also participate as a judge.
I have to admit, towards the end of my mentoring I was starting to wonder as bright and energetic as the students were could they get through such a complex dataset and problem formulation and put a useful story together that made sense. I can honestly say after watching 18 team presentations I was floored at how well the teams could state the problem, describe their analysis, visually present results, gather and use data outside of the competition to support their analysis and finally, articulate a business solution.
It was truly an incredible event and I hope to get invited to mentor and/or judge in the future and would encourage anyone else who gets a chance to participate as a mentor and/or judge to seriously consider it as well.
Miller, summed it up from his perspective, “The event was a huge success; it really exceeded my expectations. And, yes, we will do it again in 2017!”
Primary event sponsors, in order of level of support: the P&G Fund of the Greater Cincinnati Foundation, Ford GDIA, Victoria’s Secret, Capital One Technology, PwC, TDA@OSU, Google, Prevedere and DataCamp.