Team Erdio: Audio Classification for Urban Sounds

written by

Olivia Haimerl

Thursday, February 16, 2023

Congratulations to Team Erdio for being a Top 5 Project of the The Erdős Institute’s Spring 2022 Data Science Bootcamp with their project Erdio: Audio Classification for Urban Sounds!

Composed of McGill University students Matthew Frick, Matthew Heffernan, and Paul Jreidini, Team Erdio successfully created an open-source gunshot identification system trained on realistic audio that also classifies additional urban sounds and provides information on first-response via siren detection. In creating this system, the team noted that “timely identification of safety-critical events, such as gunshots, is of great importance to public safety stakeholders. However, existing systems only deliver limited value by not classifying additional urban sounds.” To accomplish this feat, Team Erdio utilized data from UrbanSound8K to classify urban field recordings such as air conditioners, children playing, drilling, jackhammers, street music, car horns, and others while also classifying gunshots as a separate urban sound. Through data cleaning, feature engineering (including breaking down each audio file into human audible frequency bins, decomposing into harmonic and percussive components, extracting relative power and other features), feature selection, and classifier training, Team Erdio was able to create a system that “achieved F ~ 85% for top models [on identifying gunshots], balancing recall and precision” tested on episodes of Futurama that had multiple urban sounds, including gunshots.

When determining what project to develop over the course of the Data Science Bootcamp, Matthew H. noted that the team wanted to find a good existing data set to work with, as they “wanted to focus on the machine learning and data science component” of the project as opposed to directly focusing on the data acquisition process. Paul further discussed that the team desired to challenge themselves: “we wanted to work with something outside of our comfort zone, but not so much so that we didn’t know where to start.” With the fast and intensive timeline of the Spring Bootcamp, Team Erdio highlighted the importance of splitting up the work according to each student’s strengths, with Matthew H. having experience in machine learning and Python, Paul having experience in coding, and Matthew F. having experience in data analysis. However, to successfully complete their project, the work often required them to source information elsewhere. As Matthew F. described, “I had previous knowledge of analysis on classifier data, but never in the audio medium, so I went and found a textbook on audio analysis to find some directions and inspiration.” Further, the team had to overcome difficulties when constructing the system, as there were slight code mismatches that persisted and challenges that arose with using audio, such as substantial foreground noise in the audio clips, competing sounds that were often confused, such as drills, engines, and jackhammers, and gunshot noises suffering from class imbalance during the training data.

Beyond completing the gunshot identification system equipped with urban sounds, Team Erdio also engineered general features of the system for off-line classification tasks and demonstrated how the system could provide additional value for other stakeholders beyond the government

and first responders, such as the film and television industry. Moreover, if given additional time to work on the audio classification system, Matthew H. noted that they “would like to expand the underlying data set, as it was very realistic but cleaner data would be better to train the system with.” Matthew F. noted that they “would like to streamline and speed up some of the data analysis so that we could more realistically make an app or something where an individual could input an audio recording and it could be used for a variety of applications.” The team suggested that it could even be used to help with Google traffic updates, as individuals could upload audio clips with live traffic and construction sounds.

The entirety of Team Erdio noted that the most rewarding part of the project was actually seeing the system work and classify gunshots in the episodes of Futurama (that were complete with a variety of other urban sounds). Matthew F. highlighted that “seeing the classifiers actually pick up on the gunshots in completely new data that had nothing to do with our data training was a really great moment.” Although the team agrees that their success was mostly due to hard work and intense focus during the bootcamp, they noted that to be successful, a team should initially start with a good data set that doesn’t require intensive data cleaning. Moreover, Team Erdio noted that future bootcamp participants shouldn’t be afraid to jump forward and pick any classifier while doing test cases for feature engineering as machine learning may not always correspond with human intuition.

Congratulations again to Team Erdio for being a Top 5 Project of The Erdős Institute’s Spring 2022 Data Science Bootcamp!

TEAM

Predicting faults in Olympic sports (preliminary plan: Show Jumping)

Olympic show jumping consists of horse and rider pairs competing over a course of 12-17 jumps that are between 1.40 and 1.65 meters tall. Faults occur when the horse knocks down the obstacle, refuses to jump the obstacle, or steps a foot in a water obstacle, as well as if the pair exceeds the allotted time or the rider falls. Eliminations occur after a second refusal, fall of the horse, or the second time the rider falls. Using Fédération Équestre Internationale (FEI) data, several studies have classified when faults and falls are most likely to occur based on the type of jump, whether the jump is in a turn, how late in the course a jump occurs, etc. However, the FEI data only lists total faults, not when the faults occurred, and the studies have relied on hand-coding from videos.
This project will first train a model to detect knock downs from event footage, then hopefully use the tool to analyze the faults at an FEI event. While live replays are available on the FEI YouTube channel, there may be a terms of service issue around using these videos.

Video overview of the sport https://www.youtube.com/watch?v=9N5nnon1Qbw
Paper exploring where faults are most likely to occur, with diagrams of different jump types https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8926200/

If you have an idea for another sport, and a data set is publicly available, that is another option for the project.

THE ERDŐS INSTITUTE

Helping PhDs get and create jobs they love at every stage of their career.

Team Erdio: Audio Classification for Urban Sounds

TEAM

Predicting faults in Olympic sports (preliminary plan: Show Jumping)

THE ERDŐS INSTITUTE

Helping PhDs get and create jobs they love at every stage of their career.

Team Erdio: Audio Classification for Urban Sounds

TEAM

Predicting faults in Olympic sports (preliminary plan: Show Jumping)

​