top of page
header.png
Team Lime: Spotify Podcast Recommender

written by

Elizabeth Campolongo

Thursday, February 16, 2023

Congratulations to Team Lime on winning The Erdős Institute’s Fall 2022 Data Science Boot Camp with their project: Spotify Podcast Recommender!

 

Composed of Music Theory Ph.D. Candidate Aditya Chander (Yale University), Economics Postdoc Ritika Khurana (University of Delaware), Sociology Ph.D. Student Yuchen Luo (New York University), and recent Linguistics Ph.D. and Market Researcher Taylor Mahler (The Ohio State University), Team Lime successfully utilized Spotify’s podcast dataset to build a podcast recommendation system. Their model is designed to take one of two inputs and suggest similar podcast episodes for the user: either the name of a podcast episode or a description of a podcast/podcast episode of interest. The relevance and relatability of suggestions was confirmed by measuring the similarity between podcasts within user-tagged categories. With more time, the team would also like to add in a user feedback option to continuously retrain their model for improved recommendations. Team Lime further suggests that the applications of this system are not limited to simply maintaining user engagement, but could also be employed by advertisers to increase revenue by targeting connected podcasts to advertise diverse products, avoiding repetitive advertising to the same listeners. Ultimately, of the two models they tried, the pre-trained transformer model resulted in 88.3% of the ordered category pairs maintaining lower similarity scores between-category than within, as compared to 75.1% with the other. Thus, they selected the pre-trained transformer model for their recommendation app.

 

When discussing how the team settled on this dataset and specific project, Ritika explained that she wanted to try something different; Aditya has a music background and also wanted to expand his horizons. Since he had worked with Spotify’s API in the past and had some familiarity with Natural Language Processing (NLP), this project was a natural extension of both their interests. Taylor and Yuchen joined their group later, both drawn to the NLP aspect of the project. Taylor’s Ph.D. is in linguistics, but she had not previously worked with NLP, and Yuchen’s experience was more theoretical for her sociology studies—she was excited to apply her NLP knowledge to something practical that a company would like.

 

At the end of the project, they were excited to have a finished product. “When we see the finished project and we realize, wait, it actually works, that I think the recommended episodes make a lot of intuitive sense,” Yuchen thought that was the most rewarding part. For Ritika it was “learning new skills, and definitely–at the end–when we realized that we won the project,” was great, “but for me, the biggest or most rewarding part was that this was my first Python project.” Taylor found that “to sit down and think about what this would mean for an actual business and actual users, because I have very limited experience outside of academia, [to] realize that it actually has business value, I think was rewarding.” Aditya agreed that it was exciting to have a product at the end: “From that perspective, knowing that—what Yuchen said—we had a product at the end of it, it wasn’t just a series of insights that maybe would have led to something else, we had a concrete deliverable app.”

 

The team noted that with more time and computational resources, they envision adding more features to the model and improving their app. For instance, they would like to continually retrain the model by having users provide feedback on the generated recommendations and include descriptions of the episodes (in both the results and for the modeling process as well). Following the completion of their project, though Aditya still mostly listens to music, he now listens to more podcasts and has utilized their app for recommendations. Taylor plans to try it to help her husband find a new podcast now that his favorite one has ended; she mostly listens to interviews or podcasts on topics she’s interested in learning. Ritika likes to listen to Hidden Brain and Trained, whose topics vary widely on the speaker, from science to philosophy. Yuchen enjoys podcasts about anime and book summaries since she doesn’t have much time to read outside of work.

 

Team Lime attributes much of their success to organization and clear delegation of tasks. They highly recommend having weekly meetings to help hold each other accountable and making clear to-do lists following the meetings so that everyone knows their task(s). Furthermore, though it is good to consider small details, it is important to not lose sight of the big picture or the end goals and deliverables of the project. Two other factors of their success that they highlight were paired-programming and great advice from their project mentor, Gleb Zhelezov.

 

Congratulations again to Team Lime as well as all of the other teams who completed a Fall 2022 Data Science Boot Camp project!

TEAM

Music Subgenre Classification

Anthony Kling,Ramachandra Rahul Taduri,Reid Harris

clear.png

Music genres are essential for organizing and categorizing music, making it easier for listeners to discover, enjoy, and connect with styles that resonate with them. Genres also carry historical, cultural, and sonic significance. Playlists, which often focus on a single subgenre, have become an increasingly popular way to discover new music.

We address the multi-label classification problem to identify a song's genre(s) using acoustic features extracted from audio files. We train a variety of supervised learning models to determine genre. Rather than focusing on broad genres (e.g., jazz, hip hop, electronic), we concentrate on four subgenres of electronic music: techno, house, trance, and drum and bass. While these subgenres are distinct and well-defined, they can be challenging to differentiate. We train various models, including XGBoost, and neural networks on data obtained from AcousticBrainz.

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL
bottom of page