Computer Vision for Music Identification

Yan Ke, Derek Hoiem, and Rahul Sukthankar
IEEE Conference on Computer Vision and Pattern Recognition, June, 2005, pp. 597 - 604.

  • Adobe portable document format (pdf) (311KB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

We describe how certain tasks in the audio domain can be effectively addressed using computer vision approaches. This paper focuses on the problem of music identification, where the goal is to reliably identify a song given a few seconds of noisy audio. Our approach treats the spectrogram of each music clip as a 2-D image and transforms music identification into a corrupted sub-image retrieval problem. By employing pairwise boosting on a large set of Viola-Jones features, our system learns compact, discriminative, local descriptors that are amenable to efficient indexing. During the query phase, we retrieve the set of song snippets that locally match the noisy sample and employ geometric verification in conjunction with an EM-based "occlusion" model to identify the song that is most consistent with the observed signal. We have implemented our algorithm in a practical system that can quickly and accurately recognize music from short audio samples in the presence of distortions such as poor recording quality and significant ambient noise. Our experiments demonstrate that this approach significantly outperforms the current state-of-the-art in content-based music identification.

music identification, computer vision

Number of pages: 8

Text Reference
Yan Ke, Derek Hoiem, and Rahul Sukthankar, "Computer Vision for Music Identification," IEEE Conference on Computer Vision and Pattern Recognition, June, 2005, pp. 597 - 604.

BibTeX Reference
   author = "Yan Ke and Derek Hoiem and Rahul Sukthankar",
   title = "Computer Vision for Music Identification",
   booktitle = "IEEE Conference on Computer Vision and Pattern Recognition",
   pages = "597 - 604",
   month = "June",
   year = "2005",
   volume = "1",