Content-based auto-tagging of audios using deep learning


In the recent years, deep learning and feature learning have drawn significant attention in the field of Music Information Retrieval (MIR) research, inspired by good results in speech recognition and computer vision. Here, we tackle the problem of content-based automatic tagging of audios which is a multi-label classification task. Deep neural network architectures like Convolutional Neural Network and Convolutional Recurrent Neural Network are used to learn hierarchical features from musical audio signals and the experiments are performed on MagnaTagATune (MTT) dataset. We focused to achieve state-of-the-art performance with Mel-spectrogram input. Tags such as genre, instruments, emotions etc. can be automatically predicted for newer tracks with the focus on accurate classification of clips. These tags convey high-level information from a listener’s perspective and thus can be used for organization of music library, efficient music browsing, creating personalized recommendations, playlist generation, and other applications.

In International Conference on Big Data, IoT and Data Science, 2017