Music enthusiasts are growing exponentially and based on this, many songs are being introduced to the market and stored in signal music libraries. Due to this development emotion recognition model from music contents has received increasing attention in today’s world. Of these technologies, a novel Music Emotion Recognition (MER) system is introduced to meet the ever-increasing demand for easy and efficient access to music information. Even though this system was well-developed it lacks in maintaining accuracy of the system and finds difficulty in predicting multi-label emotion type. To address these shortcomings, in this research article, a novel MER system is developed by inter-linking the pre-processing, feature extraction and classification steps. Initially, pre-processing step is employed to convert larger audio files into smaller audio frames. Afterwards, music related temporal, spectral and energy features are extracted for those pre-processed frames which are subjected to the proposed gradient descent based Spiking Neural Network (SNN) classifier. While learning SNN, it is important to determine the optimal weight values for reducing the training error so that gradient descent optimization approach is adopted. To prove the effectiveness of proposed research, proposed model is compared with conventional classification algorithms. The proposed methodology was experimentally tested using various evaluation metrics and it achieves 94.55% accuracy. Hence the proposed methodology attains a good accuracy measure and outperforms well than other algorithms. © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.