image
image
user-login
Patent search/

MUSIC GENRE CLASSIFICATION SYSTEM BASED ON MACHINE LEARNING AND DEEP LEARNING

search

Patent Search in India

  • tick

    Extensive patent search conducted by a registered patent agent

  • tick

    Patent search done by experts in under 48hrs

₹999

₹399

Talk to expert

MUSIC GENRE CLASSIFICATION SYSTEM BASED ON MACHINE LEARNING AND DEEP LEARNING

ORDINARY APPLICATION

Published

date

Filed on 7 November 2024

Abstract

The present invention discloses a music genre classification system utilizing machine learning and deep learning methodologies, specifically employing a Convolutional Neural Network (CNN). The system categorizes music types from audio files through a structured process involving dossier preprocessing, feature extraction via Wavelet Transform Analysis, and genre categorization. By integrating spectrogram and wavelet data, the system enhances the accuracy of genre differentiation. Additionally, it features a personalized recommendation system that tailors suggestions based on user preferences, promoting user engagement in music streaming services and digital libraries. Built on the scalable PyTorch framework, the model processes audio inputs in real-time, ensuring efficient classification and improved user experience. The system is evaluated using the GTZAN dataset, demonstrating superior performance compared to traditional methods. This innovation addresses limitations in existing music classification technologies while fostering music discovery and enhancing the overall listening experience. Accompanied Drawing [Fig. 1-4]

Patent Information

Application ID202411085337
Invention FieldCOMPUTER SCIENCE
Date of Application07/11/2024
Publication Number47/2024

Inventors

NameAddressCountryNationality
Sachin JainAssistant Professor, Computer Science and Engineering, Ajay Kumar Garg Engineering College, GhaziabadIndiaIndia
Siddhi AgrawalComputer Science and Engineering, Ajay Kumar Garg Engineering College, GhaziabadIndiaIndia
Sneha JaiswalComputer Science and Engineering, Ajay Kumar Garg Engineering College, GhaziabadIndiaIndia
Soumya MaheshwariComputer Science and Engineering, Ajay Kumar Garg Engineering College, GhaziabadIndiaIndia
Vivek NautiyalComputer Science and Engineering, Ajay Kumar Garg Engineering College, GhaziabadIndiaIndia

Applicants

NameAddressCountryNationality
Ajay Kumar Garg Engineering College27th KM Milestone, Delhi - Meerut Expy, Ghaziabad, Uttar Pradesh 201015IndiaIndia

Specification

Description:[001] The present invention pertains to the field of music genre classification and recommendation systems, specifically focusing on the categorization of music genres using machine learning and deep learning techniques. This invention addresses the need for efficient classification of music genres in contexts involving visual and audio content transmitted through various channels, including radio waves, as well as digital media and computer-based data storage and retrieval systems.
BACKGROUND OF THE INVENTION
[002] Background description includes information that may be useful in understanding the present disclosure. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed disclosure, or that any publication specifically or implicitly referenced is prior art.
[003] The classification of music genres has become increasingly important in the modern digital age, where vast amounts of audio data are generated and consumed daily. As the variety of musical genres expands, organizing and categorizing these audio files poses a significant challenge. Traditional methods of manually identifying and sorting music can be time-consuming and prone to error, particularly when dealing with overlapping characteristics among genres. This project addresses the complexity of audio data organization by providing an automated music genre classification system that categorizes music files into ten distinct genres, streamlining the process for listeners, artists, and digital platforms. By leveraging machine learning and deep learning techniques, this invention aims to enhance the accuracy and efficiency of music genre classification.
[004] In the realm of music genre classification, several approaches have been developed, with varying degrees of success. Existing state-of-the-art techniques, such as Music Information Retrieval (MIR), utilize machine learning algorithms to extract features from audio files, including Mel-Frequency Cepstral Coefficients (MFCCs) and spectral contrast. Deep learning models, particularly Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), have gained traction for their ability to identify complex genre patterns in spectrograms and audio sequences. Notable prior patents, such as US10657584B2, which focuses on real-time genre classification using deep neural networks, and US8199873B2, which emphasizes automatic genre classification based on audio content, represent significant advancements in this field.
[005] Despite these advancements, several shortcomings persist in the current methodologies for music genre classification. Traditional algorithms such as Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and Decision Trees often struggle with high-dimensional data and require manual feature extraction, resulting in lower classification accuracy. Furthermore, while CNNs are widely recognized for their effectiveness, they can be computationally expensive and may misclassify similar genres, such as jazz and classical music. RNNs, on the other hand, are challenged by long-term dependencies and computational demands, leading to inefficiencies in real-time applications. Additionally, many existing systems are limited to single input types, primarily focusing on either spectrograms or audio waveforms, thereby reducing the richness of feature extraction.
[006] The present invention overcomes the limitations of these prior arts through several innovative features. First, by utilizing a CNN for automated feature extraction from spectrograms, this system improves accuracy and scalability compared to traditional methods that rely on manual input. The integration of wavelet transforms and multi-modal inputs enhances genre distinction, allowing for more precise classification of similar genres. Furthermore, the invention implements an automated deep learning pipeline that minimizes manual bias, enabling real-time genre classification capabilities-an advancement over many existing batch-processing solutions.
[007] By incorporating personalized music recommendations based on user preferences, this system goes beyond mere classification, enhancing user engagement. Additionally, the use of wavelet transforms enhances robustness against noisy data, addressing a common limitation in existing technologies. Built on the flexible PyTorch framework, this system is scalable and adaptable to large datasets, supporting comprehensive evaluation metrics for a more holistic performance assessment. Ultimately, the proposed invention not only addresses the drawbacks of current systems but also supports academic research and music education, targeting a niche area that has been underexplored in the field of music classification.
SUMMARY OF THE INVENTION
[008] This section is provided to introduce certain objects and aspects of the present disclosure in a simplified form that are further described below in the detailed description. This summary is not intended to identify the key features or the scope of the claimed subject matter.
[009] The present invention discloses a novel music genre classification system that leverages advanced machine learning and deep learning techniques, specifically utilizing a Convolutional Neural Network (CNN). This system facilitates the categorization of music types from audio files by implementing a structured three-step process: dossier preprocessing, feature extraction, and categorization. The dossier preprocessing unit prepares raw audio data by converting it into a standardized format, removing noise, and ensuring consistency in sample rates. Following this, the feature extraction module employs Wavelet Transform Analysis to capture essential temporal and spectral characteristics of audio signals, enabling a robust representation of audio features. These features, derived from both spectrograms and wavelet transforms, serve as the input for the CNN, which classifies the music into distinct genres based on learned patterns and hierarchies.
[010] Furthermore, the invention incorporates a personalized recommendation system that tailors music suggestions based on users' preferences, thereby enhancing user engagement and satisfaction. Real-time processing capabilities enable instantaneous classification of audio inputs, streamlining the user experience in music streaming services and digital music libraries. The model is built on the PyTorch framework, ensuring scalability and adaptability to handle large datasets efficiently. Comprehensive performance evaluations are conducted using the GTZAN dataset, which comprises 1,000 audio tracks spanning ten genres, ensuring rigorous training and testing of the classification model. The system's ability to process multi-modal input, deliver automated genre predictions, and provide personalized recommendations sets it apart from existing methodologies, addressing the limitations of traditional music classification systems while fostering music discovery and exploration.
BRIEF DESCRIPTION OF DRAWINGS
[011] The accompanying drawings are included to provide a further understanding of the present disclosure, and are incorporated in, and constitute a part of this specification. The drawings illustrate exemplary embodiments of the present disclosure, and together with the description, serve to explain the principles of the present disclosure.
[012] In the figures, similar components, and/or features may have the same reference label. Further, various components of the same type may be distinguished by following the reference label with a second label that distinguishes among the similar components. If only the first reference label is used in the specification, the description is applicable to any one of the similar components having the same first reference label irrespective of the second reference label.
[013] Fig. 1-4 illustrates various systematic diagrams associated with the proposed system, in accordance with the embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[014] The following is a detailed description of embodiments of the disclosure depicted in the accompanying drawings. The embodiments are in such detail as to clearly communicate the disclosure. However, the amount of detail offered is not intended to limit the anticipated variations of embodiments. On the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit, and scope of the present disclosure as defined by the appended claims.
[015] In the following description, numerous specific details are set forth in order to provide a thorough understanding of embodiments of the present invention. It will be apparent to one skilled in the art that embodiments of the present invention may be practiced without some of these specific details.
[016] Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail to avoid obscuring the embodiments.
[017] Also, it is noted that individual embodiments may be described as a process that is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function.
[018] The word "exemplary" and/or "demonstrative" is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as "exemplary" and/or "demonstrative" is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms "includes," "has," "contains," and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term "comprising" as an open transition word without precluding any additional or other elements.
[019] Reference throughout this specification to "one embodiment" or "an embodiment" or "an instance" or "one instance" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
[020] In an embodiment of the invention and referring to Figures 1-4, the present invention relates to a novel system for music genre classification leveraging advanced machine learning and deep learning methodologies. This system is designed to automatically categorize music types from audio files transmitted through various media, facilitating efficient organization, retrieval, and recommendation processes in digital music libraries and streaming services. The classification system utilizes a three-step process: dossier preprocessing, feature extraction, and categorization, with a Convolutional Neural Network (CNN) architecture at its core. The significance of this invention lies not only in its innovative approach to music classification but also in its potential to enhance user experience through personalized recommendations and real-time processing capabilities.
[021] At the heart of this invention is the music genre classification system, which incorporates several key components and methodologies. The first component is the Dossier Preprocessing Unit, which is responsible for handling raw audio data. This unit processes the audio files to ensure they are suitable for analysis by converting them into a consistent format, removing noise, and standardizing the sample rate. The preprocessing unit ensures that the subsequent feature extraction process operates on clean and uniform audio data, thereby enhancing the overall performance of the system.
[022] Following preprocessing, the Feature Extraction Module comes into play. This module employs Wavelet Transform Analysis to capture both temporal and spectral characteristics of audio signals, which are crucial for distinguishing between different music genres. By applying wavelet transforms, the system effectively analyzes audio signals at various frequency scales, enabling it to extract rich features that represent the audio's structure and content. This dual perspective on audio signals-both in the time domain and the frequency domain-provides a comprehensive understanding of the audio characteristics essential for accurate genre classification.
[023] The output of the feature extraction module is then fed into the Convolutional Neural Network (CNN). The CNN is specifically designed to leverage the extracted features from spectrograms and wavelet transforms. This architecture consists of multiple convolutional layers, pooling layers, and fully connected layers that work collaboratively to learn complex patterns and hierarchies within the audio data. Each convolutional layer applies various filters to the input data, capturing spatial hierarchies and local features that are critical for identifying genre-specific characteristics. The use of pooling layers reduces the dimensionality of the feature maps, thereby increasing the model's computational efficiency and resilience to overfitting.
[024] Multi-Modal Input Processing is a defining characteristic of this system, as it combines spectrogram data and wavelet-transformed data to form a rich dataset for the CNN. By integrating these two input types, the system can leverage complementary information, enhancing its ability to differentiate between similar genres effectively. This multi-faceted approach to data processing allows the CNN to perform a more nuanced analysis of the audio input, ultimately improving classification accuracy.
[025] To evaluate the performance of the proposed system, the GTZAN Dataset is utilized. This dataset comprises 1,000 audio tracks categorized across ten distinct genres, providing a robust framework for training and testing the model. By leveraging a well-established dataset, the invention not only ensures comprehensive training but also facilitates comparative evaluations against existing classification systems. The effectiveness of the model is gauged using various metrics, including accuracy, recall, and precision, which are calculated using a Confusion Matrix Evaluation framework.
[026] One of the primary advantages of the present invention is its Automated Genre Prediction capability. This allows for real-time processing of audio inputs, making it feasible for the system to classify music genres instantly upon upload. This immediate feedback mechanism enhances the user experience, particularly in applications such as music streaming services where users expect quick results. By streamlining the classification process, the system supports efficient music discovery and personalized recommendations based on the identified genres.
[027] Furthermore, the invention incorporates a Personalized Recommendation System that utilizes user preferences to tailor music suggestions based on predicted genres. This system actively engages users by analyzing their listening habits and adapting recommendations to align with their interests. Such personalization fosters a deeper connection between users and the music they enjoy, thereby enhancing engagement and satisfaction within digital music platforms.
[028] The architecture of the system is designed to facilitate Real-Time Processing Capability, allowing for immediate classification and recommendation functionalities. This characteristic sets the invention apart from many existing systems, which predominantly rely on batch processing methods that can delay user interactions. Real-time processing not only improves user experience but also optimizes the overall workflow within music classification systems.
[029] Additionally, the invention addresses issues related to Robustness to Noisy Data. The incorporation of wavelet transforms significantly enhances the system's performance when faced with noisy audio inputs, which are common in real-world scenarios. By utilizing wavelet transforms, the system is better equipped to extract relevant features even in the presence of noise, thus ensuring reliable genre classification under diverse conditions.
[030] The entire system is built on the PyTorch Framework, known for its scalability and flexibility in handling large datasets. This choice of framework allows for efficient model training and adaptation to varying data sizes and complexities. The use of PyTorch also facilitates the integration of advanced deep learning techniques, enabling the implementation of more complex models that can further improve classification performance.
[031] Table 1 below summarizes the interconnection and working of each component within the music genre classification system, highlighting the efficacy of their collaborative functioning:

Table 1: Interconnection of Components
[032] Through the collaborative functioning of these components, the system effectively addresses existing limitations in music genre classification methodologies. Traditional approaches often suffer from the need for manual feature extraction, high computational costs, and limited capabilities in noisy environments. The present invention alleviates these issues by employing automated feature extraction methods and integrating advanced deep learning techniques, leading to enhanced accuracy and efficiency in classification tasks.
[033] Additionally, the invention supports various applications across different domains. For instance, in Music Streaming Services, the system can automatically categorize and recommend songs based on their genres, significantly enhancing the user experience. In Digital Music Libraries, the automated tagging of songs by genre streamlines the organization and management of extensive music collections, making it easier for users to locate and enjoy their favorite tracks.
[034] The system is also valuable in Music Education, providing a resource for teaching music theory and genre characteristics. By categorizing different musical styles, educators can enhance students' understanding of music diversity and history. Furthermore, the invention aids in Research and Analysis by offering a robust framework for genre classification studies, facilitating academic exploration in musicology and audio analysis.
[035] In accordance with another embodiment of the present invention, the system can assist industry professionals in analyzing trends and preferences in music consumption. By providing insights into genre popularity and user preferences, stakeholders can make informed decisions regarding marketing strategies and content curation.
[036] In accordance with another embodiment of the present invention, the invention presents opportunities for the integration of Recurrent Neural Networks (RNNs), such as GRU or LSTM models, could further improve the system's ability to classify music based on time-based features, including rhythm and melody. Additionally, efforts can be directed toward curating high-quality datasets that reduce redundancy and ensure broader musical diversity for improved training outcomes.
[037] In conclusion, this music genre classification system represents a significant advancement in the field of audio analysis and music information retrieval. By combining novel hardware and software components, employing state-of-the-art deep learning techniques, and prioritizing user experience through personalization and real-time capabilities, this invention effectively addresses the shortcomings of prior arts. Its comprehensive design not only improves the accuracy and efficiency of music genre classification but also fosters user engagement and discovery, ultimately enriching the music listening experience.
, Claims:1. A music genre classification system, comprising:
a) a dossier preprocessing unit configured to preprocess audio files by converting them into a consistent format, removing noise, and standardizing the sample rate;
b) a feature extraction module utilizing wavelet transform analysis to extract temporal and spectral characteristics of audio signals;
c) a convolutional neural network (CNN) that receives extracted features from the feature extraction module and processes these features to classify the music into distinct genres;
d) a personalized recommendation system that tailors music suggestions to user preferences based on classified genres; and
a real-time processing capability allowing instant genre classification upon audio input.
2. The music genre classification system as claimed in Claim 1, wherein the feature extraction module combines spectrogram data with wavelet-transformed data to provide multi-modal input for the CNN, enhancing genre differentiation accuracy.
3. The music genre classification system as claimed in Claim 1, wherein the convolutional neural network is implemented using the PyTorch framework, enabling scalability and efficient training on large datasets.
4. The music genre classification system as claimed in Claim 1, wherein the dossier preprocessing unit employs advanced noise reduction techniques to improve the quality of audio inputs before feature extraction.
5. The music genre classification system as claimed in Claim 1, further includes an automated genre prediction module that classifies music genres instantly upon audio upload, enhancing user experience in music streaming services.
6. The music genre classification system as claimed in Claim 1, wherein the personalized recommendation system analyzes user listening habits to adapt music suggestions, fostering user engagement and satisfaction.
7. The music genre classification system as claimed in Claim 1, wherein the performance evaluation of the CNN is conducted using a confusion matrix to measure classification accuracy, recall, and precision.
8. The music genre classification system as claimed in Claim 1, further including a user-friendly interface that allows users to upload audio files easily and receive classification results promptly.
9. The music genre classification system as claimed in Claim 1, wherein the system is robust to noisy data inputs due to the incorporation of wavelet transforms, which facilitate effective feature extraction even in adverse conditions.
10. The music genre classification system as claimed in Claim 1, wherein the integration of recurrent neural networks (RNNs) is proposed for future enhancements to improve classification based on temporal features such as rhythm and melody.

Documents

NameDate
202411085337-COMPLETE SPECIFICATION [07-11-2024(online)].pdf07/11/2024
202411085337-DECLARATION OF INVENTORSHIP (FORM 5) [07-11-2024(online)].pdf07/11/2024
202411085337-DRAWINGS [07-11-2024(online)].pdf07/11/2024
202411085337-FORM 1 [07-11-2024(online)].pdf07/11/2024
202411085337-FORM 18 [07-11-2024(online)].pdf07/11/2024
202411085337-FORM-9 [07-11-2024(online)].pdf07/11/2024
202411085337-REQUEST FOR EARLY PUBLICATION(FORM-9) [07-11-2024(online)].pdf07/11/2024

footer-service

By continuing past this page, you agree to our Terms of Service,Cookie PolicyPrivacy Policy  and  Refund Policy  © - Uber9 Business Process Services Private Limited. All rights reserved.

Uber9 Business Process Services Private Limited, CIN - U74900TN2014PTC098414, GSTIN - 33AABCU7650C1ZM, Registered Office Address - F-97, Newry Shreya Apartments Anna Nagar East, Chennai, Tamil Nadu 600102, India.

Please note that we are a facilitating platform enabling access to reliable professionals. We are not a law firm and do not provide legal services ourselves. The information on this website is for the purpose of knowledge only and should not be relied upon as legal advice or opinion.