image
image
user-login
Patent search/

A METHOD FOR EFFICIENT TEXT CLASSIFICATION USING LONG SHORT-TERM MEMORY (LSTM) NEURAL NETWORKS

search

Patent Search in India

  • tick

    Extensive patent search conducted by a registered patent agent

  • tick

    Patent search done by experts in under 48hrs

₹999

₹399

Talk to expert

A METHOD FOR EFFICIENT TEXT CLASSIFICATION USING LONG SHORT-TERM MEMORY (LSTM) NEURAL NETWORKS

ORDINARY APPLICATION

Published

date

Filed on 9 November 2024

Abstract

The invention presents a method for efficient multi-label text classification using Long Short-Term Memory (LSTM) neural networks. The method involves preprocessing text data, applying GloVe word embeddings, and utilizing LSTM architecture with an integrated attention mechanism for enhanced classification accuracy. The model is fine-tuned and tested through eight configurations, with the sixth configuration yielding an accuracy rate of 95.17%. Designed for scalability, the system handles large-scale datasets efficiently, achieving high precision, recall, and F1-scores across various text classification tasks such as news categorization and sentiment analysis. The approach is adaptable for other NLP applications, offering a robust and efficient solution for text categorization.

Patent Information

Application ID202411086309
Invention FieldCOMPUTER SCIENCE
Date of Application09/11/2024
Publication Number47/2024

Inventors

NameAddressCountryNationality
Mr. Neeraj SirohiDepartment of IT, IMS Engineering College, Ghaziabad, Uttar Pradesh, IndiaIndiaIndia
Lakshya GuptaDepartment of IT, IMS Engineering College, Ghaziabad, Uttar Pradesh, IndiaIndiaIndia
Mahima YadavDepartment of IT, IMS Engineering College, Ghaziabad, Uttar Pradesh, IndiaIndiaIndia
Aman ChaudharyDepartment of IT, IMS Engineering College, Ghaziabad, Uttar Pradesh, IndiaIndiaIndia
Anushka ChaudharyDepartment of IT, IMS Engineering College, Ghaziabad, Uttar Pradesh, IndiaIndiaIndia

Applicants

NameAddressCountryNationality
IMS Engineering CollegeNational Highway 24, Near Dasna, Adhyatmik Nagar, Ghaziabad, Uttar Pradesh- 201015IndiaIndia

Specification

Description:[0001] The present invention pertains to the field of Artificial Intelligence (AI) and Natural Language Processing (NLP). It specifically focuses on text classification techniques leveraging deep learning models, particularly Long Short-Term Memory (LSTM) neural networks. These techniques are used to classify text documents based on their content, enabling efficient categorization across various applications such as news articles, reviews, emails, social media posts, and more. The invention also applies to industries such as media, finance, customer support, and e-commerce, where accurate text classification is crucial for automating and improving business processes.

Background of the Invention
[0002] Text classification, a fundamental task in NLP, is the process of assigning predefined categories to text documents based on their content. It is essential for a wide range of applications, including sentiment analysis, spam detection, content filtering, and automated tagging. Traditional approaches to text classification, such as Naive Bayes, Decision Trees, and Support Vector Machines (SVMs), have proven effective when applied to small or moderately sized datasets. However, these methods face significant limitations when applied to larger datasets due to issues like high feature dimensionality, sparse data, and inability to capture long-term dependencies in textual data.
[0003] In recent years, Deep Learning models, particularly Recurrent Neural Networks (RNNs) and their variant, LSTM, have emerged as effective solutions for handling sequential data. LSTMs address the vanishing gradient problem often encountered in RNNs, allowing them to maintain and update long-term dependencies in data sequences. By leveraging word embeddings like Global Vectors for Word Representation (GloVe), which convert words into continuous vector representations capturing semantic relationships, LSTMs can better understand the context and nuances of text. This invention builds on these developments by proposing an optimized multi-label LSTM-based model for high-accuracy text classification, demonstrating its application in the context of classifying news articles into predefined categories.

Objects of the Invention
[0004] An object of the present invention is to create an advanced LSTM-based text classification system that efficiently handles multi-label classification tasks, enhancing the accuracy of categorization for large-scale text datasets.
[0005] Another object of the present invention is to incorporate word embedding techniques, specifically Global Vectors for Word Representation (GloVe), to convert text into continuous vector representations.
[0006] Yet another object of the present invention is to optimize the LSTM model for large-scale datasets and address the challenges of long-range dependencies in textual data.
[0007] Another object of the present invention is to demonstrate high accuracy, precision, recall, and F1-score averages in text categorization tasks.
[0008] Another object of the present invention is to provide a scalable and adaptable deep learning model for various NLP applications.

Summary of the Invention
[0009] The invention provides a method and system for efficient text classification using a deep learning architecture based on LSTM neural networks. The proposed method leverages the strengths of LSTM models to handle the sequential nature of text data, making it particularly suitable for tasks requiring the understanding of long-term dependencies and contextual information.
[0010] The invention begins with the preprocessing of text data, where input text documents are cleaned, tokenized, and converted into sequences for compatibility with the LSTM model. Word embedding techniques, specifically GloVe, are employed to transform words into continuous vector representations of 300 dimensions, capturing their semantic and contextual meanings. These embeddings are fed into the LSTM layers, which are responsible for processing the sequential information and identifying patterns.
[0011] The model includes an attention mechanism, which highlights the most relevant parts of the input sequence, improving classification accuracy by allowing the model to focus on important aspects of the text. The LSTM architecture is fine-tuned and tested through multiple configurations, with eight variations analyzed to identify the most effective setup. The sixth model demonstrates the highest classification accuracy of 95.17%, with optimal precision, recall, and F1-score averages.
[0012] The method is suitable for various applications, including news categorization, spam filtering, and sentiment analysis. It is adaptable to other NLP tasks, leveraging the flexibility and scalability of LSTM networks combined with word embeddings. In this respect, before explaining at least one object of the invention in detail, it is to be understood that the invention is not limited in its application to the details of set of rules and to the arrangements of the various models set forth in the following description or illustrated in the drawings. The invention is capable of other objects and of being practiced and carried out in various ways, according to the need of that industry. Also, it is to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting.
[0013] These together with other objects of the invention, along with the various features of novelty which characterize the invention, are pointed out with particularity in the disclosure. For a better understanding of the invention, its operating advantages and the specific objects attained by its uses, reference should be made to the accompanying drawings and descriptive matter in which there are illustrated preferred embodiments of the invention.

Detailed description of the Invention
[0014] An embodiment of this invention, illustrating its features, will now be described in detail. The words "comprising," "having," "containing," and "including," and other forms thereof are intended to be equivalent in meaning and be open-ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items or meant to be limited to only the listed item or items.
[0015] The terms "first," "second," and the like, herein do not denote any order, quantity, or importance, but rather are used to distinguish one element from another, and the terms "a" and "an" herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced item.

[0016] The invention provides a method for efficient and high-accuracy multi-label text classification utilizing Long Short-Term Memory (LSTM) neural networks integrated with Global Vectors for Word Representation (GloVe) word embeddings. The method is designed to handle large-scale text datasets, ensuring optimal performance by capturing long-term dependencies and semantic relationships within the text data. The detailed steps of the invention are described as follows:
1. Data Preprocessing:
[0017] The first stage involves preparing the text data for processing. This step includes:
[0018] Cleaning the Text: Removing unnecessary characters such as punctuation, numbers, and special symbols. This helps in reducing noise in the dataset and focuses the model's learning on meaningful textual information.
[0019] Lowercasing: Converting all text to lowercase ensures uniformity, reducing the complexity arising from the same word appearing in different cases (e.g., "Apple" vs. "apple").
[0020] Tokenization: The text is broken down into smaller units called tokens, usually words or phrases. Tokenization converts each document into a sequence of tokens that can be processed further.
[0021] Removing Stop Words: Common words like "and," "is," "the," which do not contribute much meaning to the text classification, are removed. This step helps reduce the dimensionality of the input data.
[0022] Stemming and Lemmatization: These techniques are optionally applied to reduce words to their root forms, standardizing variations of the same word (e.g., "running," "runs," and "ran" to "run").
[0023] After these preprocessing steps, each document is represented as a clean sequence of meaningful words, ready for further transformation.
2. Word Embedding:
[0024] An essential part of the model architecture is the word embedding layer, which transforms the pre-processed text into a format suitable for neural network input:
[0025] The GloVe (Global Vectors for Word Representation) embedding technique is employed to convert words into continuous vector representations. GloVe embeddings capture semantic and contextual relationships between words, allowing the model to understand similarity and context. For example, words like "king" and "queen" or "car" and "vehicle" have similar vector representations, indicating their relationship.
[0026] The GloVe embeddings used in this model are 300-dimensional, meaning each word is represented as a vector of 300 values. This dimensionality provides a balance between computational efficiency and capturing the semantic richness of the text data.
[0027] The embeddings are pre-trained on large corpora, which ensures that the model starts with a robust understanding of word meanings and relationships, even before being trained on the specific text dataset.
3. LSTM Layers:
[0028] The core component of the architecture is the LSTM network, which is designed to handle sequential data by maintaining and updating long-term dependencies across time steps:
[0029] LSTM Architecture: The model consists of one or more LSTM layers, each containing a sequence of LSTM cells. These cells are designed to manage the flow of information through the network using a gating mechanism (input, forget, and output gates). These gates control which information is added to the cell state, which information is discarded, and what part of the information is passed on to the next layer or time step.
[0030] The LSTM layers are configured with a specified number of hidden units, which determines the capacity of the network to learn complex patterns. The hidden units capture dependencies in the text sequences, allowing the model to recognize context and relationships between words, even those far apart in the sequence.
[0031] Mitigating the Vanishing Gradient Problem: Traditional Recurrent Neural Networks (RNNs) often suffer from the vanishing gradient problem, where the gradients become too small during backpropagation, preventing effective learning of long-term dependencies. The LSTM architecture addresses this by retaining information through its gating mechanisms, enabling the network to capture long-term dependencies and effectively process lengthy sequences.
4. Attention Mechanism:
[0032] To enhance the model's ability to focus on relevant information in the text, an attention mechanism is integrated into the LSTM layers:
[0033] The attention mechanism calculates weights for each word in the input sequence, allowing the network to prioritize important parts of the text while minimizing the influence of less relevant parts. For instance, in a news article about politics, words like "election," "candidate," and "policy" may receive higher weights.
[0034] This mechanism enables the model to dynamically adjust its focus, improving classification accuracy, especially for long documents where only certain segments are critical for determining the category.
[0035] The attention layer outputs a weighted sum of the LSTM outputs, emphasizing significant words and enhancing the network's interpretability and performance.
5. Output Layer and Multi-Label Classification:
[0036] The final layer of the model is a dense layer with a softmax activation function, used for multi-label classification. This layer takes the output from the LSTM and attention layers and assigns probability scores to each category:
[0037] Softmax Activation: The softmax function outputs a probability distribution over all possible categories, where each category receives a score between 0 and 1, representing the likelihood that the text belongs to that category. The model is trained to maximize the accuracy of these predictions, ensuring that the highest probability corresponds to the correct label.
[0038] The multi-label nature of the classification means that the model can assign multiple categories to a single text document if applicable. For example, a news article might be categorized as both "politics" and "economics" if it covers relevant topics from both domains.


6. Model Optimization and Training:
[0039] The model is optimized through hyperparameter tuning to achieve the best performance. Several parameters are adjusted, including:
[0040] Number of LSTM Layers: The depth of the network, determined by the number of LSTM layers, is optimized to balance the model's capacity and the risk of overfitting.
[0041] Learning Rate: The learning rate controls how quickly the model updates its weights during training. A suitable learning rate is chosen to ensure convergence without overshooting the optimal values.
[0042] Batch Size: The size of batches used during training affects how frequently the model updates its weights. Smaller batches allow for more frequent updates, while larger batches provide more stable gradients.
[0043] Dropout Rate: Dropout is used to prevent overfitting by randomly deactivating neurons during training. The dropout rate is adjusted to find the right balance between generalization and retention of relevant information.
[0044] The training process involves feeding the model large-scale datasets with labelled text documents. The LSTM model is trained using backpropagation and gradient descent, with the objective of minimizing the cross-entropy loss function, which measures the difference between the predicted and actual labels.


7.Evaluation and Performance Metrics:
[0045] The model is evaluated based on several key metrics: accuracy, precision, recall, and F1-score:
[0046] Accuracy: Measures the overall correctness of the model's predictions. The optimized configuration of the LSTM model achieves an accuracy of 95.17%, indicating high performance in categorizing text correctly.
[0047] Precision: Represents the proportion of correctly predicted positive observations out of all predicted positives, reflecting the model's ability to avoid false positives.
[0048] Recall: Indicates the proportion of correctly predicted positive observations out of all actual positives, measuring the model's ability to identify all relevant cases.
[0049] F1-Score: The harmonic means of precision and recall, providing a balanced measure of the model's performance. The model demonstrates an F1-score average of over 95%, showcasing its effectiveness in handling multi-label text classification.
8.Visual Output Analysis:
[0050] The LSTM model's predictions are analyzed visually to ensure they are well-fitted to the dataset. The model produces visual outputs such as confusion matrices and attention heatmaps that highlight the model's focus during the classification process. These visual tools confirm the accuracy and reliability of the LSTM with GloVe features, indicating that the model is a good fit for the data.
9.Scalability and Adaptability:
[0051] The invention is designed for scalability, capable of handling different text classification tasks beyond the initial dataset used for development. The architecture is flexible enough to be adapted for applications such as sentiment analysis, spam detection, email classification, and content recommendation.
[0052] The LSTM model, coupled with GloVe embeddings, forms a powerful tool for various NLP tasks, demonstrating adaptability for other language processing requirements.
[0053] This detailed description outlines the invention's components, processes, and configurations, providing a robust solution for efficient and high-accuracy text classification using LSTM and GloVe.
[0054] The foregoing descriptions of specific embodiments of the present invention have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the present invention to the precise forms disclosed, and obviously many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described to best explain the principles of the present invention, and its practical application to thereby enable others skilled in the art to best utilize the present invention and various embodiments with various modifications as are suited to the particular use contemplated. It is understood that various omission and substitutions of equivalents are contemplated as circumstance may suggest or render expedient, but such are intended to cover the application or implementation without departing from the spirit or scope of the claims of the present invention.
, Claims:1. A method for multi-label text classification, comprising the steps of:
a) preprocessing input text data to clean, tokenize, and remove stop words;
b) generating word embeddings using Global Vectors for Word Representation (GloVe) to create continuous vector representations of the pre-processed text;
c) feeding the generated embeddings into a Long Short-Term Memory (LSTM) neural network comprising one or more LSTM layers for capturing sequential dependencies;
d) applying an attention mechanism to prioritize important segments of the input text;
e) producing a probability distribution for multiple labels through a multi-label output layer;
f) training the LSTM network using backpropagation and gradient descent to minimize prediction errors based on the labelled dataset.

2. A system for multi-label text classification, comprising:
a data preprocessing module configured to clean and tokenize input text data;
a word embedding module utilizing Global Vectors for Word Representation (GloVe) to generate continuous vector representations of words;
an LSTM neural network comprising one or more LSTM layers configured to capture sequential dependencies and contextual information within the input text;
an attention mechanism integrated into the LSTM layers to highlight relevant portions of the input text for improved classification accuracy;
a multi-label output layer configured to assign probability scores to multiple categories for each text document based on the outputs of the LSTM layers;
a training module for optimizing the parameters of the LSTM network using a labelled dataset to minimize prediction error.

3. The method as claimed in claim 1, wherein the preprocessing step includes stemming and lemmatization for standardizing word forms.

4. The method as claimed in claim 1, wherein the attention mechanism is applied after the LSTM layers to refine the output based on contextual relevance.

5. The method as claimed in claim 1, wherein the training step involves evaluating model performance using metrics such as accuracy, precision, recall, and F1-score.
6. The method as claimed in claim 1, wherein the further comprising a post-processing step to visualize classification results through confusion matrices and attention heatmaps.

7. The system as claimed in claim 2, wherein the word embedding module employs pre-trained GloVe embeddings of 300 dimensions.

8. The system as claimed in claim 2, wherein the wherein the attention mechanism dynamically adjusts weights based on the importance of individual words in the context of the input text.

9. The system as claimed in claim 2, wherein the training module employs hyperparameter tuning to optimize the performance of the LSTM neural network.

Documents

NameDate
202411086309-COMPLETE SPECIFICATION [09-11-2024(online)].pdf09/11/2024
202411086309-DECLARATION OF INVENTORSHIP (FORM 5) [09-11-2024(online)].pdf09/11/2024
202411086309-FORM 1 [09-11-2024(online)].pdf09/11/2024
202411086309-FORM-9 [09-11-2024(online)].pdf09/11/2024
202411086309-REQUEST FOR EARLY PUBLICATION(FORM-9) [09-11-2024(online)].pdf09/11/2024

footer-service

By continuing past this page, you agree to our Terms of Service,Cookie PolicyPrivacy Policy  and  Refund Policy  © - Uber9 Business Process Services Private Limited. All rights reserved.

Uber9 Business Process Services Private Limited, CIN - U74900TN2014PTC098414, GSTIN - 33AABCU7650C1ZM, Registered Office Address - F-97, Newry Shreya Apartments Anna Nagar East, Chennai, Tamil Nadu 600102, India.

Please note that we are a facilitating platform enabling access to reliable professionals. We are not a law firm and do not provide legal services ourselves. The information on this website is for the purpose of knowledge only and should not be relied upon as legal advice or opinion.