image
image
user-login
Patent search/

Intelligent Stacked Ensemble Framework for Optimized Heart Disease Prediction Using Random Forest, XGBoost, MLP, and LightGBM with Logistic Regression as Meta-Estimator

search

Patent Search in India

  • tick

    Extensive patent search conducted by a registered patent agent

  • tick

    Patent search done by experts in under 48hrs

₹999

₹399

Talk to expert

Intelligent Stacked Ensemble Framework for Optimized Heart Disease Prediction Using Random Forest, XGBoost, MLP, and LightGBM with Logistic Regression as Meta-Estimator

ORDINARY APPLICATION

Published

date

Filed on 20 November 2024

Abstract

Heart diseases are a leading cause of mortality globally, accounting for approximately 17.9 million deaths annually. Over the past three decades, global deaths from heart diseases have increased by 60%, primarily due to limited healthcare resources. Early detection is essential for effective management through timely intervention such as counseling and medication. Prior research has highlighted key diagnostic elements for heart disease, including genetic factors and lifestyle indicators such as age, gender, smoking habits, stress, blood pressure, troponin levels, and electrocardiogram (ECG) readings. The present invention seeks to develop a machine learning (ML)-based predictive model that accurately identifies heart disease risk using a variety of medical features. The invention evaluates several ML techniques for heart disease prediction and identifies the optimal algorithm for high accuracy and minimal misclassification. Among the tested methods, an ensemble approach combining Random Forest (RF), Multi-layer Perceptron (MLP), XGBoost, and LightGBM using a stacking classifier with Logistic Regression (LR) as a meta-model demonstrated the highest accuracy of 95.8%. This method outperforms other techniques and provides a promising framework for the early prediction of heart diseases, ultimately contributing to the reduction of global mortality rates associated with these conditions.

Patent Information

Application ID202441090057
Invention FieldBIO-MEDICAL ENGINEERING
Date of Application20/11/2024
Publication Number48/2024

Inventors

NameAddressCountryNationality
NOMULA NAGARJUNA REDDYDepartment of Computer Science and Engineering, B V Raju Institute of Technology, Narsapur, Telangana - 502313IndiaIndia
THOUTIREDDY SHILPADepartment of Computer Science and Engineering, B V Raju Institute of Technology, Narsapur, Telangana - 502313IndiaIndia
LINGADALLY NIPUNDepartment of Computer Science and Engineering, B V Raju Institute of Technology, Narsapur, Telangana - 502313IndiaIndia
MD UZAIR BABADepartment of Computer Science and Engineering, B V Raju Institute of Technology, Narsapur, Telangana - 502313IndiaIndia
NYALAKANTI RISHINDRADepartment of Computer Science and Engineering, B V Raju Institute of Technology, Narsapur, Telangana - 502313IndiaIndia

Applicants

NameAddressCountryNationality
B V Raju Institute of TechnologyDepartment of Computer Science and Engineering, B V Raju Institute of Technology, Narsapur, Telangana - 502313IndiaIndia

Specification

Description:2. FIELD OF THE INVENTION:
This invention relates to the field of medical diagnostics and artificial intelligence. Specifically, it pertains to a machine learning-based system that provides accurate and early prediction of heart disease by using an optimized stacked ensemble framework. This framework combines multiple machine learning algorithms to maximize prediction accuracy, robustness, and generalizability, enabling early diagnosis and intervention in clinical settings.
________________________________________
3. BACKGROUND OF THE INVENTION:
Heart disease is a leading cause of mortality worldwide, responsible for approximately 17.9 million deaths annually. Early diagnosis of heart disease can lead to timely medical intervention, significantly reducing the risk of severe complications. Traditional diagnostic methods rely on clinical assessments and physical examinations, which may not fully capture the risk factors associated with heart disease, particularly in asymptomatic patients.
Recent advancements in machine learning (ML) provide new opportunities for accurate and scalable diagnostic systems that can process large, multidimensional datasets, capturing complex relationships among various clinical and lifestyle factors. Ensemble learning techniques, which combine multiple algorithms to improve predictive performance, are particularly promising for heart disease prediction. However, challenges remain in balancing accuracy with generalizability and minimizing misclassification errors.
This invention addresses these challenges by proposing a unique stacked ensemble model incorporating Random Forest, XGBoost, Multi-Layer Perceptron (MLP), and LightGBM as base models, with logistic regression as a meta-estimator. This approach optimizes prediction accuracy, minimizes the error rate, and offers a reliable solution for early diagnosis of heart disease in diverse patient populations.
________________________________________
4. OBJECTIVES OF THE INVENTION:
• To Enhance Prediction Accuracy for Heart Disease.
• To Provide an Optimized Framework for Clinical Data Analysis.
• To Facilitate Early Diagnosis and Intervention.
• To Ensure Robustness and Generalizability Across Patient Populations.
• To Create a Scalable Solution for Clinical Deployment.
• To Reduce Computational Complexity While Maximizing Predictive Power.
• To Improve Healthcare Outcomes Through Data-Driven Decision Making
________________________________________
5. SUMMARY OF THE INVENTION:
This invention provides a novel stacked ensemble framework designed for the early prediction of heart disease by combining multiple machine learning algorithms. The framework employs Random Forest, XGBoost, Multi-Layer Perceptron, and LightGBM as base models to capture various aspects of clinical data and uses logistic regression as a meta-estimator to aggregate predictions from these base models. The invention applies advanced feature selection techniques-combining Kullback-Leibler (KL) divergence and Fisher Index-to identify the most relevant clinical features, enhancing the model's predictive power.
The key objectives and advantages of this invention are:
• Increased Diagnostic Accuracy: The framework achieves high accuracy in predicting heart disease by integrating multiple models, each contributing unique strengths.
• Enhanced Feature Selection: The feature selection process uses KL divergence and Fisher Index scores, averaged to identify the most informative features, which improves the model's efficiency and interpretability.
• Adaptability to Clinical Settings: The model's design allows for deployment in real-world healthcare environments, where early and accurate heart disease prediction is essential for improving patient outcomes.
________________________________________

6. DETAILED DESCRIPTION OF THE INVENTION:
I. System Overview
The proposed system consists of a stacked ensemble model optimized for predicting heart disease risk. The system operates by preprocessing patient data, selecting important features, training multiple machine learning models, and combining their outputs through a logistic regression meta-estimator. This structure leverages the strengths of each model type, enhancing both prediction accuracy and model interpretability.
II. Data Preprocessing and Feature Selection
Data preprocessing is a crucial step in preparing patient data for machine learning, as real-world datasets often contain noise, missing values, and categorical data requiring encoding. The data preprocessing steps in this system include:
• Handling Missing Values: Missing data is either imputed or removed to ensure dataset consistency.
• Encoding Categorical Variables: Clinical variables that are categorical in nature are converted into numerical formats to facilitate processing by machine learning models.
• Feature Scaling: Features are standardized to prevent any single variable from dominating due to its scale.
Feature selection is achieved by combining two powerful methods: Kullback-Leibler (KL) divergence and the Fisher Index. Each feature is independently scored by these methods to capture its statistical and probabilistic significance in heart disease prediction. The final feature score is derived by averaging the KL divergence and Fisher Index scores, which provides a balanced and comprehensive assessment of feature relevance. The selected features are thus optimized to contribute maximally to the model's accuracy, minimizing irrelevant or redundant information.
III. Model Architecture
A. Base Models
The system incorporates four diverse machine learning algorithms as base models, each trained independently on the selected features:
• Random Forest (RF): RF is an ensemble of decision trees that reduces overfitting by averaging the outputs from multiple trees. It provides robust predictive power, particularly for handling categorical and continuous data.
• XGBoost: XGBoost is a gradient-boosting model that is highly effective for structured data, capturing complex patterns through iterative improvement of weak learners.
• Multi-Layer Perceptron (MLP): An MLP is a deep neural network that models non-linear relationships, using multiple layers to learn intricate patterns in the data.
• LightGBM: LightGBM is a tree-based model optimized for speed and efficiency, leveraging leaf-wise growth techniques that enhance both predictive performance and memory management.
Each model is optimized through hyperparameter tuning to achieve maximum accuracy on the training dataset, contributing unique insights into the final prediction.
B. Meta-Estimator (Logistic Regression)
The meta-estimator, logistic regression, is a linear model selected for its efficiency and interpretability. It aggregates predictions from the base models, yielding a single, final prediction. Logistic regression's simplicity allows it to effectively combine outputs from the complex base models, providing a balance between accuracy and generalization.
IV. Training and Testing Process
• Dataset Splitting: The data is divided into training and validation sets, typically in an 80/20 ratio. This split allows for independent model training and validation to prevent overfitting.
• Model Training: The base models are independently trained on the preprocessed dataset, with hyperparameter optimization performed for each model.
• Stacking Ensemble Construction: After training, predictions from each base model are combined through the logistic regression meta-estimator.
• Performance Evaluation: The model is evaluated on several metrics, including accuracy, precision, recall, and F1 score, ensuring a comprehensive assessment of predictive performance.
V. System Workflow
The system workflow consists of the following steps:
• Data Collection and Preprocessing: Raw data is collected, cleaned, and processed for modeling.
• Feature Selection: Important clinical and demographic features are selected based on combined KL divergence and Fisher Index scores.
• Model Training: The base models are independently trained, and a logistic regression meta-estimator is used to finalize the predictions.
• Deployment: The model is deployed in a clinical setting for real-time heart disease prediction, facilitating early intervention for patients at risk.
, Claims:• Claim 1: A system for heart disease prediction comprising a stacked ensemble framework, wherein Random Forest, XGBoost, MLP, and LightGBM are used as base models and logistic regression serves as the meta-estimator for final prediction.
• Claim 2: The system of claim 1, wherein the data preprocessing includes handling missing values, encoding categorical variables, and scaling features to ensure model compatibility.
• Claim 3: The system of claim 1, wherein feature selection is performed by computing and averaging feature scores from Kullback-Leibler (KL) divergence and Fisher Index techniques to identify the most relevant features for heart disease prediction.
• Claim 4: The system of claim 1, wherein the logistic regression meta-estimator combines the predictions from the base models, providing a final prediction with improved accuracy and robustness.
• Claim 5: The system of claim 1, further configured to be deployed in real-world clinical environments for early diagnosis of heart disease, enabling timely intervention and improving patient outcomes.

Documents

NameDate
202441090057-COMPLETE SPECIFICATION [20-11-2024(online)].pdf20/11/2024
202441090057-DECLARATION OF INVENTORSHIP (FORM 5) [20-11-2024(online)].pdf20/11/2024
202441090057-FORM 1 [20-11-2024(online)].pdf20/11/2024
202441090057-REQUEST FOR EARLY PUBLICATION(FORM-9) [20-11-2024(online)].pdf20/11/2024

footer-service

By continuing past this page, you agree to our Terms of Service,Cookie PolicyPrivacy Policy  and  Refund Policy  © - Uber9 Business Process Services Private Limited. All rights reserved.

Uber9 Business Process Services Private Limited, CIN - U74900TN2014PTC098414, GSTIN - 33AABCU7650C1ZM, Registered Office Address - F-97, Newry Shreya Apartments Anna Nagar East, Chennai, Tamil Nadu 600102, India.

Please note that we are a facilitating platform enabling access to reliable professionals. We are not a law firm and do not provide legal services ourselves. The information on this website is for the purpose of knowledge only and should not be relied upon as legal advice or opinion.