image
image
user-login
Patent search/

ADAPTIVE DATA QUALITY-DRIVEN SCALING AND ENCODING SYSTEM

search

Patent Search in India

  • tick

    Extensive patent search conducted by a registered patent agent

  • tick

    Patent search done by experts in under 48hrs

₹999

₹399

Talk to expert

ADAPTIVE DATA QUALITY-DRIVEN SCALING AND ENCODING SYSTEM

ORDINARY APPLICATION

Published

date

Filed on 6 November 2024

Abstract

The invention introduces an adaptive data quality-driven scaling and encoding system designed for machine learning and artificial intelligence applications. This innovative system dynamically adjusts preprocessing techniques based on real-time assessments of data quality, addressing common challenges such as outliers, missing values, and distribution changes. By continuously monitoring data characteristics and leveraging feedback from model performance, the system optimizes scaling and encoding methods to enhance model accuracy and efficiency. Its modular architecture ensures scalability and compatibility with various machine learning frameworks. Furthermore, the explainable decisions interface provides transparency in preprocessing choices, allowing users to understand and adjust techniques based on domain knowledge. This invention significantly improves the effectiveness of machine learning models by aligning data transformations with the current state of the data, thereby overcoming the limitations of traditional static preprocessing methods.

Patent Information

Application ID202411084983
Invention FieldCOMPUTER SCIENCE
Date of Application06/11/2024
Publication Number47/2024

Inventors

NameAddressCountryNationality
Prikshat Kumar AngraLovely Professional University, Delhi-Jalandhar GT road Phagwara- 144411.IndiaIndia
Gagandeep Singh CheemaLovely Professional University, Delhi-Jalandhar GT road Phagwara- 144411.IndiaIndia
Ashwani KumarLovely Professional University, Delhi-Jalandhar GT road Phagwara- 144411.IndiaIndia
Pritpal SinghLovely Professional University, Delhi-Jalandhar GT road Phagwara- 144411.IndiaIndia

Applicants

NameAddressCountryNationality
Lovely Professional UniversityLovely Professional University, Delhi-Jalandhar GT road Phagwara- 144411.IndiaIndia

Specification

Description:The following specification particularly describes the invention and the manner in which it is to be performed.
CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY
[001] The present application does not claim priority from any patent application.
TECHNICAL FIELD
[002] The present subject matter described herein, in general, relates to scaling and encoding system using adaptive techniques.
BACKGROUND
[003] In the realm of machine learning, effective data pre-processing is essential for enhancing model performance and ensuring accurate predictions. Traditional data scaling and encoding techniques are predominantly static, applying predefined methods uniformly across datasets without adapting to real-time data quality issues. This inflexibility can lead to suboptimal outcomes, particularly when handling data with outliers, skewed distributions, or varying cardinalities over time. Further, conventional methods often inadequately address data quality challenges such as missing values or inconsistencies, typically relying on manual interventions or fixed procedures that may not be suitable for all scenarios.
[004] This lack of real-time adaptation limits the effectiveness of machine learning models, resulting in unreliable predictions and inefficient processing workflows. Also, the absence of transparency in pre-processing decisions makes it challenging for users to validate and trust the methods employed, highlighting the need for an advanced system that dynamically adjusts pre-processing techniques based on ongoing data quality assessments.
[005] US2016073106A1, In a video coding system, a common video sequence is coded multiple times to yield respective instances of coded video data. Each instance may be coded according to a set coding parameters derived from a target bit rate of a respective tier of service. Each tier may be coded according to a constraint that limits a maximum coding rate of the tier to be less than a target bit rate of another predetermined tier of service. Having been coded according to the constraint facilitates dynamic switching among tiers by a requesting client device processing resources or communication bandwidth changes. Improved coding systems to switch among different coding streams may increase quality of video streamed while minimizing transmission and storage size of such content. However this invention mainly deals with adaptive techniques for video streaming.
[006] US11228766B2, Videos may be characterized by objective metrics that quantify video quality. Embodiments are directed to target bitrate prediction methods in which one or more objective metrics may serve as inputs into a model that predicts a mean opinion score (MOS), a measure of perceptual quality, as a function of metric values. The model may be derived by generating training data through conducting subjective tests on a set of video encodings, obtaining MOS data from the subjective tests, and correlating the MOS data with metric measurements on the training data. The MOS predictions may be extended to predict the target (encoding) bitrate that achieves a desired MOS value. The target bitrate prediction methods may be applied to segments of a video. The methods may be made computationally faster by applying temporal subsampling. The methods may also be extended for adaptive bitrate (ABR) applications by applying scaling factors to predicted bitrates at one frame size to determine predicted bitrates at different frame sizes. A dynamic scaling algorithm may be used to determine predicted bitrates at the different frame sizes. However, the invention mainly deals with dynamic scaling for videos.
OBJECT
[007] The primary objective of this invention is to develop an adaptive data quality-driven scaling and encoding system that dynamically adjusts pre-processing techniques based on real-time assessments of data quality and evolving data characteristics. By leveraging artificial intelligence (AI) algorithms, the invention aims to enhance the accuracy and efficiency of machine learning models, ensuring that preprocessing methods are optimally aligned with the specific needs of the data and model requirements. This system is designed to improve the handling of data quality issues, promote transparency in decision-making, and facilitate scalability and flexibility in processing large datasets, ultimately advancing the effectiveness of machine learning applications across various domains.
SUMMARY
[008] This invention presents an innovative adaptive data quality-driven scaling and encoding system designed to enhance data pre-processing in machine learning applications. By continuously assessing data quality and model performance, the system dynamically adjusts pre-processing techniques to address challenges such as outliers, missing values, and distribution shifts. Key components include a data quality assessment module that identifies issues, an adaptive scaling engine that modifies scaling methods in real-time, and an adaptive encoding module tailored for categorical data.
[009] The system features a feedback loop to refine strategies based on performance metrics, promoting continuous improvement. Also, an explainable decisions interface provides transparency, enabling users to understand and manually adjust pre-processing choices when necessary. With a modular and scalable architecture, this invention seamlessly integrates with various machine learning frameworks, making it suitable for diverse applications, including predictive modeling, natural language processing, and computer vision, ultimately improving model accuracy and efficiency.
BRIEF DESCRIPTION OF DRAWINGS
[0010] The foregoing detailed description of embodiments is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the present subject matter, an example of the construction of the present subject matter is provided as figures; however, the invention is not limited to the specific method disclosed in the document and the figures.
[0011] FIG 1: Illustrates the working of the model with the help of a chart
DETAILED DESCRIPTION
[0012] Some embodiments of this disclosure, illustrating all its features, will now be discussed in detail. The words "comprising," "having," "containing," and "including," and other forms thereof, are intended to be equivalent in meaning and be open-ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Although any product to encode and scale through adaptive techniques in the practice or testing of embodiments of the present disclosure.
[0013] The data quality assessment module is a sophisticated component designed to ensure the integrity and reliability of the input data utilized in machine learning processes. It employs a range of statistical methods for comprehensive data profiling, enabling the identification of essential characteristics such as central tendency, variability, and distribution patterns within the dataset. This profiling process involves calculating descriptive statistics like mean, median, standard deviation, and quartiles to create a robust overview of the data's structure and behaviour.
[0014] In parallel, the module integrates advanced algorithms for anomaly detection, which are pivotal for identifying outliers and irregular patterns that may indicate data quality issues. These algorithms can include techniques such as z-score analysis, interquartile range (IQR) methods, and machine learning-based approaches like isolation forests or auto-encoders. By analyzing deviations from expected data distributions, the module effectively highlights anomalies that could skew model performance. Furthermore, the data quality assessment module operates in real-time, allowing for the continuous monitoring of incoming data streams, ensuring timely detection of quality issues and enabling immediate corrective actions to maintain the integrity of the pre-processing pipeline. This proactive approach not only enhances the overall data quality but also contributes significantly to the performance and reliability of subsequent machine learning models.
[0015] The adaptive scaling engine is a dynamic component of the data pre-processing pipeline, designed to optimize the transformation of input features based on their statistical properties and identified data quality issues. This engine utilizes a selection of scaling techniques, including min-max scaling, z-score normalization, and robust scaling, to accommodate various data distributions and ensure the features are appropriately prepared for machine learning algorithms.
[0016] Min-max scaling transforms the data to a specified range, typically [0, 1], ensuring that all features contribute equally to model training. Z-score normalization, on the other hand, standardizes the data by centering it around the mean with a unit standard deviation, which is particularly useful for datasets that follow a normal distribution. Robust scaling focuses on the median and the interquartile range, making it ideal for datasets with significant outliers, as it minimizes the influence of extreme values.
[0017] The adaptive scaling engine continuously monitors the input data for quality issues such as missing values, outliers, and skewed distributions. Based on the analysis of these issues, it intelligently adjusts the scaling parameters and selects the most suitable technique for the current dataset. For example, if outliers are detected, the engine may prioritize robust scaling to ensure that the resulting feature transformations remain stable and reliable. This adaptability not only enhances the performance of machine learning models by providing them with well-scaled features but also improves overall data handling efficiency, allowing the pre-processing pipeline to respond in real-time to changing data characteristics and quality challenges.
[0018] The adaptive encoding module serves as a critical component in the data preprocessing pipeline, designed to facilitate the effective transformation of categorical variables into numerical formats suitable for machine learning models. This module employs various encoding techniques, including one-hot encoding, label encoding, target encoding, and embeddings, to accommodate diverse data characteristics and enhance model performance.
[0019] One-hot encoding transforms categorical variables into binary vectors, where each category is represented as a separate binary feature, ensuring that no ordinal relationships are implied among the categories. This approach is particularly useful for nominal data where categories are distinct and do not hold any inherent order. Conversely, label encoding assigns unique integer values to each category, making it suitable for ordinal data where a clear ranking exists among the categories.
[0020] In scenarios where the categorical variable is associated with a target variable, target encoding becomes advantageous. This technique replaces the categorical variable with the average target value for each category, allowing the model to capture the relationship between the categorical features and the target variable more effectively.
[0021] Embeddings are utilized for high cardinality categorical features, where traditional encoding methods may result in excessive dimensionality. This technique maps each category to a continuous vector space, preserving semantic relationships and allowing the model to generalize better.
[0022] The adaptive encoding module continuously monitors the incoming data for shifts in categorical distributions, enabling it to dynamically adjust its encoding strategies. For instance, if a previously unseen category is detected, the module can seamlessly incorporate it through one-hot encoding or embedding, depending on the context. Additionally, if the distribution of existing categories changes significantly, the module may switch from target encoding to label encoding to maintain the integrity of the transformations. This adaptive capability ensures that the encoding processes remain aligned with the evolving data characteristics, ultimately leading to improved model accuracy and robustness.
[0023] The real-time feedback loop is an essential mechanism that integrates machine learning algorithms to enhance the pre-processing methods applied to the data. This loop functions by continuously monitoring the impact of various scaling and encoding techniques on the performance of the predictive models. Initially, a baseline model is established using standard pre-processing methods, which include techniques such as min-max scaling, standardization, one-hot encoding, and label encoding.
[0024] Once the baseline model is set, the real-time feedback loop collects performance metrics, such as accuracy, precision, recall, and F1 score, during the model training and validation phases. As the model iterates through multiple training cycles, the feedback loop analyzes these performance metrics to identify any correlations between specific pre-processing techniques and the model's predictive capabilities.
[0025] Machine learning algorithms, particularly reinforcement learning and optimization algorithms, are employed to assess the effectiveness of different pre-processing strategies. For instance, if the performance metrics indicate that one-hot encoding is yielding higher accuracy than label encoding for certain categorical variables, the feedback loop dynamically adjusts the pre-processing pipeline to favor the more effective technique. Conversely, if a particular scaling method is found to negatively impact model performance, the feedback loop can trigger a transition to an alternative scaling method, such as robust scaling, to improve results.
[0026] This iterative process allows for continuous refinement of the pre-processing methods, adapting to changes in the data distribution and evolving model requirements. The real-time feedback loop thus ensures that the data pre-processing pipeline remains agile and responsive, optimizing the overall model performance through data-driven decision-making. By implementing this dynamic approach, the system can effectively leverage insights from past iterations to inform future pre-processing strategies, leading to a more robust and accurate predictive model.
[0027] The explainable decisions interface serves as a pivotal component within the overall data pre-processing system, designed to enhance user understanding and transparency regarding the choices made during data preparation. This interface operates by generating comprehensive summaries and visualizations that elucidate the rationale behind the selected pre-processing techniques.
[0028] Upon the completion of each pre-processing stage, the interface compiles data on the methods applied, such as normalization, encoding, or imputation, alongside the specific parameters utilized for these techniques. Users are presented with intuitive visual representations, such as flowcharts or decision trees, illustrating the sequence of pre-processing steps taken. Each technique is accompanied by descriptive summaries explaining its purpose and expected impact on the model's performance.
[0029] Additionally, the interface utilizes visual analytics, including bar graphs or heat maps, to convey the comparative effectiveness of various pre-processing methods based on performance metrics like accuracy, recall, or F1 score. For instance, if one-hot encoding leads to a marked improvement in model accuracy over label encoding for a particular dataset, this will be visually represented, enabling users to grasp the implications of their pre-processing decisions quickly.
[0030] Furthermore, the interface allows users to interactively explore different pre-processing configurations. Users can simulate alternative techniques and immediately observe the projected impact on model performance through real-time feedback. This interactivity not only empowers users to make informed decisions but also fosters a deeper understanding of how pre-processing choices affect the predictive outcomes.
[0031] Ultimately, the explainable decisions interface enhances the usability of the data pre-processing system by transforming complex decisions into accessible insights, thereby promoting trust and confidence in the automated processes while facilitating a more collaborative approach between data scientists and domain experts.
[0032] The modular architecture of the data pre-processing system is engineered to provide seamless integration of new pre-processing techniques, thus ensuring the system remains adaptable to the continuously evolving landscape of machine learning frameworks. This architecture is structured into distinct modules, each responsible for specific pre-processing tasks such as data cleaning, normalization, feature extraction, and encoding.
[0033] Each module operates independently, allowing developers to introduce new pre-processing techniques or update existing ones without necessitating a complete overhaul of the system. For instance, when a novel normalization method or a new encoding strategy emerges, it can be encapsulated within a new module designed to interact with the existing framework through standardized interfaces. This ensures compatibility while maintaining the integrity and functionality of the overall system.
[0034] Moreover, the architecture incorporates version control mechanisms, enabling the easy tracking of updates and changes made to individual modules. This feature facilitates experimentation with multiple versions of a pre-processing technique, allowing users to compare performance metrics and select the most effective approach for their specific datasets.
[0035] To enhance user engagement, the system includes a plug-and-play functionality that allows data scientists and engineers to customize their pre-processing pipelines. Users can easily enable or disable specific modules based on their project requirements, tailoring the pre-processing steps to suit different datasets and machine learning tasks.
[0036] The modular architecture not only fosters innovation by permitting the inclusion of cutting-edge pre-processing techniques but also ensures that the system can evolve in tandem with advancements in machine learning practices. This adaptability enhances the system's longevity and relevance, empowering users to leverage the most effective pre-processing methods available while maintaining operational efficiency and ease of use.
[0037] The method for dynamically adjusting data pre-processing techniques in machine learning applications involves several key steps that ensure optimal data preparation based on real-time conditions and feedback. Initially, raw data is ingested into the system, where a real-time quality assessment is performed to detect issues such as missing values, outliers, or data inconsistencies. Based on the identified issues, appropriate pre-processing techniques-such as scaling methods (e.g., normalization, standardization) or encoding techniques (e.g., one-hot encoding, label encoding)-are automatically selected and applied.
[0038] This process is not static; it incorporates continuous feedback from the performance of the machine learning models. As the model trains and generates performance metrics, the system evaluates the impact of the pre-processing methods on model accuracy, precision, and other relevant factors. This feedback loop enables the dynamic refinement of pre-processing techniques, ensuring they are tailored to the model's evolving needs.
[0039] Moreover, the system provides explanations for each pre-processing decision, offering transparency and empowering users to understand the rationale behind the choices. Users can also make manual adjustments if necessary, further enhancing the system's flexibility and control. This method fosters a robust and adaptive pre-processing process that improves model performance while maintaining user insight and intervention opportunities.
, Claims:1. A data pre-processing system for machine learning and artificial intelligence applications, comprising:
a. a data quality assessment module configured to continuously monitor and evaluate data characteristics;
b. an adaptive scaling engine dynamically selecting and adjusting data scaling techniques;
c. an adaptive encoding module automatically selecting and adapting encoding techniques for categorical data;
d. a real-time feedback loop that incorporates model performance metrics;
e. an explainable decisions interface providing transparency and rationale;
2. The system as claimed in claim 1, wherein the data quality assessment module utilizes statistical methods for anomaly detection.
3. The system as claimed in claim 1, wherein the adaptive scaling engine selects from techniques and adjusts scaling parameters based on data quality issues.
4. The system as claimed in claim 1, wherein the adaptive encoding module employs methods and dynamically adjusts encoding strategies as data characteristics change.
5. The system as claimed in claim 1, wherein the real-time feedback loop employs machine learning algorithms to iteratively update pre-processing methods.
6. The system as claimed in claim 1, wherein the explainable decisions interface generates summaries and visualizations of pre-processing decisions.
7. The system as claimed in claim 1, wherein the modular architecture allows for the easy integration.
8. A method for dynamically adjusting data pre-processing techniques in machine learning applications, comprising:
a. ingesting data into the system;
b. performing real-time quality assessment to identify data issues;
c. selecting and applying appropriate scaling and encoding techniques based on the assessment;
d. integrating feedback from model performance to refine pre-processing techniques;
e. providing explanations for pre-processing decisions to facilitate user understanding and manual adjustments.
9. The method as claimed in claim 8, wherein the scaling and encoding techniques are adjusted dynamically ensuring that the pre-processing aligns with the current state of the data and model requirements.

Documents

NameDate
202411084983-COMPLETE SPECIFICATION [06-11-2024(online)].pdf06/11/2024
202411084983-DECLARATION OF INVENTORSHIP (FORM 5) [06-11-2024(online)].pdf06/11/2024
202411084983-DRAWINGS [06-11-2024(online)].pdf06/11/2024
202411084983-EDUCATIONAL INSTITUTION(S) [06-11-2024(online)].pdf06/11/2024
202411084983-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [06-11-2024(online)].pdf06/11/2024
202411084983-FORM 1 [06-11-2024(online)].pdf06/11/2024
202411084983-FORM FOR SMALL ENTITY [06-11-2024(online)].pdf06/11/2024
202411084983-FORM FOR SMALL ENTITY(FORM-28) [06-11-2024(online)].pdf06/11/2024
202411084983-FORM-9 [06-11-2024(online)].pdf06/11/2024
202411084983-REQUEST FOR EARLY PUBLICATION(FORM-9) [06-11-2024(online)].pdf06/11/2024

footer-service

By continuing past this page, you agree to our Terms of Service,Cookie PolicyPrivacy Policy  and  Refund Policy  © - Uber9 Business Process Services Private Limited. All rights reserved.

Uber9 Business Process Services Private Limited, CIN - U74900TN2014PTC098414, GSTIN - 33AABCU7650C1ZM, Registered Office Address - F-97, Newry Shreya Apartments Anna Nagar East, Chennai, Tamil Nadu 600102, India.

Please note that we are a facilitating platform enabling access to reliable professionals. We are not a law firm and do not provide legal services ourselves. The information on this website is for the purpose of knowledge only and should not be relied upon as legal advice or opinion.