image
image
user-login
Patent search/

A SYSTEM AND A METHOD FOR FORECASTING MICROALGAE CULTIVATION IN SUSTAINABLE WASTEWATER TREATMENT

search

Patent Search in India

  • tick

    Extensive patent search conducted by a registered patent agent

  • tick

    Patent search done by experts in under 48hrs

₹999

₹399

Talk to expert

A SYSTEM AND A METHOD FOR FORECASTING MICROALGAE CULTIVATION IN SUSTAINABLE WASTEWATER TREATMENT

ORDINARY APPLICATION

Published

date

Filed on 18 November 2024

Abstract

ABSTRACT A SYSTEM AND A METHOD FOR FORECASTING MICROALGAE CULTIVATION IN SUSTAINABLE WASTEWATER TREATMENT The present disclosure discloses a system (100) and a method (200) for forecasting microalgae cultivation in sustainable wastewater treatment. The system comprises a data collection module (102) that measures the growth rate of microalgal cultures based on Optical Density (OD) values using an ultraviolet (UV) spectrophotometer at predefined intervals. A data processing module (104) processes the input data, while a training module (106) splits the processed data into training and testing datasets. A time series modelling module (108) trains at least four distinct machine learning models to predict microalgae growth patterns and biomass concentrations. An optimization module (110) performs hyperparameter optimization on the predicted growth patterns and biomass concentrations, generating optimized predictions. A validation module (112) validates the optimized predictions using testing data, and an evaluation module (114) performs residual analysis on the validated predictions to identify the most accurate model. Figure 1 Figure 1

Patent Information

Application ID202441089265
Invention FieldPHYSICS
Date of Application18/11/2024
Publication Number48/2024

Inventors

NameAddressCountryNationality
KARTHIKEYAN MEENATCHI SUNDARAMDepartment of Environmental Science and Engineering, School of Engineering and Sciences, SRM University-AP, Neerukonda, Mangalagiri Mandal, Guntur- 522502, Andhra Pradesh, IndiaIndiaIndia
KARTHIK RAJENDRANDepartment of Environmental Science and Engineering, School of Engineering and Sciences, SRM University-AP, Neerukonda, Mangalagiri Mandal, Guntur-522502 , Andhra Pradesh, IndiaIndiaIndia

Applicants

NameAddressCountryNationality
SRM UNIVERSITYAmaravati, Mangalagiri, Andhra Pradesh-522502, IndiaIndiaIndia

Specification

Description:FIELD OF INVENTION
The present disclosure generally relates to the field of biotechnology. More particularly, the present disclosure relates to a system and method for forecasting microalgae cultivation in sustainable wastewater treatment.
BACKGROUND
The background information herein below relates to the present disclosure but is not necessarily prior art.
Microalgae cultivation plays a pivotal role in sustainable wastewater treatment by facilitating nutrient removal and biomass production. Accurate forecasting of biomass yield is essential for optimizing the process, especially in large-scale industrial applications. Despite existing methods, the dynamic and complex nature of microalgae cultivation poses significant challenges in predicting biomass yield reliably. Traditional models have been used to estimate productivity, but they often fall short of capturing the time-dependent behavior of biomass growth.
The current invention lacks a robust time series forecasting model specifically tailored for microalgae biomass yield. Existing statistical and machine learning models focus on input-output relationships but fail to account for the dynamic growth patterns of microalgae over time. This limitation reduces the reliability of forecasts, especially for real-time applications. Additionally, the inability to incorporate time-dependent factors in biomass prediction hinders effective resource management, including nutrients, workspace, and labor, in industrial-scale operations. Therefore, there is a pressing need for a dedicated time series model to improve the accuracy of biomass yield predictions and address these challenges in microalgae cultivation systems.
Therefore, there is felt a need for a system and method for forecasting microalgae cultivation in sustainable wastewater treatment that alleviates the aforementioned drawbacks.
OBJECTS
Some of the objects of the present disclosure, which at least one embodiment herein satisfies, are as follows:
It is an object of the present disclosure to ameliorate one or more problems of the prior art or to at least provide a useful alternative.
An object of the present disclosure is to provide a system and a method for time series forecasting for microalgae cultivation in sustainable wastewater treatment.
Another object of the present disclosure is to provide a system that integrates real-time data monitoring and advanced forecasting models to optimize microalgae biomass management in wastewater treatment, thereby enhancing operational efficiency and environmental sustainability.
Still another object of the present disclosure is to provide a system that utilizes LSTM models for accurate temporal pattern recognition in microalgae growth, enabling dynamic adjustments in nutrient inputs to maintain optimal conditions in wastewater treatment, thereby improving treatment efficiency and resource utilization.
Yet another object of the present disclosure is to provide a system that employs advanced data preprocessing techniques to handle outliers and missing values, ensuring reliable data for forecasting microalgae production in wastewater treatment, thereby optimizing operational outcomes and environmental sustainability.
Still object of the present disclosure is to provide a system that integrates AI-driven forecasting models with real-time monitoring of microalgae biomass, facilitating precise nutrient management in wastewater treatment for enhanced efficiency and sustainable resource utilization.
Another object of the present disclosure is to provide a system that enhances wastewater treatment by utilizing AI-powered forecasting models to predict microalgae growth patterns, enabling proactive adjustments in nutrient inputs to maintain optimal conditions and maximize treatment efficiency.
Other objects and advantages of the present disclosure will be more apparent from the following description, which is not intended to limit the scope of the present disclosure.
SUMMARY
The present disclosure envisages a system for time series forecasting for microalgae cultivation in sustainable wastewater treatment.
The system comprises a data collection module, a data processing module, a training module, a time series modelling module, an optimization module, a validation module, an evaluation module, a repository, and a processor.
The data collection module is configured to collect an input data by measuring the growth rate of microalgal cultures, wherein said growth rate being determined based on Optical Density (OD) values obtained from an ultraviolet (UV) spectrophotometer at a predefined time interval with continuous timestamps until the OD values reach a stationary phase.
The data processing module is configured to cooperate with said data collection device to receive said input data of microalgal cultures and further configured to implement a set of pre-processing techniques on said input data of microalgal cultures to generate processed data.
The training module is configured to cooperate with said data processing module to receive the processed data and split the processed data into training data and testing data in a 90:10 ratio.
The time series modelling module is configured to cooperate with said training module to receive data for training at least four distinct time series machine learning models to predict growth patterns and biomass concentrations of microalgal cultures over time.
The optimization module is configured to cooperate with said time series modelling module to receive said predicted growth patterns and biomass concentrations of microalgal cultures from at least four distinct time series machine learning models, and further configured to perform hyperparameter optimization of said predicted growth patterns and biomass concentrations of microalgal cultures to generate optimized predicted data for each of the at least four distinct time series machine learning models.
The validation module is configured to cooperate with said optimization module and said training module to receive said testing data for validating the optimized predicted data for each of the at least four distinct time series machine learning models, to generate validated predicted data for each of the at least four distinct time series machine learning models.
The evaluation module is configured to cooperate with said validation module to receive said validated predicted data, and further configured to perform residual analysis of the validated predicted data of the at least four distinct time series machine learning models for identifying one of the at least four distinct time series machine learning models for performing time series forecasting for microalgae cultivation in sustainable wastewater treatment.
In an aspect, the system further comprises a repository and a processor.
In an aspect, the repository is configured to store a set of predefined commands, a set of preprocessing rules, the set of pre-processing techniques, the input data of microalgal cultures, the processed data, the testing data, the training data, the predicted growth patterns, the biomass concentrations, the optimized predicted data.
In an aspect, the processor is configured to fetch the predefined instructions from the repository to execute and operate one or more modules of the system.
In an aspect, the set of pre-processing techniques includes interpolation and outlier removal techniques that handle missing values in the input data.
In an aspect, the system further includes a normalization module configured to normalize the pre-processed data of microalgal cultures to facilitate consistent analysis across at least four distinct time series machine learning models.
In an aspect, the at least four distinct time series machine learning models include the Autoregressive Integrated Moving Average (ARIMA) model, Prophet model, Long Short-Term Memory (LSTM) model, and eXtreme Gradient Boosting (XGBoost) model, to predict growth patterns of microalgal cultures based on the processed data.
In an aspect, the evaluation module calculates R-squared (R2), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE), to perform residual analysis of the validated predicted data of the at least four distinct time series machine learning models.
In an aspect, the evaluation module identifies the Long Short-Term Memory (LSTM) model for showing higher R-squared (R2) values and minimal errors.
In an aspect, the validation module validates the forecast accuracy by comparing predicted biomass concentrations with experimental results from sustainable wastewater treatment systems.
The present disclosure also envisages a method for time series forecasting for microalgae cultivation in sustainable wastewater treatment. The method comprises the following steps:
• collecting, by a data collection module, an input data by measuring a growth rate of microalgal cultures, wherein said growth rate being determined based on Optical Density (OD) values obtained from an ultraviolet (UV) spectrophotometer at a predefined time interval with continuous timestamps until the OD values reach a stationary phase;
• implementing, by a data processing module, a set of pre-processing techniques on said input data of microalgal cultures to generate processed data;
• receiving, by a training module, processed data and split the processed data into training data and testing data in a 90:10 ratio;
• receiving, by a time series modeling module, data for training at least four distinct time series machine learning models to predict growth patterns and biomass concentrations of microalgal cultures over time;
• generating, by an optimization module, optimized predicted data for each of the at least four distinct time series machine learning models;
• validating, by a validation module, the optimized predicted data for each of the at least four distinct time series machine learning models, so as to generate validated predicted data for each of the at least four distinct time series machine learning models; and
• performing, by an evaluation module, residual analysis of the validated predicted data of at least four distinct time series machine learning models for identifying one of the at least four distinct time series machine learning models for performing time series forecasting for microalgae cultivation in sustainable wastewater treatment.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWING
A system and method for forecasting microalgae cultivation in sustainable wastewater treatment the present disclosure will now be described with the help of the accompanying drawing, in which:
Figure 1 illustrates the architecture of a system for forecasting microalgae cultivation in sustainable wastewater treatment in accordance with an embodiment of the present disclosure;
Figures 2a-2b illustrate a flow chart depicting the steps involved in a method for forecasting microalgae cultivation in sustainable wastewater treatment in accordance with an embodiment of the present disclosure;
Figure 3 illustrates a flow chart of the various time series models in accordance with an embodiment of the present disclosure;
Figure 4 illustrates a graph of the growth of microbial biomass over time in accordance with an embodiment of the present disclosure;
Figure 5 illustrates a Quantile-Quantile (Q-Q) plot that compares the sample quantiles from the time series data with the theoretical quantiles of a normal distribution in accordance with an embodiment of the present disclosure;
Figure 6 illustrates a graph of the trend, seasonal, and residual components of the time series data in accordance with an embodiment of the present disclosure;
Figure 7 illustrates a graph of optimization of p, d, q parameters in the ARIMA model to minimize Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values in accordance with an embodiment of the present disclosure;
Figure 8 illustrates a graph of the ARIMA model's performance in forecasting biomass levels in accordance with an embodiment of the present disclosure;
Figure 9 illustrates a graph of the forecasting of biomass yield in mg/L over time using a Long Short-Term Memory (LSTM) model in accordance with an embodiment of the present disclosure;
Figure 10 illustrates a graph of the time series forecasting of biomass in mg/L over time using the Extreme Gradient Boosting (XGBoost) mode in accordance with an embodiment of the present disclosure;
Figure 11 illustrates a graph of the time series forecasting of biomass using the prophet model over time in accordance with an embodiment of the present disclosure;
Figure 12 illustrates a graph of the comparison of true and predicted biomass yield for ARIMA, LSTM, Prophet, and XGBoost models in accordance with an embodiment of the present disclosure; and
Figure 13 illustrates a graph of the comparison between experimental biomass data and predictions made by ARIMA, LSTM, and XGBoost models, including a detailed inset of forecasted data in accordance with an embodiment of the present disclosure.
LIST OF REFERENCE NUMERALS
100 - System
102 - Data Collection Module
104 - Data Processing Module
106 - Training Module
108 - Time Series Modelling Module
110 - Optimization Module
112 - Validation Module
114 - Evaluation Module
116 - Repository
118 - Processor
120 - Normalization Module
122 - Biomass
124 - Sample Quantiles
126 - Theoretical Quantiles
128 - Trend
130 - Seasonal
132 - Residual
134 - Time (h)
136 - AIC Value(d=0)
136a - AIC Value (d =1)
138 - BIC Value (d=0)
138a - BIC Value (d=1)
140 - p,d,q Combinations
142 - Residuals
144 - Train
146 - Test
148 - Predicted Value
150 - True Value
152 - Auto-Regressive Integrated Moving Average (ARIMA) Model
154 - Long Short Term Model (LSTM)
156 - XGBoot Model
158 - Prophet
160 - Forecasted Data
162 - Train-Test Split Module
164 - Stationary Phase
166 - Differencing
168 - Coefficient Estimation
170 - Model Fitting
172 - Parameter
174 - Hyper-Parameter Optimization
176 - Grid Search
176a - Learning Rate
176b - Max Depth
176c - Subsample
178 - Model Validation
180 - Error Value
182 - Residual Analysis
184 - Best Model
DETAILED DESCRIPTION
Embodiments, of the present disclosure, will now be described with reference to the accompanying drawing.
Embodiments are provided so as to thoroughly and fully convey the scope of the present disclosure to the person skilled in the art. Numerous details, are set forth, relating to specific components, and methods, to provide a complete understanding of embodiments of the present disclosure. It will be apparent to the person skilled in the art that the details provided in the embodiments should not be construed to limit the scope of the present disclosure. In some embodiments, well-known processes, well-known apparatus structures, and well-known techniques are not described in detail.
The terminology used, in the present disclosure, is only for the purpose of explaining a particular embodiment and such terminology shall not be considered to limit the scope of the present disclosure. As used in the present disclosure, the forms "a," "an," and "the" may be intended to include the plural forms as well, unless the context clearly suggests otherwise. The terms "including," and "having," are open ended transitional phrases and therefore specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not forbid the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The particular order of steps disclosed in the method and process of the present disclosure is not to be construed as necessarily requiring their performance as described or illustrated. It is also to be understood that additional or alternative steps may be employed.
When an element is referred to as being "engaged to," "connected to," or "coupled to" another element, it may be directly engaged, connected, or coupled to the other element. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed elements.
Microalgae cultivation plays a pivotal role in sustainable wastewater treatment by facilitating nutrient removal and biomass production. Accurate forecasting of biomass yield is essential for optimizing the process, especially in large-scale industrial applications. Despite existing methods, the dynamic and complex nature of microalgae cultivation poses significant challenges in predicting biomass yield reliably. Traditional models have been used to estimate productivity, but they often fall short of capturing the time-dependent behaviour of biomass growth.
The current invention lacks a robust time series forecasting model specifically tailored for microalgae biomass yield. Existing statistical and machine learning models focus on input-output relationships but fail to account for the dynamic growth patterns of microalgae over time. This limitation reduces the reliability of forecasts, especially for real-time applications. Additionally, the inability to incorporate time-dependent factors in biomass prediction hinders effective resource management, including nutrients, workspace, and labour, in industrial-scale operations. Therefore, there is a pressing need for a dedicated time series model to improve the accuracy of biomass yield predictions and address these challenges in microalgae cultivation systems.
To address the issues of the existing system and methods, the present disclosure envisages a system (hereinafter referred to as "system 100") for forecasting microalgae cultivation in sustainable wastewater treatment (hereinafter referred to as "method 200"). The system 100 will now be described with reference to Figure 1 and the method 200 will be described concerning Figures 2a-2b to Figure 13.
Referring to Figure 1, the system 100 for time series forecasting for microalgae cultivation in sustainable wastewater treatment.
The data collection module 102 is configured to collect input data by measuring the growth rate of microalgal cultures, wherein said growth rate being determined based on Optical Density (OD) values obtained from an ultraviolet (UV) spectrophotometer at a predefined time interval with continuous timestamps until the OD values reach a stationary phase.
The data processing module 104 is configured to cooperate with said data collection device 102 to receive said input data of microalgal cultures and further configured to implement a set of pre-processing techniques on said input data of microalgal cultures to generate processed data.
In an aspect, the set of pre-processing techniques includes interpolation and outlier removal techniques that handle missing values in the input data.
In an aspect, the system 100 further includes a normalization module 120 configured to normalize the pre-processed data of microalgal cultures to facilitate consistent analysis across at least four distinct time series machine learning models.
The training module 106 is configured to cooperate with said data processing module 104 to receive the processed data and split the processed data into training data and testing data in a 90:10 ratio.
The time series modelling module 108 is configured to cooperate with said training module 106 to receive data for training at least four distinct time series machine learning models to predict growth patterns and biomass concentrations of microalgal cultures over time.
In an aspect, the at least four distinct time series machine learning models include the Autoregressive Integrated Moving Average (ARIMA) model, Prophet model, Long Short-Term Memory (LSTM) model, and Extreme Gradient Boosting (XGBoost) model, to predict growth patterns of microalgal cultures based on the processed data.
The optimization module 110 is configured to cooperate with said time series modelling module 108 to receive said predicted growth patterns and biomass concentrations of microalgal cultures from the at least four distinct time series machine learning models, and further configured to perform hyperparameter optimization of said predicted growth patterns and biomass concentrations of microalgal cultures to generate optimized predicted data for each of the at least four distinct time series machine learning models.
The validation module 112 is configured to cooperate with said optimization module 110 and said training module 106 to receive said testing data for validating the optimized predicted data for each of the at least four distinct time series machine learning models, so as to generate validated predicted data for each of the at least four distinct time series machine learning models.
In an aspect, the validation module 112 validates the forecast accuracy by comparing predicted biomass concentrations with experimental results from sustainable wastewater treatment systems.
The evaluation module 114 is configured to cooperate with said validation module 112 to receive said validated predicted data, and further configured to perform residual analysis of the validated predicted data of at least four distinct time series machine learning models for identifying one of the at least four distinct time series machine learning models for performing time series forecasting for microalgae cultivation in sustainable wastewater treatment.
In an aspect, the evaluation module 114 calculates R-squared (R2), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE), to perform residual analysis of the validated predicted data of the at least four distinct time series machine learning models.
In an aspect, the evaluation module 114 identifies Long Short-Term Memory (LSTM) model for showing higher R-squared (R2) values and minimal errors.
In an aspect, the system 100 further comprises a repository 116 and the processor 118.
In an aspect, repository 116 is configured to store a set of predefined commands, a set of preprocessing rules, the set of pre-processing techniques, the input data of microalgal cultures, the processed data, the testing data, the training data, the predicted growth patterns, the biomass concentrations, the optimized predicted data.
The repository 116 may be a memory that can store one or more computer-readable instructions or routines, for time series forecasting for microalgae cultivation in sustainable wastewater treatment. The memory may include any non-transitory storage device including, for example, volatile memory such as RAM, or non-volatile memory such as EPROM, flash memory, and the like.
In an alternative aspect, the repository 116 may be an external data storage device coupled to the system 100 directly or through one or more data servers.
In an aspect, the processor 118 is configured to fetch the predefined commands from repository 116 to execute and operate one or more modules of the system 100.
In an aspect, the processor 118 may be implemented as processing units to standardize and normalize the dataset. Among other capabilities, the processor 118 may fetch and execute computer-readable instructions stored in memory. The functions of the 118 may be provided through the use of dedicated hardware as well as hardware capable of executing machine-readable instructions. The processor 118 may be configured to execute functions of various modules of the system 100 such as the data collection module 102, the data processing module 104, the training module 106, the time series modeling module 108, the optimization module 110, the validation module 112, and the evaluation module 114.
In an aspect, the system 100 may also include a communication interface. The communication interface may include a variety of interfaces, for example, Also, the system 100 or the processor 118 may include, or be coupled with, one or more transceivers to communicate with various devices coupled to the system 100 or the processor 118.
Figures 2a-2b illustrate a flow chart depicting the steps involved in a method for time series forecasting for microalgae cultivation in sustainable wastewater treatment analysis in accordance with an embodiment of the present disclosure. The order in which method 200 is described is not intended to be construed as a limitation, and any number of the described method steps may be combined in any order to implement method 200, or an alternative method. Furthermore, method 200 may be implemented by processing resource or computing device(s) through any suitable hardware, non-transitory machine-readable medium/instructions, or a combination thereof. The method 200 comprises the following steps:
At step 202, the method 200 includes collecting, by a data collection module 102, an input data by measuring a growth rate of microalgal cultures, wherein said growth rate being determined based on Optical Density (OD) values obtained from an ultraviolet (UV) spectrophotometer at a predefined time interval with continuous timestamps until the OD values reach a stationary phase.
At step 204, the method 200 includes implementing, by a data processing module 104, a set of pre-processing techniques on said input data of microalgal cultures to generate processed data.
At step 206, the method 200 includes receiving, by a training module 106, processed data and split the processed data into training data and testing data in a 90:10 ratio.
At step 208, the method 200 includes receiving, by a time series modeling module 108, data for training at least four distinct time series machine learning models to predict growth patterns and biomass concentrations of microalgal cultures over time.
At step 210, the method 200 includes generating, by an optimization module 110, optimized predicted data for each of the at least four distinct time series machine learning models.
At step 212, the method 200 includes validating, by a validation module 112, the optimized predicted data for each of the at least four distinct time series machine learning models, so as to generate validated predicted data for each of the at least four distinct time series machine learning models.
At step 214, the method 200 includes performing, by an evaluation module 114, residual analysis of the validated predicted data of at least four distinct time series machine learning models for identifying one of the at least four distinct time series machine learning models for performing time series forecasting for microalgae cultivation in sustainable wastewater treatment.
Figure 3 illustrates a flow chart of the various time series models in accordance with an embodiment of the present disclosure. The system 100 consisting a data collection module 102, a data processing module 104, a normalization module 120, a Train-Test Split Module 162, an Auto-Regressive Integrated Moving Average (ARIMA) Model 152, a Long Short Term Model (LSTM) model 154, a XGBoost Model 156, a training model 144, a testing model 146, a stationary phase 164, a differencing 166, p-d-q Combinations 140, a coefficient estimation 168, a model fitting 170, a parameter 172, a hyper-parameter optimization 174, a grid search 176, a learning rate 176a, a max depth 176b, a subsample 176c, a validation 178, a error value 180, a residual analysis 182, and a best model 184.
The data collection module 102 involves measuring the growth rate of microalgal cultures through Optical Density (OD) readings taken every 3 hours using a UV spectrophotometer at 750 nm until reaching the stationary phase 164. The onset of the death phase, marked by a decline in live cells, concludes the experiments, forming the dataset used to configure model training 144. The collected data undergoes data preprocessing module 104 to ensure consistency and quality. Missing values occurring between 2-hour intervals are handled using methods such as fillna(), replace(), and interpolate(). Outliers are directly removed to reduce data uncertainty, accommodating the random nature of input variables. The preprocessed data is normalized to standardize input features, ensuring they are on a comparable scale. This normalization module 120 aids in configuring effective training 144 for the machine learning models. The dataset is then split into training 144 and testing model 146 sets in a 90:10 ratio. This high train test data split 162 ratio is configured to maximize learning potential from the available data while ensuring an unbiased prediction of the final model fit.
Model training 144 involves configuring three distinct time series models-ARIMA model 152, LSTM model 154, and XGBoost model (156) using normalized training data. Each model adheres to specific training 144 procedures tailored to its characteristics: The ARIMA model 152 is tailored for stationary data and utilizes techniques such as differencing 166, decomposition, and detrending to achieve stationarity. Model coefficients 168 (p, d, q) are determined through statistical methods to ensure the data maintains a constant mean, variance, and covariance.
The LSTM model 154, a type of recurrent neural network, addresses the challenge of vanishing gradients and is configured with four neurons in the input layer and one neuron in the output layer for prediction. Training employs Mean Absolute Error (MAE) as the loss function and the Adam optimizer. The model undergoes 50 epochs of training 144 with a batch size of 1 to effectively retain information over long sequences. The XGBoost model 154, an ensemble model designed for regression and classification tasks, prioritizes computational efficiency and model flexibility. The hyperparameter optimization 174 using a grid search strategy focuses on parameters such as maximum tree depth, subsample, and learning rate. Training module 144 is conducted with predefined parameters, enhancing model performance guided by Mean Absolute Percentage Error (MAPE). Each model is configured to cooperate with the dataset's characteristics and objectives, aiming to accurately capture and predict the underlying patterns in microalgal growth rates based on the gathered and processed data.
The model training 144, each model is configured to effectively cooperate with the testing model 146 for evaluation of performance and generalizability. This phase includes rigorous model fitting 170 and validation 178 processes to accurately capture underlying data patterns. The hyperparameter optimization 174 is crucial and is conducted using grid search 176 to fine-tune parameters such as learning rate 176a, maximum depth 176b, and subsample 176c. This step is configured to enhance the models' predictive capability.
Model validation 178 entails assessing error values 180 and conducting residual analysis 182 to ensure the models' accuracy. Key metrics such as R² values, Root Mean Square Error (RMSE), and Mean Absolute Error (MAE) are utilized to evaluate performance objectively. Based on thorough residual analysis and performance metrics, the best-performing model (184) is identified.
Figure 4 illustrates a graph of the growth of microbial biomass over time in accordance with an embodiment of the present disclosure. The Mean values of microbial biomass at each time point are presented alongside corresponding standard deviations to account for measurement variability. The system 100 includes primary data trend that reveals the growth pattern of microbial biomass over an observed period. The system 100 is developed using Python version 3.10.8, executed on a computing system comprising an Intel® Core™ i7-10759H CPU operating at 2.60 GHz with 16 GB of RAM, running on a 64-bit Windows platform utilizing an x64-based processor.
The data processing module 104 involved initial timestamps recorded at 3-hour intervals, which underwent imputation and subsequent conversion to an hourly basis. This processed dataset served as input for all developed time series models. Biomass concentration, quantified in mg/L, was derived from optical density (OD) values converted into dry cell weight, crucial for accurately quantifying biomass concentration and predictive modeling.
Figure 5 illustrates a Quantile-Quantile (Q-Q) plot that compares the sample quantiles from the time series data with the theoretical quantiles of a normal distribution in accordance with an embodiment of the present disclosure. A Q-Q plot is an essential tool for evaluating how well a dataset follows a particular distribution. In this case, the objective is to determine whether the sample data conforms to a normal distribution, which is a common assumption in many kinetic models. The points on the plot represent the relationship between the sample quantiles 124 (observed data) and the theoretical quantiles 126 (expected data based on a normal distribution). When the sample data follows the theoretical distribution, the points align closely along a straight reference line, as observed in most of the dataset in the plot. The closer the points are to the line, the better the data fits the theoretical model.
In analysis, a significant portion of the dataset lies along the reference line, which suggests that the sample data adheres to the normal distribution for the most part. The dataset largely falls within the 95% confidence interval, confirming that the majority of the data points conform to the normal distribution assumption. However, approximately 4.1% of the data points deviate from the straight line towards the ends of the plot. These deviations, or outliers, indicate some degree of nonlinearity and variability in the data, which can be attributed to the fact that the data was collected from real-world sources, making it subject to variability and external factors.
The deviation is reasonable given the nature of the data collection and the non-ideal conditions often encountered in real-world scenarios. The nonlinearity observed does not significantly detract from the overall distributional assumption but should be considered when interpreting the results and forecasting future trends based on this dataset.
Figure 6 illustrates a graph of the trend, seasonal, and residual components of the time series data in accordance with an embodiment of the present disclosure.The graph of the biomass levels 122 over time, shows overall growth trends. The second graph depicts the trend component 128, isolating the long-term progression of biomass, indicating a general upward trajectory. The third graph highlights the seasonal component 130, capturing the cyclical patterns inherent in the data, characterized by periodic fluctuations. Furthermore, the fourth graph represents the residuals 142, showing the deviations or noise not explained by the trend or seasonal components, demonstrating relatively minor fluctuations around a stable mean. These decomposition components collectively aid in understanding and analyzing the dynamics of the time series data for better forecasting and model accuracy.
Figure 7 illustrates a graph of optimization of p, d, q parameters in the ARIMA model to minimize Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values in accordance with an embodiment of the present disclosure. The graph depicts the AIC values 136 for various parameter combinations of the ARIMA model 140, with d=0 and d=1 136a, respectively, while the graph illustrates the corresponding BIC values 138 for the same parameter sets. The x-axis represents combinations of p, d, and q values 140, and the y-axis displays the respective AIC 136 and BIC 138 scores. The lower values of AIC and BIC indicate a superior model fit. Based on this optimization, ARIMA (1, 1, 3) was identified as the optimal model due to its lowest AIC and BIC values and highest adjusted R² value. The residual diagnostics confirm that the residuals fall within the 95% confidence interval, thereby validating the model's appropriateness for forecasting applications.
Figure 8 illustrates a graph of the ARIMA model's performance in forecasting biomass levels in accordance with an embodiment of the present disclosure. In graph (a) of the model's training data and future predictions 144 are displayed over time 134, with a 95% confidence interval (CI) and a modified confidence interval shown to account for variability. The biomass 122 forecast demonstrates a clear predictive trajectory, along with a widening confidence interval as time progresses. The graph (b) presents the frequency distribution of residuals 132 for the training and test data 144, 146, showing residual stability within acceptable bounds. The graph ( c) displays a histogram comparing the train and test residuals, illustrating that the residuals remain centered near zero with minimal deviation, thereby confirming the accuracy and reliability of the ARIMA model's predictions.
Figure 9 illustrates a graph of the forecasting of biomass yield in mg/L over time using a Long Short-Term Memory (LSTM) model in accordance with an embodiment of the present disclosure. The graph (a) shows the comparison between actual biomass values and predicted values based on the LSTM model, highlighting the model's accuracy in predicting future biomass growth. The residuals plot in graph (b) illustrates the distribution and variance of prediction errors for both training data 144 and test data 146, revealing that while the residuals for the training set exhibit randomness, the test residuals follow a discernible pattern, which is not ideal for residual analysis. Despite this, the spread of residuals remains relatively constant, indicating a normal distribution with no significant outliers. The graph (c) provides a histogram of the residuals 132, showing a symmetric distribution around zero, where most residuals are clustered near the center, confirming the absence of bias in the model's predictions. The mean residuals for the training and testing data were calculated as 0.03 and 0.2, respectively, further supporting the reliability of the LSTM model in accurately capturing the patterns of biomass growth. The architecture of the LSTM model, including its selection of input gates, output gates, and forget gates, successfully manages long-term dependencies and patterns inherent in the biomass time series data. The use of these components contributes to the model's predictive performance, as reflected in the analysis of residuals and their normal distribution.
Figure 10 illustrates a graph of the time series forecasting of biomass in mg/L over time using the Extreme Gradient Boosting (XGBoost) mode in accordance with an embodiment of the present disclosure. In graph (a), the true biomass values are compared with the predicted values generated by the model, indicating the prediction accuracy over the forecast period. Graph (b) presents the residuals 132 for both the training data 144 and test data 146, showing the variance between actual and predicted values. While the residuals of the training data remain stable, the test data exhibits a greater deviation, highlighting the model's reduced accuracy over the testing phase. Graph (c) displays a histogram of residuals, which shows a nearly symmetric distribution, with most residuals clustered near zero. However, there is a noticeable increase in residual variance as the forecasting range expands.
The XGBoost model's performance in this study was enhanced through specific model configurations and the incorporation of lag features. Lag features were engineered by shifting the time series data by three steps to capture underlying trends and seasonality. Additionally, the rolling mean was used as an essential feature to smooth out short-term fluctuations and better represent the overall data trend. To optimize the model, three key hyperparameters-learning rate, max depth, and subsample-were tuned using a grid search method. The learning rate controlled the step size during each iteration, with typical values ranging from 0.01 to 0.3. Max depth, determining the complexity of each decision tree, was adjusted between 3 to 10, while the subsample, determining the fraction of observations used for training each tree, was set between 0.5 and 1.0. Despite these optimizations, the R² value of the XGBoost model was found to be 0.3 during testing, significantly lower than the training R² value, indicating a notable deviation in prediction accuracy, especially as the forecast range increased.
Figure 11 illustrates a graph of the time series forecasting of biomass using the prophet model over time in accordance with an embodiment of the present disclosure. The graph (a) compares the true biomass values with the forecasted values generated by the Prophet model, showing an increasing trend in predictions and the associated confidence interval. The graph (b) displays the residuals 132 for both the training data 144 and test data 146, highlighting the variance and distribution of prediction errors. The test data residuals show noticeable fluctuation, indicating some inaccuracy in the forecast. The graph (c) provides a histogram of residuals, showing a symmetric distribution around zero, with most residuals clustered near the center, indicating minimal bias but highlighting some variability in the predictions.
The prophet model 158 used in this study was configured with customizable parameters to optimize forecasts, including a changepoint prior scale set to 0.05, which controls the model's flexibility in handling trend changes. The model was trained using a 90:10 train-test split, but the limited amount of data led to lower prediction accuracy. The evaluation of the model's performance was conducted using the Mean Absolute Percentage Error (MAPE) metric, generated through the cross-validation metric function from the Prophet library. Although the model was flexible in capturing trends, the overall accuracy remained suboptimal due to the limited dataset, as shown in the residual analysis.
Figure 12 illustrates a graph of the comparison of true and predicted biomass yield for ARIMA, LSTM, Prophet, and XGBoost models in accordance with an embodiment of the present disclosure. The graph compares the predicted values 148 with the true values 150 of biomass yield (mg/L) for four different forecasting models: (a) Auto-Regressive Integrated Moving Average (ARIMA) model 152, (b) Long Short-Term Memory (LSTM) model 154, (c) Prophet model 158, and (d) XGBoost model 156. Each plot evaluates the performance of the models based on the R² value, Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). Models that align more closely with the perfect prediction line demonstrate superior prediction accuracy.
The ARIMA model 152 shows a minimal R² value of 0.000256, indicating low predictive performance, with an MAE of 0.0239 and RMSE of 0.0430. The LSTM model 154 demonstrates an R² value of 0.75, showing a relatively strong fit, with an MAE of 0.0023 and RMSE of 0.008. The Prophet model 158 achieves an R² value of 3.50, with an MAE of 0.0090 and RMSE of 0.091, indicating reasonable predictive capacity. The XGBoost model 156 shows a moderate R² value of 0.3332, with an MAE of 0.0400 and RMSE of 0.009, suggesting a stable yet less accurate performance.
Figure 13 illustrates a graph of the comparison between experimental biomass data and predictions made by ARIMA, LSTM, and XGBoost models, including a detailed inset of forecasted data in accordance with an embodiment of the present disclosure. The experimental data 150 with predicted biomass yield 122 across different models-Auto-Regressive Integrated Moving Average (ARIMA) 152, Long Short-Term Memory (LSTM) 154, and XGBoost 156, time 134. The plot illustrates how each model predicts biomass growth over the experimental period, with inset 160 providing a focused view of the forecasted data in the latter portion of the time series. The figure demonstrates the performance of the models in capturing the overall biomass trend, highlighting variations in accuracy across models when compared to the actual experimental data. The forecasted data inset 160 shows the models diverge in their predictive capabilities as the time series extends, emphasizing the effectiveness of each model for long-term forecasting.
The foregoing description of the embodiments has been provided for purposes of illustration and is not intended to limit the scope of the present disclosure. Individual components of a particular embodiment are generally not limited to that particular embodiment, but are interchangeable. Such variations are not to be regarded as a departure from the present disclosure, and all such modifications are considered to be within the scope of the present disclosure.
TECHNICAL ADVANCEMENTS
The present disclosure described herein above has several technical advantages including, but not limited to, the realization of a system and method for forecasting microalgae cultivation in sustainable wastewater treatment, that:
• provides accurate forecasting of microalgae growth through temporal pattern recognition and LSTM models, enhancing wastewater treatment efficiency;
• utilizes real-time monitoring to optimize microalgae biomass levels, improving resource allocation in wastewater treatment systems;
• implements rigorous data preprocessing techniques to handle outliers and missing values, ensuring robust model performance;
• enhances environmental sustainability by integrating advanced forecasting models into algae cultivation systems for efficient resource management; and
• optimizes model selection based on R2, RMSE, and MSE criteria to improve the accuracy of microalgae production forecasts.
In an operative configuration, the system 100 functions by first utilizing the data collection module 102 to gather input data by measuring the growth rate of microalgal cultures through Optical Density (OD) values obtained from a UV spectrophotometer. The data is collected at predefined intervals until the OD values reach a stationary phase. The data processing module 104 then processes the input data by applying pre-processing techniques to generate refined data. The training module 106 splits the processed data into training and testing sets, while the time series modeling module 108 trains at least four distinct time series machine learning models on the data. The optimization module 110 performs hyperparameter optimization on the predicted growth patterns and biomass concentrations. The validation module 112 validates the optimized data using the testing dataset, and finally, the evaluation module 114 performs residual analysis to identify the most suitable model for accurate time series forecasting of microalgae cultivation in sustainable wastewater treatment.
The embodiments herein and the various features and advantageous details thereof are explained with reference to the non-limiting embodiments in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.
The foregoing description of the specific embodiments so fully reveals the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the embodiments as described herein.
The use of the expression "at least" or "at least one" suggests the use of one or more elements or ingredients or quantities, as the use may be in the embodiment of the disclosure to achieve one or more of the desired objects or results.
While considerable emphasis has been placed herein on the components and component parts of the preferred embodiments, it will be appreciated that many embodiments can be made and that many changes can be made in the preferred embodiments without departing from the principles of the disclosure. These and other changes in the preferred embodiment as well as other embodiments of the disclosure will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the disclosure and not as a limitation.
, Claims:WE CLAIM:
1. A system for time series forecasting for microalgae cultivation in sustainable wastewater treatment, comprising:
• a data collection module (102) configured to collect an input data by measuring a growth rate of microalgal cultures, wherein said growth rate being determined based on Optical Density (OD) values obtained from an ultraviolet (UV) spectrophotometer at a predefined time interval with continuous timestamps until the OD values reach a stationary phase;
• a data processing module (104) configured to cooperate with said data collection device (102) to receive said input data of microalgal cultures and further configured to implement a set of pre-processing techniques on said input data of microalgal cultures to generate processed data;
• a training module (106) configured to cooperate with said data processing module (104) to receive the processed data and split the processed data into training data and testing data in a 90:10 ratio;
• a time series modeling module (108) configured to cooperate with said training module (106) to receive data for training at least four distinct time series machine learning models to predict growth patterns and biomass concentrations of microalgal cultures over time.
• an optimization module (110) configured to cooperate with said time series modeling module (108) to receive said predicted growth patterns and biomass concentrations of microalgal cultures from the at least four distinct time series machine learning models, and further configured to perform hyperparameter optimization of said predicted growth patterns and biomass concentrations of microalgal cultures to generate optimized predicted data for each of the at least four distinct time series machine learning models;
• a validation module (112) configured to cooperate with said optimization module (110) and said training module (106) to receive said testing data for validating the optimized predicted data for each of the at least four distinct time series machine learning models, so as to generate validated predicted data for each of the at least four distinct time series machine learning models; and
• an evaluation module (114) configured to cooperate with said validation module (112) to receive said validated predicted data, and further configured to perform residual analysis of the validated predicted data of the at least four distinct time series machine learning models for identifying one of the at least four distinct time series machine learning models for performing time series forecasting for microalgae cultivation in sustainable wastewater treatment.
2. The system (100) as claimed in claim 1, wherein said system (100) further comprises:
• a repository (116) configured to store a set of predefined commands, a set of preprocessing rules, the set of pre-processing techniques, the input data of microalgal cultures, the processed data, the testing data, the training data, the predicted growth patterns, the biomass concentrations, the optimized predicted data; and
• a processor (118) configured to fetch said set of predefined commands to execute and operate one or more modules of said system (100).
3. The system (100) as claimed in claim 1, wherein said set of pre-processing techniques includes interpolation and outlier removal techniques that handles missing values in the input data.
4. The system (100) as claimed in claim 1, wherein said system (100) further includes a normalization module (120) configured to normalize the pre-processed data of microalgal cultures to facilitate consistent analysis across at least four distinct time series machine learning models.
5. The system (100) as claimed in claim 1, wherein said at least four distinct time series machine learning models include the Autoregressive Integrated Moving Average (ARIMA) model, Prophet model, Long Short-Term Memory (LSTM) model, and eXtreme Gradient Boosting (XGBoost) model, to predict growth patterns of microalgal cultures based on the processed data.
6. The system (100) as claimed in claim 1, wherein said evaluation module (114) calculates R-squared (R2), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE), to perform residual analysis of the validated predicted data of the at least four distinct time series machine learning models.
7. The system (100) as claimed in claims 5 and 6, wherein said evaluation module (114) identifies Long Short-Term Memory (LSTM) model for showing higher R-squared (R2) values and minimal errors.
8. The system (100) as claimed in claim 1, wherein said validation module (112) validates the forecast accuracy by comparing predicted biomass concentrations with experimental results from sustainable wastewater treatment systems.
9. A method (200) for time series forecasting for microalgae cultivation in sustainable wastewater treatment, wherein said method (200) comprises the following steps:
• collecting, by a data collection module (102), an input data by measuring a growth rate of microalgal cultures, wherein said growth rate being determined based on Optical Density (OD) values obtained from an ultraviolet (UV) spectrophotometer at a predefined time interval with continuous timestamps until the OD values reach a stationary phase;
• implementing, by a data processing module (104), a set of pre-processing techniques on said input data of microalgal cultures to generate processed data;
• receiving, by a training module (106), processed data and split the processed data into a training data and a testing data in a 90:10 ratio;
• receiving, by a time series modeling module (108) , data for training at least four distinct time series machine learning models to predict growth patterns and biomass concentrations of microalgal cultures over time;
• generating, by an optimization module (110), optimized predicted data for each of the at least four distinct time series machine learning models;
• validating, by a validation module (112), the optimized predicted data for each of the at least four distinct time series machine learning models, so as to generate validated predicted data for each of the at least four distinct time series machine learning models; and
• performing, by an evaluation module (114), residual analysis of the validated predicted data of at least four distinct time series machine learning models for identifying one of the at least four distinct time series machine learning models for performing time series forecasting for microalgae cultivation in sustainable wastewater treatment.

Dated this 18th Day of November, 2024

_______________________________
MOHAN RAJKUMAR DEWAN, IN/PA - 25
OF R. K. DEWAN & CO.
AUTHORIZED AGENT OF APPLICANT

TO,
THE CONTROLLER OF PATENTS
THE PATENT OFFICE, AT CHENNAI

Documents

NameDate
202441089265-FORM-26 [19-11-2024(online)].pdf19/11/2024
202441089265-COMPLETE SPECIFICATION [18-11-2024(online)].pdf18/11/2024
202441089265-DECLARATION OF INVENTORSHIP (FORM 5) [18-11-2024(online)].pdf18/11/2024
202441089265-DRAWINGS [18-11-2024(online)].pdf18/11/2024
202441089265-EDUCATIONAL INSTITUTION(S) [18-11-2024(online)].pdf18/11/2024
202441089265-EVIDENCE FOR REGISTRATION UNDER SSI [18-11-2024(online)].pdf18/11/2024
202441089265-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [18-11-2024(online)].pdf18/11/2024
202441089265-FORM 1 [18-11-2024(online)].pdf18/11/2024
202441089265-FORM 18 [18-11-2024(online)].pdf18/11/2024
202441089265-FORM FOR SMALL ENTITY(FORM-28) [18-11-2024(online)].pdf18/11/2024
202441089265-FORM-9 [18-11-2024(online)].pdf18/11/2024
202441089265-PROOF OF RIGHT [18-11-2024(online)].pdf18/11/2024
202441089265-REQUEST FOR EARLY PUBLICATION(FORM-9) [18-11-2024(online)].pdf18/11/2024
202441089265-REQUEST FOR EXAMINATION (FORM-18) [18-11-2024(online)].pdf18/11/2024

footer-service

By continuing past this page, you agree to our Terms of Service,Cookie PolicyPrivacy Policy  and  Refund Policy  © - Uber9 Business Process Services Private Limited. All rights reserved.

Uber9 Business Process Services Private Limited, CIN - U74900TN2014PTC098414, GSTIN - 33AABCU7650C1ZM, Registered Office Address - F-97, Newry Shreya Apartments Anna Nagar East, Chennai, Tamil Nadu 600102, India.

Please note that we are a facilitating platform enabling access to reliable professionals. We are not a law firm and do not provide legal services ourselves. The information on this website is for the purpose of knowledge only and should not be relied upon as legal advice or opinion.