Consult an Expert

Trademark

Patent

Infringement

Design Registration

Consult an Expert

Talk to a IP/Trademark Lawyer

Trademark

Trademark Registration

Trademark Search

Respond to TM Objection

International Trademark

Trademark Class Finder

Patent

Indian Patent Search

Provisional Patent Application

Patent Registration

Infringement

Patent Infringement

Trademark Infringement

Design Registration

Patent search/

ENHANCED REAL-TIME ACTION RECOGNITION ON EDGE DEVICES USING HYBRID DEEP LEARNING ARCHITECTURES

Patent Search in India

Extensive patent search conducted by a registered patent agent
Patent search done by experts in under 48hrs

₹999

₹399

Talk to expert

ENHANCED REAL-TIME ACTION RECOGNITION ON EDGE DEVICES USING HYBRID DEEP LEARNING ARCHITECTURES

ORDINARY APPLICATION

Published

Filed on 4 November 2024

Abstract

Abstract ' The rapid growth of video data across various domains necessitates the development of advanced techniques for efficient human action recognition (HAR). This research presents an innovative hybrid framework as a novel approach that integrates state-of-the-art archit:ectures, including Recurrent Neural Networks (RNN), Vision Transformers (ViT), $patio-Temporal Networks (STN), Generative Adversarial Networks (GAN), FlowNet, Video Super-Resolution Networks (VSRnet), and Graph Neural Networks (GNN). The framework optimizes HAR by leveraging the unique strengths of each architecture. By effectively capturing temporal correlations across video frames and intricate spatial patterns within them, the proposed design enhances recognition performance. ViT excels in spatial feature extraction, while RNNs and-STNs-address the temporal dynamics. GANs and-VSRnet improve video resolution and frame quality, and FlowNet facilitates optical flow estimation, while GNNs enhance relational modeling between entities. Through extensive evaluations, the hybrid model demonstrates significantly higher accuracy in HAR compared to existing solutions. This work contributes valuable insights into video analysis, ~ showcasing the practical benefits of hybrid architectures in overcoming the limitations of ~ traditional models and paving the way for advancements in real-time HAR applications.

Patent Information

Application ID	202441084027
Invention Field	COMPUTER SCIENCE
Date of Application	04/11/2024
Publication Number	46/2024

Inventors

Name	Address	Country	Nationality
Dr. J. Arunnehru	DR. J. ARUNNEHRU C BLOCK BNO 1 100 FEET RD VADAPALANI CHENNAI TAMILNADU 600026 9629427700 arunnehruj@gmail.com	India	India
Kamlesh J.V.K.	C BLOCK BNO 1 100 FEET RD VADAPALANI CHENNAI TAMILNADU 600026	India	India
Ayush	C BLOCK BNO 1 100 FEET RD VADAPALANI CHENNAI TAMILNADU 600026	India	India
Suhail Ahmed I	C BLOCK BNO 1 100 FEET RD VADAPALANI CHENNAI TAMILNADU 600026	India	India

Applicants

Name	Address	Country	Nationality
Dr. J. Arunnehru	DR. J. ARUNNEHRU C BLOCK BNO 1 100 FEET RD VADAPALANI CHENNAI TAMILNADU 600026 9629427700 arunnehruj@gmail.com	India	India
Kamlesh J.V.K.	C BLOCK BNO 1 100 FEET RD VADAPALANI CHENNAI TAMILNADU 600026	India	India
Ayush	C BLOCK BNO 1 100 FEET RD VADAPALANI CHENNAI TAMILNADU 600026	India	India
Suhail Ahmed I	C BLOCK BNO 1 100 FEET RD VADAPALANI CHENNAI TAMILNADU 600026	India	India

Specification

1. Background Of The Present Innovation
1.1. Field of Innovation
I) The development relates to advancements in video processing and computer vision.
2) Specifically focuses on human action recognition (HAR).
3) Involves the development of advanced hybrid frameworks integrating various state-of-the-art
machine learning architectures.
1.2. Description of Related Arts
I) Human action recognition (HAR) has gained significant attention due to the increasing
availability of video data in diverse applications, such as:
i) Surveillance
ii) Sports analytics
iii) Human-Computer interaction
2) Traditional HAR approaches often rely on:
i) Handcrafted features
ii) Shallow learning models
3) Convolutional Neural Networks (CNNs) have revolutionized spatial feature extraction in
images and videos, effectively capturing local patterns but lacking temporal context when
applied alone to HAR tasks.
4) Recurrent Neural Networks (RNNs) are tailored for modeling sequential data and have been
applied in Human Action Recognition (HAR) to capture temporal patterns. However, they
often face challenges when it comes to capturing long-range dependencies effectively.
5) Long Short-Term Memory (LSTM) networks, a specialized form of RNN, overcome the
vanishing gradient issue, making them well-suited for handling long sequences in video data.
6) Two-stream architectures process spatial and temporal information separately, often
combining RGB frames with optical flow, yet they can be computationally expensive.
7) 3D CNNs extend traditional CNNs by incorporating temporal dimensions, enabling direct
spatiotemporal feature extraction, but they require large amounts of labeled data for training.
8) Generative Adversarial Networks (GANs) have been utilized to generate high-quality video
frames and enhance resolution, yet their integration into HAR tasks remains limited.
9) Video Super-Resolution Networks (VSRnets) improve video quality by upscaling lowresolution
frames, contributing to better feature representation for HAR but lacking direct
application in recognition tasks.
I 0) Flow Net estimates optical flow between frames to capture motion information, providing
insights into dynamic changes but often requiring additional integration for HAR.
II) Graph Neural Networks (GNNs) model relational interactions among entities in video,
enhancing the understanding of complex behaviors but not widely adopted in HAR
frameworks.
12) Attention mechanisms enhance models by focusing on the most relevant areas of video
frames, improving feature extraction. However, their application in HAR is still in the early
stages of development.
13) Temporal Convolutional Networks (TCNs) offer an alternative to RNNs for sequence
modeling, with benefits like better parallelization. However, their potential has not yet been
fully explored in the context of HAR.
14)Combining multiple models or techniques can improve HAR accuracy, but the complexity
and computational cost increase significantly.
15) Transfer learning enables the use of pre-trained models on new HAR datasets, enhancing
performance but often relying on the availability of substantial labeled data.
16) Self-supervised techniques have emerged as a way to leverage unlabeled data for training,
though their direct application in HAR is still under exploration.
17) Integrating multiple data modalities (e.g., audio and video) can Improve recognition
accuracy, but such methods require complex data synchronization.
18) Efficient algorithms are essential for real-time HAR applications, yet many existing models
trade off accuracy for speed.
19) Various augmentation methods are used to increase the diversity of training data, helping
models generalize better, but can introduce noise if not applied judiciously.
20) Standard evaluation metrics for HAR include accuracy, precision, recall, and Fl-score;
however, the choice of metrics can significantly influence perceived model performance.
2. Brief Description Of The Analysis
Figure I provides an overview of the proposed hybrid framework for video processing, which enhances
human action recognition (HAR) by integrating multiple advanced architectures. The process begins with
raw video input, which undergoes preprocessing before spatial feature extraction using Vision
Transformers (ViT) and temporal feature extraction through Recurrent Neural Networks (RNNs) and
Spatio-Temporal Networks (STNs). FlowNet is employed for precise motion estimation, while
Generative Adversarial Networks (GANs) and Video Super-Resolution Networks (VSRnet) improve
video quality by generating high-resolution frames. Additionally, Graph Neural Networks (GNNs) model
relational interactions, aiding in the understanding of complex behaviors. The framework culminates in
an action recognition module that analyzes these features to produce action predictions, optimizing
performance in video analysis tasks.
Figure 2 represents a pipeline for recognizing human activities from video input using machine
learning and computer vision techniques. The process begins with a video feed that is analyzed by a
"People Detector" module, which utilizes a SqueezeNet backbone and a Single Shot Detector (SSD) for detecting individuals within the frames. The detection is powered by TensorFiow and can be run on
either VPU or CPU hardware. Once people are detected, the bounding boxes around them are
processed using Bayesian filters, which run on the CPU to improve detection accuracy by smoothing
predictions. Next, the processed data is sent to a Hybrid Architecture for Human Activity Recognition
(HAR), implemented with TensorFiow, which operates on either CPU or GPU. The key features
extracted from the frames are then passed to a Dense Neural Network (DNN), which is responsible for
classifying the detected activities. Finally, the output of the system is the classification of human
activities, such as writing, running, archery, drawing, and planting, with each activity represented in
different colored blocks. This pipeline demonstrates how video data can be processed through deep
learning models to identify and classify human actions in real-time.
Figure 3 ·illustrates the prediction rates for Human Activity Recognition (HAR) across five activities:
Writing, Running, Archery, Drawing, and Planting. Each activity's prediction rate is compared against
a target threshold of90%, represented by a red dashed line. The results show that all activities exceed
the iarget. Writing performs well, and Running achieves the highest rate. Archery, while still above the
target, has the lowest prediction rate. Drawing closely follows Running, and Planting also surpasses the
target. Overall, the model demonstrates strong accuracy, successfully classifying all activities with
rates above the desired threshold.
Figure 4 visually represents the performance metrics of the proposed hybrid framework for human
action recognition (HAR) throughout the training process. It features two key components: accuracy
and loss over epochs. The accuracy curve reflects the model's ability to predict actions correctly,
ideally showing a steady upward trend as training progresses. In contrast, the loss curve, typically
measured by a loss function like cross-entropy, indicates the model's error; a decreasing trend signifies
improved predictions. If the loss stagnates or increases, it may suggest overfitting or the need for
hyperparameter adjustments. The x-axis denotes gain in training rate epochs, while they-axis displays
accuracy and loss values, facilitating easy comparison of trends. Overall, the training graph acts as a
diagnostic tool, helping researchers evaluate the effectiveness of the hybrid framework in improving
HAR performance and guiding further model optimizations.

Claims
I. A hybrid framework that integrates multiple state-of-the-art architectures (RNNs, ViT, STN,
GANs, FlowNet, VSRnet, and·GNNs) to enhance human action recognition (HAR) capabilities
in video processing.
2. The proposed framework achieves significantly higher accuracy in HAR compared to
conventional models, effectively capturing complex spatial and temporal dynamics within video
data.
3. Utilization of Vision Transformers (ViT) for robust spatial feature extraction, allowing the
framework to identify intricate patterns in individual video frames.
4. Integration of RNNs and Spatio-Temporal Networks (STN) to effectively model temporal
dependencies in video sequences, improving the recognition of sequential actions.
5. Incorporation of Generative Adversarial Networks (GANs) and· Video Super-Resolution
Networks (VSRnet) for generating high-quality video frames, thus enhancing the overall quality
of the input data for HAR.
6. Implementation of Flow Net for optical flow estimation, providing accurate motion detection
and contributing to improved action recognition performance.
7. Application of Graph Neural Networks (GNNs) to model relational interactions between
entities in video data, aiding in the recognition of complex object behaviors.
8. The framework employs advanced data preprocessing techniques, including noise reduction
and frame normalization, to optimize video input quality.
9. The hybrid architecture is scalable, allowing it to adapt to varymg video lengths and
resolutions while maintaining performance.
1 0. The framework is designed for real-time human action recognition, making it suitable for
applications in surveillance, sports analytics, and human-computer interaction.

Documents

Name	Date
202441084027-Form 1-041124.pdf	07/11/2024
202441084027-Form 2(Title Page)-041124.pdf	07/11/2024
202441084027-Form 3-041124.pdf	07/11/2024
202441084027-Form 5-041124.pdf	07/11/2024

Talk To Experts

Online Lawyer Consultation

Online CA Consultation

Company Secretary Services

Calculators

Business Setup Calculator

PPF Calculator

Income Tax Calculator

Simple Compound Interest Calculator

Salary Calculator

Retirement Planning Calculator

RD Calculator

Mutual Fund Calculator

FD Calculator

Home Loan EMI Calculator

EMI Calculator

Lumpsum Calculator

Downloads

Rental Agreement Format

GST Invoice Format

Income Certificate Format

Power of Attorney Format

Affidavit Format

Salary Slip Sample

Appointment Letter Format

Relieving Letter Format

Legal Heir Certificate Format

Generate Free Rent Receipt

Commercial Rental Agreement

Consent Letter for GST Registration Format

No Objection Certificate (NOC) Format

Partnership Deed Format

Experience Letter Format

Resignation Letter Format

Offer Letter Format

Bonafide Certificate Format

Delivery Challan Format

Authorised Signatory in GST

Uber9 Business Process Services Private Limited, CIN - U74900TN2014PTC098414, GSTIN - 33AABCU7650C1ZM, Registered Office Address - F-97, Newry Shreya Apartments Anna Nagar East, Chennai, Tamil Nadu 600102, India.

Please note that we are a facilitating platform enabling access to reliable professionals. We are not a law firm and do not provide legal services ourselves. The information on this website is for the purpose of knowledge only and should not be relied upon as legal advice or opinion.