Consult an Expert
Trademark
Design Registration
Consult an Expert
Trademark
Copyright
Patent
Infringement
Design Registration
More
Consult an Expert
Consult an Expert
Trademark
Design Registration
Login
ENHANCED REAL-TIME ACTION RECOGNITION ON EDGE DEVICES USING HYBRID DEEP LEARNING ARCHITECTURES
Extensive patent search conducted by a registered patent agent
Patent search done by experts in under 48hrs
₹999
₹399
Abstract
Information
Inventors
Applicants
Specification
Documents
ORDINARY APPLICATION
Published
Filed on 4 November 2024
Abstract
Abstract ' The rapid growth of video data across various domains necessitates the development of advanced techniques for efficient human action recognition (HAR). This research presents an innovative hybrid framework as a novel approach that integrates state-of-the-art archit:ectures, including Recurrent Neural Networks (RNN), Vision Transformers (ViT), $patio-Temporal Networks (STN), Generative Adversarial Networks (GAN), FlowNet, Video Super-Resolution Networks (VSRnet), and Graph Neural Networks (GNN). The framework optimizes HAR by leveraging the unique strengths of each architecture. By effectively capturing temporal correlations across video frames and intricate spatial patterns within them, the proposed design enhances recognition performance. ViT excels in spatial feature extraction, while RNNs and-STNs-address the temporal dynamics. GANs and-VSRnet improve video resolution and frame quality, and FlowNet facilitates optical flow estimation, while GNNs enhance relational modeling between entities. Through extensive evaluations, the hybrid model demonstrates significantly higher accuracy in HAR compared to existing solutions. This work contributes valuable insights into video analysis, ~ showcasing the practical benefits of hybrid architectures in overcoming the limitations of ~ traditional models and paving the way for advancements in real-time HAR applications.
Patent Information
Application ID | 202441084027 |
Invention Field | COMPUTER SCIENCE |
Date of Application | 04/11/2024 |
Publication Number | 46/2024 |
Inventors
Name | Address | Country | Nationality |
---|---|---|---|
Dr. J. Arunnehru | DR. J. ARUNNEHRU C BLOCK BNO 1 100 FEET RD VADAPALANI CHENNAI TAMILNADU 600026 9629427700 arunnehruj@gmail.com | India | India |
Kamlesh J.V.K. | C BLOCK BNO 1 100 FEET RD VADAPALANI CHENNAI TAMILNADU 600026 | India | India |
Ayush | C BLOCK BNO 1 100 FEET RD VADAPALANI CHENNAI TAMILNADU 600026 | India | India |
Suhail Ahmed I | C BLOCK BNO 1 100 FEET RD VADAPALANI CHENNAI TAMILNADU 600026 | India | India |
Applicants
Name | Address | Country | Nationality |
---|---|---|---|
Dr. J. Arunnehru | DR. J. ARUNNEHRU C BLOCK BNO 1 100 FEET RD VADAPALANI CHENNAI TAMILNADU 600026 9629427700 arunnehruj@gmail.com | India | India |
Kamlesh J.V.K. | C BLOCK BNO 1 100 FEET RD VADAPALANI CHENNAI TAMILNADU 600026 | India | India |
Ayush | C BLOCK BNO 1 100 FEET RD VADAPALANI CHENNAI TAMILNADU 600026 | India | India |
Suhail Ahmed I | C BLOCK BNO 1 100 FEET RD VADAPALANI CHENNAI TAMILNADU 600026 | India | India |
Specification
1. Background Of The Present Innovation
1.1. Field of Innovation
I) The development relates to advancements in video processing and computer vision.
2) Specifically focuses on human action recognition (HAR).
3) Involves the development of advanced hybrid frameworks integrating various state-of-the-art
machine learning architectures.
1.2. Description of Related Arts
I) Human action recognition (HAR) has gained significant attention due to the increasing
availability of video data in diverse applications, such as:
i) Surveillance
ii) Sports analytics
iii) Human-Computer interaction
2) Traditional HAR approaches often rely on:
i) Handcrafted features
ii) Shallow learning models
3) Convolutional Neural Networks (CNNs) have revolutionized spatial feature extraction in
images and videos, effectively capturing local patterns but lacking temporal context when
applied alone to HAR tasks.
4) Recurrent Neural Networks (RNNs) are tailored for modeling sequential data and have been
applied in Human Action Recognition (HAR) to capture temporal patterns. However, they
often face challenges when it comes to capturing long-range dependencies effectively.
5) Long Short-Term Memory (LSTM) networks, a specialized form of RNN, overcome the
vanishing gradient issue, making them well-suited for handling long sequences in video data.
6) Two-stream architectures process spatial and temporal information separately, often
combining RGB frames with optical flow, yet they can be computationally expensive.
7) 3D CNNs extend traditional CNNs by incorporating temporal dimensions, enabling direct
spatiotemporal feature extraction, but they require large amounts of labeled data for training.
8) Generative Adversarial Networks (GANs) have been utilized to generate high-quality video
frames and enhance resolution, yet their integration into HAR tasks remains limited.
9) Video Super-Resolution Networks (VSRnets) improve video quality by upscaling lowresolution
frames, contributing to better feature representation for HAR but lacking direct
application in recognition tasks.
I 0) Flow Net estimates optical flow between frames to capture motion information, providing
insights into dynamic changes but often requiring additional integration for HAR.
II) Graph Neural Networks (GNNs) model relational interactions among entities in video,
enhancing the understanding of complex behaviors but not widely adopted in HAR
frameworks.
12) Attention mechanisms enhance models by focusing on the most relevant areas of video
frames, improving feature extraction. However, their application in HAR is still in the early
stages of development.
13) Temporal Convolutional Networks (TCNs) offer an alternative to RNNs for sequence
modeling, with benefits like better parallelization. However, their potential has not yet been
fully explored in the context of HAR.
14)Combining multiple models or techniques can improve HAR accuracy, but the complexity
and computational cost increase significantly.
15) Transfer learning enables the use of pre-trained models on new HAR datasets, enhancing
performance but often relying on the availability of substantial labeled data.
16) Self-supervised techniques have emerged as a way to leverage unlabeled data for training,
though their direct application in HAR is still under exploration.
17) Integrating multiple data modalities (e.g., audio and video) can Improve recognition
accuracy, but such methods require complex data synchronization.
18) Efficient algorithms are essential for real-time HAR applications, yet many existing models
trade off accuracy for speed.
19) Various augmentation methods are used to increase the diversity of training data, helping
models generalize better, but can introduce noise if not applied judiciously.
20) Standard evaluation metrics for HAR include accuracy, precision, recall, and Fl-score;
however, the choice of metrics can significantly influence perceived model performance.
2. Brief Description Of The Analysis
Figure I provides an overview of the proposed hybrid framework for video processing, which enhances
human action recognition (HAR) by integrating multiple advanced architectures. The process begins with
raw video input, which undergoes preprocessing before spatial feature extraction using Vision
Transformers (ViT) and temporal feature extraction through Recurrent Neural Networks (RNNs) and
Spatio-Temporal Networks (STNs). FlowNet is employed for precise motion estimation, while
Generative Adversarial Networks (GANs) and Video Super-Resolution Networks (VSRnet) improve
video quality by generating high-resolution frames. Additionally, Graph Neural Networks (GNNs) model
relational interactions, aiding in the understanding of complex behaviors. The framework culminates in
an action recognition module that analyzes these features to produce action predictions, optimizing
performance in video analysis tasks.
Figure 2 represents a pipeline for recognizing human activities from video input using machine
learning and computer vision techniques. The process begins with a video feed that is analyzed by a
"People Detector" module, which utilizes a SqueezeNet backbone and a Single Shot Detector (SSD) for detecting individuals within the frames. The detection is powered by TensorFiow and can be run on
either VPU or CPU hardware. Once people are detected, the bounding boxes around them are
processed using Bayesian filters, which run on the CPU to improve detection accuracy by smoothing
predictions. Next, the processed data is sent to a Hybrid Architecture for Human Activity Recognition
(HAR), implemented with TensorFiow, which operates on either CPU or GPU. The key features
extracted from the frames are then passed to a Dense Neural Network (DNN), which is responsible for
classifying the detected activities. Finally, the output of the system is the classification of human
activities, such as writing, running, archery, drawing, and planting, with each activity represented in
different colored blocks. This pipeline demonstrates how video data can be processed through deep
learning models to identify and classify human actions in real-time.
Figure 3 ·illustrates the prediction rates for Human Activity Recognition (HAR) across five activities:
Writing, Running, Archery, Drawing, and Planting. Each activity's prediction rate is compared against
a target threshold of90%, represented by a red dashed line. The results show that all activities exceed
the iarget. Writing performs well, and Running achieves the highest rate. Archery, while still above the
target, has the lowest prediction rate. Drawing closely follows Running, and Planting also surpasses the
target. Overall, the model demonstrates strong accuracy, successfully classifying all activities with
rates above the desired threshold.
Figure 4 visually represents the performance metrics of the proposed hybrid framework for human
action recognition (HAR) throughout the training process. It features two key components: accuracy
and loss over epochs. The accuracy curve reflects the model's ability to predict actions correctly,
ideally showing a steady upward trend as training progresses. In contrast, the loss curve, typically
measured by a loss function like cross-entropy, indicates the model's error; a decreasing trend signifies
improved predictions. If the loss stagnates or increases, it may suggest overfitting or the need for
hyperparameter adjustments. The x-axis denotes gain in training rate epochs, while they-axis displays
accuracy and loss values, facilitating easy comparison of trends. Overall, the training graph acts as a
diagnostic tool, helping researchers evaluate the effectiveness of the hybrid framework in improving
HAR performance and guiding further model optimizations.
Claims
I. A hybrid framework that integrates multiple state-of-the-art architectures (RNNs, ViT, STN,
GANs, FlowNet, VSRnet, and·GNNs) to enhance human action recognition (HAR) capabilities
in video processing.
2. The proposed framework achieves significantly higher accuracy in HAR compared to
conventional models, effectively capturing complex spatial and temporal dynamics within video
data.
3. Utilization of Vision Transformers (ViT) for robust spatial feature extraction, allowing the
framework to identify intricate patterns in individual video frames.
4. Integration of RNNs and Spatio-Temporal Networks (STN) to effectively model temporal
dependencies in video sequences, improving the recognition of sequential actions.
5. Incorporation of Generative Adversarial Networks (GANs) and· Video Super-Resolution
Networks (VSRnet) for generating high-quality video frames, thus enhancing the overall quality
of the input data for HAR.
6. Implementation of Flow Net for optical flow estimation, providing accurate motion detection
and contributing to improved action recognition performance.
7. Application of Graph Neural Networks (GNNs) to model relational interactions between
entities in video data, aiding in the recognition of complex object behaviors.
8. The framework employs advanced data preprocessing techniques, including noise reduction
and frame normalization, to optimize video input quality.
9. The hybrid architecture is scalable, allowing it to adapt to varymg video lengths and
resolutions while maintaining performance.
1 0. The framework is designed for real-time human action recognition, making it suitable for
applications in surveillance, sports analytics, and human-computer interaction.
Documents
Name | Date |
---|---|
202441084027-Form 1-041124.pdf | 07/11/2024 |
202441084027-Form 2(Title Page)-041124.pdf | 07/11/2024 |
202441084027-Form 3-041124.pdf | 07/11/2024 |
202441084027-Form 5-041124.pdf | 07/11/2024 |
Talk To Experts
Calculators
Downloads
By continuing past this page, you agree to our Terms of Service,, Cookie Policy, Privacy Policy and Refund Policy © - Uber9 Business Process Services Private Limited. All rights reserved.
Uber9 Business Process Services Private Limited, CIN - U74900TN2014PTC098414, GSTIN - 33AABCU7650C1ZM, Registered Office Address - F-97, Newry Shreya Apartments Anna Nagar East, Chennai, Tamil Nadu 600102, India.
Please note that we are a facilitating platform enabling access to reliable professionals. We are not a law firm and do not provide legal services ourselves. The information on this website is for the purpose of knowledge only and should not be relied upon as legal advice or opinion.