image
image
user-login
Patent search/

A System and Method for Gesture-Based Virtual Input Using Real-Time Hand Tracking and Recognition.

search

Patent Search in India

  • tick

    Extensive patent search conducted by a registered patent agent

  • tick

    Patent search done by experts in under 48hrs

₹999

₹399

Talk to expert

A System and Method for Gesture-Based Virtual Input Using Real-Time Hand Tracking and Recognition.

ORDINARY APPLICATION

Published

date

Filed on 14 November 2024

Abstract

This invention relates to a system (100) and method for gesture-based virtual input, enabling hands-free interaction with computing devices through real-time hand tracking and gesture recognition. The system includes: (1) a hand tracking module (110) that detects and tracks hand movements using a neural network model to identify key hand landmarks; (2) a gesture recognition module (120) that interprets specific gestures based on detected landmarks; and (3) an input simulation module (130) that generates virtual keyboard and mouse inputs from the recognized gestures. By utilizing OpenCV and MediaPipe, the system ensures responsiveness and accuracy across diverse conditions, including varying lighting and hand orientations. Key features include a palm detection model to locate hands in the frame and a hand landmark model trained on real and synthetic data, enhancing recognition across scenarios.

Patent Information

Application ID202441088037
Invention FieldCOMPUTER SCIENCE
Date of Application14/11/2024
Publication Number47/2024

Inventors

NameAddressCountryNationality
Dr. Kasarapu RamaniProfessor, School of Computing, Mohan Babu University ( Erstwhile Sree Vidyanikethan Engineering College ), A. Rangampet, Tirupati-517102, INDIAIndiaIndia
Mr.Kuruba Dinesh BabuUG Scholar, Department of IT, School of Computing, Mohan Babu University ( Erstwhile Sree Vidyanikethan Engineering College ), A. Rangampet, Tirupati-517102, INDIAIndiaIndia
Mr. Bathala Lalith KumarUG Scholar, Department of IT, School of Computing, Mohan Babu University ( Erstwhile Sree Vidyanikethan Engineering College ), A. Rangampet, Tirupati-517102, INDIAIndiaIndia
Mr. EragamReddy Badrinath ReddyUG Scholar, Department of IT, School of Computing, Mohan Babu University ( Erstwhile Sree Vidyanikethan Engineering College ), A. Rangampet, Tirupati-517102, INDIAIndiaIndia

Applicants

NameAddressCountryNationality
Mohan Babu University (Erstwhile Sree Vidyanikethan Engineering College)IPR Cell, Mohan Babu University (Erstwhile Sree Vidyanikethan Engineering College), Tirupati, Andhra Pradesh, India - 517102IndiaIndia

Specification

Description:The invention is a comprehensive system and method for gesture-based virtual input using real-time hand tracking and recognition. It is designed to provide a hands-free, intuitive way to interact with computers by translating hand gestures into virtual keyboard and mouse actions. This innovative system uses advanced technologies-MediaPipe for hand tracking and OpenCV for video processing-aiming to replicate conventional computer interactions without the need for physical input devices.
Fig: 1 illustrates the schematic of the system for gesture-based virtual input using real-time hand tracking and recognition. Hand Tracking Module is the foundation of the system, designed to detect and track hand movements in real-time. It employs a trained neural network model capable of identifying multiple hand landmarks, such as knuckle positions and finger joints, which allow the system to analyze the hand's location, orientation, and movement in detail. By accurately tracking these landmarks, the hand tracking module establishes the basic information needed to interpret gestures effectively, providing a responsive base for further interaction with the computer.
Gesture Recognition Module is coupled with the hand tracking module, this component is responsible for recognizing specific hand gestures based on the data from tracked hand landmarks. It interprets these gestures in the context of pre-defined commands, enabling the system to differentiate between various hand movements. For instance, a closed fist might be recognized as a "mouse click," while an open hand might simulate a "scroll" action. By interpreting gestures in this way, the gesture recognition module creates a fluid experience where users can control their device naturally and intuitively.
Input Simulation Module generates virtual keyboard and mouse inputs based on the recognized gestures. For example, hand movements corresponding to common computer interactions, such as dragging, clicking, or typing, are mapped to virtual inputs that the device understands. This input simulation allows the user to perform typical computer functions-navigating, selecting, and interacting with on-screen elements-using only hand gestures. The real-time mapping of gestures to actions creates a seamless, hands-free interaction experience.
The hand tracking module includes a specialized model for palm detection, which helps identify the initial hand position within an image frame. This model is designed to work in a variety of scenarios, detecting hands regardless of their orientation, scale, or degree of visibility. It can also identify partially visible hands, ensuring that the system functions robustly even in less-than-ideal conditions. This versatility makes it suitable for use across different environments.
Another critical aspect of the hand tracking module is the hand landmark model, which localizes a predefined set of 3D hand-knuckle coordinates. This model has been extensively trained on annotated real-world images and synthetic models to provide consistent accuracy in hand tracking, even under diverse conditions, including different lighting, angles, and hand positions. By recognizing 3D hand landmarks, this model allows for more precise tracking and, consequently, more reliable gesture interpretation.
The gesture recognition module uses a predefined set of gestures mapped to specific computer commands, such as mouse clicks, drag-and-drop actions, or keyboard inputs. These gestures allow users to perform complex actions on the device using natural hand movements. This predefined set enables the system to anticipate and recognize user intentions more accurately and facilitates an intuitive interaction experience.
To enhance the system's responsiveness, a video processing unit powered by OpenCV captures and processes hand movements. OpenCV, known for its computer vision capabilities, helps optimize gesture capture and interpretation. By reducing processing delays and enhancing accuracy in interpreting complex movements, OpenCV ensures that the system can handle dynamic gestures effectively without lag.
The input simulation module not only interprets gestures but also generates real-time virtual inputs, such as mouse movements, clicks, and keyboard entries. Each detected gesture is mapped to a specific action, allowing the user to navigate and interact with the interface without needing to touch any physical input devices. This real-time simulation of inputs enables a smooth and immersive experience where users can control the interface solely with hand gestures.
The system aims to transform human-computer interaction by creating an intuitive, immersive experience that adapts to modern interface demands. This system's foundation lies in a Palm Detection Model and a Hand Landmark Model, each optimized for mobile, real-time applications.
A. Palm Detection Model: This model operates similarly to the face detection mode in MediaPipe Face Mesh, using a single-shot detector to identify initial hand locations within an image frame. The model is built to detect hands across a wide range of scales (up to 20x relative to the frame) and can identify partially occluded or self-occluded hands. Unlike facial features, hands lack high-contrast landmarks, making detection more challenging. To increase accuracy, the model incorporates additional context, such as limb, body, or other associated features.
B. Hand Landmark Model: Following palm detection, the Hand Landmark Model uses regression, or direct coordinate prediction, to pinpoint 21 3D hand-knuckle coordinates within the identified hand area. This model, trained on 30,000 real-world images annotated with 3D coordinates, reliably identifies hand poses, even when hands are partially visible. For broader hand pose coverage, high-fidelity synthetic models are used, ensuring detailed understanding of hand geometry in diverse contexts.
This hand landmark identification draws from methods used in popular landmark-detection applications, like Google Maps, which identifies iconic landmarks for location-based computer vision tasks. Similar principles are applied here to recognize hands in augmented reality, enabling functions such as hand or face recognition, pose estimation, and even finger-specific operations. The model, implemented in Python using MediaPipe, allows for accurate gesture detection that maps to specific operations, creating a versatile platform for gesture-based interaction.
Fig: 2 illustrates a method for enabling gesture-based virtual input using real-time hand tracking and recognition. The method for enabling gesture-based virtual input leverages real-time hand tracking and recognition to create a touchless control system for computing devices. The method is structured in four key steps that combine advanced computer vision and machine learning techniques to deliver a responsive, gesture-driven interface.
The process begins by detecting a hand in a video stream using a trained palm detection model. This model is specifically optimized to locate hand regions within an image in real-time, allowing for accurate and efficient detection. By focusing on the palm region, which is relatively stable and distinguishable, the model achieves reliable hand detection across various conditions, including different hand sizes, orientations, and lighting scenarios. This initial detection step is essential for isolating the hand from other background elements, enabling a clean starting point for further landmark localization.
Once the hand region is detected, the method proceeds to localize specific hand landmarks within this region. A neural network-based hand landmark model is employed here, which identifies critical points, or landmarks, on the hand, such as finger joints and knuckles. By mapping out these landmarks in the detected hand region, the model creates a detailed skeletal representation of the hand in 3D space. This enables precise tracking of each part of the hand, making it possible to capture complex hand poses and movements even as the hand moves or rotates. This stage of localization is crucial for accurately interpreting hand gestures.
Following landmark localization, the method moves to recognizing specific hand gestures based on the spatial arrangement of these detected landmarks. The model is trained to understand various hand gestures by analyzing the relative positions and orientations of landmarks in real-time. For example, specific landmark configurations can represent gestures like an open hand, a closed fist, or a pointing finger. By identifying these spatial configurations, the system can distinguish between different gestures with high accuracy, allowing it to interpret the user's intended actions effectively.
Finally, the method generates virtual input commands that correspond to the recognized gestures. These commands can be used to control a computing device without requiring physical touch, creating an intuitive, touchless interface. For example, a pointing gesture could move a cursor, while a fist gesture could simulate a click or selection. This stage of virtual command generation transforms the recognized gestures into actionable inputs, enabling users to interact with digital environments seamlessly and naturally. Through this combination of detection, localization, recognition, and command generation, the method offers a sophisticated solution for gesture-based control in real-time applications.
Fig: 2 illustrates the architecture diagram for the system design describing the all over workflow. The proposed land marking algorithm, which forms the basis of the Virtual Input Hub, is designed to enable real-time hand tracking and gesture recognition, providing a hands-free method for interacting with computers. The algorithm consists of two core components: the Palm Detection Model and the Hand Landmark Model. These two models work together to accurately track hand positions and predict 3D coordinates for key hand landmarks, which are then used to recognize gestures and translate them into meaningful actions on a computer.
The Palm Detection Model is the first step in the hand tracking process. Similar to the face detection model in MediaPipe Face Mesh, this model uses a single-shot detector optimized for real-time performance on mobile devices. The model addresses several challenges inherent in hand detection, including varying hand sizes, occlusions (when part of the hand is hidden), and self-occlusions (when parts of the hand obstruct each other). To improve accuracy, the model uses additional contextual information, such as the presence of limbs, body, or the person, and operates over a broad scale to detect hands at various distances from the camera. This initial detection of the hand sets the stage for more precise hand tracking by the subsequent model.
Once the palm is detected, the Hand Landmark Model takes over to precisely locate 21 hand-knuckle points in 3D space. These key points correspond to joints in the fingers, palm, and wrist. The model uses direct coordinate prediction to calculate the exact positions of these landmarks within the detected hand region. This approach is highly efficient and ensures that hand landmarks can be predicted with high accuracy in real time. The model is also robust to self-occlusions and partial visibility of the hand, which is a common issue in hand tracking systems. To achieve this, the Hand Landmark Model is trained on a comprehensive dataset that includes both real-world images (manually annotated) and synthetic hand models
rendered across various backgrounds. This diverse training data helps the model generalize well across different hand poses and environments.
After the hand landmarks are detected, the system moves on to gesture recognition. The algorithm incorporates Google's Landmark Detection system, which is widely used in applications like Google Maps, to identify specific landmarks that correspond to hand gestures. The 21 hand landmarks are analyzed to identify particular hand gestures based on the relative positions of the key points. These gestures are then mapped to predefined actions, such as virtual mouse movements or keyboard inputs, enabling hands-free interaction with the computer.
The system's overall workflow begins with image acquisition, where an HD webcam is used to capture real-time images, processed by MATLAB 2012. The images are resized, converted into the YCbCr color space, and then binarized using Otsu's method to isolate the hand region. The next step is pre-processing, which involves repairing defects, cleaning up the binary image, and ensuring that the detected hand is the only object in the image. Finally, finger detection is carried out, where the system identifies the centroid of the hand, eliminates the wrist and palm, and categorizes the fingers based on the distance between their centroids. These steps allow the system to accurately recognize and track the fingers in real time.
The architecture of the Virtual Input Hub is designed to be modular and adaptable. It integrates seamlessly with the MediaPipe library and the Python programming language, ensuring compatibility with a wide range of applications. The system's flexibility allows it to be expanded to support additional functionality, such as interactive presentations or accessibility solutions, and makes it a versatile tool for hands-free computer interaction. The integration of palm detection and hand land marking models ensures that the system can accurately track hand positions and gestures, allowing users to control their computers through natural hand movements.

The performance of the system has been promising. The Palm Detection Model effectively handles hands of different sizes and orientations, even in the presence of occlusions. Once the hand is detected, the Hand Landmark Model accurately predicts the 3D coordinates for all 21 key points, even when parts of the hand are hidden. Gesture recognition works reliably, with the system able to interpret complex hand movements and map them to appropriate actions.
, Claims:We claim
1. A system (100) and method for gesture-based virtual input using real-time hand tracking and recognition, the system comprising:
a) a hand tracking module (110) configured to detect and track hand movements in real-time by identifying a plurality of hand landmarks using a trained neural network model;
b) a gesture recognition module (120) coupled to the hand tracking module to interpret specific hand gestures based on detected hand landmarks, and
c) an input simulation module (130) configured to generate virtual keyboard and mouse inputs based on interpreted gestures, enabling hands-free control of a computing device.
2. The system of claim 1, wherein the hand tracking module includes a palm detection model to identify initial hand locations within an image frame, with the model designed to detect hands at various scales and orientations, including partially occluded hands.
3. The system of claim 1, wherein the hand tracking module employs a hand landmark model that localizes a set of predefined 3D hand-knuckle coordinates, wherein said model is trained on annotated real-world images and synthetic hand models to enhance accuracy under diverse conditions and viewpoints.
4. The system of claim 1, wherein the gesture recognition module uses a set of predefined hand gestures that correspond to specific computer commands, such as mouse clicks, drag actions, and keyboard inputs, allowing the user to perform computer operations by performing natural hand movements.
5. The system of claim 1, further comprising a video processing unit that utilizes OpenCV to enhance the capture and processing of hand movements and gestures, allowing for efficient interpretation of complex hand movements and maintaining system responsiveness.
6. The system of claim 1, wherein the input simulation module is configured to generate real-time mouse movements, clicks, and keyboard inputs by mapping detected hand gestures to corresponding computer actions, enabling the user to navigate and interact with the computer interface through gestures alone.
7. The system of claim 1, wherein the system utilizes MediaPipe to implement the hand tracking module, providing resilience to variations in lighting, hand orientation, and
8. occlusion, thus ensuring robust gesture detection and recognition in diverse environmental conditions.
9. A method (200) for enabling gesture-based virtual input using real-time hand tracking and recognition, the method comprising:
a) detecting a hand in a video stream by applying a trained palm detection model to identify hand regions within an image;
b) localizing hand landmarks within the detected hand regions using a neural network-based hand landmark model;
c) recognizing specific hand gestures based on the spatial configuration of detected hand landmarks, and
d) generating virtual input commands corresponding to recognized gestures for controlling a computing device without physical touch.

Documents

NameDate
202441088037-COMPLETE SPECIFICATION [14-11-2024(online)].pdf14/11/2024
202441088037-DECLARATION OF INVENTORSHIP (FORM 5) [14-11-2024(online)].pdf14/11/2024
202441088037-DRAWINGS [14-11-2024(online)].pdf14/11/2024
202441088037-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [14-11-2024(online)].pdf14/11/2024
202441088037-FORM 1 [14-11-2024(online)].pdf14/11/2024
202441088037-FORM FOR SMALL ENTITY [14-11-2024(online)].pdf14/11/2024
202441088037-FORM FOR SMALL ENTITY(FORM-28) [14-11-2024(online)].pdf14/11/2024
202441088037-FORM-9 [14-11-2024(online)].pdf14/11/2024
202441088037-REQUEST FOR EARLY PUBLICATION(FORM-9) [14-11-2024(online)].pdf14/11/2024

footer-service

By continuing past this page, you agree to our Terms of Service,Cookie PolicyPrivacy Policy  and  Refund Policy  © - Uber9 Business Process Services Private Limited. All rights reserved.

Uber9 Business Process Services Private Limited, CIN - U74900TN2014PTC098414, GSTIN - 33AABCU7650C1ZM, Registered Office Address - F-97, Newry Shreya Apartments Anna Nagar East, Chennai, Tamil Nadu 600102, India.

Please note that we are a facilitating platform enabling access to reliable professionals. We are not a law firm and do not provide legal services ourselves. The information on this website is for the purpose of knowledge only and should not be relied upon as legal advice or opinion.