Consult an Expert
Trademark
Design Registration
Consult an Expert
Trademark
Copyright
Patent
Infringement
Design Registration
More
Consult an Expert
Consult an Expert
Trademark
Design Registration
Login
HAND GESTURE RECOGNITION AND SPEECH SYNTHESIS FOR THE IMPAIRED LEVERAGING DEEP NEURAL NETWORKS AND MOTION TRIGGER FILTERING
Extensive patent search conducted by a registered patent agent
Patent search done by experts in under 48hrs
₹999
₹399
Abstract
Information
Inventors
Applicants
Specification
Documents
ORDINARY APPLICATION
Published
Filed on 20 November 2024
Abstract
This invention focuses on facilitating phone calls between a person with sensory • disabilities(deaf-mutes) and a person without such disabilities. The speech-to-text technology converts the speech of the regular person into text for the special person, while the text input from the special person is transformed into speech for the regular individual (106). The other way is to enable video functionality exclusively where the camera (101) is used to capture the sign language of the individuals with sensory disabilities and are recognized using Google’s Mediapipe (104) and sign language is converted into an understandable text format for the common man. The hearing person cannot view the video; converting sign language into text. The model is trained on different types of Sign languages (102) to improve the efficiency of the model. Thus, Hand-gesture recognition (105) and voice conversion is used to bridge the communication gap between the deaf-mutes and the others.
Patent Information
Application ID | 202441090030 |
Invention Field | ELECTRONICS |
Date of Application | 20/11/2024 |
Publication Number | 48/2024 |
Inventors
Name | Address | Country | Nationality |
---|---|---|---|
Dr.V.VIDHYA | Department of Artificial Intelligence and Data science, Easwari Engineering College, Bharathi Salai, Ramapuram, Chennai-600089. | India | India |
G.KRISHNA PRIYA | Department of Artificial Intelligence and Data science, Easwari Engineering College, Bharathi Salai, Ramapuram, Chennai-600089. | India | India |
D.KARTHIKA PRIYA | Department of Artificial Intelligence and Data science, Easwari Engineering College, Bharathi Salai, Ramapuram, Chennai-600089. | India | India |
HARINI. J | Department of Artificial Intelligence and Data science, Easwari Engineering College, Bharathi Salai, Ramapuram, Chennai-600089. | India | India |
KAVIN A S | Department of Artificial Intelligence and Data science, Easwari Engineering College, Bharathi Salai, Ramapuram, Chennai-600089. | India | India |
LAKSHETHA S | Department of Artificial Intelligence and Data science, Easwari Engineering College, Bharathi Salai, Ramapuram, Chennai-600089. | India | India |
Applicants
Name | Address | Country | Nationality |
---|---|---|---|
EASWARI ENGINEERING COLLEGE | 162, Bharathi Salai, Ramapuram, Chennai-600089. | India | India |
Specification
DESCRIPTION:
[0001] The title of the invention is Hand Gesture Recognition and Speech Synthesis for the Impaired: Leveraging Deep Neural Networks and Motion Trigger Filtering
PRIOR ART AND BACKGROUND:
[0002] CN113723327A:This patent describes a framework that uses hardware equipment such as hand gloves to coordinate the face and hand key points. Our invention extract the feature via camera.
[0003] CN105976675A: This patent describes a framework that includes voice and image acquisition terminals that process and recognize gestures, lip movements, and facial expressions, converting them into text or speech to facilitate communication and also incorporates gesture recognition through glove-based systems and advanced algorithms like wavelet transforms and neural networks to enhance accuracy. Our invention includes Hand Gesture Recognition by enabling video and Speech Synthesis using Deep Neural Networks and Google's MediaPipe integrated in smartphones.
[0004] CN111208907A: This patent describes a framework that discloses a sign language recognition system and method that combines electromyographic (EMG) signals and finger joint deformation signals. Our invention combines Deep Neural Network and Google's Audiopipe to recognize sign language and for text-to-speech conversion and vice versa.
[0005] CN111768786A: This patent describes a framework where the whole model is integrated on an intelligent terminal. In our invention the whole application is integrated on the smartphone.
OBJECTIVE:
[0006] The primary objective is to develop a machine learning model in particular to Hand Gesture recognition and voice conversion method for deaf and mutes integrated in a Smartphone.
SUMMARY:
[0007] The invention contains a dataset of 200x200 pixel images with 29 classes, including letters A-Z and special classes. For voice-to-text, we utilized a Kaggle dataset of 3,168 speech samples from various speakers. The data was preprocessed using acoustic analysis, examining frequencies from 0-280 Hz. These datasets support our-gesture recognition and voice conversion models.
[0008] Data preprocessing is crucial for our gathered datasets. We'll focus on two key techniques:
one is RGB to Grayscale Conversion and in that, we use the average method and weighted method other important preprocessing methods are image resizing, noise reduction and data normalization (min-max normalization, z-score normalization) important hardware component are CPU Processing Unit and sensor unit
[0009] The sensor unit is crucial for gesture and voice recognition. Cameras capture gestures, while accelerometers and gyroscopes aid interpretation. For voice, sensors analyze speech and filter noise. Software components used are hand landmark detection which is used to locates 21 hand knuckle coordinates using real and synthetic images .Convolutional Neural Networks used identify hand motions and aid in speech recognition, support vector machines used to classify data points by determining optimal decision boundaries they process hand emotions captured by web cameras and finally recognize_google() method transcribes audio using Google's web voice API. It requires an AudioData object from the speech_recognition module.Voice conversion modifies speech features from source to target speaker. It alters timbre and prosody elements like duration and intonation. The invention uses camera-recorded hand motions, and converts them to text, then to speech.
DETAILED TECHNICAL DESCRIPTION:
[0010] Hand Motion Detection Systems Camera-based hand motion tracking technology uses sensors and cameras to follow and understand user movements in real-time.
[0011] Network Architecture
1. Input Layer: Accepts grayscale images converted from RGB, enhancing contrast for better feature extraction.
2. Preprocessing: Resizes images to a uniform size, applies Gaussian filtering for noise reduction, and normalizes pixel values to stabilize training.
3. Feature Extraction: Utilizes convolutional layers to capture spatial hierarchies, , _ = -pooling layeis lo down sample'features, and activation functions for non-linearity.
4. Hand Motion Detection: Employs visual-based systems for direct analysis, using deformable templates for gesture tracking and key point detection for critical features.
[0012] Training Strategy
1. Implements mini-batch gradient descent and adaptive learning rates for efficient training, monitoring loss and accuracy.
2. Integrates dropout layers and weight decay to prevent overfitting, employing early stopping based on validation performance.
3. Conducts hyperparameter tuning and evaluates performance on validation sets to ensure robustness and efficiency for deployment.
BRIEF DESCRIPTION OF THE DRAWING:
Fig 1:A flow diagram of Hand gesture recognition and Text-to-Speech conversion
LIST OF REFERENCE NUMERALS
100 - Hand images
101 - Hand images from the captured using camera by enabling the video functionality
102 - Training on Different types of sign languages
103 - 3D hand key points on the palm are marked (Hand landmark detection)
104 - Hand landmark detection is done using Mediapipe.
105 - Identify hand gestures based on hand key points.
106 - Simultaneous Text-to-Speech conversion for the recognised gesture.
107 - Output is generated at the user end i.e Smartphones
WE Claim.
1. A method for, Hand Gesture Recognition and Speech Synthesis for the Impaired: Leveraging Deep Neural Networks and Motion Trigger Filtering comprising,
a. Extracting the gestures that are captured through the camera
b. Training the model with hand gesture images
c. Matching and coordinating the training and real time images
d. Converting the output of the hand gesture recognition system simultaneously
into speech i.e from text-to-speech. _ _ .....
2. The system as claimed in claim 1, wherein the whole deep learning model is compressed into mobile application is integrated on the smartphone.
3. The system as claimed in claim 1, wherein the model is trained on different types of sign language such as British Sign Language and American Sign Language.
4. The system as claimed in claim 1, wherein the accuracy of hand gesture recognition is improved by hand landmark detection and motion trigger filtering.
Documents
Name | Date |
---|---|
202441090030-Form 1-201124.pdf | 25/11/2024 |
202441090030-Form 18-201124.pdf | 25/11/2024 |
202441090030-Form 2(Title Page)-201124.pdf | 25/11/2024 |
202441090030-Form 3-201124.pdf | 25/11/2024 |
202441090030-Form 5-201124.pdf | 25/11/2024 |
202441090030-Form 9-201124.pdf | 25/11/2024 |
Talk To Experts
Calculators
Downloads
By continuing past this page, you agree to our Terms of Service,, Cookie Policy, Privacy Policy and Refund Policy © - Uber9 Business Process Services Private Limited. All rights reserved.
Uber9 Business Process Services Private Limited, CIN - U74900TN2014PTC098414, GSTIN - 33AABCU7650C1ZM, Registered Office Address - F-97, Newry Shreya Apartments Anna Nagar East, Chennai, Tamil Nadu 600102, India.
Please note that we are a facilitating platform enabling access to reliable professionals. We are not a law firm and do not provide legal services ourselves. The information on this website is for the purpose of knowledge only and should not be relied upon as legal advice or opinion.