Consult an Expert
Trademark
Design Registration
Consult an Expert
Trademark
Copyright
Patent
Infringement
Design Registration
More
Consult an Expert
Consult an Expert
Trademark
Design Registration
Login
MULTIMODAL AI-BASED SENTIMENT ANALYSIS
Extensive patent search conducted by a registered patent agent
Patent search done by experts in under 48hrs
₹999
₹399
Abstract
Information
Inventors
Applicants
Specification
Documents
ORDINARY APPLICATION
Published
Filed on 12 November 2024
Abstract
ABSTRACT The Multimodal Al-based Sentiment Analysis System utilizes advanced deep learning algorithms across text, audio, and visual modalities to provide a comprehensive interpretation of human emotions. In the text modality, the system employs a BERT-based model fine-tuned on sentiment analysis data to capture context-aware embeddings, supplemented by sentiment lexicons like VADER for validating and refining sentiment scores. An attention mechanism further enhances accuracy by weighing critical words or phrases that contribute more significantly to sentiment For audio analysis, Mel-frequency cepstral coefficients (MFCCs) and prosodic features such as pitch and speaking rate are extracted to capture vocal tone characteristics. These �are fed into a CNN-RNN hybrid model, where CNN layers capture local spectral features and RNN layers (LSTM) capture temporal dependencies. A fme-tuned Wav2Vec model may also be utilized for speech emotion recognition, leveraging robust audio embeddings. In the visual modality, facial expressions and gestures are analyzed using CNNs trained on datasets like AffectNet, detecting facial action units (AUs) and micro-expressions indicative of emotions. Landmark detection and pose estimation techniques, such as OpenPose is used to interpret emotional cues from gestures and postures, while Vision Transfqrmers (ViTs) help capture spatial relationships and fine-grained facial features. To synthesize information across modalities, the system employs various fusion strategies. Early fusion combines raw features at an initial stage to Jearn joint representations, while late fusion aggregates modality-specific predictions through weighted averaging or majority voting. An attention-based fusion mechanism dynamically assigns weights to each modality, adjusting based on context to improve sentiment interpretation. This multimodal approach enables the system to overcome the limitations of single-channel analysis, resulting in a nuanced understanding of u'ser emotions. With applications in customer feedback analysis,_ humancomputer interaction, and personalized digital experiences, this system provides a robust framework for interpreting complex human emotions across diverse digital environments.
Patent Information
Application ID | 202441087034 |
Invention Field | COMPUTER SCIENCE |
Date of Application | 12/11/2024 |
Publication Number | 47/2024 |
Inventors
Name | Address | Country | Nationality |
---|---|---|---|
Dr. RAJKUMAR KALIMUTHU | ASSOCIATE PROFESSOR, DEPT. OF CYBER SECURITY, GURU NANAK INSTITUTE OF TECHNOLOGY, HYDERABAD, TELANGANA-501506. Ibrahimpatnam, Ranga Reddy District, Hyderabad Telangana India 501506 | India | India |
V Meena | Dr.M.G.R.Nagar,Hosur, Krishnagiri District,Tamil Nadu India 635109 | India | India |
P Veeranna | lbrahimpatnam, Ranga Reddy District, Hyderabad Telangana India 501506 | India | India |
D Purushothaman | Maisammaguda, Dulapally Hyderabad Telangana India 500100 | India | India |
Aruna A | SNS Kalvi Nagar, Vazhiyampalayam, Saravanampatti Coimbatore Tamilnadu India 641035 | India | India |
Ramani P | 365, Thudiyalur Road Saravanampatti Coimbatore Tamil Nadu India 641035 | India | India |
B.Gunasundari | Aranvoyalkuppam Tiruvallur-Poonamallee high road Tiruvallur Tamil Nadu India 602025 | India | India |
Indraneel K | lbrahimpatnam, Ranga Reddy District, Hyderabad Telangana India 501506 | India | India |
Applicants
Name | Address | Country | Nationality |
---|---|---|---|
Dr. RAJKUMAR KALIMUTHU | Dr. RAJKUMAR KALIMUTHU , ASSOCIATE PROFESSOR, DEPT. OF CYBER SECURITY, GURU NANAK INSTITUTE OF TECHNOLOGY, HYDERABAD, TELANGANA-501506. researchip08@gmail.com 7200016375 | India | India |
V Meena | Dr.M.G.R.Nagar,Hosur, Krishnagiri District,Tamil Nadu India 635109 | India | India |
P Veeranna | lbrahimpatnam, Ranga Reddy District, Hyderabad Telangana India 501506 | India | India |
D Purushothaman | Maisammaguda, Dulapally Hyderabad Telangana India 500100 | India | India |
Aruna A | SNS Kalvi Nagar, Vazhiyampalayam, Saravanampatti Coimbatore Tamilnadu India 641035 | India | India |
Ramani P | 365, Thudiyalur Road Saravanampatti Coimbatore Tamil Nadu India 641035 | India | India |
B.Gunasundari | Aranvoyalkuppam Tiruvallur-Poonamallee high road Tiruvallur Tamil Nadu India 602025 | India | India |
Indraneel K | lbrahimpatnam, Ranga Reddy District, Hyderabad Telangana India 501506 | India | India |
Specification
FiELD OF INVENTION
The present invention lies in the field of artificial-intelligence (AI) and machine learning (ML),
specifically focusing on sentiment analysis through a multimodal approach. Unlike traditional
sentiment analysis methods, which typically rely on single data sources like text, this invention
integrates multiple data types to capture a fuller range of human emotional expressions. These
data sources, or "modalities," may include text, audio, visual cues, and even physiological
signals, allowing for a more accurate and nuanced interpretation of emotions. For instance,
natura!"language processing (NLP) techniques analyze textual content for sentiment-bearing
words and linguistic subtleties, while audio analysis captures emotional information from vocal
attributes such as tone, pitch, and rhythm. Visual analysis using computer vision interprets
facial expressions, body language, and gestures, providing non-verbal emotional indicators.
Physiological data, if available, can offer insights through metrics like .heart rate or skin
conductance, which are particularly relevant in gauging stress or arousal. By synthesizing data
from these diverse channels, this system constructs a comprehensive representation of a user's
emotional state, surpassing single-mode approaches in accuracy and depth.
This multi modal AI-based sentiment analysis system has a broad range of applications across
different fields. In customer service, for example, it can enhance automated responses by
detecting customer emotions more accurately, enabling AI or human agents to respond with
empathy and relevance. In mental health monitoring, it can analyze multimodal indicators to
provide insights into an individual's emotional wellbeing, potentially offering timely support
for stress or anxiety management. In social media and marketing analytics, th~ system helps
capture user sentiment across platforms, valuable for brand reputation monitoring and audience
engagement strategies. Additionally, in human-computer interaction (HCI), this system creates
empathetic and responsive interfaces that can adapt in real-time based on the user's emotional
feedback. Overall, this invention represents a substantial advancement in AI-driven sentiment
analysis, offering a robust, multi-layered approach to understanding human emotions across
varied contexts and applications.
SUMMARY OF INVENTION
The M:ultimodal AI-based Sentiment Analysis System interprets human emotions by
integrating data from multiple modalities-text, audio, and visual sources-each adding
unique insights for a comprehensive emotional assessment. In text analysis, a fine-tuned BERT
model captures context-aware sentiment, further refined with sentiment lexicons like VADER,
while an attention mechanism prioritizes words critical to sentiment, enhancing accuracy. For
audio analysis, features such as pitch and speaking rate are extracted and processed with a
CNN-RNN hybrid model that captures vocal tone nuances, with an additional fme-tuned
Wav2Vec model providing robust audio embeddings. In visual analysis,. CNNs trained on
emotion-specific datasets, along with Vision Transformers and tools like OpenPose, detect
emotional cues in facial expressions, gestUres, and postures. To merge these insights, the system
employs fusion strategies: early fusion combines raw features for joint representation, hlte
fusion merges modality-specific predictions, and attention-based fusion dynamically adjusts.
the weight of each modality based on context. This multimodal approach overcomes the
limitations of single-charmel sentiment analysis, enabling a nuanced understanding of emotions
suited to applications such as customer feedback analysis, human-computer interaction, and
personalized digital experiences.
DETAILED DESCRIPTION OF INVENTION
The Multimodal Al-based Sentiment Analysis System is organized into specialized
components for analyzing sentiment from different data types-text, audio, and visual
inputs-as well as a fusion component to combine these insights. Each component utilizes deep
learning techniques to extract and interpret sentiment indicators, creating a unified system for
comprehensive emotion analysis.
1. Text Analysis Component
The Text Analysis Component is dedicated to extracting sentiment from textual content. It
employs a BERT-based model fine-tuned specifically for sentiment analysis, capturing contextaware
em beddings that reflect nuanced emotional expressions. To refine these em beddings, this
component uses sentiment lexicons like VADER, which validate and adjust sentiment scores
to improve accuracy. Additionally, an attention mechanism emphasizes sentiment-rich words .
or phrases, allowing the system to capture subtle emotional cues, such as sarcasm or emphasis,
that might otherwise be overlooked.
2. Audio Analysis Component
The Audio Analysis Component processes vocal tone characteristics that convey emotional
information. It extracts Mel-frequency cepstral coefficients (MFCCs) and prosodic features
such as pitch and speaking rate, which provide insights into tone and mood. These audio
features are then processed by a CNN-RNN hybrid model, with CNN layers focusing on
spectral features and RNN (LSTM) layers capturing temporal changes in tone. The component
may also employ a fme-tuned Wav2Vec model to create robust audio embeddings, enhancing
the system's ability to detect nuances in speech-related emotions
3. Visual Analysis Component
The Visual Analysis Component interprets sentiment based on facial expressions, gestures, and
posture. Using CNNs trained on emotion-specific datasets like AffectNet, this component
detects facial action units (AUs) and micro-expressions that provide insights into subtle
emotions. Landmark detection and pose estimation techniques, such as OpenPose, are used to
� capture additional cues from body gestures and posture, adding context to facial expressions
for a richer emotional interpretation. Vision Transformers (ViTs) are utilized to capture finegrained
spatial relationships in facial features, enhancing the system's ability to identifY
complex emotions accurately.
4. Multimodat�Fusion Component
The Multimodal Fusion Component combines insights from the Text, Audio, arid Visual
Analysis Components, It employs several fusion techniques to merge information effectively.
Early fusion combines raw� features from each data type at an initial stage, enabling joint
representation learning that incorporates text, audio, and visual elements. Late fusion
aggregates predictions from each component at a later stage, using methods like weighted
averaging or majority voting to syntliesize modality-specific insights into a fmal sentiment
interpretation. An attention-based fusion mechanism dynamically adjusts the importance of
each data source based on context, allowing the system to prioritize the most relevant cues for
accurate sentiment interpretation
WE CLAIM
I. The Text Analysis Component that uses a fine-tuned BERT model and sentiment
lexicqns like VADER, combined with an attention mechanism to improve sentiment
accuracy by highlighting key phrases.
2. The Audio Analysis Component that extracts features such as MFCCs, pitch, and
speaking rate, processed by a CNN-RNN hybrid model, with additional support from a
fine-tuned Wav2Vec model for speech emotion recognition.
3. The Visual Analysis Component that analyzes facial expressions, gestures, and
postures using CNNs trained on emotion datasets, and Vision Transformers for
capturing fine-grained facial features and spatial relationships.
4. A Multimodal Fusion Component that combines insights from text, audio; and visual
components using early fusion, late fusion, and an attention-based mechanism to
dynamically adjust modality weights for improved sentiment interpretation.9D20-806E6
Documents
Name | Date |
---|---|
202441087034-Form 1-121124.pdf | 14/11/2024 |
202441087034-Form 2(Title Page)-121124.pdf | 14/11/2024 |
202441087034-Form 9-121124.pdf | 14/11/2024 |
Talk To Experts
Calculators
Downloads
By continuing past this page, you agree to our Terms of Service,, Cookie Policy, Privacy Policy and Refund Policy © - Uber9 Business Process Services Private Limited. All rights reserved.
Uber9 Business Process Services Private Limited, CIN - U74900TN2014PTC098414, GSTIN - 33AABCU7650C1ZM, Registered Office Address - F-97, Newry Shreya Apartments Anna Nagar East, Chennai, Tamil Nadu 600102, India.
Please note that we are a facilitating platform enabling access to reliable professionals. We are not a law firm and do not provide legal services ourselves. The information on this website is for the purpose of knowledge only and should not be relied upon as legal advice or opinion.