Consult an Expert

Trademark

Patent

Infringement

Design Registration

Consult an Expert

Talk to a IP/Trademark Lawyer

Trademark

Trademark Registration

Trademark Search

Respond to TM Objection

International Trademark

Trademark Class Finder

Patent

Indian Patent Search

Provisional Patent Application

Patent Registration

Infringement

Patent Infringement

Trademark Infringement

Design Registration

Patent search/

REINFORCEMENT LEARNING FOR AUTONOMOUS ROBOT NAVIGATION

Patent Search in India

Extensive patent search conducted by a registered patent agent
Patent search done by experts in under 48hrs

₹999

₹399

Talk to expert

REINFORCEMENT LEARNING FOR AUTONOMOUS ROBOT NAVIGATION

ORDINARY APPLICATION

Published

Filed on 15 November 2024

Abstract

The invention relates to a method and system for autonomous robot navigation using reinforcement learning (RL) to enable robots to efficiently navigate dynamic and unstructured environments. By leveraging sensory data from cameras, LiDAR, and other sensors, the robot builds an environmental model and uses RL algorithms to determine optimal navigation strategies through trial-and-error interactions with the environment. The system employs techniques such as Q-learning, deep Q-networks, and actor-critic models to continuously refine the robot’s decision-making policy, balancing exploration and exploitation to adapt to changing conditions. This approach improves the robot’s ability to navigate complex environments autonomously, minimizing energy consumption, optimizing travel time, and ensuring safety.

Patent Information

Application ID	202441088589
Invention Field	ELECTRONICS
Date of Application	15/11/2024
Publication Number	47/2024

Inventors

Name	Address	Country	Nationality
Vennapusa Surendra Reddy	Associate Professor, Audisankara College of Engineering & Technology(AUTONOMOUS), NH-16, By-Pass Road, Gudur, Tirupati Dist., Andhra Pradesh, India-524101, India.	India	India
R. Janardhan	Final Year B.Tech Student, Audisankara College of Engineering & Technology(AUTONOMOUS), NH-16, By-Pass Road,Gudur, Tirupati Dist., Andhra Pradesh, India-524101, India.	India	India
R.V. Sai Kumar	Final Year B.Tech Student, Audisankara College of Engineering & Technology(AUTONOMOUS), NH-16, By-Pass Road,Gudur, Tirupati Dist., Andhra Pradesh, India-524101, India.	India	India
Rudra Adithya Sai	Final Year B.Tech Student, Audisankara College of Engineering & Technology(AUTONOMOUS), NH-16, By-Pass Road,Gudur, Tirupati Dist., Andhra Pradesh, India-524101, India.	India	India
Rupineni Pradeep	Final Year B.Tech Student, Audisankara College of Engineering & Technology(AUTONOMOUS), NH-16, By-Pass Road, Gudur, Tirupati Dist., Andhra Pradesh, India-524101, India.	India	India
S. Bhargav Reddy	Final Year B.Tech Student, Audisankara College of Engineering & Technology(AUTONOMOUS), NH-16, By-Pass Road, Gudur, Tirupati Dist., Andhra Pradesh, India-524101, India.	India	India
S. Sushma	Final Year B.Tech Student, Audisankara College of Engineering & Technology(AUTONOMOUS), NH-16, By-Pass Road, Gudur, Tirupati Dist., Andhra Pradesh, India-524101, India.	India	India
S. Madhuri	Final Year B.Tech Student, Audisankara College of Engineering & Technology(AUTONOMOUS), NH-16, By-Pass Road, Gudur, Tirupati Dist., AndhraPradesh, India-524101, India.	India	India
Shaik Akhib	Final Year B.Tech Student, Audisankara College of Engineering & Technology(AUTONOMOUS), NH-16, By-Pass Road, Gudur, Tirupati Dist., Andhra Pradesh, India-524101, India.	India	India
S. Darbar Ali	Final Year B.Tech Student, Audisankara College of Engineering & Technology(AUTONOMOUS), NH-16, By-Pass Road, Gudur, Tirupati Dist., Andhra Pradesh, India-524101, India.	India	India

Applicants

Name	Address	Country	Nationality
Audisankara College of Engineering & Technology	Audisankara College of Engineering & Technology, NH-16, By-Pass Road, Gudur, Tirupati Dist, Andhra Pradesh, India-524101, India.	India	India

Specification

Description:In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of embodiments of the present disclosure. It will be apparent, however, that embodiments of the present disclosure may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. An individual feature may not address all of the problems discussed above or might address only some of the problems discussed above. Some of the problems discussed above might not be fully addressed by any of the features described herein.

The ensuing description provides exemplary embodiments only and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the disclosure as set forth.

Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail to avoid obscuring the embodiments.

Also, it is noted that individual embodiments may be described as a process that is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function.

The word "exemplary" and/or "demonstrative" is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as "exemplary" and/or "demonstrative" is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms "includes," "has," "contains," and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term "comprising" as an open transition word without precluding any additional or other elements.

Reference throughout this specification to "one embodiment" or "an embodiment" or "an instance" or "one instance" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The present invention provides a system and method for autonomous robot navigation using reinforcement learning (RL) to enable the robot to navigate efficiently in dynamic, unstructured, and real-time environments. The system utilizes sensory data collected from various sensors, such as cameras, LiDAR, ultrasonic sensors, or radar, to build a model of the robot's environment, which includes information about obstacles, terrain, and target locations. Using this sensory data, the robot can map its surroundings and determine the most optimal path to achieve its goal.
The key innovation of the invention lies in the application of reinforcement learning techniques to guide the robot's decision-making process. In contrast to traditional approaches that rely on predefined rules or paths, reinforcement learning enables the robot to learn its navigation strategy by interacting with the environment. Each action taken by the robot in a given state results in feedback, in the form of a reward or penalty, based on the action's success or failure. Over time, through iterative trial and error, the robot refines its behavior to maximize cumulative rewards and minimize penalties, thus improving its navigation capabilities.

The RL framework employed by the system utilizes both exploration and exploitation strategies. During the exploration phase, the robot tries new actions that it has not previously taken in order to discover new, potentially more effective paths. In the exploitation phase, the robot uses its learned experiences to follow previously discovered paths or actions that have provided the highest reward in the past. A balance between exploration and exploitation is maintained using advanced algorithms to ensure that the robot does not become stuck in suboptimal behavior while also learning new, potentially better strategies.

The reinforcement learning model may include various techniques such as Q-learning, deep Q-networks (DQN), policy gradient methods, and actor-critic models. Q-learning assigns a value to each action taken in a given state, which is updated based on the reward received. Deep Q-networks extend Q-learning by using deep neural networks to approximate the Q-values for complex state spaces. Policy gradient methods directly optimize the robot's policy by adjusting the parameters of the action-selection model. Actor-critic models combine value-based and policy-based approaches to optimize both the decision-making and value estimation processes simultaneously.

The robot continuously updates its policy based on feedback from its environment. This allows the robot to adapt to changing conditions, such as the introduction of new obstacles or the need to reroute due to unforeseen circumstances. The learned policy is used to guide the robot's navigation decisions, ensuring efficient and safe movement while minimizing energy consumption and travel time. As the robot interacts with the environment, its model becomes more refined, and it becomes more capable of navigating new environments without requiring extensive retraining.

Additionally, the system allows for dynamic adjustments to the robot's learning algorithm based on the complexity of the environment or the robot's operational constraints. For example, the system may increase the exploration rate in new environments or scale down computations when resources are limited, ensuring that the robot's performance is optimal under varying conditions.

In the first embodiment, the invention is applied to a mobile robot used in an indoor warehouse environment. The robot is equipped with an array of sensors, including ultrasonic sensors, cameras, and LiDAR, which allow it to perceive its surroundings. The robot's task is to navigate the warehouse floor while avoiding obstacles such as shelves, packages, and other moving robots. The robot employs a reinforcement learning model to determine the optimal path between various locations within the warehouse, such as starting points and delivery zones.

The reinforcement learning framework utilizes Q-learning, where the robot assigns a value to each action (such as moving forward, turning left, or right) based on the current state (e.g., position and proximity to obstacles). When the robot takes an action, it receives feedback in the form of a reward or penalty. A positive reward is given when the robot reaches its destination or successfully avoids an obstacle, and a penalty is assigned when it collides with an obstacle or takes inefficient paths. Over time, through repeated interactions with the environment, the robot refines its decision-making process to choose the most optimal routes and avoid inefficient maneuvers, ensuring faster and more energy-efficient navigation.

In the second embodiment, the invention is applied to an autonomous drone operating in an outdoor environment. The drone is equipped with cameras, GPS, and LiDAR sensors to map its surroundings, which include varying terrain types, moving obstacles such as other drones, and unpredictable weather conditions. The drone's task is to fly from one point to another, while dynamically adjusting its path to account for obstacles and environmental changes.

The drone employs a deep reinforcement learning (DRL) approach, specifically utilizing deep Q-networks (DQN), to learn its navigation strategy. The DQN model is trained using a combination of past experiences and exploration of new actions. The drone learns to adjust its altitude, speed, and direction based on real-time sensory feedback, such as detecting an incoming storm or adjusting to avoid another flying object. The system allows the drone to autonomously adjust its flight plan, ensuring efficient and safe navigation while minimizing flight time and energy consumption. The learned navigation strategy is continuously updated as the drone encounters new environments, allowing it to adapt to various outdoor settings without the need for manual intervention.

While considerable emphasis has been placed herein on the preferred embodiments, it will be appreciated that many embodiments can be made and that many changes can be made in the preferred embodiments without departing from the principles of the invention. These and other changes in the preferred embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter to be implemented merely as illustrative of the invention and not as limitation. , Claims:1.A method for autonomous robot navigation using reinforcement learning, comprising:
Collecting real-time sensory data from the environment, including one or more of visual, auditory, LIDAR, or radar data;
Building an environmental model based on the sensory data to identify obstacles, paths, and goals;
Applying a reinforcement learning algorithm to determine an optimal action policy for navigating the environment, wherein the robot's actions are evaluated using a reward signal; and
Continuously refining the robot's navigation strategy by adjusting the policy based on the received feedback.

2.The method of claim 1, wherein the reinforcement learning algorithm is selected from the group consisting of Q-learning, policy gradient methods, and actor-critic methods.

3.The method of claim 1, wherein the robot employs both exploration and exploitation techniques to optimize its navigation strategy.

4.The method of claim 1, wherein the sensory data is processed in real time to generate a dynamic representation of the environment.

Documents

Name	Date
202441088589-COMPLETE SPECIFICATION [15-11-2024(online)].pdf	15/11/2024
202441088589-DECLARATION OF INVENTORSHIP (FORM 5) [15-11-2024(online)].pdf	15/11/2024
202441088589-DRAWINGS [15-11-2024(online)].pdf	15/11/2024
202441088589-FORM 1 [15-11-2024(online)].pdf	15/11/2024
202441088589-FORM-9 [15-11-2024(online)].pdf	15/11/2024
202441088589-REQUEST FOR EARLY PUBLICATION(FORM-9) [15-11-2024(online)].pdf	15/11/2024

Talk To Experts

Online Lawyer Consultation

Online CA Consultation

Company Secretary Services

Calculators

Business Setup Calculator

PPF Calculator

Income Tax Calculator

Simple Compound Interest Calculator

Salary Calculator

Retirement Planning Calculator

RD Calculator

Mutual Fund Calculator

FD Calculator

Home Loan EMI Calculator

EMI Calculator

Lumpsum Calculator

Downloads

Rental Agreement Format

GST Invoice Format

Income Certificate Format

Power of Attorney Format

Affidavit Format

Salary Slip Sample

Appointment Letter Format

Relieving Letter Format

Legal Heir Certificate Format

Generate Free Rent Receipt

Commercial Rental Agreement

Consent Letter for GST Registration Format

No Objection Certificate (NOC) Format

Partnership Deed Format

Experience Letter Format

Resignation Letter Format

Offer Letter Format

Bonafide Certificate Format

Delivery Challan Format

Authorised Signatory in GST

Uber9 Business Process Services Private Limited, CIN - U74900TN2014PTC098414, GSTIN - 33AABCU7650C1ZM, Registered Office Address - F-97, Newry Shreya Apartments Anna Nagar East, Chennai, Tamil Nadu 600102, India.

Please note that we are a facilitating platform enabling access to reliable professionals. We are not a law firm and do not provide legal services ourselves. The information on this website is for the purpose of knowledge only and should not be relied upon as legal advice or opinion.