image
image
user-login
Patent search/

Automated Translation of Natural Language to Structured Query Language and the Development of IoT Applications of Relational Databases Utilizing Llama 2

search

Patent Search in India

  • tick

    Extensive patent search conducted by a registered patent agent

  • tick

    Patent search done by experts in under 48hrs

₹999

₹399

Talk to expert

Automated Translation of Natural Language to Structured Query Language and the Development of IoT Applications of Relational Databases Utilizing Llama 2

ORDINARY APPLICATION

Published

date

Filed on 5 November 2024

Abstract

Relational databases store a significant amount of the world’s data. They must be intelligent enough to make the accessibility faster. However, accessing this data currently requires users to understand a query language such as SQL and not every user familiar with the Structured Query Language (SQL) queries as they may not be awareof the structure of the database and they thus are required to learn SQL. So, non-expert users need a system to interact with relational databases in their natural language such as English. For this, Database Management System (DBMS) must have an ability to understand Natural Language(NL). An intelligent interface is developed using Llama 2 which is a large language model (LLM) which translates natural language query to SQL. The system begins by processing IoT device data into CSV files for model training. This involves loading data, fine-tuning the model for natural language interpretation, and evaluating its accuracy. After deployment, the system enables users to query relational databases in plain language, simplifying data retrieval and improving accessibility without requiring SQL knowledge.

Patent Information

Application ID202441084598
Invention FieldCOMPUTER SCIENCE
Date of Application05/11/2024
Publication Number46/2024

Inventors

NameAddressCountryNationality
K.VineelaBVRIT HYDERABAD College of Engineering for Women, Rajiv Gandhi Nagar, Bachupally, Hyderabad-500090, Telangana.IndiaIndia
D.NagasudhaB V Raju Institute of Technology, Narsapur-502313, Telangana, India.IndiaIndia

Applicants

NameAddressCountryNationality
BVRIT HYDERABAD College of Engineering for WomenBVRIT HYDERABAD College of Engineering for Women, Rajiv Gandhi Nagar, Bachupally, Hyderabad-500090, Telangana.IndiaIndia
B V Raju Institute of TechnologyB V Raju Institute of Technology, Narsapur-502313, Telangana, India.IndiaIndia
K.VineelaBVRIT HYDERABAD College of Engineering for Women, Rajiv Gandhi Nagar, Bachupally, Hyderabad-500090, Telangana.IndiaIndia
D.NagasudhaB V Raju Institute of Technology, Narsapur-502313, Telangana, India.IndiaIndia

Specification

Description:Background of the Invention

Relational databases are some of the most popular means of managing vast repositories of information but it is often necessary to have knowledge of Structured Query Language (SQL) when one wants to learn the data stored in these databases. The second issue that emerges for users with no programming background in Computer Science or knowledge of SQL is that they have difficulty in communicating with the database since they are unaware of the structure of the database or the language used in forming queries. This poses a problem for the non-technical users that require easy way to sort, access or even retrieve large chunks of data.

To counter this limitation, there is an increasing need for natural language interfaces to databases a discipline that makes it possible for a user to access a database without learning a SQL. The idea is to have an intelligent front-end which accepts natural language questions (for instance in English) and still returns the information from the relational databases.

To deal with this, a large language model (LLM), namely Llama 2, is employed to map natural language queries to SQL queries.The process begins with collecting and processing data from Internet of Things (IoT) devices, which is then used to train the Llama 2 optimization through scripts and program evaluation and deployment. This invention proposes to incorporate the Llama 2 into presently used databases in an effort to improve the process of querying databases by making them more intuitive, easy to use, and more importantly more accessible to lay users when interacting with relational databases.

Summary of the Invention

The invention outlines a complex front-end system which is aimed to allow the operatives to communicate with the relational databases using plain language instead of SQL. This system uses Llama 2 which is a large language model (LLM) that has been so optimized through methods such as PEFT (LoRA) to perform the conversion of natural language to SQL query.First there is the collection involves processing IoT device data into CSV files for model training, which requires data processing with Python and the Pandas library. The data is then persisted in local databases: SQLite or DuckDB or similar databases. LlamaIndex can be used to manage the SQL query execution in the most proper way. The termination of the user question in natural language turns it into a set of SQL queries enabling user to get the necessary data without performing SQL operations. This approach does not only improve the functionality of database systems, but also increases the number of people who able to navigate them, or interact with data in general, making data usage more natural and easy to use.









Brief description of Drawing

Fig .1 Schematic representation of multiple CSV files are collected from IOT Devices and processed with Python and Pandas, stored in a SQL database, and queried via Llama Index with Llamas 2 fine-tuning, enabling natural language to SQL translation for user queries.

Detailed Description of the Invention
The invention comprises a complex system which would allow the common user to operate on the relational databases using natural language, and no knowledge of SQL is required. The key components and processes involved in the system are as follows:
The key components and processes involved in the system are as follows:
Dataset Collection:
The process begins with the collection of data from Internet of Things (IoT) devices. This data is formatted and stored in CSV (Comma-Separated Values) files.
Fine-Tuning Process:
Fine-tuning means adapting the Llama 2 model to a given dataset to enhance the model's ability to produce SQL queries. This is also the process of fine tuning of the models parameters to increase its effectiveness in translating natural language inputs into SQL code.
Training Data Loading:
Modal is used to load and format the selected dataset for training as shown below. This step unprejudiced the data to go through a proper preprocessing that will be useful in the fine-tuning process of the Llama 2.
Model Evaluation:
Moreover, the fine-tuned Llama 2 model's efficacy is determined by comparing its SQL query outputs to those of a standard Llama 2 model. Finally, this evaluation validates the enhancements realized in the fine-tuning process and checks the correctness of the model in producing SQL statements.
Integration with LlamaIndex:
The fine-tuned model is then applied in the LlamaIndex platform for text-2-SQL inference. This integration lets the user to apply the enhanced model on different databases, which in turn helps him/her to retrieve data through natural language queries with optimum efficiency as well as accuracy.

Inference Process:
The system also tests the fine-tuned model on a test database by converting natural language queries to relevant SQL queries through SQL queries execution.
For example:
Input Text: For this query, the candidate has to ask: 'How many CFL Teams are from York College?'
SQL Query: The following query is used to obtain the number of players in CFL drafted from York University SELECT COUNT(CFL_Team) FROM CFL_Draft WHERE College = 'York';
Integration with DuckDB and SQLite:Integration with DuckDB and SQLite:
Database Creation:
In this invention, the idea of DuckDB combined with SQLite is explained through the use of DuckDB's SQLite extension. This extension allows writing queries directly on the SQLite databases that are present in DuckDB thereby providing more control on the data.
Installation and Loading:
SQLite extension is used to link DuckDB to SQLite via a facility known as ATTACH which grants direct database access to SQLite. It is also possible to perform query and data manipulation operations on the tables from SQLite databases as to the DuckDB tables.
SQL Compatibility:
DuckDB supports a very wide array of SQL queries and this means that users can manage all their SQLite data using standard SQL queries. This compatibility makes the analysis and querying of data easy due to the ease in sharing of data between the two systems.
Data Loading and Transformation:
The system the ability to transfer data from SQLite's tables into DuckDB and the other way as well. This capability involves integration of the two database systems to enable support of analysis and transformation tasks that relación.
Data Type Considerations:
In SQLite, there is no strong typing and you can have multiple data types in a field while referring to the same field in DuckDB, it's strongly typed. This is done by employing conversion rules that seek to map SQLite data types onto the expected typing system of DuckDB.

Direct SQLite Database Interaction:
With DuckDB it is possible to directly query and work with SQLite databases thereby avoiding the need of transferring the data into the system by a first intermediate file format.
Writing to SQLite:
Every user can create SQLite new database, new table in the database and insert recorded data into the created table with the help of standard SQL queries. The use of this functionality is that we could export the analysis results from DuckDB back to SQLite, which enriches the data management functionalities.
, Claims:Independent Claims:
A system for querying relational databases using natural language, comprising:
Data Collection Module: The data generated by IoT devices and store the data in CSV files format.
Training Module: Uses CSV files to train a Large Language Model which comprises of :
1. Loading CSV files.
2. Making modifications on the LLM that is used to transform natural language queries to an SQL equivalent.
3. Input validation to determine the accuracy of the translation done by the LLM.
Natural Language Interface: Enables people to type in questions in plain text.
Translation Engine: Uses LLM trained from natural language to translate it into SQL queries.
Query Execution Module: Performs SQL against relational DBMS and receives one or more answers.
Result Presentation Module: Converts the results in such a manner that the users are able to understand them.
Further, the system of claim 1, where in the large language model is specifically Llama 2.
The system of claim 1 further where the result presentation module configures the results into formats that are easy for a user to understand.
Dependent Claims:
The system of claim 1 which has the capability to receive feedback from the user to correct errors that were made in the translation of the query into the language and the feedback is used to tune the LLM further.
The claimed system where the training module comprises a process of the model steady improvement involving the new data from the IoT devices.
The system of claim 1, the natural language understanding in the natural language interface is extended to more than just English.

Documents

NameDate
202441084598-COMPLETE SPECIFICATION [05-11-2024(online)].pdf05/11/2024
202441084598-DECLARATION OF INVENTORSHIP (FORM 5) [05-11-2024(online)].pdf05/11/2024
202441084598-DRAWINGS [05-11-2024(online)].pdf05/11/2024
202441084598-FIGURE OF ABSTRACT [05-11-2024(online)].pdf05/11/2024
202441084598-FORM 1 [05-11-2024(online)].pdf05/11/2024
202441084598-REQUEST FOR EARLY PUBLICATION(FORM-9) [05-11-2024(online)].pdf05/11/2024

footer-service

By continuing past this page, you agree to our Terms of Service,Cookie PolicyPrivacy Policy  and  Refund Policy  © - Uber9 Business Process Services Private Limited. All rights reserved.

Uber9 Business Process Services Private Limited, CIN - U74900TN2014PTC098414, GSTIN - 33AABCU7650C1ZM, Registered Office Address - F-97, Newry Shreya Apartments Anna Nagar East, Chennai, Tamil Nadu 600102, India.

Please note that we are a facilitating platform enabling access to reliable professionals. We are not a law firm and do not provide legal services ourselves. The information on this website is for the purpose of knowledge only and should not be relied upon as legal advice or opinion.