Skip to content

Interactive SQL generation via editable step-by-step explanation

Notifications You must be signed in to change notification settings

magic-YuanTian/STEPS

Repository files navigation

STEPS

image

Paper

Interactive Text-to-SQL Generation via Editable Step-by-Step Explanations

Dataset

  1. Download the original Spider dataset
  2. Generate text to clause dataset using SQL2NL/SQL2NL.py
  • instruction = 'trainingData'.
  • You can design your own explanation template within the method "parseSQL()".
  • python SQL2NL.py and you will get the dataset under "dataset/structured/spider/train_spider.json".
  • You could also download our raw text-to-clause dataset. Please put it in the same directory as the original Spider dataset and include all the databases (For more information, please refer to https://github.com/taoyds/spider)
  1. Paraphrase the text-to-clause dataset (optional)
  • You could paraphrase the dataset by Quillbot with our automated script based on PyAutoGUI.
  • Please check the script and all screenshots under here. These screenshots are used to position the cursor during the automation. Due to subtle resolution/theme/version differences, the screenshots may not be identified on your computer (even if a human can), you may need to take your screenshots on your computer and replace them manually.

Distribution of dataset used for this project: clause_distribution

Models

  1. Our text-to-clause model is based on SmBoP, and you can strictly follow their environment and settings.
  2. You can directly download and reuse our check point (HuggingFace) as well as configuration file.
  • Please replace the original configuration file with ours!

Rule-based NL Explanation Generation

SQL2NL_clean.py includes a simple example to run:

from SQL2NL_clean import sql2nl
SQL = "SELECT * FROM STUDENT"  # input your SQL here
explanation_data = sql2nl(SQL)  # Generate explanation data

The sql2nl method automatically outputs the step-by-step explanation to console:

Screenshot 2024-06-10 at 11 03 39 AM

The explanation_data includes the JSON format data:

[
    {'number': 'Start first query,', 'subquery': 'SELECT first_name , last_name FROM players GROUP BY birth_date HAVING COUNT ( * ) > 1 ORDER BY birth_date LIMIT 1', 'explanation': [
            {'subexpression': 'FROM players', 'explanation': 'In table players'
            },
            {'subexpression': 'GROUP BY birth_date', 'explanation': 'Group the records based on the birth date'
            },
            {'subexpression': 'HAVING COUNT ( * ) > 1', 'explanation': 'Keep the groups where the number of records is greater than 1'
            },
            {'subexpression': 'ORDER BY birth_date LIMIT 1', 'explanation': 'Sort the records in ascending order based on the birth date, and return the first record'
            },
            {'subexpression': 'SELECT first_name , last_name', 'explanation': 'Return the first name and the last name'
            }
        ], 'supplement': ''
    },
    {'number': 'Start second query,', 'subquery': 'SELECT first_name , last_name FROM players WHERE first_name = "TOM" GROUP BY birth_date HAVING COUNT ( * ) > 1 ORDER BY birth_date LIMIT 1', 'explanation': [
            {'subexpression': 'FROM players', 'explanation': 'In table players'
            },
            {'subexpression': 'WHERE first_name = "TOM"', 'explanation': 'Keep the records where the first name is "TOM"'
            },
            {'subexpression': 'GROUP BY birth_date', 'explanation': 'Group the records based on the birth date'
            },
            {'subexpression': 'HAVING COUNT ( * ) > 1', 'explanation': 'Keep the groups where the number of records is greater than 1'
            },
            {'subexpression': 'ORDER BY birth_date LIMIT 1', 'explanation': 'Sort the records in ascending order based on the birth date, and return the first record'
            },
            {'subexpression': 'SELECT first_name , last_name', 'explanation': 'Return the first name and the last name'
            }
        ], 'supplement': ''
    },
    {'number': 'Start third query,', 'subquery': 'SELECT first_name , last_name FROM players GROUP BY birth_date HAVING COUNT ( * ) > 1 ORDER BY birth_date LIMIT 1 INTERSECT SELECT first_name , last_name FROM players WHERE first_name = "TOM" GROUP BY birth_date HAVING COUNT ( * ) > 1 ORDER BY birth_date LIMIT 1', 'explanation': [
            {'subexpression': 'SELECT *', 'explanation': 'Keep the intersection of first query result and second query result.'
            }
        ], 'supplement': ''
    }
]

To help you understand the project logic, we also encapsulate most of the project folder (This does not include the interface). You can directly download it to check the configuration. Folder

This repository is currently being updated, and more details will be provided in the future. If you have any questions, please feel free to email [email protected]

Thanks!

About

Interactive SQL generation via editable step-by-step explanation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages