Interactive Text-to-SQL Generation via Editable Step-by-Step Explanations
- Download the original Spider dataset
- Generate text to clause dataset using SQL2NL/SQL2NL.py
- instruction = 'trainingData'.
- You can design your own explanation template within the method "parseSQL()".
- python SQL2NL.py and you will get the dataset under "dataset/structured/spider/train_spider.json".
- You could also download our raw text-to-clause dataset. Please put it in the same directory as the original Spider dataset and include all the databases (For more information, please refer to https://github.com/taoyds/spider)
- Paraphrase the text-to-clause dataset (optional)
- You could paraphrase the dataset by Quillbot with our automated script based on PyAutoGUI.
- Please check the script and all screenshots under here. These screenshots are used to position the cursor during the automation. Due to subtle resolution/theme/version differences, the screenshots may not be identified on your computer (even if a human can), you may need to take your screenshots on your computer and replace them manually.
Distribution of dataset used for this project:
- Our text-to-clause model is based on SmBoP, and you can strictly follow their environment and settings.
- You can directly download and reuse our check point (HuggingFace) as well as configuration file.
- Please replace the original configuration file with ours!
SQL2NL_clean.py includes a simple example to run:
from SQL2NL_clean import sql2nl
SQL = "SELECT * FROM STUDENT" # input your SQL here
explanation_data = sql2nl(SQL) # Generate explanation data
The sql2nl
method automatically outputs the step-by-step explanation to console:
![Screenshot 2024-06-10 at 11 03 39 AM](https://private-user-images.githubusercontent.com/75125334/338295318-282079ad-1f48-474c-bec1-f9e991a68be2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxNDQ1NDIsIm5iZiI6MTczOTE0NDI0MiwicGF0aCI6Ii83NTEyNTMzNC8zMzgyOTUzMTgtMjgyMDc5YWQtMWY0OC00NzRjLWJlYzEtZjllOTkxYTY4YmUyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDIzMzcyMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWRjYTE1ZWZmMTdkZTkxODJjY2FhNjQxNWQ5Y2YyYzVkOWUxNzQ0N2FlNjQ0ODE4YTY2NjVhMDIyMzdiODZjYTUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.ivBdQEDkbcY51sZhv924ddnKOJHZeltIgsYRb7SjMKQ)
The explanation_data
includes the JSON format data:
[
{'number': 'Start first query,', 'subquery': 'SELECT first_name , last_name FROM players GROUP BY birth_date HAVING COUNT ( * ) > 1 ORDER BY birth_date LIMIT 1', 'explanation': [
{'subexpression': 'FROM players', 'explanation': 'In table players'
},
{'subexpression': 'GROUP BY birth_date', 'explanation': 'Group the records based on the birth date'
},
{'subexpression': 'HAVING COUNT ( * ) > 1', 'explanation': 'Keep the groups where the number of records is greater than 1'
},
{'subexpression': 'ORDER BY birth_date LIMIT 1', 'explanation': 'Sort the records in ascending order based on the birth date, and return the first record'
},
{'subexpression': 'SELECT first_name , last_name', 'explanation': 'Return the first name and the last name'
}
], 'supplement': ''
},
{'number': 'Start second query,', 'subquery': 'SELECT first_name , last_name FROM players WHERE first_name = "TOM" GROUP BY birth_date HAVING COUNT ( * ) > 1 ORDER BY birth_date LIMIT 1', 'explanation': [
{'subexpression': 'FROM players', 'explanation': 'In table players'
},
{'subexpression': 'WHERE first_name = "TOM"', 'explanation': 'Keep the records where the first name is "TOM"'
},
{'subexpression': 'GROUP BY birth_date', 'explanation': 'Group the records based on the birth date'
},
{'subexpression': 'HAVING COUNT ( * ) > 1', 'explanation': 'Keep the groups where the number of records is greater than 1'
},
{'subexpression': 'ORDER BY birth_date LIMIT 1', 'explanation': 'Sort the records in ascending order based on the birth date, and return the first record'
},
{'subexpression': 'SELECT first_name , last_name', 'explanation': 'Return the first name and the last name'
}
], 'supplement': ''
},
{'number': 'Start third query,', 'subquery': 'SELECT first_name , last_name FROM players GROUP BY birth_date HAVING COUNT ( * ) > 1 ORDER BY birth_date LIMIT 1 INTERSECT SELECT first_name , last_name FROM players WHERE first_name = "TOM" GROUP BY birth_date HAVING COUNT ( * ) > 1 ORDER BY birth_date LIMIT 1', 'explanation': [
{'subexpression': 'SELECT *', 'explanation': 'Keep the intersection of first query result and second query result.'
}
], 'supplement': ''
}
]
To help you understand the project logic, we also encapsulate most of the project folder (This does not include the interface). You can directly download it to check the configuration. Folder
This repository is currently being updated, and more details will be provided in the future. If you have any questions, please feel free to email [email protected]
Thanks!