Natural Language Interface to DataBases
This is a project managed using maven. Just in case, if you don't know about maven, checkout this wonderful tutorial, which you have to pay for though...
Right now it uses the dblp database on local machine. To connect to the database, make sure you have database "dblp" on your localhost with post 5432, accessible to user "dblpuser" with password "dblpuser". Or modify the startConnection()
method in class app.Controller
to connect to database.
To get hands on the development, import it into eclipse, but first make sure you've installed the following eclipse plugins:
- m2eclipse (for using maven in eclipse)
- e(fx)clipse (for using javafx smoothly in eclipse)
To use WordNet inside the project (I'm using MIT JWI as the interface, which is already included in maven pom.xml
):
- Create a folder "lib" in the project base directory.
- Download WordNet into that "lib" directory just created.
- Extract the downloaded WordNet.
- Finally just make sure "$(basedir)/lib/WordNet-3.0/dict/" exists. (Or you have to modify the path inside class
model.WordNet
.)
The entry point of the application is the main()
method in ui.UserView
class.
NOTE Whatever functionality you are writing, be sure to provide the test cases.
- [done]
Download the Microsoft Academic Search Database and try connecting to it.I(Keping) just couldn't find how to get that database downloaded and used in SQL, so I decided to first just use our dblp database in hw1. - [done] Use Stanford NLP to parse a natural language sentence.
- [done] According the data structure in Stanford NLP, design the data structure for class ParseTree. For now let's just make it feasible, without thinking about memory and time efficiency.
- [done] A basic implementation of SchemaGraph.
- [done] ParseTreeNodeMapper
- [in-progress]ParseTreeStructureAdjuster:
- [tests needed] Remove meaningless nodes.
- Merge logic nodes and quantifier nodes with their parents.
- Reorder the nodes.
- Insert implicit nodes.
- [in-progress] QueryTreeTranslator:
- [done] A basic translator for a single table. "SELECT ... FROM ... WHERE ...;"
- Add "AND", "OR" logic for "WHERE".
- Add function without groupby for "SELECT".
- Add group by.
- Add join for multiple tables.
- Add quantifier: all, each, any.
- ...
- ...
- UI design is conducted in parallel with the requirements of the above tasks.
Here is the grammar rules of syntactically valid parse trees:
- Q -> (SClause)(ComplexCondition)*
- SClause -> SELECT + GNP
- ComplexCondition -> ON + (leftSubtree*rightSubtree)
- leftSubtree -> GNP
- rightSubtree -> GNP | VN | MIN | MAX
- GNP -> (FN + GNP) | NP
- NP -> NN + (NN)*(condition)*
- condition -> VN | (ON + VN)
Note:
All terminal nodes are defined in the paper.
+ represents a parent-child relationship.
* represents a sibling relationship.
One Query (Q) can must have one SClause and zero or more ComplexConditions.
A ComplexCondition must have one ON, with a leftSubtree and a rightSubtree.
An NP is: one NN (since an SQL query has to select at least one attribute), whose children
are multiple NNs and Conditions. (All other selected attributes and conditions are stacked
here to form a wide "NP" tree.)