In this assignment you will:
-
demonstrate data modelling skills with a combination of SQL and python
- write SQL queries to inspect the model of a MySQL database
- write python code to connect to a MySQL database and handle the output of queries, including exception handling
A well implemented relational database should enforce all the constraints that were identified (even implicitly) during the design phase, such as the functional dependencies, the check constraints, and the referential integrity. In this assignment, we will write generic testing software that confirms whether an implementation of a database design is correct. You will write a series of tester functions in python that connect to a MySQL database and probe it to check whether a given constraint is being enforced correctly.
As an example, imagine that you have two relations: Employee(employee_id, employee_name, dept_id) and Department(dept_id, dept_name). Naturally, we might expect in this case that a SQL query like the following should only return one row per employee:
SELECT *
FROM `Employee`
NATURAL JOIN `Department`;
If this is not the case, we can be confident that Department.dept_id has not been set up as a primary key; if it were, it should not be possible for more than one Department tuple to match each Employee on dept_id. If you designed a function that could take in two relations likes this and the foreign key relationship between them, you could connect to a database and check that this behaviour is as expected.
Unfortunately, this example above only shows that a primary key does not exist; it is insufficient to conclude that the primary key has been created. You would need a better test design to be confident that you will score full marks on this assignment. As a strong hint, you may find that queries that modify the contents of tables combined with exception handling in your python code will be a more robust approach, though it certainly isn't the only one.
Specifically, you are given a python library, DataModelChecker.py
, to implement. The API has already been exposed but none of the functions contain bodies yet. You should implement these six functions so that they can be used to test the correctness of any MySQL implementation of a relational database design.
It is important to remember that this is an exercise in software testing and creative thinking. You will probably want to create your own sandbox MySQL database and try out different table definitions to improve test coverage.
You should submit DataModelChecker.py
without renaming it, after implementing the six functions that contain TODO comments in their bodies. You are welcome to add other classes and auxiliary functions to support your solution (and improve code quality), but these must all appear in the same DataModelChecker.py
file that you submit.
Your functions will be evaluated with a series of unit tests and your score on the assignment will be the number of unit tests that you pass. Each test will run your function with a boundary case input against a localhost MySQL database.
For example, a test of confirmSuperkey()
might pass as input a table for which no primary key was ever defined in the database and ask if a particular set of attributes is an enforced superkey or not, i.e., allows insertions that violate the supposed key constraint. If the function returns false, the test passes; otherwise it fails. Approximately twenty such test cases will be defined and tested.
Note that we will modify our testing dataset to avoid hard-coded solutions, such as by changing constraints between the pre-test and test database. You should not make assumptions that a hard-coded solution that passes pre-testing will also pass in the final evaluation. In particular, the actual result (true or false) for each test case will be quasi-randomly chosen in the final test database by adding or removing particular constraints. Moreover, if you create temporary files, you are responsible for monitoring their existence: there is no guarantee that we will run test cases in a deterministic order.
A subset of the test cases will be provided; however, these are only structurally correct. The underlying database and therefore also the correct answer will change with 50% probability between what is provided and the final grading. You will need to also identify some important boundary cases to test that have not been released in advance. The pre-released cases are available in tests.py
, some of which can be tested by yourself using build_test_database.sql
; the SQL script, in particular, shows the permissions that the test user will have, which you can use to gauge which approaches to the assignment are permitted. Marking will take place by running:
mysql -uroot < build_test_database.sql
python3 tests.py
You can safely make the following assumptions:
- All tables and attributes exist
- All attributes will be of INT type
- All primary keys will contain one AUTO_INCREMENT attribute, namely the first one
- Attributes that are not part of the function argument list be NULLABLE and will have default values
- All tables will be empty before your code is marked, though they will not be cleared between unit tests
- Each test case will use an independent set of tables that are not related to those in other test cases
- Tests will be run using
mysql Ver 8.0.33 for macos10.15 on x86_64 (Homebrew)
andPython 3.11.3
on Mac OS Catalina 10.15.7.
Pre-marking will occur at an arbitrary point in the morning on the following dates. You should submit prior to midnight the night before to be certain to receive a pre-grading update:
- Thursday, 27 July 2023
- Friday, 28 July 2023
- Saturday, 29 July 2023
- Sunday, 30 July 2023
- Monday, 31 July 2023
- Tuesday, 1 August 2023
The instructor reserves the right to award a score of 0 to functions that have not been implemented at all (e.g., simply return True or return False with no dependence on the input parameters).
For this assignment, we will use random data to test your code, as exemplified in build_test_database.sql
. It will be engineered to test particular boundary cases in the lessons. Functional dependencies may cross multiple tables, though these will share common attributes that facilitate NATURAL JOIN.
This assignment is equivalent to an in-person, solo, written exam. As such, it must be completed independently, even the development of general ideas or pseudocode. Submissions will be subjected to plagiarism detection software and evidence of collaboration will be reported as an Academic Integrity infringement. You are welcome to prepare for the assignment with peers in the class by working through the in-class problems together or studying the practice quizzes, which are designed to prepare you well for this assignment.
You are permitted to use sources that you find on the Internet, so long as the source is clearly dated with a last edit prior to 1-January-2023 and you provide a citation in your source code. For example, GitHub and StackOverflow content is permitted, so long as it is clearly dated prior to this year. The use of Generative AI tools must also be cited. (Clearly, it has a last edit of whenever you used it). If you do not include a citation as a comment in your source code, your work will be considered plagiarism.
Submissions will be accepted until the end date of the assignment listed in Brightspace, which provides a three day buffer to address most challenges that are likely to arise. Note that support for the assignment will not be available after the deadline, however. Submissions will not be accepted after the end date; if you have not submitted code by then, whether by choice or circumstances, the weight for this assignment will be shifted to the corresponding midterm exam.
I hope that this assignment is a fun way to learn and/or practice data modelling, the SQL query language, and connecting to MySQL databases from a high-level, application-layer code base. Good luck!
Thu 27 July 2023 12.28: Expand specs, especially permitted assumptions; modify pre-marking dates in accordance with deadline extension; add sample test cases and database build script.