-
Notifications
You must be signed in to change notification settings - Fork 25
/
params.json
1 lines (1 loc) · 180 KB
/
params.json
1
{"name":"Postgresql-exercises","tagline":"A series of questions and answers on a single dataset ","body":"# PostgreSQL Exercises\r\n\r\nThis is a compilation of all the questions and answers on [Alisdair Owen's](https://github.com/AlisdairO) [PostgreSQL Exercises](https://pgexercises.com). Don't forget that actually solving these problems will make you go further than just skimming through this guide, so make sure to pay [PostgreSQL Exercises](https://pgexercises.com) a visit.\r\n\r\n\r\n## Table of Contents\r\n\r\n- [Getting Started](#getting-started)\r\n - [I want to use my own Postgres system](#i-want-to-use-my-own-postgres-system)\r\n - [Schema](#schema)\r\n- [Simple SQL Queries](#simple-sql-queries)\r\n - [Retrieve everything from a table](#retrieve-everything-from-a-table)\r\n - [Retrieve specific columns from a table](#retrieve-specific-columns-from-a-table)\r\n - [Control which rows are retrieved](#control-which-rows-are-retrieved)\r\n - [Control which rows are retrieved, Part 2](#control-which-rows-are-retrieved-part-2)\r\n - [Basic string searches](#basic-string-searches)\r\n - [Matching against multiple possible values](#matching-against-multiple-possible-values)\r\n - [Classify results into bucket](#classify-results-into-bucket)\r\n - [Working with dates](#working-with-dates)\r\n - [Removing duplicates, and ordering results](#removing-duplicates-and-ordering-results)\r\n - [Combining results from multiple queries](#combining-results-from-multiple-queries)\r\n - [Simple aggregation](#simple-aggregation)\r\n - [More aggregation](#more-aggregation)\r\n- [Joins and Subqueries](#joins-and-subqueries)\r\n - [Retrieve the start times of members' bookings](#retrieve-the-start-times-of-members-bookings)\r\n - [Work out the start times of bookings for tennis courts](#work-out-the-start-times-of-bookings-for-tennis-courts)\r\n - [Produce a list of all members who have recommended another member](#produce-a-list-of-all-members-who-have-recommended-another-member)\r\n - [Produce a list of all members, along with their recommender](#produce-a-list-of-all-members-along-with-their-recommender)\r\n - [Produce a list of all members who have used a tennis court](#produce-a-list-of-all-members-who-have-used-a-tennis-court)\r\n - [Produce a list of costly bookings](#produce-a-list-of-costly-bookings)\r\n - [Produce a list of all members, along with their recommender, using no joins](#produce-a-list-of-all-members-along-with-their-recommender-using-no-joins)\r\n - [Produce a list of costly bookings, using a subquery](#produce-a-list-of-costly-bookings-using-a-subquery)\r\n- [Modifying Data](#modifying-data)\r\n - [Insert some data into a table](#insert-some-data-into-a-table)\r\n - [Insert multiple rows of data into a table](#insert-multiple-rows-of-data-into-a-table)\r\n - [Insert calculated data into a table](#insert-calculated-data-into-a-table)\r\n - [Update some existing data](#update-some-existing-data)\r\n - [Update multiple rows and columns at the same time](#update-multiple-rows-and-columns-at-the-same-time)\r\n - [Update a row based on the contents of another row](#update-a-row-based-on-the-contents-of-another-row)\r\n - [Delete all bookings](#delete-all-bookings)\r\n - [Delete a member from the c-members table](#delete-a-member-from-the-cdmembers-table)\r\n - [Delete based on a subquery](#delete-based-on-a-subquery)\r\n- [Aggregation](#aggregation)\r\n - [Count the number of facilities](#count-the-number-of-facilities)\r\n - [Count the number of expensive facilities](#count-the-number-of-expensive-facilities)\r\n - [Count the number of recommendations each member makes](#count-the-number-of-recommendations-each-member-makes)\r\n - [List the total slots booked per facility](#list-the-total-slots-booked-per-facility)\r\n - [List the total slots booked per facility in a given month](#list-the-total-slots-booked-per-facility-in-a-given-month)\r\n - [List the total slots booked per facility per month](#list-the-total-slots-booked-per-facility-per-month)\r\n - [Find the count of members who have made at least one booking](#find-the-count-of-members-who-have-made-at-least-one-booking)\r\n - [List facilities with more than 1000 slots booked](#list-facilities-with-more-than-1000-slots-booked)\r\n - [Find the total revenue of each facility](#find-the-total-revenue-of-each-facility)\r\n - [Find facilities with a total revenue less than 1000](#find-facilities-with-a-total-revenue-less-than-1000)\r\n - [Output the facility id that has the highest number of slots booked](#output-the-facility-id-that-has-the-highest-number-of-slots-booked)\r\n - [List the total slots booked per facility per month, Part 2](#list-the-total-slots-booked-per-facility-per-month-part-2)\r\n - [List the total hours booked per named facility](#list-the-total-hours-booked-per-named-facility)\r\n - [List each member's first booking after September 1st 2012](#list-each-members-first-booking-after-september-1st-2012)\r\n - [Produce a list of member names, with each row containing the total member count](#produce-a-list-of-member-names-with-each-row-containing-the-total-member-count)\r\n - [Produce a numbered list of members](#produce-a-numbered-list-of-members)\r\n - [Output the facility id that has the highest number of slots booked, again](#output-the-facility-id-that-has-the-highest-number-of-slots-booked-again)\r\n - [Rank members by (rounded) hours used](#rank-members-by-rounded-hours-used)\r\n - [Find the top three revenue generating facilities](#find-the-top-three-revenue-generating-facilities)\r\n - [Classify facilities by value](#classify-facilities-by-value)\r\n - [Calculate the payback time for each facility](#calculate-the-payback-time-for-each-facility)\r\n - [Calculate a rolling average of total revenue](#calculate-a-rolling-average-of-total-revenue)\r\n- [Working with Timestamps](#working-with-timestamps)\r\n - [Produce a timestamp for 1 a.m. on the 31st of August 2012](#produce-a-timestamp-for-1-am-on-the-31st-of-august-2012)\r\n - [Subtract timestamps from each other](#subtract-timestamps-from-each-other)\r\n - [Generate a list of all the dates in October 2012](#generate-a-list-of-all-the-dates-in-october-2012)\r\n - [Get the day of the month from a timestamp](#get-the-day-of-the-month-from-a-timestamp)\r\n - [Work out the number of seconds between timestamps](#work-out-the-number-of-seconds-between-timestamps)\r\n - [Work out the number of days in each month of 2012](#work-out-the-number-of-days-in-each-month-of-2012)\r\n - [Work out the number of days remaining in the month](#work-out-the-number-of-days-remaining-in-the-month)\r\n - [Work out the end time of bookings](#work-out-the-end-time-of-bookings)\r\n - [Return a count of bookings for each month](#return-a-count-of-bookings-for-each-month)\r\n - [Work out the utilisation percentage for each facility by month](#work-out-the-utilisation-percentage-for-each-facility-by-month)\r\n- [String Operations](#string-operations)\r\n - [Format the names of members](#format-the-names-of-members)\r\n - [Find facilities by a name prefix](#find-facilities-by-a-name-prefix)\r\n - [Perform a case-insensitive search](#perform-a-case-insensitive-search)\r\n - [Find telephone numbers with parentheses](#find-telephone-numbers-with-parentheses)\r\n - [Pad zip codes with leading zeroes](#pad-zip-codes-with-leading-zeroes)\r\n - [Count the number of members whose surname starts with each letter of the alphabet](#count-the-number-of-members-whose-surname-starts-with-each-letter-of-the-alphabet)\r\n - [Clean up telephone numbers](#clean-up-telephone-numbers)\r\n- [Recursive Queries](#recursive-queries)\r\n - [Find the upward recommendation chain for member ID 27](#find-the-upward-recommendation-chain-for-member-id-27)\r\n - [Find the downward recommendation chain for member ID 1](#find-the-downward-recommendation-chain-for-member-id-1)\r\n - [Produce a CTE that can return the upward recommendation chain for any member](#produce-a-cte-that-can-return-the-upward-recommendation-chain-for-any-member)\r\n\r\n***\r\n\r\n## Getting Started\r\n\r\nIt's pretty simple to get going with the exercises: all you have to do is [open the exercises](https://pgexercises.com/questions/basic/), take a look at the questions, and try to answer them!\r\n\r\nThe dataset for these exercises is for a newly created country club, with a set of members, facilities such as tennis courts, and booking history for those facilities. Amongst other things, the club wants to understand how they can use their information to analyse facility usage/demand. **Please note:** this dataset is designed purely for supporting an interesting array of exercises, and the database schema is flawed in several aspects - please don't take it as an example of good design. We'll start off with a look at the Members table:\r\n\r\n```sql\r\nCREATE TABLE cd.members\r\n(\r\n memid integer NOT NULL, \r\n surname character varying(200) NOT NULL, \r\n firstname character varying(200) NOT NULL, \r\n address character varying(300) NOT NULL, \r\n zipcode integer NOT NULL, \r\n telephone character varying(20) NOT NULL, \r\n recommendedby integer,\r\n joindate timestamp not null,\r\n CONSTRAINT members_pk PRIMARY KEY (memid),\r\n CONSTRAINT fk_members_recommendedby FOREIGN KEY (recommendedby)\r\n REFERENCES cd.members(memid) ON DELETE SET NULL\r\n);\r\n```\r\n\r\nEach member has an ID (not guaranteed to be sequential), basic address information, a reference to the member that recommended them (if any), and a timestamp for when they joined. The addresses in the dataset are entirely (and unrealistically) fabricated.\r\n\r\n```sql\r\nCREATE TABLE cd.facilities\r\n(\r\n facid integer NOT NULL, \r\n name character varying(100) NOT NULL, \r\n membercost numeric NOT NULL, \r\n guestcost numeric NOT NULL, \r\n initialoutlay numeric NOT NULL, \r\n monthlymaintenance numeric NOT NULL, \r\n CONSTRAINT facilities_pk PRIMARY KEY (facid)\r\n);\r\n```\r\n\r\nThe facilities table lists all the bookable facilities that the country club possesses. The club stores id/name information, the cost to book both members and guests, the initial cost to build the facility, and estimated monthly upkeep costs. They hope to use this information to track how financially worthwhile each facility is.\r\n\r\n```sql\r\nCREATE TABLE cd.bookings\r\n(\r\n bookid integer NOT NULL, \r\n facid integer NOT NULL, \r\n memid integer NOT NULL, \r\n starttime timestamp NOT NULL,\r\n slots integer NOT NULL,\r\n CONSTRAINT bookings_pk PRIMARY KEY (bookid),\r\n CONSTRAINT fk_bookings_facid FOREIGN KEY (facid) REFERENCES cd.facilities(facid),\r\n CONSTRAINT fk_bookings_memid FOREIGN KEY (memid) REFERENCES cd.members(memid)\r\n);\r\n```\r\n\r\nFinally, there's a table tracking bookings of facilities. This stores the facility id, the member who made the booking, the start of the booking, and how many half hour 'slots' the booking was made for. This idiosyncratic design will make certain queries more difficult, but should provide you with some interesting challenges - as well as prepare you for the horror of working with some real-world databases :-).\r\n\r\nOkay, that should be all the information you need. You can select a category of query to try from the menu above, or alternatively [start from the beginning](https://pgexercises.com/questions/basic/).\r\n\r\n\r\n#### I want to use my own Postgres system\r\n\r\nNo problem! Getting up and running isn't too hard. First, you'll need an install of PostgreSQL, which you can get from [here](http://www.postgresql.org/download/). Once you have it started, [download the SQL](https://pgexercises.com/dbfiles/clubdata.sql).\r\n\r\nFinally, run `psql -U <username> -f clubdata.sql -d postgres -x -q` to create the 'exercises' database, the Postgres 'pgexercises' user, the tables, and to load the data in. Note that you may find that the sort order of your results differs from those shown on the web site: that's probably because your Postgres is set up using a different locale to that used by PGExercises (which uses the C locale)\r\n\r\nWhen you're running queries, you may find psql a little clunky. If so, I recommend trying out pgAdmin or the Eclipse database development tools.\r\n\r\n\r\n#### Schema\r\n\r\n\r\n![](Images/schema-horizontal.png)\r\n\r\n***\r\n\r\n## Simple SQL Queries\r\n\r\nThis category deals with the basics of SQL. It covers select and where clauses, case expressions, unions, and a few other odds and ends. If you're already educated in SQL you will probably find these exercises fairly easy. If not, you should find them a good point to start learning for the more difficult categories ahead!\r\n\r\nIf you struggle with these questions, I strongly recommend [Learning SQL](http://shop.oreilly.com/product/9780596007270.do), by Alan Beaulieu, as a concise and well-written book on the subject. If you're interested in the fundamentals of database systems (as opposed to just how to use them), you should also investigate An Introduction to Database Systems by C.J. Date.\r\n\r\n\r\n### Retrieve everything from a table\r\n\r\nHow can you retrieve all the information from the cd.facilities table?\r\n\r\n\r\nExpected results:\r\n\r\n| facid | name | membercost | guestcost | initialoutlay | monthlymaintenance |\r\n| ----- | --------------- | ---------- | --------- | ------------- | ------------------ |\r\n| 0 | Tennis Court 1 | 5 | 25 | 10000 | 200 |\r\n| 1 | Tennis Court 2 | 5 | 25 | 8000 | 200 |\r\n| 2 | Badminton Court | 0 | 15.5 | 4000 | 50 |\r\n| 3 | Table Tennis | 0 | 5 | 320 | 10 |\r\n| 4 | Massage Room 1 | 35 | 80 | 4000 | 3000 |\r\n| 5 | Massage Room 2 | 35 | 80 | 4000 | 3000 |\r\n| 6 | Squash Court | 3.5 | 17.5 | 5000 | 80 |\r\n| 7 | Snooker Table | 0 | 5 | 450 | 15 |\r\n| 8 | Pool Table | 0 | 5 | 400 | 15 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect * from cd.facilities; \r\n```\r\n\r\nThe `SELECT` statement is the basic starting block for queries that read information out of the database. A minimal select statement is generally comprised of `select [some set of columns] from [some table or group of tables]`. \r\n\r\nIn this case, we want all of the information from the facilities table. The from section is easy - we just need to specify the `cd.facilities` table. 'cd' is the table's schema - a term used for a logical grouping of related information in the database. \r\n\r\nNext, we need to specify that we want all the columns. Conveniently, there's a shorthand for 'all columns' - *. We can use this instead of laboriously specifying all the column names. \r\n\r\n### Retrieve specific columns from a table \r\n\r\nYou want to print out a list of all of the facilities and their cost to members. How would you retrieve a list of only facility names and costs?\r\n\r\n\r\nExpected results:\r\n\r\n| name | membercost |\r\n| --------------- | ---------- |\r\n| Tennis Court 1 | 5 |\r\n| Tennis Court 2 | 5 |\r\n| Badminton Court | 0 |\r\n| Table Tennis | 0 |\r\n| Massage Room 1 | 35 |\r\n| Massage Room 2 | 35 |\r\n| Squash Court | 3.5 |\r\n| Snooker Table | 0 |\r\n| Pool Table | 0 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect name, membercost from cd.facilities; \r\n```\r\n\r\nFor this question, we need to specify the columns that we want. We can do that with a simple comma-delimited list of column names specified to the select statement. All the database does is look at the columns available in the FROM clause, and return the ones we asked for, as illustrated below\r\n\r\n![](Images/select.png)\r\n\r\nGenerally speaking, for non-throwaway queries it's considered desirable to specify the names of the columns you want in your queries rather than using *. This is because your application might not be able to cope if more columns get added into the table.\r\n\r\n\r\n\r\n### Control which rows are retrieved\r\n\r\nHow can you produce a list of facilities that charge a fee to members?\r\n\r\n\r\nExpected results:\r\n\r\n| facid | name | membercost | guestcost | initialoutlay | monthlymaintenance |\r\n| ----- | -------------- | ---------- | --------- | ------------- | ------------------ |\r\n| 0 | Tennis Court 1 | 5 | 25 | 10000 | 200 |\r\n| 1 | Tennis Court 2 | 5 | 25 | 8000 | 200 |\r\n| 4 | Massage Room 1 | 35 | 80 | 4000 | 3000 |\r\n| 5 | Massage Room 2 | 35 | 80 | 4000 | 3000 |\r\n| 6 | Squash Court | 3.5 | 17.5 | 5000 | 80 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect * from cd.facilities where membercost > 0; \r\n```\r\n\r\nThe `FROM` clause is used to build up a set of candidate rows to read results from. In our examples so far, this set of rows has simply been the contents of a table. In future we will explore joining, which allows us to create much more interesting candidates. \r\n\r\nOnce we've built up our set of candidate rows, the `WHERE` clause allows us to filter for the rows we're interested in - in this case, those with a membercost of more than zero. As you will see in later exercises, `WHERE` clauses can have multiple components combined with boolean logic - it's possible to, for instance, search for facilities with a cost greater than 0 and less than 10. The filtering action of the `WHERE` clause on the facilities table is illustrated below: \r\n\r\n![](Images/whereclause.png)\r\n\r\n\r\n\r\n### Control which rows are retrieved, Part 2\r\n\r\nHow can you produce a list of facilities that charge a fee to members, and that fee is less than 1/50th of the monthly maintenance cost? Return the facid, facility name, member cost, and monthly maintenance of the facilities in question.\r\n\r\n\r\nExpected results:\r\n\r\n| facid | name | membercost | monthlymaintenance |\r\n| ----- | -------------- | ---------- | ------------------ |\r\n| 4 | Massage Room 1 | 35 | 3000 |\r\n| 5 | Massage Room 2 | 35 | 3000 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect facid, name, membercost, monthlymaintenance \r\n\tfrom cd.facilities \r\n\twhere \r\n\t\tmembercost > 0 and \r\n\t\t(membercost < monthlymaintenance/50.0); \r\n```\r\n\r\nThe `WHERE` clause allows us to filter for the rows we're interested in - in this case, those with a membercost of more than zero, and less than 1/50th of the monthly maintenance cost. As you can see, the massage rooms are very expensive to run thanks to staffing costs!\r\n\r\nWhen we want to test for two or more conditions, we use `AND` to combine them. We can, as you might expect, use `OR` to test whether either of a pair of conditions is true. \r\n\r\nYou might have noticed that this is our first query that combines a `WHERE` clause with selecting specific columns. You can see in the image below the effect of this: the intersection of the selected columns and the selected rows gives us the data to return. This may not seem too interesting now, but as we add in more complex operations like joins later, you'll see the simple elegance of this behaviour.\r\n\r\n![](Images/whereandselect.png)\r\n\r\n\r\n\r\n### Basic string searches\r\n\r\nHow can you produce a list of all facilities with the word 'Tennis' in their name?\r\n\r\n\r\nExpected results:\r\n\r\n| facid | name | membercost | guestcost | initialoutlay | monthlymaintenance |\r\n| ----- | -------------- | ---------- | --------- | ------------- | ------------------ |\r\n| 0 | Tennis Court 1 | 5 | 25 | 10000 | 200 |\r\n| 1 | Tennis Court 2 | 5 | 25 | 8000 | 200 |\r\n| 3 | Table Tennis | 0 | 5 | 320 | 10 |\r\n\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect *\r\n\tfrom cd.facilities \r\n\twhere \r\n\t\tname like '%Tennis%'; \r\n```\r\n\r\nSQL's `LIKE` operator provides simple pattern matching on strings. It's pretty much universally implemented, and is nice and simple to use - it just takes a string with the % character matching any string, and _ matching any single character. In this case, we're looking for names containing the word 'Tennis', so putting a % on either side fits the bill.\r\n\r\nThere's other ways to accomplish this task: Postgres supports regular expressions with the ~ operator, for example. Use whatever makes you feel comfortable, but do be aware that the `LIKE` operator is much more portable between systems.\r\n\r\n\r\n\r\n### Matching against multiple possible values\r\n\r\nHow can you retrieve the details of facilities with ID 1 and 5? Try to do it without using the OR operator. \r\n\r\n\r\nExpected results:\r\n\r\n| facid | name | membercost | guestcost | initialoutlay | monthlymaintenance |\r\n| ----- | -------------- | ---------- | --------- | ------------- | ------------------ |\r\n| 1 | Tennis Court 2 | 5 | 25 | 8000 | 200 |\r\n| 5 | Massage Room 2 | 35 | 80 | 4000 | 3000 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect *\r\n\tfrom cd.facilities \r\n\twhere \r\n\t\tfacid in (1,5); \r\n```\r\n\r\nThe obvious answer to this question is to use a `WHERE` clause that looks like `where facid = 1 or facid = 5`. An alternative that is easier with large numbers of possible matches is the `IN` operator. The `IN` operator takes a list of possible values, and matches them against (in this case) the facid. If one of the values matches, the where clause is true for that row, and the row is returned. \r\n\r\nThe `IN` operator is a good early demonstrator of the elegance of the relational model. The argument it takes is not just a list of values - it's actually a table with a single column. Since queries also return tables, if you create a query that returns a single column, you can feed those results into an `IN` operator. To give a toy example: \r\n\r\n```sql\r\nselect * \r\n\tfrom cd.facilities\r\n\twhere\r\n\t\tfacid in (\r\n\t\t\tselect facid from cd.facilities\r\n\t\t\t);\r\n```\r\n\r\n This example is functionally equivalent to just selecting all the facilities, but shows you how to feed the results of one query into another. The inner query is called a *subquery*. \r\n\r\n### Classify results into bucket\r\n\r\n\r\n\r\nHow can you produce a list of facilities, with each labelled as 'cheap' or 'expensive' depending on if their monthly maintenance cost is more than $100? Return the name and monthly maintenance of the facilities in question. \r\n\r\n\r\nExpected results:\r\n\r\n| name | cost |\r\n| --------------- | --------- |\r\n| Tennis Court 1 | expensive |\r\n| Tennis Court 2 | expensive |\r\n| Badminton Court | cheap |\r\n| Table Tennis | cheap |\r\n| Massage Room 1 | expensive |\r\n| Massage Room 2 | expensive |\r\n| Squash Court | cheap |\r\n| Snooker Table | cheap |\r\n| Pool Table | cheap |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect name, \r\n\tcase when (monthlymaintenance > 100) then\r\n\t\t'expensive'\r\n\telse\r\n\t\t'cheap'\r\n\tend as cost\r\n\tfrom cd.facilities; \r\n```\r\n\r\nThis exercise contains a few new concepts. The first is the fact that we're doing computation in the area of the query between `SELECT` and `FROM`. Previously we've only used this to select columns that we want to return, but you can put anything in here that will produce a single result per returned row - including subqueries. \r\n\r\nThe second new concept is the `CASE` statement itself. `CASE` is effectively like if/switch statements in other languages, with a form as shown in the query. To add a 'middling' option, we would simply insert another `when...then` section. \r\n\r\nFinally, there's the `AS` operator. This is simply used to label columns or expressions, to make them display more nicely or to make them easier to reference when used as part of a subquery. \r\n\r\n\r\n\r\n### Working with dates\r\n\r\nHow can you produce a list of members who joined after the start of September 2012? Return the memid, surname, firstname, and joindate of the members in question.\r\n\r\n\r\nExpected results:\r\n\r\n| memid | surname | firstname | joindate |\r\n| ----- | ----------------- | --------- | ------------------- |\r\n| 24 | Sarwin | Ramnaresh | 2012-09-01 08:44:42 |\r\n| 26 | Jones | Douglas | 2012-09-02 18:43:05 |\r\n| 27 | Rumney | Henrietta | 2012-09-05 08:42:35 |\r\n| 28 | Farrell | David | 2012-09-15 08:22:05 |\r\n| 29 | Worthington-Smyth | Henry | 2012-09-17 12:27:15 |\r\n| 30 | Purview | Millicent | 2012-09-18 19:04:01 |\r\n| 33 | Tupperware | Hyacinth | 2012-09-18 19:32:05 |\r\n| 35 | Hunt | John | 2012-09-19 11:32:45 |\r\n| 36 | Crumpet | Erica | 2012-09-22 08:36:38 |\r\n| 37 | Smith | Darren | 2012-09-26 18:08:45 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect memid, surname, firstname, joindate \r\n\tfrom cd.members\r\n\twhere joindate >= '2012-09-01'; \r\n```\r\n\r\n This is our first look at SQL timestamps. They're formatted in descending order of magnitude: `YYYY-MM-DD HH:MM:SS.nnnnnn`. We can compare them just like we might a unix timestamp, although getting the differences between dates is a little more involved (and powerful!). In this case, we've just specified the date portion of the timestamp. This gets automatically cast by postgres into the full timestamp `2012-09-01 00:00:00`. \r\n\r\n\r\n\r\n### Removing duplicates, and ordering results\r\n\r\nHow can you produce an ordered list of the first 10 surnames in the members table? The list must not contain duplicates. \r\n\r\n\r\nExpected results:\r\n\r\n| surname |\r\n| ------- |\r\n| Bader |\r\n| Baker |\r\n| Boothe |\r\n| Butters |\r\n| Coplin |\r\n| Crumpet |\r\n| Dare |\r\n| Farrell |\r\n| GUEST |\r\n| Genting |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect distinct surname \r\n\tfrom cd.members\r\norder by surname\r\nlimit 10; \r\n```\r\n\r\nThere's three new concepts here, but they're all pretty simple. \r\n\r\n- Specifying `DISTINCT` after `SELECT` removes duplicate rows from the result set. Note that this applies to *rows*: if row A has multiple columns, row B is only equal to it if the values in all columns are the same. As a general rule, don't use `DISTINCT` in a willy-nilly fashion - it's not free to remove duplicates from large query result sets, so do it as-needed. \r\n- Specifying `ORDER BY` (after the `FROM` and `WHERE` clauses, near the end of the query) allows results to be ordered by a column or set of columns (comma separated). \r\n- The `LIMIT` keyword allows you to limit the number of results retrieved. This is useful for getting results a page at a time, and can be combined with the `OFFSET` keyword to get following pages. This is the same approach used by MySQL and is very convenient - you may, unfortunately, find that this process is a little more complicated in other DBs. \r\n\r\n\r\n\r\n### Combining results from multiple queries\r\n\r\nYou, for some reason, want a combined list of all surnames and all facility names. Yes, this is a contrived example :-). Produce that list! \r\n\r\n\r\nExpected results:\r\n\r\n| surname |\r\n| ----------------- |\r\n| Tennis Court 2 |\r\n| Worthington-Smyth |\r\n| Badminton Court |\r\n| Pinker |\r\n| Dare |\r\n| Bader |\r\n| Mackenzie |\r\n| Crumpet |\r\n| Massage Room 1 |\r\n| Squash Court |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect surname \r\n\tfrom cd.members\r\nunion\r\nselect name\r\n\tfrom cd.facilities; \r\n```\r\n\r\nThe `UNION` operator does what you might expect: combines the results of two SQL queries into a single table. The caveat is that both results from the two queries must have the same number of columns and compatible data types. \r\n\r\n`UNION` removes duplicate rows, while `UNION ALL` does not. Use `UNION ALL` by default, unless you care about duplicate results. \r\n\r\n\r\n\r\n### Simple aggregation\r\n\r\nYou'd like to get the signup date of your last member. How can you retrieve this information? \r\n\r\n\r\nExpected results:\r\n\r\n| latest |\r\n| ------------------- |\r\n| 2012-09-26 18:08:45 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect max(joindate) as latest\r\n\tfrom cd.members; \r\n```\r\n\r\nThis is our first foray into SQL's aggregate functions. They're used to extract information about whole groups of rows, and allow us to easily ask questions like:\r\n\r\n- What's the most expensive facility to maintain on a monthly basis? \r\n- Who has recommended the most new members? \r\n- How much time has each member spent at our facilities? \r\n\r\nThe MAX aggregate function here is very simple: it receives all the possible values for joindate, and outputs the one that's biggest. There's a lot more power to aggregate functions, which you will come across in future exercises.\r\n\r\n\r\n\r\n### More aggregation\r\n\r\nYou'd like to get the first and last name of the last member(s) who signed up - not just the date. How can you do that? \r\n\r\n\r\nExpected results:\r\n\r\n| firstname | surname | joindate |\r\n| --------- | ------- | ------------------- |\r\n| Darren | Smith | 2012-09-26 18:08:45 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect firstname, surname, joindate\r\n\tfrom cd.members\r\n\twhere joindate = \r\n\t\t(select max(joindate) \r\n\t\t\tfrom cd.members); \r\n```\r\n\r\nIn the suggested approach above, you use a *subquery* to find out what the most recent joindate is. This subquery returns a *scalar* table - that is, a table with a single column and a single row. Since we have just a single value, we can substitute the subquery anywhere we might put a single constant value. In this case, we use it to complete the `WHERE` clause of a query to find a given member. \r\n\r\nYou might hope that you'd be able to do something like below:\r\n\r\n```sql\r\nselect firstname, surname, max(joindate)\r\n from cd.members\r\n```\r\n\r\nUnfortunately, this doesn't work. The `MAX` function doesn't restrict rows like the `WHERE` clause does - it simply takes in a bunch of values and returns the biggest one. The database is then left wondering how to pair up a long list of names with the single join date that's come out of the max function, and fails. Instead, you're left having to say 'find me the row(s) which have a join date that's the same as the maximum join date'.\r\n\r\nAs mentioned by the hint, there's other ways to get this job done - one example is below. In this approach, rather than explicitly finding out what the last joined date is, we simply order our members table in descending order of join date, and pick off the first one. Note that this approach does not cover the extremely unlikely eventuality of two people joining at the exact same time :-). \r\n\r\n```sql\r\nselect firstname, surname, joindate\r\n\tfrom cd.members\r\norder by joindate desc\r\nlimit 1;\r\n```\r\n\r\n***\r\n\r\n## Joins and Subqueries\r\n\r\nThis category deals primarily with a foundational concept in relational database systems: joining. Joining allows you to combine related information from multiple tables to answer a question. This isn't just beneficial for ease of querying: a lack of join capability encourages denormalisation of data, which increases the complexity of keeping your data internally consistent. \r\n\r\nThis topic covers inner, outer, and self joins, as well as spending a little time on subqueries (queries within queries). If you struggle with these questions, I strongly recommend [Learning SQL](http://shop.oreilly.com/product/9780596007270.do), by Alan Beaulieu, as a concise and well-written book on the subject.\r\n\r\n\r\n\r\n### Retrieve the start times of members' bookings\r\n\r\nHow can you produce a list of the start times for bookings by members named 'David Farrell'? \r\n\r\n\r\nExpected results:\r\n\r\n| starttime |\r\n| ------------------- |\r\n| 2012-09-18 09:00:00 |\r\n| 2012-09-18 17:30:00 |\r\n| 2012-09-18 13:30:00 |\r\n| 2012-09-18 20:00:00 |\r\n| 2012-09-19 09:30:00 |\r\n| 2012-09-19 15:00:00 |\r\n| 2012-09-19 12:00:00 |\r\n| 2012-09-20 15:30:00 |\r\n| 2012-09-20 11:30:00 |\r\n| 2012-09-20 14:00:00 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect bks.starttime \r\n\tfrom \r\n\t\tcd.bookings bks\r\n\t\tinner join cd.members mems\r\n\t\t\ton mems.memid = bks.memid\r\n\twhere \r\n\t\tmems.firstname='David' \r\n\t\tand mems.surname='Farrell'; \r\n```\r\n\r\nThe most commonly used kind of join is the `INNER JOIN`. What this does is combine two tables based on a join expression - in this case, for each member id in the members table, we're looking for matching values in the bookings table. Where we find a match, a row combining the values for each table is returned. Note that we've given each table an *alias* (bks and mems). This is used for two reasons: firstly, it's convenient, and secondly we might join to the same table several times, requiring us to distinguish between columns from each different time the table was joined in.\r\n\r\nLet's ignore our select and where clauses for now, and focus on what the `FROM` statement produces. In all our previous examples, `FROM` has just been a simple table. What is it now? Another table! This time, it's produced as a composite of bookings and members. You can see a subset of the output of the join below:\r\n\r\n![](Images/joinbefore.gif)\r\n\r\nFor each member in the members table, the join has found all the matching member ids in the bookings table. For each match, it's then produced a row combining the row from the members table, and the row from the bookings table.\r\n\r\nObviously, this is too much information on its own, and any useful question will want to filter it down. In our query, we use the start of the `SELECT` clause to pick columns, and the `WHERE` clause to pick rows, as illustrated below:\r\n\r\n![](Images/join1.png)\r\n\r\nThat's all we need to find David's bookings! In general, I encourage you to remember that the output of the `FROM` clause is essentially one big table that you then filter information out of. This may sound inefficient - but don't worry, under the covers the DB will be behaving much more intelligently :-).\r\n\r\nOne final note: there's two different syntaxes for inner joins. I've shown you the one I prefer, that I find more consistent with other join types. You'll commonly see a different syntax, shown below:\r\n\r\n```sql\r\nselect bks.starttime\r\n from\r\n cd.bookings bks,\r\n cd.members mems\r\n where\r\n mems.firstname='David'\r\n and mems.surname='Farrell'\r\n and mems.memid = bks.memid;\r\n```\r\n\r\nThis is functionally exactly the same as the approved answer. If you feel more comfortable with this syntax, feel free to use it!\r\n\r\n\r\n\r\n### Work out the start times of bookings for tennis courts\r\n\r\nHow can you produce a list of the start times for bookings for tennis courts, for the date '2012-09-21'? Return a list of start time and facility name pairings, ordered by the time. \r\n\r\n\r\nExpected results:\r\n\r\n| start | name |\r\n| ------------------- | -------------- |\r\n| 2012-09-21 08:00:00 | Tennis Court 1 |\r\n| 2012-09-21 08:00:00 | Tennis Court 2 |\r\n| 2012-09-21 09:30:00 | Tennis Court 1 |\r\n| 2012-09-21 10:00:00 | Tennis Court 2 |\r\n| 2012-09-21 11:30:00 | Tennis Court 2 |\r\n| 2012-09-21 12:00:00 | Tennis Court 1 |\r\n| 2012-09-21 13:30:00 | Tennis Court 1 |\r\n| 2012-09-21 14:00:00 | Tennis Court 2 |\r\n| 2012-09-21 15:30:00 | Tennis Court 1 |\r\n| 2012-09-21 16:00:00 | Tennis Court 2 |\r\n| 2012-09-21 17:00:00 | Tennis Court 1 |\r\n| 2012-09-21 18:00:00 | Tennis Court 2 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect bks.starttime as start, facs.name as name\r\n\tfrom \r\n\t\tcd.facilities facs\r\n\t\tinner join cd.bookings bks\r\n\t\t\ton facs.facid = bks.facid\r\n\twhere \r\n\t\tfacs.facid in (0,1) and\r\n\t\tbks.starttime >= '2012-09-21' and\r\n\t\tbks.starttime < '2012-09-22'\r\norder by bks.starttime; \r\n```\r\n\r\nThis is another `INNER JOIN` query, although it has a fair bit more complexity in it! The `FROM` part of the query is easy - we're simply joining facilities and bookings tables together on the facid. This produces a table where, for each row in bookings, we've attached detailed information about the facility being booked.\r\n\r\nOn to the `WHERE` component of the query. The checks on starttime are fairly self explanatory - we're making sure that all the bookings start between the specified dates. Since we're only interested in tennis courts, we're also using the `IN` operator to tell the database system to only give us back facility IDs 0 or 1 - the IDs of the courts. There's other ways to express this: We could have used `where facs.facid = 0 or facs.facid = 1`, or even `where facs.name like 'Tennis%'`.\r\n\r\nThe rest is pretty simple: we `SELECT` the columns we're interested in, and `ORDER BY` the start time.\r\n\r\n\r\n\r\n### Produce a list of all members who have recommended another member\r\n\r\nHow can you output a list of all members who have recommended another member? Ensure that there are no duplicates in the list, and that results are ordered by (surname, firstname).\r\n\r\n\r\nExpected results:\r\n\r\n| firstname | surname |\r\n| --------- | -------- |\r\n| Florence | Bader |\r\n| Timothy | Baker |\r\n| Gerald | Butters |\r\n| Jemima | Farrell |\r\n| Matthew | Genting |\r\n| David | Jones |\r\n| Janice | Joplette |\r\n| Millicent | Purview |\r\n| Tim | Rownam |\r\n| Darren | Smith |\r\n| Tracy | Smith |\r\n| Ponder | Stibbons |\r\n| Burton | Tracy |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect distinct recs.firstname as firstname, recs.surname as surname\r\n\tfrom \r\n\t\tcd.members mems\r\n\t\tinner join cd.members recs\r\n\t\t\ton recs.memid = mems.recommendedby\r\norder by surname, firstname; \r\n```\r\n\r\nHere's a concept that some people find confusing: you can join a table to itself! This is really useful if you have columns that reference data in the same table, like we do with recommendedby in cd.members.\r\n\r\nIf you're having trouble visualising this, remember that this works just the same as any other inner join. Our join takes each row in members that has a recommendedby value, and looks in members again for the row which has a matching member id. It then generates an output row combining the two members entries. This looks like the diagram below:\r\n\r\n![](Images/innerjoin.png)\r\n\r\nNote that while we might have two 'surname' columns in the output set, they can be distinguished by their table aliases. Once we've selected the columns that we want, we simply use `DISTINCT` to ensure that there are no duplicates. \r\n\r\n\r\n\r\n### Produce a list of all members, along with their recommender\r\n\r\nHow can you output a list of all members, including the individual who recommended them (if any)? Ensure that results are ordered by (surname, firstname). \r\n\r\n\r\nExpected results:\r\n\r\n| memfname | memsname | recfname | recsname |\r\n| --------- | ----------------- | --------- | -------- |\r\n| Florence | Bader | Ponder | Stibbons |\r\n| Anne | Baker | Ponder | Stibbons |\r\n| Timothy | Baker | Jemima | Farrell |\r\n| Tim | Boothe | Tim | Rownam |\r\n| Gerald | Butters | Darren | Smith |\r\n| Joan | Coplin | Timothy | Baker |\r\n| Erica | Crumpet | Tracy | Smith |\r\n| Nancy | Dare | Janice | Joplette |\r\n| David | Farrell | | |\r\n| Jemima | Farrell | | |\r\n| GUEST | GUEST | | |\r\n| Matthew | Genting | Gerald | Butters |\r\n| John | Hunt | Millicent | Purview |\r\n| David | Jones | Janice | Joplette |\r\n| Douglas | Jones | David | Jones |\r\n| Janice | Joplette | Darren | Smith |\r\n| Anna | Mackenzie | Darren | Smith |\r\n| Charles | Owen | Darren | Smith |\r\n| David | Pinker | Jemima | Farrell |\r\n| Millicent | Purview | Tracy | Smith |\r\n| Tim | Rownam | | |\r\n| Henrietta | Rumney | Matthew | Genting |\r\n| Ramnaresh | Sarwin | Florence | Bader |\r\n| Darren | Smith | | |\r\n| Darren | Smith | | |\r\n| Jack | Smith | Darren | Smith |\r\n| Tracy | Smith | | |\r\n| Ponder | Stibbons | Burton | Tracy |\r\n| Burton | Tracy | | |\r\n| Hyacinth | Tupperware | | |\r\n| Henry | Worthington-Smyth | Tracy | Smith |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect mems.firstname as memfname, mems.surname as memsname, recs.firstname as recfname, recs.surname as recsname\r\n\tfrom \r\n\t\tcd.members mems\r\n\t\tleft outer join cd.members recs\r\n\t\t\ton recs.memid = mems.recommendedby\r\norder by memsname, memfname; \r\n```\r\n\r\nLet's introduce another new concept: the `LEFT OUTER JOIN`. These are best explained by the way in which they differ from inner joins. Inner joins take a left and a right table, and look for matching rows based on a join condition (`ON`). When the condition is satisfied, a joined row is produced. A `LEFT OUTER JOIN` operates similarly, except that if a given row on the left hand table doesn't match anything, it still produces an output row. That output row consists of the left hand table row, and a bunch of `NULLS` in place of the right hand table row.\r\n\r\nThis is useful in situations like this question, where we want to produce output with optional data. We want the names of all members, and the name of their recommender *if that person exists*. You can't express that properly with an inner join.\r\n\r\nAs you may have guessed, there's other outer joins too. The `RIGHT OUTER JOIN` is much like the `LEFT OUTER JOIN`, except that the left hand side of the expression is the one that contains the optional data. The rarely-used `FULL OUTER JOIN` treats both sides of the expression as optional. \r\n\r\n\r\n\r\n### Produce a list of all members who have used a tennis court\r\n\r\nHow can you produce a list of all members who have used a tennis court? Include in your output the name of the court, and the name of the member formatted as a single column. Ensure no duplicate data, and order by the member name.\r\n\r\n\r\nExpected results:\r\n\r\n| member | facility |\r\n| ----------------- | -------------- |\r\n| Anne Baker | Tennis Court 2 |\r\n| Anne Baker | Tennis Court 1 |\r\n| Burton Tracy | Tennis Court 2 |\r\n| Burton Tracy | Tennis Court 1 |\r\n| Charles Owen | Tennis Court 2 |\r\n| Charles Owen | Tennis Court 1 |\r\n| Darren Smith | Tennis Court 2 |\r\n| David Farrell | Tennis Court 2 |\r\n| David Farrell | Tennis Court 1 |\r\n| David Jones | Tennis Court 1 |\r\n| David Jones | Tennis Court 2 |\r\n| David Pinker | Tennis Court 1 |\r\n| Douglas Jones | Tennis Court 1 |\r\n| Erica Crumpet | Tennis Court 1 |\r\n| Florence Bader | Tennis Court 1 |\r\n| Florence Bader | Tennis Court 2 |\r\n| GUEST GUEST | Tennis Court 2 |\r\n| GUEST GUEST | Tennis Court 1 |\r\n| Gerald Butters | Tennis Court 1 |\r\n| Gerald Butters | Tennis Court 2 |\r\n| Henrietta Rumney | Tennis Court 2 |\r\n| Jack Smith | Tennis Court 1 |\r\n| Jack Smith | Tennis Court 2 |\r\n| Janice Joplette | Tennis Court 1 |\r\n| Janice Joplette | Tennis Court 2 |\r\n| Jemima Farrell | Tennis Court 2 |\r\n| Jemima Farrell | Tennis Court 1 |\r\n| Joan Coplin | Tennis Court 1 |\r\n| John Hunt | Tennis Court 1 |\r\n| John Hunt | Tennis Court 2 |\r\n| Matthew Genting | Tennis Court 1 |\r\n| Millicent Purview | Tennis Court 2 |\r\n| Nancy Dare | Tennis Court 2 |\r\n| Nancy Dare | Tennis Court 1 |\r\n| Ponder Stibbons | Tennis Court 2 |\r\n| Ponder Stibbons | Tennis Court 1 |\r\n| Ramnaresh Sarwin | Tennis Court 2 |\r\n| Ramnaresh Sarwin | Tennis Court 1 |\r\n| Tim Boothe | Tennis Court 1 |\r\n| Tim Boothe | Tennis Court 2 |\r\n| Tim Rownam | Tennis Court 1 |\r\n| Tim Rownam | Tennis Court 2 |\r\n| Timothy Baker | Tennis Court 2 |\r\n| Timothy Baker | Tennis Court 1 |\r\n| Tracy Smith | Tennis Court 2 |\r\n| Tracy Smith | Tennis Court 1 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect distinct mems.firstname || ' ' || mems.surname as member, facs.name as facility\r\n\tfrom \r\n\t\tcd.members mems\r\n\t\tinner join cd.bookings bks\r\n\t\t\ton mems.memid = bks.memid\r\n\t\tinner join cd.facilities facs\r\n\t\t\ton bks.facid = facs.facid\r\n\twhere\r\n\t\tbks.facid in (0,1)\r\norder by member \r\n```\r\n\r\nThis exercise is largely a more complex application of what you've learned in prior questions. It's also the first time we've used more than one join, which may be a little confusing for some. When reading join expressions, remember that a join is effectively a function that takes two tables, one labelled the left table, and the other the right. This is easy to visualise with just one join in the query, but a little more confusing with two.\r\n\r\nOur second `INNER JOIN` in this query has a right hand side of cd.facilities. That's easy enough to grasp. The left hand side, however, is the table returned by joining cd.members to cd.bookings. It's important to emphasise this: the relational model is all about tables. The output of any join is another table. The output of a query is a table. Single columned lists are tables. Once you grasp that, you've grasped the fundamental beauty of the model.\r\n\r\nAs a final note, we do introduce one new thing here: the `||` operator is used to concatenate strings.\r\n\r\n\r\n\r\n### Produce a list of costly bookings\r\n\r\nHow can you produce a list of bookings on the day of 2012-09-14 which will cost the member (or guest) more than $30? Remember that guests have different costs to members (the listed costs are per half-hour 'slot'), and the guest user is always ID 0. Include in your output the name of the facility, the name of the member formatted as a single column, and the cost. Order by descending cost, and do not use any subqueries. \r\n\r\n\r\nExpected results:\r\n\r\n| member | facility | cost |\r\n| --------------- | -------------- | ---- |\r\n| GUEST GUEST | Massage Room 2 | 320 |\r\n| GUEST GUEST | Massage Room 1 | 160 |\r\n| GUEST GUEST | Massage Room 1 | 160 |\r\n| GUEST GUEST | Massage Room 1 | 160 |\r\n| GUEST GUEST | Tennis Court 2 | 150 |\r\n| Jemima Farrell | Massage Room 1 | 140 |\r\n| GUEST GUEST | Tennis Court 1 | 75 |\r\n| GUEST GUEST | Tennis Court 2 | 75 |\r\n| GUEST GUEST | Tennis Court 1 | 75 |\r\n| Matthew Genting | Massage Room 1 | 70 |\r\n| Florence Bader | Massage Room 2 | 70 |\r\n| GUEST GUEST | Squash Court | 70.0 |\r\n| Jemima Farrell | Massage Room 1 | 70 |\r\n| Ponder Stibbons | Massage Room 1 | 70 |\r\n| Burton Tracy | Massage Room 1 | 70 |\r\n| Jack Smith | Massage Room 1 | 70 |\r\n| GUEST GUEST | Squash Court | 35.0 |\r\n| GUEST GUEST | Squash Court | 35.0 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect mems.firstname || ' ' || mems.surname as member, \r\n\tfacs.name as facility, \r\n\tcase \r\n\t\twhen mems.memid = 0 then\r\n\t\t\tbks.slots*facs.guestcost\r\n\t\telse\r\n\t\t\tbks.slots*facs.membercost\r\n\tend as cost\r\n from\r\n cd.members mems \r\n inner join cd.bookings bks\r\n on mems.memid = bks.memid\r\n inner join cd.facilities facs\r\n on bks.facid = facs.facid\r\n where\r\n\t\tbks.starttime >= '2012-09-14' and \r\n\t\tbks.starttime < '2012-09-15' and (\r\n\t\t\t(mems.memid = 0 and bks.slots*facs.guestcost > 30) or\r\n\t\t\t(mems.memid != 0 and bks.slots*facs.membercost > 30)\r\n\t\t)\r\norder by cost desc; \r\n```\r\n\r\nThis is a bit of a complicated one! While its more complex logic than we've used previously, there's not an awful lot to remark upon. The `WHERE` clause restricts our output to sufficiently costly rows on 2012-09-14, remembering to distinguish between guests and others. We then use a `CASE` statement in the column selections to output the correct cost for the member or guest.\r\n\r\n\r\n\r\n### Produce a list of all members, along with their recommender, using no joins\r\n\r\nHow can you output a list of all members, including the individual who recommended them (if any), without using any joins? Ensure that there are no duplicates in the list, and that each firstname + surname pairing is formatted as a column and ordered. \r\n\r\n\r\nExpected results:\r\n\r\n| member | recommender |\r\n| ----------------------- | ----------------- |\r\n| Anna Mackenzie | Darren Smith |\r\n| Anne Baker | Ponder Stibbons |\r\n| Burton Tracy | |\r\n| Charles Owen | Darren Smith |\r\n| Darren Smith | |\r\n| David Farrell | |\r\n| David Jones | Janice Joplette |\r\n| David Pinker | Jemima Farrell |\r\n| Douglas Jones | David Jones |\r\n| Erica Crumpet | Tracy Smith |\r\n| Florence Bader | Ponder Stibbons |\r\n| GUEST GUEST | |\r\n| Gerald Butters | Darren Smith |\r\n| Henrietta Rumney | Matthew Genting |\r\n| Henry Worthington-Smyth | Tracy Smith |\r\n| Hyacinth Tupperware | |\r\n| Jack Smith | Darren Smith |\r\n| Janice Joplette | Darren Smith |\r\n| Jemima Farrell | |\r\n| Joan Coplin | Timothy Baker |\r\n| John Hunt | Millicent Purview |\r\n| Matthew Genting | Gerald Butters |\r\n| Millicent Purview | Tracy Smith |\r\n| Nancy Dare | Janice Joplette |\r\n| Ponder Stibbons | Burton Tracy |\r\n| Ramnaresh Sarwin | Florence Bader |\r\n| Tim Boothe | Tim Rownam |\r\n| Tim Rownam | |\r\n| Timothy Baker | Jemima Farrell |\r\n| Tracy Smith | |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect distinct mems.firstname || ' ' || mems.surname as member,\r\n\t(select recs.firstname || ' ' || recs.surname as recommender \r\n\t\tfrom cd.members recs \r\n\t\twhere recs.memid = mems.recommendedby\r\n\t)\r\n\tfrom \r\n\t\tcd.members mems\r\norder by member; \r\n```\r\n\r\nThis exercise marks the introduction of subqueries. Subqueries are, as the name implies, queries within a query. They're commonly used with aggregates, to answer questions like 'get me all the details of the member who has spent the most hours on Tennis Court 1'.\r\n\r\nIn this case, we're simply using the subquery to emulate an outer join. For every value of member, the subquery is run once to find the name of the individual who recommended them (if any). A subquery that uses information from the outer query in this way (and thus has to be run for each row in the result set) is known as a *correlated subquery*.\r\n\r\n\r\n\r\n### Produce a list of costly bookings, using a subquery\r\n\r\nThe [Produce a list of costly bookings](https://pgexercises.com/questions/joins/threejoin2.html) exercise contained some messy logic: we had to calculate the booking cost in both the WHERE clause and the CASE statement. Try to simplify this calculation using subqueries. For reference, the question was:\r\n\r\n\r\n\r\n*How can you produce a list of bookings on the day of 2012-09-14 which will cost the member (or guest) more than $30? Remember that guests have different costs to members (the listed costs are per half-hour 'slot'), and the guest user is always ID 0. Include in your output the name of the facility, the name of the member formatted as a single column, and the cost. Order by descending cost.* \r\n\r\n\r\nExpected results:\r\n\r\n| member | facility | cost |\r\n| --------------- | -------------- | ---- |\r\n| GUEST GUEST | Massage Room 2 | 320 |\r\n| GUEST GUEST | Massage Room 1 | 160 |\r\n| GUEST GUEST | Massage Room 1 | 160 |\r\n| GUEST GUEST | Massage Room 1 | 160 |\r\n| GUEST GUEST | Tennis Court 2 | 150 |\r\n| Jemima Farrell | Massage Room 1 | 140 |\r\n| GUEST GUEST | Tennis Court 1 | 75 |\r\n| GUEST GUEST | Tennis Court 2 | 75 |\r\n| GUEST GUEST | Tennis Court 1 | 75 |\r\n| Matthew Genting | Massage Room 1 | 70 |\r\n| Florence Bader | Massage Room 2 | 70 |\r\n| GUEST GUEST | Squash Court | 70.0 |\r\n| Jemima Farrell | Massage Room 1 | 70 |\r\n| Ponder Stibbons | Massage Room 1 | 70 |\r\n| Burton Tracy | Massage Room 1 | 70 |\r\n| Jack Smith | Massage Room 1 | 70 |\r\n| GUEST GUEST | Squash Court | 35.0 |\r\n| GUEST GUEST | Squash Court | 35.0 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect member, facility, cost from (\r\n\tselect \r\n\t\tmems.firstname || ' ' || mems.surname as member,\r\n\t\tfacs.name as facility,\r\n\t\tcase\r\n\t\t\twhen mems.memid = 0 then\r\n\t\t\t\tbks.slots*facs.guestcost\r\n\t\t\telse\r\n\t\t\t\tbks.slots*facs.membercost\r\n\t\tend as cost\r\n\t\tfrom\r\n\t\t\tcd.members mems\r\n\t\t\tinner join cd.bookings bks\r\n\t\t\t\ton mems.memid = bks.memid\r\n\t\t\tinner join cd.facilities facs\r\n\t\t\t\ton bks.facid = facs.facid\r\n\t\twhere\r\n\t\t\tbks.starttime >= '2012-09-14' and\r\n\t\t\tbks.starttime < '2012-09-15'\r\n\t) as bookings\r\n\twhere cost > 30\r\norder by cost desc; \r\n```\r\n\r\nThis answer provides a mild simplification to the previous iteration: in the no-subquery version, we had to calculate the member or guest's cost in both the `WHERE` clause and the `CASE` statement. In our new version, we produce an inline query that calculates the total booking cost for us, allowing the outer query to simply select the bookings it's looking for. For reference, you may also see subqueries in the `FROM` clause referred to as *inline views*.\r\n\r\n***\r\n\r\n## Modifying Data\r\n\r\nQuerying data is all well and good, but at some point you're probably going to want to put data into your database! This section deals with inserting, updating, and deleting information. Operations that alter your data like this are collectively known as Data Manipulation Language, or DML.\r\n\r\nIn previous sections, we returned to you the results of the query you've performed. Since modifications like the ones we're making in this section don't return any query results, we instead show you the updated content of the table you're supposed to be working on. You can compare this with the table shown in 'Expected Results' to see how you've done.\r\n\r\nIf you struggle with these questions, I strongly recommend [Learning SQL](http://shop.oreilly.com/product/9780596007270.do), by Alan Beaulieu.\r\n\r\n\r\n\r\n### Insert some data into a table\r\n\r\nThe club is adding a new facility - a spa. We need to add it into the facilities table. Use the following values:\r\n\r\n- facid: 9, Name: 'Spa', membercost: 20, guestcost: 30, initialoutlay: 100000, monthlymaintenance: 800.\r\n\r\n\r\nExpected results:\r\n\r\n| facid | name | membercost | guestcost | initialoutlay | monthlymaintenance |\r\n| ----- | --------------- | ---------- | --------- | ------------- | ------------------ |\r\n| 0 | Tennis Court 1 | 5 | 25 | 10000 | 200 |\r\n| 1 | Tennis Court 2 | 5 | 25 | 8000 | 200 |\r\n| 2 | Badminton Court | 0 | 15.5 | 4000 | 50 |\r\n| 3 | Table Tennis | 0 | 5 | 320 | 10 |\r\n| 4 | Massage Room 1 | 35 | 80 | 4000 | 3000 |\r\n| 5 | Massage Room 2 | 35 | 80 | 4000 | 3000 |\r\n| 6 | Squash Court | 3.5 | 17.5 | 5000 | 80 |\r\n| 7 | Snooker Table | 0 | 5 | 450 | 15 |\r\n| 8 | Pool Table | 0 | 5 | 400 | 15 |\r\n| 9 | Spa | 20 | 30 | 100000 | 800 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\ninsert into cd.facilities\r\n (facid, name, membercost, guestcost, initialoutlay, monthlymaintenance)\r\n values (9, 'Spa', 20, 30, 100000, 800); \r\n```\r\n\r\n`INSERT INTO ... VALUES` is the simplest way to insert data into a table. There's not a whole lot to discuss here: `VALUES` is used to construct a row of data, which the `INSERT` statement inserts into the table. It's a simple as that.\r\n\r\nYou can see that there's two sections in parentheses. The first is part of the `INSERT` statement, and specifies the columns that we're providing data for. The second is part of `VALUES`, and specifies the actual data we want to insert into each column.\r\n\r\nIf we're inserting data into every column of the table, as in this example, explicitly specifying the column names is optional. As long as you fill in data for all columns of the table, in the order they were defined when you created the table, you can do something like the following:\r\n\r\n```sql\r\ninsert into cd.facilities values (9, 'Spa', 20, 30, 100000, 800);\r\n```\r\n\r\nGenerally speaking, for SQL that's going to be reused I tend to prefer being explicit and specifying the column names.\r\n\r\n\r\n\r\n### Insert multiple rows of data into a table\r\n\r\nIn the previous exercise, you learned how to add a facility. Now you're going to add multiple facilities in one command. Use the following values:\r\n\r\n- facid: 9, Name: 'Spa', membercost: 20, guestcost: 30, initialoutlay: 100000, monthlymaintenance: 800. \r\n- facid: 10, Name: 'Squash Court 2', membercost: 3.5, guestcost: 17.5, initialoutlay: 5000, monthlymaintenance: 80.\r\n\r\n\r\nExpected results:\r\n\r\n| facid | name | membercost | guestcost | initialoutlay | monthlymaintenance |\r\n| ----- | --------------- | ---------- | --------- | ------------- | ------------------ |\r\n| 0 | Tennis Court 1 | 5 | 25 | 10000 | 200 |\r\n| 1 | Tennis Court 2 | 5 | 25 | 8000 | 200 |\r\n| 2 | Badminton Court | 0 | 15.5 | 4000 | 50 |\r\n| 3 | Table Tennis | 0 | 5 | 320 | 10 |\r\n| 4 | Massage Room 1 | 35 | 80 | 4000 | 3000 |\r\n| 5 | Massage Room 2 | 35 | 80 | 4000 | 3000 |\r\n| 6 | Squash Court | 3.5 | 17.5 | 5000 | 80 |\r\n| 7 | Snooker Table | 0 | 5 | 450 | 15 |\r\n| 8 | Pool Table | 0 | 5 | 400 | 15 |\r\n| 9 | Spa | 20 | 30 | 100000 | 800 |\r\n| 10 | Squash Court 2 | 3.5 | 17.5 | 5000 | 80 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\ninsert into cd.facilities\r\n (facid, name, membercost, guestcost, initialoutlay, monthlymaintenance)\r\n values\r\n (9, 'Spa', 20, 30, 100000, 800),\r\n (10, 'Squash Court 2', 3.5, 17.5, 5000, 80); \r\n```\r\n\r\n`VALUES` can be used to generate more than one row to insert into a table, as seen in this example. Hopefully it's clear what's going on here: the output of `VALUES` is a table, and that table is copied into cd.facilities, the table specified in the `INSERT` command.\r\n\r\nWhile you'll most commonly see `VALUES` when inserting data, Postgres allows you to use `VALUES` wherever you might use a `SELECT`. This makes sense: the output of both commands is a table, it's just that `VALUES` is a bit more ergonomic when working with constant data.\r\n\r\nSimilarly, it's possible to use `SELECT` wherever you see a `VALUES`. This means that you can `INSERT` the results of a `SELECT`. For example:\r\n\r\n```sql\r\ninsert into cd.facilities\r\n (facid, name, membercost, guestcost, initialoutlay, monthlymaintenance)\r\n SELECT 9, 'Spa', 20, 30, 100000, 800\r\n UNION ALL\r\n SELECT 10, 'Squash Court 2', 3.5, 17.5, 5000, 80;\r\n```\r\n\r\nIn later exercises you'll see us using `INSERT ... SELECT` to generate data to insert based on the information already in the database.\r\n\r\n\r\n\r\n### Insert calculated data into a table\r\n\r\nLet's try adding the spa to the facilities table again. This time, though, we want to automatically generate the value for the next facid, rather than specifying it as a constant. Use the following values for everything else:\r\n\r\n- Name: 'Spa', membercost: 20, guestcost: 30, initialoutlay: 100000, monthlymaintenance: 800.\r\n\r\n\r\nExpected results:\r\n\r\n| facid | name | membercost | guestcost | initialoutlay | monthlymaintenance |\r\n| ----- | --------------- | ---------- | --------- | ------------- | ------------------ |\r\n| 0 | Tennis Court 1 | 5 | 25 | 10000 | 200 |\r\n| 1 | Tennis Court 2 | 5 | 25 | 8000 | 200 |\r\n| 2 | Badminton Court | 0 | 15.5 | 4000 | 50 |\r\n| 3 | Table Tennis | 0 | 5 | 320 | 10 |\r\n| 4 | Massage Room 1 | 35 | 80 | 4000 | 3000 |\r\n| 5 | Massage Room 2 | 35 | 80 | 4000 | 3000 |\r\n| 6 | Squash Court | 3.5 | 17.5 | 5000 | 80 |\r\n| 7 | Snooker Table | 0 | 5 | 450 | 15 |\r\n| 8 | Pool Table | 0 | 5 | 400 | 15 |\r\n| 9 | Spa | 20 | 30 | 100000 | 800 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\ninsert into cd.facilities\r\n (facid, name, membercost, guestcost, initialoutlay, monthlymaintenance)\r\n select (select max(facid) from cd.facilities)+1, 'Spa', 20, 30, 100000, 800; \r\n```\r\n\r\nIn the previous exercises we used `VALUES` to insert constant data into the facilities table. Here, though, we have a new requirement: a dynamically generated ID. This gives us a real quality of life improvement, as we don't have to manually work out what the current largest ID is: the SQL command does it for us.\r\n\r\nSince the `VALUES` clause is only used to supply constant data, we need to replace it with a query instead. The `SELECT` statement is fairly simple: there's an inner subquery that works out the next facid based on the largest current id, and the rest is just constant data. The output of the statement is a row that we insert into the facilities table.\r\n\r\nWhile this works fine in our simple example, it's not how you would generally implement an incrementing ID in the real world. Postgres provides `SERIAL` types that are auto-filled with the next ID when you insert a row. As well as saving us effort, these types are also safer: unlike the answer given in this exercise, there's no need to worry about concurrent operations generating the same ID.\r\n\r\n\r\n\r\n### Update some existing data\r\n\r\nWe made a mistake when entering the data for the second tennis court. The initial outlay was 10000 rather than 8000: you need to alter the data to fix the error. \r\n\r\n\r\nExpected results:\r\n\r\n| facid | name | membercost | guestcost | initialoutlay | monthlymaintenance |\r\n| ----- | --------------- | ---------- | --------- | ------------- | ------------------ |\r\n| 0 | Tennis Court 1 | 5 | 25 | 10000 | 200 |\r\n| 1 | Tennis Court 2 | 5 | 25 | 10000 | 200 |\r\n| 2 | Badminton Court | 0 | 15.5 | 4000 | 50 |\r\n| 3 | Table Tennis | 0 | 5 | 320 | 10 |\r\n| 4 | Massage Room 1 | 35 | 80 | 4000 | 3000 |\r\n| 5 | Massage Room 2 | 35 | 80 | 4000 | 3000 |\r\n| 6 | Squash Court | 3.5 | 17.5 | 5000 | 80 |\r\n| 7 | Snooker Table | 0 | 5 | 450 | 15 |\r\n| 8 | Pool Table | 0 | 5 | 400 | 15 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nupdate cd.facilities\r\n set initialoutlay = 10000\r\n where facid = 1; \r\n```\r\n\r\nThe `UPDATE` statement is used to alter existing data. If you're familiar with `SELECT` queries, it's pretty easy to read: the `WHERE` clause works in exactly the same fashion, allowing us to filter the set of rows we want to work with. These rows are then modified according to the specifications of the `SET` clause: in this case, setting the initial outlay.\r\n\r\nThe `WHERE` clause is extremely important. It's easy to get it wrong or even omit it, with disastrous results. Consider the following command:\r\n\r\n```sql\r\nupdate cd.facilities\r\n set initialoutlay = 10000;\r\n```\r\n\r\nThere's no `WHERE` clause to filter for the rows we're interested in. The result of this is that the update runs on every row in the table! This is rarely what we want to happen.\r\n\r\n\r\n\r\n### Update multiple rows and columns at the same time\r\n\r\nWe want to increase the price of the tennis courts for both members and guests. Update the costs to be 6 for members, and 30 for guests.\r\n\r\n\r\n\r\n\r\n\r\n| facid | name | membercost | guestcost | initialoutlay | monthlymaintenance |\r\n| ----- | --------------- | ---------- | --------- | ------------- | ------------------ |\r\n| 0 | Tennis Court 1 | 6 | 30 | 10000 | 200 |\r\n| 1 | Tennis Court 2 | 6 | 30 | 8000 | 200 |\r\n| 2 | Badminton Court | 0 | 15.5 | 4000 | 50 |\r\n| 3 | Table Tennis | 0 | 5 | 320 | 10 |\r\n| 4 | Massage Room 1 | 35 | 80 | 4000 | 3000 |\r\n| 5 | Massage Room 2 | 35 | 80 | 4000 | 3000 |\r\n| 6 | Squash Court | 3.5 | 17.5 | 5000 | 80 |\r\n| 7 | Snooker Table | 0 | 5 | 450 | 15 |\r\n| 8 | Pool Table | 0 | 5 | 400 | 15 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nupdate cd.facilities\r\n set\r\n membercost = 6,\r\n guestcost = 30\r\n where facid in (0,1); \r\n```\r\n\r\n The `SET` clause accepts a comma separated list of values that you want to update. \r\n\r\n\r\n\r\n### Update a row based on the contents of another row\r\n\r\nWe want to alter the price of the second tennis court so that it costs 10% more than the first one. Try to do this without using constant values for the prices, so that we can reuse the statement if we want to. \r\n\r\n\r\nExpected results:\r\n\r\n| facid | name | membercost | guestcost | initialoutlay | monthlymaintenance |\r\n| ----- | --------------- | ---------- | --------- | ------------- | ------------------ |\r\n| 0 | Tennis Court 1 | 5 | 25 | 10000 | 200 |\r\n| 1 | Tennis Court 2 | 5.5 | 27.5 | 8000 | 200 |\r\n| 2 | Badminton Court | 0 | 15.5 | 4000 | 50 |\r\n| 3 | Table Tennis | 0 | 5 | 320 | 10 |\r\n| 4 | Massage Room 1 | 35 | 80 | 4000 | 3000 |\r\n| 5 | Massage Room 2 | 35 | 80 | 4000 | 3000 |\r\n| 6 | Squash Court | 3.5 | 17.5 | 5000 | 80 |\r\n| 7 | Snooker Table | 0 | 5 | 450 | 15 |\r\n| 8 | Pool Table | 0 | 5 | 400 | 15 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nupdate cd.facilities facs\r\n set\r\n membercost = (select membercost * 1.1 from cd.facilities where facid = 0),\r\n guestcost = (select guestcost * 1.1 from cd.facilities where facid = 0)\r\n where facs.facid = 1; \r\n```\r\n\r\nUpdating columns based on calculated data is not too intrinsically difficult: we can do so pretty easily using subqueries. You can see this approach in our selected answer. \r\n\r\nAs the number of columns we want to update increases, standard SQL can start to get pretty awkward: you don't want to be specifying a separate subquery for each of 15 different column updates. Postgres provides a nonstandard extension to SQL called `UPDATE...FROM` that addresses this: it allows you to supply a `FROM` clause to generate values for use in the `SET` clause. Example below: \r\n\r\n```sql\r\nupdate cd.facilities facs\r\n set\r\n membercost = facs2.membercost * 1.1,\r\n guestcost = facs2.guestcost * 1.1\r\n from (select * from cd.facilities where facid = 0) facs2\r\n where facs.facid = 1;\r\n```\r\n\r\n\r\n\r\n### Delete all bookings\r\n\r\nAs part of a clearout of our database, we want to delete all bookings from the cd.bookings table. How can we accomplish this?\r\n\r\n\r\nExpected results:\r\n\r\n| bookid | facid | memid | starttime | slots |\r\n| ------ | ----- | ----- | --------- | ----- |\r\n| | | | | |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\ndelete from cd.bookings; \r\n```\r\n\r\nThe `DELETE` statement does what it says on the tin: deletes rows from the table. Here, we show the command in its simplest form, with no qualifiers. In this case, it deletes everything from the table. Obviously, you should be careful with your deletes and make sure they're always limited - we'll see how to do that in the next exercise. \r\n\r\nAn alternative to unqualified `DELETE`s is the following: \r\n\r\n```sql\r\ntruncate cd.bookings;\r\n```\r\n\r\n`TRUNCATE` also deletes everything in the table, but does so using a quicker underlying mechanism. It's not [perfectly safe in all circumstances](https://www.postgresql.org/docs/9.6/static/mvcc-caveats.html), though, so use judiciously. When in doubt, use `DELETE`. \r\n\r\n\r\n\r\n### Delete a member from the cd.members table\r\n\r\nWe want to remove member 37, who has never made a booking, from our database. How can we achieve that? \r\n\r\n\r\nExpected results:\r\n\r\n| memid | surname | firstname | address | zipcode | telephone | recommendedby | joindate |\r\n| ----- | ----------------- | --------- | --------------------------------------- | ------- | -------------- | ------------- | ------------------- |\r\n| 0 | GUEST | GUEST | GUEST | 0 | (000) 000-0000 | | 2012-07-01 00:00:00 |\r\n| 1 | Smith | Darren | 8 Bloomsbury Close, Boston | 4321 | 555-555-5555 | | 2012-07-02 12:02:05 |\r\n| 2 | Smith | Tracy | 8 Bloomsbury Close, New York | 4321 | 555-555-5555 | | 2012-07-02 12:08:23 |\r\n| 3 | Rownam | Tim | 23 Highway Way, Boston | 23423 | (844) 693-0723 | | 2012-07-03 09:32:15 |\r\n| 4 | Joplette | Janice | 20 Crossing Road, New York | 234 | (833) 942-4710 | 1 | 2012-07-03 10:25:05 |\r\n| 5 | Butters | Gerald | 1065 Huntingdon Avenue, Boston | 56754 | (844) 078-4130 | 1 | 2012-07-09 10:44:09 |\r\n| 6 | Tracy | Burton | 3 Tunisia Drive, Boston | 45678 | (822) 354-9973 | | 2012-07-15 08:52:55 |\r\n| 7 | Dare | Nancy | 6 Hunting Lodge Way, Boston | 10383 | (833) 776-4001 | 4 | 2012-07-25 08:59:12 |\r\n| 8 | Boothe | Tim | 3 Bloomsbury Close, Reading, 00234 | 234 | (811) 433-2547 | 3 | 2012-07-25 16:02:35 |\r\n| 9 | Stibbons | Ponder | 5 Dragons Way, Winchester | 87630 | (833) 160-3900 | 6 | 2012-07-25 17:09:05 |\r\n| 10 | Owen | Charles | 52 Cheshire Grove, Winchester, 28563 | 28563 | (855) 542-5251 | 1 | 2012-08-03 19:42:37 |\r\n| 11 | Jones | David | 976 Gnats Close, Reading | 33862 | (844) 536-8036 | 4 | 2012-08-06 16:32:55 |\r\n| 12 | Baker | Anne | 55 Powdery Street, Boston | 80743 | 844-076-5141 | 9 | 2012-08-10 14:23:22 |\r\n| 13 | Farrell | Jemima | 103 Firth Avenue, North Reading | 57392 | (855) 016-0163 | | 2012-08-10 14:28:01 |\r\n| 14 | Smith | Jack | 252 Binkington Way, Boston | 69302 | (822) 163-3254 | 1 | 2012-08-10 16:22:05 |\r\n| 15 | Bader | Florence | 264 Ursula Drive, Westford | 84923 | (833) 499-3527 | 9 | 2012-08-10 17:52:03 |\r\n| 16 | Baker | Timothy | 329 James Street, Reading | 58393 | 833-941-0824 | 13 | 2012-08-15 10:34:25 |\r\n| 17 | Pinker | David | 5 Impreza Road, Boston | 65332 | 811 409-6734 | 13 | 2012-08-16 11:32:47 |\r\n| 20 | Genting | Matthew | 4 Nunnington Place, Wingfield, Boston | 52365 | (811) 972-1377 | 5 | 2012-08-19 14:55:55 |\r\n| 21 | Mackenzie | Anna | 64 Perkington Lane, Reading | 64577 | (822) 661-2898 | 1 | 2012-08-26 09:32:05 |\r\n| 22 | Coplin | Joan | 85 Bard Street, Bloomington, Boston | 43533 | (822) 499-2232 | 16 | 2012-08-29 08:32:41 |\r\n| 24 | Sarwin | Ramnaresh | 12 Bullington Lane, Boston | 65464 | (822) 413-1470 | 15 | 2012-09-01 08:44:42 |\r\n| 26 | Jones | Douglas | 976 Gnats Close, Reading | 11986 | 844 536-8036 | 11 | 2012-09-02 18:43:05 |\r\n| 27 | Rumney | Henrietta | 3 Burkington Plaza, Boston | 78533 | (822) 989-8876 | 20 | 2012-09-05 08:42:35 |\r\n| 28 | Farrell | David | 437 Granite Farm Road, Westford | 43532 | (855) 755-9876 | | 2012-09-15 08:22:05 |\r\n| 29 | Worthington-Smyth | Henry | 55 Jagbi Way, North Reading | 97676 | (855) 894-3758 | 2 | 2012-09-17 12:27:15 |\r\n| 30 | Purview | Millicent | 641 Drudgery Close, Burnington, Boston | 34232 | (855) 941-9786 | 2 | 2012-09-18 19:04:01 |\r\n| 33 | Tupperware | Hyacinth | 33 Cheerful Plaza, Drake Road, Westford | 68666 | (822) 665-5327 | | 2012-09-18 19:32:05 |\r\n| 35 | Hunt | John | 5 Bullington Lane, Boston | 54333 | (899) 720-6978 | 30 | 2012-09-19 11:32:45 |\r\n| 36 | Crumpet | Erica | Crimson Road, North Reading | 75655 | (811) 732-4816 | 2 | 2012-09-22 08:36:38 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\ndelete from cd.members where memid = 37; \r\n```\r\n\r\nThis exercise is a small increment on our previous one. Instead of deleting all bookings, this time we want to be a bit more targeted, and delete a single member that has never made a booking. To do this, we simply have to add a `WHERE` clause to our command, specifying the member we want to delete. You can see the parallels with `SELECT` and `UPDATE` statements here. \r\n\r\nThere's one interesting wrinkle here. Try this command out, but substituting in member id 0 instead. This member has made many bookings, and you'll find that the delete fails with an error about a foreign key constraint violation. This is an important concept in relational databases, so let's explore a little further. \r\n\r\nForeign keys are a mechanism for defining relationships between columns of different tables. In our case we use them to specify that the memid column of the bookings table is related to the memid column of the members table. The relationship (or 'constraint') specifies that for a given booking, the member specified in the booking **must** exist in the members table. It's useful to have this guarantee enforced by the database: it means that code using the database can rely on the presence of the member. It's hard (even impossible) to enforce this at higher levels: concurrent operations can interfere and leave your database in a broken state. \r\n\r\nPostgreSQL supports various different kinds of constraints that allow you to enforce structure upon your data. For more information on constraints, check out the PostgreSQL documentation on [foreign keys](https://www.postgresql.org/docs/9.6/static/ddl-constraints.html) \r\n\r\n\r\n\r\n### Delete based on a subquery\r\n\r\nIn our previous exercises, we deleted a specific member who had never made a booking. How can we make that more general, to delete all members who have never made a booking? \r\n\r\n\r\nExpected results:\r\n\r\n| memid | surname | firstname | address | zipcode | telephone | recommendedby | joindate |\r\n| ----- | ----------------- | --------- | --------------------------------------- | ------- | -------------- | ------------- | ------------------- |\r\n| 0 | GUEST | GUEST | GUEST | 0 | (000) 000-0000 | | 2012-07-01 00:00:00 |\r\n| 1 | Smith | Darren | 8 Bloomsbury Close, Boston | 4321 | 555-555-5555 | | 2012-07-02 12:02:05 |\r\n| 2 | Smith | Tracy | 8 Bloomsbury Close, New York | 4321 | 555-555-5555 | | 2012-07-02 12:08:23 |\r\n| 3 | Rownam | Tim | 23 Highway Way, Boston | 23423 | (844) 693-0723 | | 2012-07-03 09:32:15 |\r\n| 4 | Joplette | Janice | 20 Crossing Road, New York | 234 | (833) 942-4710 | 1 | 2012-07-03 10:25:05 |\r\n| 5 | Butters | Gerald | 1065 Huntingdon Avenue, Boston | 56754 | (844) 078-4130 | 1 | 2012-07-09 10:44:09 |\r\n| 6 | Tracy | Burton | 3 Tunisia Drive, Boston | 45678 | (822) 354-9973 | | 2012-07-15 08:52:55 |\r\n| 7 | Dare | Nancy | 6 Hunting Lodge Way, Boston | 10383 | (833) 776-4001 | 4 | 2012-07-25 08:59:12 |\r\n| 8 | Boothe | Tim | 3 Bloomsbury Close, Reading, 00234 | 234 | (811) 433-2547 | 3 | 2012-07-25 16:02:35 |\r\n| 9 | Stibbons | Ponder | 5 Dragons Way, Winchester | 87630 | (833) 160-3900 | 6 | 2012-07-25 17:09:05 |\r\n| 10 | Owen | Charles | 52 Cheshire Grove, Winchester, 28563 | 28563 | (855) 542-5251 | 1 | 2012-08-03 19:42:37 |\r\n| 11 | Jones | David | 976 Gnats Close, Reading | 33862 | (844) 536-8036 | 4 | 2012-08-06 16:32:55 |\r\n| 12 | Baker | Anne | 55 Powdery Street, Boston | 80743 | 844-076-5141 | 9 | 2012-08-10 14:23:22 |\r\n| 13 | Farrell | Jemima | 103 Firth Avenue, North Reading | 57392 | (855) 016-0163 | | 2012-08-10 14:28:01 |\r\n| 14 | Smith | Jack | 252 Binkington Way, Boston | 69302 | (822) 163-3254 | 1 | 2012-08-10 16:22:05 |\r\n| 15 | Bader | Florence | 264 Ursula Drive, Westford | 84923 | (833) 499-3527 | 9 | 2012-08-10 17:52:03 |\r\n| 16 | Baker | Timothy | 329 James Street, Reading | 58393 | 833-941-0824 | 13 | 2012-08-15 10:34:25 |\r\n| 17 | Pinker | David | 5 Impreza Road, Boston | 65332 | 811 409-6734 | 13 | 2012-08-16 11:32:47 |\r\n| 20 | Genting | Matthew | 4 Nunnington Place, Wingfield, Boston | 52365 | (811) 972-1377 | 5 | 2012-08-19 14:55:55 |\r\n| 21 | Mackenzie | Anna | 64 Perkington Lane, Reading | 64577 | (822) 661-2898 | 1 | 2012-08-26 09:32:05 |\r\n| 22 | Coplin | Joan | 85 Bard Street, Bloomington, Boston | 43533 | (822) 499-2232 | 16 | 2012-08-29 08:32:41 |\r\n| 24 | Sarwin | Ramnaresh | 12 Bullington Lane, Boston | 65464 | (822) 413-1470 | 15 | 2012-09-01 08:44:42 |\r\n| 26 | Jones | Douglas | 976 Gnats Close, Reading | 11986 | 844 536-8036 | 11 | 2012-09-02 18:43:05 |\r\n| 27 | Rumney | Henrietta | 3 Burkington Plaza, Boston | 78533 | (822) 989-8876 | 20 | 2012-09-05 08:42:35 |\r\n| 28 | Farrell | David | 437 Granite Farm Road, Westford | 43532 | (855) 755-9876 | | 2012-09-15 08:22:05 |\r\n| 29 | Worthington-Smyth | Henry | 55 Jagbi Way, North Reading | 97676 | (855) 894-3758 | 2 | 2012-09-17 12:27:15 |\r\n| 30 | Purview | Millicent | 641 Drudgery Close, Burnington, Boston | 34232 | (855) 941-9786 | 2 | 2012-09-18 19:04:01 |\r\n| 33 | Tupperware | Hyacinth | 33 Cheerful Plaza, Drake Road, Westford | 68666 | (822) 665-5327 | | 2012-09-18 19:32:05 |\r\n| 35 | Hunt | John | 5 Bullington Lane, Boston | 54333 | (899) 720-6978 | 30 | 2012-09-19 11:32:45 |\r\n| 36 | Crumpet | Erica | Crimson Road, North Reading | 75655 | (811) 732-4816 | 2 | 2012-09-22 08:36:38 |\r\n\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\ndelete from cd.members where memid not in (select memid from cd.bookings); \r\n```\r\n\r\nWe can use subqueries to determine whether a row should be deleted or not. There's a couple of standard ways to do this. In our featured answer, the subquery produces a list of all the different member ids in the cd.bookings table. If a row in the table isn't in the list generated by the subquery, it gets deleted. \r\n\r\nAn alternative is to use a *correlated subquery*. Where our previous example runs a large subquery once, the correlated approach instead specifies a smaller subqueryto run against every row. \r\n\r\n```sql\r\ndelete from cd.members mems where not exists (select 1 from cd.bookings where memid = mems.memid);\r\n```\r\n\r\nThe two different forms can have different performance characteristics. Under the hood, your database engine is free to transform your query to execute it in a correlated or uncorrelated fashion, though, so things can be a little hard to predict. \r\n\r\n***\r\n\r\n## Aggregation\r\n\r\nAggregation is one of those capabilities that really make you appreciate the power of relational database systems. It allows you to move beyond merely persisting your data, into the realm of asking truly interesting questions that can be used to inform decision making. This category covers aggregation at length, making use of standard grouping as well as more recent window functions.\r\n\r\nIf you struggle with these questions, I strongly recommend [Learning SQL](http://shop.oreilly.com/product/9780596007270.do), by Alan Beaulieu and [SQL Cookbook](http://shop.oreilly.com/product/9780596009762.do) by Anthony Molinaro. In fact, get the latter anyway - it'll take you beyond anything you find on this site, and on multiple different database systems to boot.\r\n\r\n\r\n\r\n### Count the number of facilities\r\n\r\nFor our first foray into aggregates, we're going to stick to something simple. We want to know how many facilities exist - simply produce a total count. \r\n\r\n\r\nExpected results:\r\n\r\n| count |\r\n| ----- |\r\n| 9 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect count(*) from cd.facilities; \r\n```\r\n\r\nAggregation starts out pretty simply! The SQL above selects everything from our facilities table, and then counts the number of rows in the result set. The count function has a variety of uses: \r\n\r\n- `COUNT(*)` simply returns the number of rows \r\n- `COUNT(address)` counts the number of non-null addresses in the result set. \r\n- Finally, `COUNT(DISTINCT address)` counts the number of *different* addresses in the facilities table. \r\n\r\n\r\n\r\nThe basic idea of an aggregate function is that it takes in a column of data, performs some function upon it, and outputs a *scalar* (single) value. There are a bunch more aggregation functions, including `MAX`, `MIN`, `SUM`, and `AVG`. These all do pretty much what you'd expect from their names :-).\r\n\r\nOne aspect of aggregate functions that people often find confusing is in queries like the below:\r\n\r\n```sql\r\nselect facid, count(*) from cd.facilities\r\n```\r\n\r\nTry it out, and you'll find that it doesn't work. This is because count(*) wants to collapse the facilities table into a single value - unfortunately, it can't do that, because there's a lot of different facids in cd.facilities - Postgres doesn't know which facid to pair the count with.\r\n\r\nInstead, if you wanted a query that returns all the facids along with a count on each row, you can break the aggregation out into a subquery as below:\r\n\r\n```sql\r\nselect facid, \r\n\t(select count(*) from cd.facilities)\r\n\tfrom cd.facilities\r\n```\r\n\r\nWhen we have a subquery that returns a scalar value like this, Postgres knows to simply repeat the value for every row in cd.facilities.\r\n\r\n\r\n\r\n### Count the number of expensive facilities\r\n\r\nProduce a count of the number of facilities that have a cost to guests of 10 or more. \r\n\r\n\r\n\r\n\r\n\r\n| count |\r\n| ----- |\r\n| 6 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect count(*) from cd.facilities where guestcost >= 10; \r\n```\r\n\r\nThis one is only a simple modification to the previous question: we need to weed out the inexpensive facilities. This is easy to do using a `WHERE` clause. Our aggregation can now only see the expensive facilities. \r\n\r\n\r\n\r\n### Count the number of recommendations each member makes\r\n\r\nProduce a count of the number of recommendations each member has made. Order by member ID. \r\n\r\n\r\nExpected results:\r\n\r\n| recommendedby | count |\r\n| ------------- | ----- |\r\n| 1 | 5 |\r\n| 2 | 3 |\r\n| 3 | 1 |\r\n| 4 | 2 |\r\n| 5 | 1 |\r\n| 6 | 1 |\r\n| 9 | 2 |\r\n| 11 | 1 |\r\n| 13 | 2 |\r\n| 15 | 1 |\r\n| 16 | 1 |\r\n| 20 | 1 |\r\n| 30 | 1 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect recommendedby, count(*) \r\n\tfrom cd.members\r\n\twhere recommendedby is not null\r\n\tgroup by recommendedby\r\norder by recommendedby; \r\n```\r\n\r\nPreviously, we've seen that aggregation functions are applied to a column of values, and convert them into an aggregated scalar value. This is useful, but we often find that we don't want just a single aggregated result: for example, instead of knowing the total amount of money the club has made this month, I might want to know how much money each different facility has made, or which times of day were most lucrative.\r\n\r\nIn order to support this kind of behaviour, SQL has the `GROUP BY` construct. What this does is batch the data together into groups, and run the aggregation function separately for each group. When you specify a `GROUP BY`, the database produces an aggregated value for each distinct value in the supplied columns. In this case, we're saying 'for each distinct value of recommendedby, get me the number of times that value appears'.\r\n\r\n\r\n\r\n### List the total slots booked per facility\r\n\r\nProduce a list of the total number of slots booked per facility. For now, just produce an output table consisting of facility id and slots, sorted by facility id. \r\n\r\n\r\nExpected results:\r\n\r\n| facid | Total Slots |\r\n| ----- | ----------- |\r\n| 0 | 1320 |\r\n| 1 | 1278 |\r\n| 2 | 1209 |\r\n| 3 | 830 |\r\n| 4 | 1404 |\r\n| 5 | 228 |\r\n| 6 | 1104 |\r\n| 7 | 908 |\r\n| 8 | 911 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect facid, sum(slots) as \"Total Slots\"\r\n\tfrom cd.bookings\r\n\tgroup by facid\r\norder by facid; \r\n```\r\n\r\nOther than the fact that we've introduced the `SUM` aggregate function, there's not a great deal to say about this exercise. For each distinct facility id, the `SUM` function adds together everything in the slots column. \r\n\r\n\r\n\r\n### List the total slots booked per facility in a given month\r\n\r\nProduce a list of the total number of slots booked per facility in the month of September 2012. Produce an output table consisting of facility id and slots, sorted by the number of slots.\r\n\r\n\r\nExpected results:\r\n\r\n| facid | Total Slots |\r\n| ----- | ----------- |\r\n| 5 | 122 |\r\n| 3 | 422 |\r\n| 7 | 426 |\r\n| 8 | 471 |\r\n| 6 | 540 |\r\n| 2 | 570 |\r\n| 1 | 588 |\r\n| 0 | 591 |\r\n| 4 | 648 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect facid, sum(slots) as \"Total Slots\"\r\n\tfrom cd.bookings\r\n\twhere\r\n\t\tstarttime >= '2012-09-01'\r\n\t\tand starttime < '2012-10-01'\r\n\tgroup by facid\r\norder by sum(slots); \r\n```\r\n\r\nThis is only a minor alteration of our previous example. Remember that aggregation happens after the `WHERE` clause is evaluated: we thus use the `WHERE` to restrict the data we aggregate over, and our aggregation only sees data from a single month.\r\n\r\n\r\n\r\n### List the total slots booked per facility per month\r\n\r\nProduce a list of the total number of slots booked per facility per month in the year of 2012. Produce an output table consisting of facility id and slots, sorted by the id and month. \r\n\r\n\r\nExpected results:\r\n\r\n| facid | month | Total Slots |\r\n| ----- | ----- | ----------- |\r\n| 0 | 7 | 270 |\r\n| 0 | 8 | 459 |\r\n| 0 | 9 | 591 |\r\n| 1 | 7 | 207 |\r\n| 1 | 8 | 483 |\r\n| 1 | 9 | 588 |\r\n| 2 | 7 | 180 |\r\n| 2 | 8 | 459 |\r\n| 2 | 9 | 570 |\r\n| 3 | 7 | 104 |\r\n| 3 | 8 | 304 |\r\n| 3 | 9 | 422 |\r\n| 4 | 7 | 264 |\r\n| 4 | 8 | 492 |\r\n| 4 | 9 | 648 |\r\n| 5 | 7 | 24 |\r\n| 5 | 8 | 82 |\r\n| 5 | 9 | 122 |\r\n| 6 | 7 | 164 |\r\n| 6 | 8 | 400 |\r\n| 6 | 9 | 540 |\r\n| 7 | 7 | 156 |\r\n| 7 | 8 | 326 |\r\n| 7 | 9 | 426 |\r\n| 8 | 7 | 117 |\r\n| 8 | 8 | 322 |\r\n| 8 | 9 | 471 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect facid, extract(month from starttime) as month, sum(slots) as \"Total Slots\"\r\n\tfrom cd.bookings\r\n\twhere\r\n\t\tstarttime >= '2012-01-01'\r\n\t\tand starttime < '2013-01-01'\r\n\tgroup by facid, month\r\norder by facid, month; \r\n```\r\n\r\nThe main piece of new functionality in this question is the `EXTRACT` function. `EXTRACT` allows you to get individual components of a timestamp, like day, month, year, etc. We group by the output of this function to provide per-month values. An alternative, if we needed to distinguish between the same month in different years, is to make use of the `DATE_TRUNC` function, which truncates a date to a given granularity.\r\n\r\nIt's also worth noting that this is the first time we've truly made use of the ability to group by more than one column.\r\n\r\n\r\n\r\n### Find the count of members who have made at least one booking\r\n\r\nFind the total number of members who have made at least one booking. \r\n\r\n\r\nExpected results:\r\n\r\n| count |\r\n| ----- |\r\n| 30 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect count(distinct memid) from cd.bookings \r\n```\r\n\r\nYour first instinct may be to go for a subquery here. Something like the below:\r\n\r\n```sql\r\nselect count(*) from \r\n\t(select distinct memid from cd.bookings) as mems\r\n```\r\n\r\nThis does work perfectly well, but we can simplify a touch with the help of a little extra knowledge in the form of `COUNT DISTINCT`. This does what you might expect, counting the distinct values in the passed column.\r\n\r\n\r\n\r\n### List facilities with more than 1000 slots booked\r\n\r\nProduce a list of facilities with more than 1000 slots booked. Produce an output table consisting of facility id and hours, sorted by facility id. \r\n\r\n\r\nExpected results:\r\n\r\n| facid | Total Slots |\r\n| ----- | ----------- |\r\n| 0 | 1320 |\r\n| 1 | 1278 |\r\n| 2 | 1209 |\r\n| 4 | 1404 |\r\n| 6 | 1104 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect facid, sum(slots) as \"Total Slots\"\r\n from cd.bookings\r\n group by facid\r\n having sum(slots) > 1000\r\n order by facid \r\n```\r\n\r\nIt turns out that there's actually an SQL keyword designed to help with the filtering of output from aggregate functions. This keyword is `HAVING`.\r\n\r\nThe behaviour of `HAVING` is easily confused with that of `WHERE`. The best way to think about it is that in the context of a query with an aggregate function, `WHERE` is used to filter what data gets input into the aggregate function, while `HAVING` is used to filter the data once it is output from the function. Try experimenting to explore this difference!\r\n\r\n\r\n\r\n### Find the total revenue of each facility\r\n\r\nProduce a list of facilities along with their total revenue. The output table should consist of facility name and revenue, sorted by revenue. Remember that there's a different cost for guests and members! \r\n\r\n\r\nExpected results:\r\n\r\n| name | revenue |\r\n| --------------- | ------- |\r\n| Table Tennis | 180 |\r\n| Snooker Table | 240 |\r\n| Pool Table | 270 |\r\n| Badminton Court | 1906.5 |\r\n| Squash Court | 13468.0 |\r\n| Tennis Court 1 | 13860 |\r\n| Tennis Court 2 | 14310 |\r\n| Massage Room 2 | 15810 |\r\n| Massage Room 1 | 72540 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect facs.name, sum(slots * case\r\n\t\t\twhen memid = 0 then facs.guestcost\r\n\t\t\telse facs.membercost\r\n\t\tend) as revenue\r\n\tfrom cd.bookings bks\r\n\tinner join cd.facilities facs\r\n\t\ton bks.facid = facs.facid\r\n\tgroup by facs.name\r\norder by revenue; \r\n```\r\n\r\n The only real complexity in this query is that guests (member ID 0) have a different cost to everyone else. We use a case statement to produce the cost for each session, and then sum each of those sessions, grouped by facility. \r\n\r\n\r\n\r\n### Find facilities with a total revenue less than 1000\r\n\r\nProduce a list of facilities with a total revenue less than 1000. Produce an output table consisting of facility name and revenue, sorted by revenue. Remember that there's a different cost for guests and members! \r\n\r\n\r\nExpected results:\r\n\r\n| name | revenue |\r\n| ------------- | ------- |\r\n| Table Tennis | 180 |\r\n| Snooker Table | 240 |\r\n| Pool Table | 270 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect name, revenue from (\r\n\tselect facs.name, sum(case \r\n\t\t\t\twhen memid = 0 then slots * facs.guestcost\r\n\t\t\t\telse slots * membercost\r\n\t\t\tend) as revenue\r\n\t\tfrom cd.bookings bks\r\n\t\tinner join cd.facilities facs\r\n\t\t\ton bks.facid = facs.facid\r\n\t\tgroup by facs.name\r\n\t) as agg where revenue < 1000\r\norder by revenue; \r\n```\r\n\r\nYou may well have tried to use the `HAVING` keyword we introduced in an earlier exercise, producing something like below:\r\n\r\n```sql\r\nselect facs.name, sum(case \r\n\t\twhen memid = 0 then slots * facs.guestcost\r\n\t\telse slots * membercost\r\n\tend) as revenue\r\n\tfrom cd.bookings bks\r\n\tinner join cd.facilities facs\r\n\t\ton bks.facid = facs.facid\r\n\tgroup by facs.name\r\n\thaving revenue < 1000\r\norder by revenue;\r\n```\r\n\r\nUnfortunately, this doesn't work! You'll get an error along the lines of `ERROR: column \"revenue\" does not exist`. Postgres, unlike some other RDBMSs like SQL Server and MySQL, doesn't support putting column names in the `HAVING` clause. This means that for this query to work, you'd have to produce something like below:\r\n\r\n```sql\r\nselect facs.name, sum(case \r\n\t\twhen memid = 0 then slots * facs.guestcost\r\n\t\telse slots * membercost\r\n\tend) as revenue\r\n\tfrom cd.bookings bks\r\n\tinner join cd.facilities facs\r\n\t\ton bks.facid = facs.facid\r\n\tgroup by facs.name\r\n\thaving sum(case \r\n\t\twhen memid = 0 then slots * facs.guestcost\r\n\t\telse slots * membercost\r\n\tend) < 1000\r\norder by revenue;\r\n```\r\n\r\nHaving to repeat significant calculation code like this is messy, so our anointed solution instead just wraps the main query body as a subquery, and selects from it using a `WHERE` clause. In general, I recommend using `HAVING` for simple queries, as it increases clarity. Otherwise, this subquery approach is often easier to use.\r\n\r\n\r\n\r\n### Output the facility id that has the highest number of slots booked\r\n\r\nOutput the facility id that has the highest number of slots booked. For bonus points, try a version without a LIMIT clause. This version will probably look messy! \r\n\r\n\r\nExpected results:\r\n\r\n| facid | Total Slots |\r\n| ----- | ----------- |\r\n| 4 | 1404 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect facid, sum(slots) as \"Total Slots\"\r\n\tfrom cd.bookings\r\n\tgroup by facid\r\norder by sum(slots) desc\r\nLIMIT 1; \r\n```\r\n\r\nLet's start off with what's arguably the simplest way to do this: produce a list of facility IDs and the total number of slots used, order by the total number of slots used, and pick only the top result.\r\n\r\nIt's worth realising, though, that this method has a significant weakness. In the event of a tie, we will still only get one result! To get all the relevant results, we might try using the `MAX` aggregate function, something like below:\r\n\r\n```sql\r\nselect facid, max(totalslots) from (\r\n\tselect facid, sum(slots) as totalslots \r\n\t\tfrom cd.bookings \r\n\t\tgroup by facid\r\n\t) as sub group by facid\r\n```\r\n\r\nThe intent of this query is to get the highest totalslots value and its associated facid(s). Unfortunately, this just won't work! In the event of multiple facids having the same number of slots booked, it would be ambiguous which facid should be paired up with the single (or *scalar*) value coming out of the `MAX` function. This means that Postgres will tell you that facid ought to be in a `GROUP BY` section, which won't produce the results we're looking for.\r\n\r\nLet's take a first stab at a working query:\r\n\r\n```sql\r\nselect facid, sum(slots) as totalslots\r\n\tfrom cd.bookings\r\n\tgroup by facid\r\n\thaving sum(slots) = (select max(sum2.totalslots) from\r\n\t\t(select sum(slots) as totalslots\r\n\t\tfrom cd.bookings\r\n\t\tgroup by facid\r\n\t\t) as sum2);\r\n```\r\n\r\nThe query produces a list of facility IDs and number of slots used, and then uses a HAVING clause that works out the maximum totalslots value. We're essentially saying: 'produce a list of facids and their number of slots booked, and filter out all the ones that doen't have a number of slots booked equal to the maximum.'\r\n\r\nUseful as `HAVING` is, however, our query is pretty ugly. To improve on that, let's introduce another new concept: [Common Table Expressions](http://www.postgresql.org/docs/current/static/queries-with.html) (CTEs). CTEs can be thought of as allowing you to define a database view inline in your query. It's really helpful in situations like this, where you're having to repeat yourself a lot. \r\n\r\nCTEs are declared in the form `WITH CTEName as (SQL-Expression)`. You can see our query redefined to use a CTE below:\r\n\r\n```sql\r\nwith sum as (select facid, sum(slots) as totalslots\r\n\tfrom cd.bookings\r\n\tgroup by facid\r\n)\r\nselect facid, totalslots \r\n\tfrom sum\r\n\twhere totalslots = (select max(totalslots) from sum);\r\n```\r\n\r\nYou can see that we've factored out our repeated selections from cd.bookings into a single CTE, and made the query a lot simpler to read in the process!\r\n\r\nBUT WAIT. There's more. It's also possible to complete this problem using Window Functions. We'll leave these until later, but even better solutions to problems like these are available.\r\n\r\nThat's a lot of information for a single exercise. Don't worry too much if you don't get it all right now - we'll reuse these concepts in later exercises.\r\n\r\n\r\n\r\n### List the total slots booked per facility per month, Part 2\r\n\r\nProduce a list of the total number of slots booked per facility per month in the year of 2012. In this version, include output rows containing totals for all months per facility, and a total for all months for all facilities. The output table should consist of facility id, month and slots, sorted by the id and month. When calculating the aggregated values for all months and all facids, return null values in the month and facid columns. \r\n\r\n\r\nExpected results:\r\n\r\n| facid | month | slots |\r\n| ----- | ----- | ----- |\r\n| 0 | 7 | 270 |\r\n| 0 | 8 | 459 |\r\n| 0 | 9 | 591 |\r\n| 0 | | 1320 |\r\n| 1 | 7 | 207 |\r\n| 1 | 8 | 483 |\r\n| 1 | 9 | 588 |\r\n| 1 | | 1278 |\r\n| 2 | 7 | 180 |\r\n| 2 | 8 | 459 |\r\n| 2 | 9 | 570 |\r\n| 2 | | 1209 |\r\n| 3 | 7 | 104 |\r\n| 3 | 8 | 304 |\r\n| 3 | 9 | 422 |\r\n| 3 | | 830 |\r\n| 4 | 7 | 264 |\r\n| 4 | 8 | 492 |\r\n| 4 | 9 | 648 |\r\n| 4 | | 1404 |\r\n| 5 | 7 | 24 |\r\n| 5 | 8 | 82 |\r\n| 5 | 9 | 122 |\r\n| 5 | | 228 |\r\n| 6 | 7 | 164 |\r\n| 6 | 8 | 400 |\r\n| 6 | 9 | 540 |\r\n| 6 | | 1104 |\r\n| 7 | 7 | 156 |\r\n| 7 | 8 | 326 |\r\n| 7 | 9 | 426 |\r\n| 7 | | 908 |\r\n| 8 | 7 | 117 |\r\n| 8 | 8 | 322 |\r\n| 8 | 9 | 471 |\r\n| 8 | | 910 |\r\n| | | 9191 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect facid, extract(month from starttime) as month, sum(slots) as slots\r\n\tfrom cd.bookings\r\n\twhere\r\n\t\tstarttime >= '2012-01-01'\r\n\t\tand starttime < '2013-01-01'\r\n\tgroup by rollup(facid, month)\r\norder by facid, month; \r\n```\r\n\r\nWhen we are doing data analysis, we sometimes want to perform multiple levels of aggregation to allow ourselves to 'zoom' in and out to different depths. In this case, we might be looking at each facility's overall usage, but then want to dive in to see how they've performed on a per-month basis. Using the SQL we know so far, it's quite cumbersome to produce a single query that does what we want - we effectively have to resort to concatenating multiple queries using `UNION ALL`:\r\n\r\n```sql\r\nselect facid, extract(month from starttime) as month, sum(slots) as slots\r\n from cd.bookings\r\n where\r\n starttime >= '2012-01-01'\r\n and starttime < '2013-01-01'\r\n group by facid, month\r\nunion all\r\nselect facid, null, sum(slots) as slots\r\n from cd.bookings\r\n where\r\n starttime >= '2012-01-01'\r\n and starttime < '2013-01-01'\r\n group by facid\r\nunion all\r\nselect null, null, sum(slots) as slots\r\n from cd.bookings\r\n where\r\n starttime >= '2012-01-01'\r\n and starttime < '2013-01-01'\r\norder by facid, month;\r\n```\r\n\r\nAs you can see, each subquery performs a different level of aggregation, and we just combine the results. We can clean this up a lot by factoring out commonalities using a CTE:\r\n\r\n```sql\r\nwith bookings as (\r\n\tselect facid, extract(month from starttime) as month, slots\r\n\tfrom cd.bookings\r\n\twhere\r\n\t\tstarttime >= '2012-01-01'\r\n\t\tand starttime < '2013-01-01'\r\n)\r\nselect facid, month, sum(slots) from bookings group by facid, month\r\nunion all\r\nselect facid, null, sum(slots) from bookings group by facid\r\nunion all\r\nselect null, null, sum(slots) from bookings\r\norder by facid, month;\r\n```\r\n\r\nThis version is not excessively hard on the eyes, but it becomes cumbersome as the number of aggregation columns increases. Fortunately, PostgreSQL 9.5 introduced support for the `ROLLUP` operator, which we've used to simplify our accepted answer.\r\n\r\n`ROLLUP` produces a hierarchy of aggregations in the order passed into it: for example, `ROLLUP(facid, month)` outputs aggregations on (facid, month), (facid), and (). If we wanted an aggregation of all facilities for a month (instead of all months for a facility) we'd have to reverse the order, using `ROLLUP(month, facid)`. Alternatively, if we instead want all possible permutations of the columns we pass in, we can use CUBE rather than `ROLLUP`. This will produce (facid, month), (month), (facid), and ().\r\n\r\n`ROLLUP` and `CUBE` are special cases of `GROUPING SETS`. `GROUPING SETS` allow you to specify the exact aggregation permutations you want: you could, for example, ask for just (facid, month) and (facid), skipping the top-level aggregation.\r\n\r\n\r\n\r\n### List the total hours booked per named facility\r\n\r\nProduce a list of the total number of *hours* booked per facility, remembering that a slot lasts half an hour. The output table should consist of the facility id, name, and hours booked, sorted by facility id. Try formatting the hours to two decimal places. \r\n\r\n\r\nExpected results:\r\n\r\n| facid | name | Total Hours |\r\n| ----- | --------------- | ----------- |\r\n| 0 | Tennis Court 1 | 660.00 |\r\n| 1 | Tennis Court 2 | 639.00 |\r\n| 2 | Badminton Court | 604.50 |\r\n| 3 | Table Tennis | 415.00 |\r\n| 4 | Massage Room 1 | 702.00 |\r\n| 5 | Massage Room 2 | 114.00 |\r\n| 6 | Squash Court | 552.00 |\r\n| 7 | Snooker Table | 454.00 |\r\n| 8 | Pool Table | 455.50 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect facs.facid, facs.name,\r\n\ttrim(to_char(sum(bks.slots)/2.0, '9999999999999999D99')) as \"Total Hours\"\r\n\r\n\tfrom cd.bookings bks\r\n\tinner join cd.facilities facs\r\n\t\ton facs.facid = bks.facid\r\n\tgroup by facs.facid, facs.name\r\norder by facs.facid; \r\n```\r\n\r\nThere's a few little pieces of interest in this question. Firstly, you can see that our aggregation works just fine when we join to another table on a 1:1 basis. Also note that we group by both `facs.facid` and `facs.name`. This is might seem odd: after all, since `facid` is the primary key of the facilities table, each `facid` has exactly one name, and grouping by both fields is the same as grouping by facid alone. In fact, you'll find that if you remove `facs.name` from the `GROUP BY` clause, the query works just fine: Postgres works out that this 1:1 mapping exists, and doesn't insist that we group by both columns.\r\n\r\nUnfortunately, depending on which database system we use, validation might not be so smart, and may not realise that the mapping is strictly 1:1. That being the case, if there were multiple `names` for each `facid` and we hadn't grouped by `name`, the DBMS would have to choose between multiple (equally valid) choices for the `name`. Since this is invalid, the database system will insist that we group by both fields. In general, I recommend grouping by all columns you don't have an aggregate function on: this will ensure better cross-platform compatibility.\r\n\r\nNext up is the division. Those of you familiar with MySQL may be aware that integer divisions are automatically cast to floats. Postgres is a little more traditional in this respect, and expects you to tell it if you want a floating point division. You can do that easily in this case by dividing by 2.0 rather than 2.\r\n\r\nFinally, let's take a look at formatting. The `TO_CHAR` function converts values to character strings. It takes a formatting string, which we specify as (up to) lots of numbers before the decimal place, decimal place, and two numbers after the decimal place. The output of this function can be prepended with a space, which is why we include the outer `TRIM` function.\r\n\r\n\r\n\r\n### List each member's first booking after September 1st 2012\r\n\r\nProduce a list of each member name, id, and their first booking after September 1st 2012. Order by member ID. \r\n\r\n\r\nExpected results:\r\n\r\n| surname | firstname | memid | starttime |\r\n| ----------------- | --------- | ----- | ------------------- |\r\n| GUEST | GUEST | 0 | 2012-09-01 08:00:00 |\r\n| Smith | Darren | 1 | 2012-09-01 09:00:00 |\r\n| Smith | Tracy | 2 | 2012-09-01 11:30:00 |\r\n| Rownam | Tim | 3 | 2012-09-01 16:00:00 |\r\n| Joplette | Janice | 4 | 2012-09-01 15:00:00 |\r\n| Butters | Gerald | 5 | 2012-09-02 12:30:00 |\r\n| Tracy | Burton | 6 | 2012-09-01 15:00:00 |\r\n| Dare | Nancy | 7 | 2012-09-01 12:30:00 |\r\n| Boothe | Tim | 8 | 2012-09-01 08:30:00 |\r\n| Stibbons | Ponder | 9 | 2012-09-01 11:00:00 |\r\n| Owen | Charles | 10 | 2012-09-01 11:00:00 |\r\n| Jones | David | 11 | 2012-09-01 09:30:00 |\r\n| Baker | Anne | 12 | 2012-09-01 14:30:00 |\r\n| Farrell | Jemima | 13 | 2012-09-01 09:30:00 |\r\n| Smith | Jack | 14 | 2012-09-01 11:00:00 |\r\n| Bader | Florence | 15 | 2012-09-01 10:30:00 |\r\n| Baker | Timothy | 16 | 2012-09-01 15:00:00 |\r\n| Pinker | David | 17 | 2012-09-01 08:30:00 |\r\n| Genting | Matthew | 20 | 2012-09-01 18:00:00 |\r\n| Mackenzie | Anna | 21 | 2012-09-01 08:30:00 |\r\n| Coplin | Joan | 22 | 2012-09-02 11:30:00 |\r\n| Sarwin | Ramnaresh | 24 | 2012-09-04 11:00:00 |\r\n| Jones | Douglas | 26 | 2012-09-08 13:00:00 |\r\n| Rumney | Henrietta | 27 | 2012-09-16 13:30:00 |\r\n| Farrell | David | 28 | 2012-09-18 09:00:00 |\r\n| Worthington-Smyth | Henry | 29 | 2012-09-19 09:30:00 |\r\n| Purview | Millicent | 30 | 2012-09-19 11:30:00 |\r\n| Tupperware | Hyacinth | 33 | 2012-09-20 08:00:00 |\r\n| Hunt | John | 35 | 2012-09-23 14:00:00 |\r\n| Crumpet | Erica | 36 | 2012-09-27 11:30:00 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect mems.surname, mems.firstname, mems.memid, min(bks.starttime) as starttime\r\n\tfrom cd.bookings bks\r\n\tinner join cd.members mems on\r\n\t\tmems.memid = bks.memid\r\n\twhere starttime >= '2012-09-01'\r\n\tgroup by mems.surname, mems.firstname, mems.memid\r\norder by mems.memid; \r\n```\r\n\r\nThis answer demonstrates the use of aggregate functions on dates. `MIN` works exactly as you'd expect, pulling out the lowest possible date in the result set. To make this work, we need to ensure that the result set only contains dates from September onwards. We do this using the `WHERE` clause.\r\n\r\nYou might typically use a query like this to find a customer's next booking. You can use this by replacing the date '2012-09-01' with the function `now()`\r\n\r\n\r\n\r\n### Produce a list of member names, with each row containing the total member count \r\n\r\nProduce a list of member names, with each row containing the total member count. Order by join date. \r\n\r\n\r\nExpected results:\r\n\r\n| count | firstname | surname |\r\n| ----- | --------- | ----------------- |\r\n| 31 | GUEST | GUEST |\r\n| 31 | Darren | Smith |\r\n| 31 | Tracy | Smith |\r\n| 31 | Tim | Rownam |\r\n| 31 | Janice | Joplette |\r\n| 31 | Gerald | Butters |\r\n| 31 | Burton | Tracy |\r\n| 31 | Nancy | Dare |\r\n| 31 | Tim | Boothe |\r\n| 31 | Ponder | Stibbons |\r\n| 31 | Charles | Owen |\r\n| 31 | David | Jones |\r\n| 31 | Anne | Baker |\r\n| 31 | Jemima | Farrell |\r\n| 31 | Jack | Smith |\r\n| 31 | Florence | Bader |\r\n| 31 | Timothy | Baker |\r\n| 31 | David | Pinker |\r\n| 31 | Matthew | Genting |\r\n| 31 | Anna | Mackenzie |\r\n| 31 | Joan | Coplin |\r\n| 31 | Ramnaresh | Sarwin |\r\n| 31 | Douglas | Jones |\r\n| 31 | Henrietta | Rumney |\r\n| 31 | David | Farrell |\r\n| 31 | Henry | Worthington-Smyth |\r\n| 31 | Millicent | Purview |\r\n| 31 | Hyacinth | Tupperware |\r\n| 31 | John | Hunt |\r\n| 31 | Erica | Crumpet |\r\n| 31 | Darren | Smith |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect count(*) over(), firstname, surname\r\n\tfrom cd.members\r\norder by joindate \r\n```\r\n\r\nUsing the knowledge we've built up so far, the most obvious answer to this is below. We use a subquery because otherwise SQL will require us to group by firstname and surname, producing a different result to what we're looking for.\r\n\r\n```sql\r\nselect (select count(*) from cd.members) as count, firstname, surname\r\n\tfrom cd.members\r\norder by joindate\r\n```\r\n\r\nThere's nothing at all wrong with this answer, but we've chosen a different approach to introduce a new concept called window functions. Window functions provide enormously powerful capabilities, in a form often more convenient than the standard aggregation functions. While this exercise is only a toy, we'll be working on more complicated examples in the near future.\r\n\r\nWindow functions operate on the result set of your (sub-)query, after the `WHERE` clause and all standard aggregation. They operate on a *window* of data. By default this is unrestricted: the entire result set, but it can be restricted to provide more useful results. For example, suppose instead of wanting the count of all members, we want the count of all members who joined in the same month as that member:\r\n\r\n```sql\r\nselect count(*) over(partition by date_trunc('month',joindate)),\r\n\tfirstname, surname\r\n\tfrom cd.members\r\norder by joindate\r\n```\r\n\r\nIn this example, we partition the data by month. For each row the window function operates over, the window is any rows that have a joindate in the same month. The window function thus produces a count of the number of members who joined in that month.\r\n\r\nYou can go further. Imagine if, instead of the total number of members who joined that month, you want to know what number joinee they were that month. You can do this by adding in an `ORDER BY` to the window function:\r\n\r\n```sql\r\nselect count(*) over(partition by date_trunc('month',joindate) order by joindate),\r\n\tfirstname, surname\r\n\tfrom cd.members\r\norder by joindate\r\n```\r\n\r\nThe `ORDER BY` changes the window again. Instead of the window for each row being the entire partition, the window goes from the start of the partition to the current row, and not beyond. Thus, for the first member who joins in a given month, the count is 1. For the second, the count is 2, and so on.\r\n\r\nOne final thing that's worth mentioning about window functions: you can have multiple unrelated ones in the same query. Try out the query below for an example - you'll see the numbers for the members going in opposite directions! This flexibility can lead to more concise, readable, and maintainable queries.\r\n\r\n```sql\r\nselect count(*) over(partition by date_trunc('month',joindate) order by joindate asc), \r\n\tcount(*) over(partition by date_trunc('month',joindate) order by joindate desc), \r\n\tfirstname, surname\r\n\tfrom cd.members\r\norder by joindate\r\n```\r\n\r\nWindow functions are extraordinarily powerful, and they will change the way you write and think about SQL. Make good use of them!\r\n\r\n\r\n\r\n### Produce a numbered list of members\r\n\r\nProduce a monotonically increasing numbered list of members, ordered by their date of joining. Remember that member IDs are not guaranteed to be sequential. \r\n\r\n\r\nExpected results:\r\n\r\n| row_number | firstname | surname |\r\n| ---------- | --------- | ----------------- |\r\n| 1 | GUEST | GUEST |\r\n| 2 | Darren | Smith |\r\n| 3 | Tracy | Smith |\r\n| 4 | Tim | Rownam |\r\n| 5 | Janice | Joplette |\r\n| 6 | Gerald | Butters |\r\n| 7 | Burton | Tracy |\r\n| 8 | Nancy | Dare |\r\n| 9 | Tim | Boothe |\r\n| 10 | Ponder | Stibbons |\r\n| 11 | Charles | Owen |\r\n| 12 | David | Jones |\r\n| 13 | Anne | Baker |\r\n| 14 | Jemima | Farrell |\r\n| 15 | Jack | Smith |\r\n| 16 | Florence | Bader |\r\n| 17 | Timothy | Baker |\r\n| 18 | David | Pinker |\r\n| 19 | Matthew | Genting |\r\n| 20 | Anna | Mackenzie |\r\n| 21 | Joan | Coplin |\r\n| 22 | Ramnaresh | Sarwin |\r\n| 23 | Douglas | Jones |\r\n| 24 | Henrietta | Rumney |\r\n| 25 | David | Farrell |\r\n| 26 | Henry | Worthington-Smyth |\r\n| 27 | Millicent | Purview |\r\n| 28 | Hyacinth | Tupperware |\r\n| 29 | John | Hunt |\r\n| 30 | Erica | Crumpet |\r\n| 31 | Darren | Smith |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect row_number() over(order by joindate), firstname, surname\r\n\tfrom cd.members\r\norder by joindate \r\n```\r\n\r\nThis exercise is a simple bit of window function practise! You could just as easily `use count(*) over(order by joindate)` here, so don't worry if you used that instead.\r\n\r\nIn this query, we don't define a partition, meaning that the partition is the entire dataset. Since we define an order for the window function, for any given row the window is: start of the dataset -> current row.\r\n\r\n\r\n\r\n### Output the facility id that has the highest number of slots booked, again \r\n\r\nOutput the facility id that has the highest number of slots booked. Ensure that in the event of a tie, all tieing results get output. \r\n\r\n\r\nExpected results:\r\n\r\n| facid | total |\r\n| ----- | ----- |\r\n| 4 | 1404 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect facid, total from (\r\n\tselect facid, sum(slots) total, rank() over (order by sum(slots) desc) rank\r\n \tfrom cd.bookings\r\n\t\tgroup by facid\r\n\t) as ranked\r\n\twhere rank = 1 \r\n```\r\n\r\nYou may recall that this is a problem we've already solved in an earlier exercise. We came up with an answer something like below, which we then cut down using CTEs:\r\n\r\n```sql\r\nselect facid, sum(slots) as totalslots\r\n\tfrom cd.bookings\r\n\tgroup by facid\r\n\thaving sum(slots) = (select max(sum2.totalslots) from\r\n\t\t(select sum(slots) as totalslots\r\n\t\tfrom cd.bookings\r\n\t\tgroup by facid\r\n\t\t) as sum2);\r\n```\r\n\r\nOnce we've cleaned it up, this solution is perfectly adequate. Explaining how the query works makes it seem a little odd, though - 'find the number of slots booked by the best facility. Calculate the total slots booked for each facility, and return only the rows where the slots booked are the same as for the best'. Wouldn't it be nicer to be able to say 'calculate the number of slots booked for each facility, rank them, and pick out any at rank 1'?\r\n\r\nFortunately, window functions allow us to do this - although it's fair to say that doing so is not trivial to the untrained eye. The first key piece of information is the existence of the éfunction. This ranks values based on the `ORDER BY` that is passed to it. If there's a tie for (say) second place), the next gets ranked at position 4. So, what we need to do is get the number of slots for each facility, rank them, and pick off the ones at the top rank. A first pass at this might look something like the below:\r\n\r\n```sql\r\nselect facid, total from (\r\n\tselect facid, total, rank() over (order by total desc) rank from (\r\n\t\tselect facid, sum(slots) total\r\n\t\t\tfrom cd.bookings\r\n\t\t\tgroup by facid\r\n\t\t) as sumslots\r\n\t) as ranked\r\nwhere rank = 1\r\n```\r\n\r\nThe inner query calculates the total slots booked, the middle one ranks them, and the outer one creams off the top ranked. We can actually tidy this up a little: recall that window function get applied pretty late in the select function, after aggregation. That being the case, we can move the aggregation into the `ORDER BY` part of the function, as shown in the approved answer.\r\n\r\nWhile the window function approach isn't massively simpler in terms of lines of code, it arguably makes more semantic sense.\r\n\r\n\r\n\r\n### Rank members by (rounded) hours used\r\n\r\nProduce a list of members, along with the number of hours they've booked in facilities, rounded to the nearest ten hours. Rank them by this rounded figure, producing output of first name, surname, rounded hours, rank. Sort by rank, surname, and first name. \r\n\r\n\r\nExpected results:\r\n\r\n| firstname | surname | hours | rank |\r\n| --------- | ----------------- | ----- | ---- |\r\n| GUEST | GUEST | 1200 | 1 |\r\n| Darren | Smith | 340 | 2 |\r\n| Tim | Rownam | 330 | 3 |\r\n| Tim | Boothe | 220 | 4 |\r\n| Tracy | Smith | 220 | 4 |\r\n| Gerald | Butters | 210 | 6 |\r\n| Burton | Tracy | 180 | 7 |\r\n| Charles | Owen | 170 | 8 |\r\n| Janice | Joplette | 160 | 9 |\r\n| Anne | Baker | 150 | 10 |\r\n| Timothy | Baker | 150 | 10 |\r\n| David | Jones | 150 | 10 |\r\n| Nancy | Dare | 130 | 13 |\r\n| Florence | Bader | 120 | 14 |\r\n| Anna | Mackenzie | 120 | 14 |\r\n| Ponder | Stibbons | 120 | 14 |\r\n| Jack | Smith | 110 | 17 |\r\n| Jemima | Farrell | 90 | 18 |\r\n| David | Pinker | 80 | 19 |\r\n| Ramnaresh | Sarwin | 80 | 19 |\r\n| Matthew | Genting | 70 | 21 |\r\n| Joan | Coplin | 50 | 22 |\r\n| David | Farrell | 30 | 23 |\r\n| Henry | Worthington-Smyth | 30 | 23 |\r\n| John | Hunt | 20 | 25 |\r\n| Douglas | Jones | 20 | 25 |\r\n| Millicent | Purview | 20 | 25 |\r\n| Henrietta | Rumney | 20 | 25 |\r\n| Erica | Crumpet | 10 | 29 |\r\n| Hyacinth | Tupperware | 10 | 29 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect firstname, surname,\r\n\t((sum(bks.slots)+10)/20)*10 as hours,\r\n\trank() over (order by ((sum(bks.slots)+10)/20)*10 desc) as rank\r\n\r\n\tfrom cd.bookings bks\r\n\tinner join cd.members mems\r\n\t\ton bks.memid = mems.memid\r\n\tgroup by mems.memid\r\norder by rank, surname, firstname; \r\n```\r\n\r\nThis answer isn't a great stretch over our previous exercise, although it does illustrate the function of `RANK` better. You can see that some of the clubgoers have an equal rounded number of hours booked in, and their rank is the same. If position 2 is shared between two members, the next one along gets position 4. There's a different function, `DENSE_RANK`, that would assign that member position 3 instead.\r\n\r\nIt's worth noting the technique we use to do rounding here. Adding 5, dividing by 10, and multiplying by 10 has the effect (thanks to integer arithmetic cutting off fractions) of rounding a number to the nearest 10. In our case, because slots are half an hour, we need to add 10, divide by 20, and multiply by 10. One could certainly make the argument that we should do the slots -> hours conversion independently of the rounding, which would increase clarity.\r\n\r\nTalking of clarity, this rounding malarky is starting to introduce a noticeable amount of code repetition. At this point it's a judgement call, but you may wish to factor it out using a subquery as below:\r\n\r\n```sql\r\nselect firstname, surname, hours, rank() over (order by hours desc) from\r\n\t(select firstname, surname,\r\n\t\t((sum(bks.slots)+10)/20)*10 as hours\r\n\r\n\t\tfrom cd.bookings bks\r\n\t\tinner join cd.members mems\r\n\t\t\ton bks.memid = mems.memid\r\n\t\tgroup by mems.memid\r\n\t) as subq\r\norder by rank, surname, firstname;\r\n```\r\n\r\n\r\n\r\n### Find the top three revenue generating facilities\r\n\r\nProduce a list of the top three revenue generating facilities (including ties). Output facility name and rank, sorted by rank and facility name. \r\n\r\n\r\nExpected results:\r\n\r\n| name | rank |\r\n| -------------- | ---- |\r\n| Massage Room 1 | 1 |\r\n| Massage Room 2 | 2 |\r\n| Tennis Court 2 | 3 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect name, rank from (\r\n\tselect facs.name as name, rank() over (order by sum(case\r\n\t\t\t\twhen memid = 0 then slots * facs.guestcost\r\n\t\t\t\telse slots * membercost\r\n\t\t\tend) desc) as rank\r\n\t\tfrom cd.bookings bks\r\n\t\tinner join cd.facilities facs\r\n\t\t\ton bks.facid = facs.facid\r\n\t\tgroup by facs.name\r\n\t) as subq\r\n\twhere rank <= 3\r\norder by rank; \r\n```\r\n\r\nThis question doesn't introduce any new concepts, and is just intended to give you the opportunity to practise what you already know. We use the `CASE` statement to calculate the revenue for each slot, and aggregate that on a per-facility basis using `SUM`. We then use the `RANK` window function to produce a ranking, wrap it all up in a subquery, and extract everything with a rank less than or equal to 3.\r\n\r\n\r\n\r\n### Classify facilities by value\r\n\r\nClassify facilities into equally sized groups of high, average, and low based on their revenue. Order by classification and facility name. \r\n\r\n\r\nExpected results:\r\n\r\n| name | revenue |\r\n| --------------- | ------- |\r\n| Massage Room 1 | high |\r\n| Massage Room 2 | high |\r\n| Tennis Court 2 | high |\r\n| Badminton Court | average |\r\n| Squash Court | average |\r\n| Tennis Court 1 | average |\r\n| Pool Table | low |\r\n| Snooker Table | low |\r\n| Table Tennis | low |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect name, case when class=1 then 'high'\r\n\t\twhen class=2 then 'average'\r\n\t\telse 'low'\r\n\t\tend revenue\r\n\tfrom (\r\n\t\tselect facs.name as name, ntile(3) over (order by sum(case\r\n\t\t\t\twhen memid = 0 then slots * facs.guestcost\r\n\t\t\t\telse slots * membercost\r\n\t\t\tend) desc) as class\r\n\t\tfrom cd.bookings bks\r\n\t\tinner join cd.facilities facs\r\n\t\t\ton bks.facid = facs.facid\r\n\t\tgroup by facs.name\r\n\t) as subq\r\norder by class, name; \r\n```\r\n\r\nThis exercise should mostly use familiar concepts, although we do introduce the `NTILE` window function. `NTILE` groups values into a passed-in number of groups, as evenly as possible. It outputs a number from 1->number of groups. We then use a `CASE` statement to turn that number into a label!\r\n\r\n\r\n\r\n### Calculate the payback time for each facility\r\n\r\nBased on the 3 complete months of data so far, calculate the amount of time each facility will take to repay its cost of ownership. Remember to take into account ongoing monthly maintenance. Output facility name and payback time in months, order by facility name. Don't worry about differences in month lengths, we're only looking for a rough value here! \r\n\r\n\r\nExpected results:\r\n\r\n| name | months |\r\n| --------------- | ---------------------- |\r\n| Badminton Court | 6.8317677198975235 |\r\n| Massage Room 1 | 0.18885741265344664778 |\r\n| Massage Room 2 | 1.7621145374449339 |\r\n| Pool Table | 5.3333333333333333 |\r\n| Snooker Table | 6.9230769230769231 |\r\n| Squash Court | 1.1339582703356516 |\r\n| Table Tennis | 6.4000000000000000 |\r\n| Tennis Court 1 | 2.2624434389140271 |\r\n| Tennis Court 2 | 1.7505470459518600 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect \tfacs.name as name,\r\n\tfacs.initialoutlay/((sum(case\r\n\t\t\twhen memid = 0 then slots * facs.guestcost\r\n\t\t\telse slots * membercost\r\n\t\tend)/3) - facs.monthlymaintenance) as months\r\n\tfrom cd.bookings bks\r\n\tinner join cd.facilities facs\r\n\t\ton bks.facid = facs.facid\r\n\tgroup by facs.facid\r\norder by name; \r\n```\r\n\r\nIn contrast to all our recent exercises, there's no need to use window functions to solve this problem: it's just a bit of maths involving monthly revenue, initial outlay, and monthly maintenance. Again, for production code you might want to clarify what's going on a little here using a subquery (although since we've hard-coded the number of months, putting this into production is unlikely!). A tidied-up version might look like:\r\n\r\n```sql\r\nselect \tname, \r\n\tinitialoutlay / (monthlyrevenue - monthlymaintenance) as repaytime \r\n\tfrom \r\n\t\t(select facs.name as name, \r\n\t\t\tfacs.initialoutlay as initialoutlay,\r\n\t\t\tfacs.monthlymaintenance as monthlymaintenance,\r\n\t\t\tsum(case\r\n\t\t\t\twhen memid = 0 then slots * facs.guestcost\r\n\t\t\t\telse slots * membercost\r\n\t\t\tend)/3 as monthlyrevenue\r\n\t\tfrom cd.bookings bks\r\n\t\tinner join cd.facilities facs\r\n\t\t\ton bks.facid = facs.facid\r\n\t\tgroup by facs.facid\r\n\t) as subq\r\norder by name;\r\n```\r\n\r\nBut, I hear you ask, what would an automatic version of this look like? One that didn't need to have a hard-coded number of months in it? That's a little more complicated, and involves some date arithmetic. I've factored that out into a CTE to make it a little more clear.\r\n\r\n```sql\r\nwith monthdata as (\r\n\tselect \tmincompletemonth,\r\n\t\tmaxcompletemonth,\r\n\t\t(extract(year from maxcompletemonth)*12) +\r\n\t\t\textract(month from maxcompletemonth) -\r\n\t\t\t(extract(year from mincompletemonth)*12) -\r\n\t\t\textract(month from mincompletemonth) as nummonths \r\n\tfrom (\r\n\t\tselect \tdate_trunc('month', \r\n\t\t\t\t(select max(starttime) from cd.bookings)) as maxcompletemonth,\r\n\t\t\tdate_trunc('month', \r\n\t\t\t\t(select min(starttime) from cd.bookings)) as mincompletemonth\r\n\t) as subq\r\n)\r\nselect \tname, \r\n\tinitialoutlay / (monthlyrevenue - monthlymaintenance) as repaytime \r\n\t\r\n\tfrom\r\n\t\t(select facs.name as name,\r\n\t\t\tfacs.initialoutlay as initialoutlay,\r\n\t\t\tfacs.monthlymaintenance as monthlymaintenance,\r\n\t\t\tsum(case\r\n\t\t\t\twhen memid = 0 then slots * facs.guestcost\r\n\t\t\t\telse slots * membercost\r\n\t\t\tend)/(select nummonths from monthdata) as monthlyrevenue\r\n\t\t\t\r\n\t\t\tfrom cd.bookings bks\r\n\t\t\tinner join cd.facilities facs\r\n\t\t\t\ton bks.facid = facs.facid\r\n\t\t\twhere bks.starttime < (select maxcompletemonth from monthdata)\r\n\t\t\tgroup by facs.facid\r\n\t\t) as subq\r\norder by name;\r\n```\r\n\r\nThis code restricts the data that goes in to complete months. It does this by selecting the maximum date, rounding down to the month, and stripping out all dates larger than that. Even this code is not completely-complete. It doesn't handle the case of a facility making a loss. Fixing that is not too hard, and is left as (another) exercise for the reader!\r\n\r\n\r\n\r\n### Calculate a rolling average of total revenue\r\n\r\nFor each day in August 2012, calculate a rolling average of total revenue over the previous 15 days. Output should contain date and revenue columns, sorted by the date. Remember to account for the possibility of a day having zero revenue. This one's a bit tough, so don't be afraid to check out the hint! \r\n\r\n\r\nExpected results:\r\n\r\n| date | revenue |\r\n| ---------- | --------------------- |\r\n| 2012-08-01 | 1126.8333333333333333 |\r\n| 2012-08-02 | 1153.0000000000000000 |\r\n| 2012-08-03 | 1162.9000000000000000 |\r\n| 2012-08-04 | 1177.3666666666666667 |\r\n| 2012-08-05 | 1160.9333333333333333 |\r\n| 2012-08-06 | 1185.4000000000000000 |\r\n| 2012-08-07 | 1182.8666666666666667 |\r\n| 2012-08-08 | 1172.6000000000000000 |\r\n| 2012-08-09 | 1152.4666666666666667 |\r\n| 2012-08-10 | 1175.0333333333333333 |\r\n| 2012-08-11 | 1176.6333333333333333 |\r\n| 2012-08-12 | 1195.6666666666666667 |\r\n| 2012-08-13 | 1218.0000000000000000 |\r\n| 2012-08-14 | 1247.4666666666666667 |\r\n| 2012-08-15 | 1274.1000000000000000 |\r\n| 2012-08-16 | 1281.2333333333333333 |\r\n| 2012-08-17 | 1324.4666666666666667 |\r\n| 2012-08-18 | 1373.7333333333333333 |\r\n| 2012-08-19 | 1406.0666666666666667 |\r\n| 2012-08-20 | 1427.0666666666666667 |\r\n| 2012-08-21 | 1450.3333333333333333 |\r\n| 2012-08-22 | 1539.7000000000000000 |\r\n| 2012-08-23 | 1567.3000000000000000 |\r\n| 2012-08-24 | 1592.3333333333333333 |\r\n| 2012-08-25 | 1615.0333333333333333 |\r\n| 2012-08-26 | 1631.2000000000000000 |\r\n| 2012-08-27 | 1659.4333333333333333 |\r\n| 2012-08-28 | 1687.0000000000000000 |\r\n| 2012-08-29 | 1684.6333333333333333 |\r\n| 2012-08-30 | 1657.9333333333333333 |\r\n| 2012-08-31 | 1703.4000000000000000 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect \tdategen.date,\r\n\t(\r\n\t\t-- correlated subquery that, for each day fed into it,\r\n\t\t-- finds the average revenue for the last 15 days\r\n\t\tselect sum(case\r\n\t\t\twhen memid = 0 then slots * facs.guestcost\r\n\t\t\telse slots * membercost\r\n\t\tend) as rev\r\n\r\n\t\tfrom cd.bookings bks\r\n\t\tinner join cd.facilities facs\r\n\t\t\ton bks.facid = facs.facid\r\n\t\twhere bks.starttime > dategen.date - interval '14 days'\r\n\t\t\tand bks.starttime < dategen.date + interval '1 day'\r\n\t)/15 as revenue\r\n\tfrom\r\n\t(\r\n\t\t-- generates a list of days in august\r\n\t\tselect \tcast(generate_series(timestamp '2012-08-01',\r\n\t\t\t'2012-08-31','1 day') as date) as date\r\n\t) as dategen\r\norder by dategen.date; \r\n```\r\n\r\nThere's at least two equally good solutions to this question. I've put the simplest to write as the answer, but there's also a more flexible solution that uses window functions.\r\n\r\nLet's look at the selected answer first. When I read SQL queries, I tend to read the `SELECT` part of the statement last - the `FROM` and `WHERE` parts tend to be more interesting. So, what do we have in our `FROM`? A call to the `GENERATE_SERIES` function. This does pretty much what it says on the tin - generates a series of values. You can specify a start value, a stop value, and an increment. It works for integer types and dates - although, as you can see, we need to be explicit about what types are going into and out of the function. Try removing the casts, and seeing the result!\r\n\r\nSo, we've generated a timestamp for each day in August. Now, for each day, we need to generate our average. We can do this using a *correlated subquery*. If you remember, a correlated subquery is a subquery that uses values from the outer query. This means that it gets executed once for each result row in the outer query. This is in contrast to an uncorrelated subquery, which only has to be executed once.\r\n\r\nIf we look at our correlated subquery, we can see that it's correlated on the dategen.date field. It produces a sum of revenue for this day and the 14 days prior to it, and then divides that sum by 15. This produces the output we're looking for!\r\n\r\nI mentioned that there's a window function-based solution for this problem as well - you can see it below. The approach we use for this is generating a list of revenue for each day, and then using window function aggregation over that list. The nice thing about this method is that once you have the per-day revenue, you can produce a wide range of results quite easily - you might, for example, want rolling averages for the previous month, 15 days, and 5 days. This is easy to do using this method, and rather harder using conventional aggregation.\r\n\r\n```sql\r\nselect date, avgrev from (\r\n\t-- AVG over this row and the 14 rows before it.\r\n\tselect \tdategen.date as date,\r\n\t\tavg(revdata.rev) over(order by dategen.date rows 14 preceding) as avgrev\r\n\tfrom\r\n\t\t-- generate a list of days. This ensures that a row gets generated\r\n\t\t-- even if the day has 0 revenue. Note that we generate days before\r\n\t\t-- the start of october - this is because our window function needs\r\n\t\t-- to know the revenue for those days for its calculations.\r\n\t\t(select\r\n\t\t\tcast(generate_series(timestamp '2012-07-10', '2012-08-31','1 day') as date) as date\r\n\t\t) as dategen\r\n\t\tleft outer join\r\n\t\t\t-- left join to a table of per-day revenue\r\n\t\t\t(select cast(bks.starttime as date) as date,\r\n\t\t\t\tsum(case\r\n\t\t\t\t\twhen memid = 0 then slots * facs.guestcost\r\n\t\t\t\t\telse slots * membercost\r\n\t\t\t\tend) as rev\r\n\r\n\t\t\t\tfrom cd.bookings bks\r\n\t\t\t\tinner join cd.facilities facs\r\n\t\t\t\t\ton bks.facid = facs.facid\r\n\t\t\t\tgroup by cast(bks.starttime as date)\r\n\t\t\t) as revdata\r\n\t\t\ton dategen.date = revdata.date\r\n\t) as subq\r\n\twhere date >= '2012-08-01'\r\norder by date;\r\n```\r\n\r\nYou'll note that we've been wanting to work out daily revenue quite frequently. Rather than inserting that calculation into all our queries, which is rather messy (and will cause us a big headache if we ever change our schema), we probably want to store that information somewhere. Your first thought might be to calculate information and store it somewhere for later use. This is a common tactic for large data warehouses, but it can cause us some problems - if we ever go back and edit our data, we need to remember to recalculate. For non-enormous-scale data like we're looking at here, we can just create a view instead. A view is essentially a stored query that looks exactly like a table. Under the covers, the DBMS just subsititutes in the relevant portion of the view definition when you select data from it. They're very easy to create, as you can see below:\r\n\r\n```sql\r\ncreate or replace view cd.dailyrevenue as\r\n\tselect \tcast(bks.starttime as date) as date,\r\n\t\tsum(case\r\n\t\t\twhen memid = 0 then slots * facs.guestcost\r\n\t\t\telse slots * membercost\r\n\t\tend) as rev\r\n\r\n\t\tfrom cd.bookings bks\r\n\t\tinner join cd.facilities facs\r\n\t\t\ton bks.facid = facs.facid\r\n\t\tgroup by cast(bks.starttime as date);\r\n```\r\n\r\nYou can see that this makes our query an awful lot simpler!\r\n\r\n```sql\r\nselect date, avgrev from (\r\n\tselect dategen.date as date,\r\n\t\tavg(revdata.rev) over(order by dategen.date rows 14 preceding) as avgrev\r\n\tfrom\t\t\r\n\t\t(select\r\n\t\t\tcast(generate_series(timestamp '2012-07-10', '2012-08-31','1 day') as date) as date\r\n\t\t) as dategen\r\n\t\tleft outer join\r\n\t\t\tcd.dailyrevenue as revdata on dategen.date = revdata.date\r\n\t\t) as subq\r\n\twhere date >= '2012-08-01'\r\norder by date;\r\n```\r\n\r\nAs well as storing frequently-used query fragments, views can be used for a variety of purposes, including restricting access to certain columns of a table.\r\n\r\n***\r\n\r\n## Working with Timestamps \r\n\r\nDates/Times in SQL are a complex topic, deserving of a category of their own. They're also fantastically powerful, making it easier to work with variable-length concepts like 'months' than many programming languages.\r\n\r\nBefore getting started on this category, it's probably worth taking a look over the PostgreSQL [docs page](http://www.postgresql.org/docs/current/static/functions-datetime.html) on date/time functions. You might also want to complete the aggregate functions category, since we'll use some of those capabilities in this section.\r\n\r\n\r\n\r\n### Produce a timestamp for 1 a.m. on the 31st of August 2012\r\n\r\nProduce a timestamp for 1 a.m. on the 31st of August 2012.\r\n\r\n\r\nExpected results:\r\n\r\n| timestamp |\r\n| ------------------- |\r\n| 2012-08-31 01:00:00 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect timestamp '2012-08-31 01:00:00'; \r\n```\r\n\r\nHere's a pretty easy question to start off with! SQL has a bunch of different date and time types, which you can peruse at your leisure over at the excellent [Postgres documentation](http://www.postgresql.org/docs/current/static/datatype-datetime.html). These basically allow you to store dates, times, or timestamps (date+time).\r\n\r\nThe approved answer is the best way to create a timestamp under normal circumstances. You can also use casts to change a correctly formatted string into a timestamp, for example:\r\n\r\n```sql\r\nselect '2012-08-31 01:00:00'::timestamp;\r\nselect cast('2012-08-31 01:00:00' as timestamp);\r\n```\r\n\r\nThe former approach is a Postgres extension, while the latter is SQL-standard. You'll note that in many of our earlier questions, we've used bare strings without specifying a data type. This works because when Postgres is working with a value coming out of a timestamp column of a table (say), it knows to cast our strings to timestamps.\r\n\r\nTimestamps can be stored with or without time zone information. We've chosen not to here, but if you like you could format the timestamp like \"2012-08-31 01:00:00 +00:00\", assuming UTC. Note that timestamp with time zone is a different type to timestamp - when you're declaring it, you should use `TIMESTAMP WITH TIME ZONE 2012-08-31 01:00:00 +00:00.`\r\n\r\nFinally, have a bit of a play around with some of the different date/time serialisations described in the Postgres docs. You'll find that Postgres is extremely flexible with the formats it accepts, although my recommendation to you would be to use the standard serialisation we've used here - you'll find it unambiguous and easy to port to other DBs.\r\n\r\n\r\n\r\n### Subtract timestamps from each other\r\n\r\nFind the result of subtracting the timestamp '2012-07-30 01:00:00' from the timestamp '2012-08-31 01:00:00' \r\n\r\n\r\nExpected results:\r\n\r\n| interval |\r\n| -------- |\r\n| 32 days |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect timestamp '2012-08-31 01:00:00' - timestamp '2012-07-30 01:00:00' as interval;\r\n```\r\n\r\nSubtracting timestamps produces an `INTERVAL` data type. `INTERVAL`s are a special data type for representing the difference between two `TIMESTAMP` types. When subtracting timestamps, Postgres will typically give an interval in terms of days, hours, minutes, seconds, without venturing into months. This generally makes life easier, since months are of variable lengths.\r\n\r\nOne of the useful things about intervals, though, is the fact that they *can* encode months. Let's imagine that I want to schedule something to occur in exactly one month's time, regardless of the length of my month. To do this, I could use `[timestamp] + interval '1 month'`.\r\n\r\nIntervals stand in contrast to SQL's treatment of `DATE` types. Dates don't use intervals - instead, subtracting two dates will return an integer representing the number of days between the two dates. You can also add integer values to dates. This is sometimes more convenient, depending on how much intelligence you require in the handling of your dates! \r\n\r\n\r\n\r\n### Generate a list of all the dates in October 2012\r\n\r\nProduce a list of all the dates in October 2012. They can be output as a timestamp (with time set to midnight) or a date. \r\n\r\n\r\nExpected results:\r\n\r\n| ts |\r\n| ------------------- |\r\n| 2012-10-01 00:00:00 |\r\n| 2012-10-02 00:00:00 |\r\n| 2012-10-03 00:00:00 |\r\n| 2012-10-04 00:00:00 |\r\n| 2012-10-05 00:00:00 |\r\n| 2012-10-06 00:00:00 |\r\n| 2012-10-07 00:00:00 |\r\n| 2012-10-08 00:00:00 |\r\n| 2012-10-09 00:00:00 |\r\n| 2012-10-10 00:00:00 |\r\n| 2012-10-11 00:00:00 |\r\n| 2012-10-12 00:00:00 |\r\n| 2012-10-13 00:00:00 |\r\n| 2012-10-14 00:00:00 |\r\n| 2012-10-15 00:00:00 |\r\n| 2012-10-16 00:00:00 |\r\n| 2012-10-17 00:00:00 |\r\n| 2012-10-18 00:00:00 |\r\n| 2012-10-19 00:00:00 |\r\n| 2012-10-20 00:00:00 |\r\n| 2012-10-21 00:00:00 |\r\n| 2012-10-22 00:00:00 |\r\n| 2012-10-23 00:00:00 |\r\n| 2012-10-24 00:00:00 |\r\n| 2012-10-25 00:00:00 |\r\n| 2012-10-26 00:00:00 |\r\n| 2012-10-27 00:00:00 |\r\n| 2012-10-28 00:00:00 |\r\n| 2012-10-29 00:00:00 |\r\n| 2012-10-30 00:00:00 |\r\n| 2012-10-31 00:00:00 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect generate_series(timestamp '2012-10-01', timestamp '2012-10-31', interval '1 day') as ts; \r\n```\r\n\r\nOne of the best features of Postgres over other DBs is a simple function called `GENERATE_SERIES`. This function allows you to generate a list of dates or numbers, specifying a start, an end, and an increment value. It's extremely useful for situations where you want to output, say, sales per day over the course of a month. A typical way to do that on a table containing a list of sales might be to use a `SUM` aggregation, grouping by the date and product type. Unfortunately, this approach has a flaw: if there are no sales for a given day, it won't show up! To make it work properly, you need to left join from a sequential list of timestamps to the aggregated data to fill in the blank spaces.\r\n\r\nOn other database systems, it's not uncommon to keep a 'calendar table' full of dates, with which you can perform these joins. Alternatively, on some systems you can write an analogue to generate_series using recursive CTEs. Fortunately for us, Postgres makes our lives a lot easier!\r\n\r\n\r\n\r\n### Get the day of the month from a timestamp\r\n\r\nGet the day of the month from the timestamp '2012-08-31' as an integer. \r\n\r\n\r\nExpected results:\r\n\r\n| date_part |\r\n| --------- |\r\n| 31 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect extract(day from timestamp '2012-08-31'); \r\n```\r\n\r\nThe `EXTRACT` function is used for getting sections of a timestamp or interval. You can get the value of any field in the timestamp as an integer.\r\n\r\n\r\n\r\n### Work out the number of seconds between timestamps\r\n\r\nWork out the number of seconds between the timestamps '2012-08-31 01:00:00' and '2012-09-02 00:00:00' \r\n\r\n\r\nExpected results:\r\n\r\n| date_part |\r\n| --------- |\r\n| 169200 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect extract(epoch from (timestamp '2012-09-02 00:00:00' - '2012-08-31 01:00:00')); \r\n```\r\n\r\nThe above answer is a Postgres-specific trick. Extracting the epoch converts an interval or timestamp into a number of seconds, or the number of seconds since epoch (January 1st, 1970) respectively. If you want the number of minutes, hours, etc you can just divide the number of seconds appropriately.\r\n\r\nIf you want to write more portable code, you will unfortunately find that you cannot use `extract epoch`. Instead you will need to use something like:\r\n\r\n```sql\r\nselect \textract(day from ts.int)*60*60*24 +\r\n\textract(hour from ts.int)*60*60 + \r\n\textract(minute from ts.int)*60 +\r\n\textract(second from ts.int)\r\n\tfrom\r\n\t\t(select timestamp '2012-09-02 00:00:00' - '2012-08-31 01:00:00' as int) ts\r\n```\r\n\r\n\r\nAnswer:\r\n\r\nThis is, as you can observe, rather awful. If you're planning to write cross platform SQL, I would consider having a library of common user defined functions for each DBMS, allowing you to normalise any common requirements like this. This keeps your main codebase a lot cleaner.\r\n\r\n\r\n\r\n### Work out the number of days in each month of 2012\r\n\r\nFor each month of the year in 2012, output the number of days in that month. Format the output as an integer column containing the month of the year, and a second column containing an interval data type. \r\n\r\n\r\nExpected results:\r\n\r\n| month | length |\r\n| ----- | ------- |\r\n| 1 | 31 days |\r\n| 2 | 29 days |\r\n| 3 | 31 days |\r\n| 4 | 30 days |\r\n| 5 | 31 days |\r\n| 6 | 30 days |\r\n| 7 | 31 days |\r\n| 8 | 31 days |\r\n| 9 | 30 days |\r\n| 10 | 31 days |\r\n| 11 | 30 days |\r\n| 12 | 31 days |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect \textract(month from cal.month) as month,\r\n\t(cal.month + interval '1 month') - cal.month as length\r\n\tfrom\r\n\t(\r\n\t\tselect generate_series(timestamp '2012-01-01', timestamp '2012-12-01', interval '1 month') as month\r\n\t) cal\r\norder by month; \r\n```\r\n\r\nThis answer shows several of the concepts we've learned. We use the `GENERATE_SERIES` function to produce a year's worth of timestamps, incrementing a month at a time. We then use the `EXTRACT` function to get the month number. Finally, we subtract each timestamp + 1 month from itself.\r\n\r\nIt's worth noting that subtracting two timestamps will always produce an interval in terms of days (or portions of a day). You won't just get an answer in terms of months or years, because the length of those time periods is variable.\r\n\r\n\r\n\r\n### Work out the number of days remaining in the month\r\n\r\nFor any given timestamp, work out the number of days remaining in the month. The current day should count as a whole day, regardless of the time. Use '2012-02-11 01:00:00' as an example timestamp for the purposes of making the answer. Format the output as a single interval value. \r\n\r\n\r\nExpected results:\r\n\r\n| remaining |\r\n| --------- |\r\n| 19 days |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect (date_trunc('month',ts.testts) + interval '1 month') \r\n\t\t- date_trunc('day', ts.testts) as remaining\r\n\tfrom (select timestamp '2012-02-11 01:00:00' as testts) ts \r\n```\r\n\r\nThe star of this particular show is the `DATE_TRUNC` function. It does pretty much what you'd expect - truncates a date to a given minute, hour, day, month, and so on. The way we've solved this problem is to truncate our timestamp to find the month we're in, add a month to that, and subtract our timestamp. To ensure partial days get treated as whole days, the timestamp we subtract is truncated to the nearest day.\r\n\r\nNote the way we've put the timestamp into a subquery. This isn't required, but it does mean you can give the timestamp a name, rather than having to list the literal repeatedly.\r\n\r\n\r\n\r\n### Work out the end time of bookings\r\n\r\nReturn a list of the start and end time of the last 10 bookings (ordered by the time at which they end, followed by the time at which they start) in the system. \r\n\r\n\r\nExpected results:\r\n\r\n| starttime | endtime |\r\n| ------------------- | ------------------- |\r\n| 2013-01-01 15:30:00 | 2013-01-01 16:00:00 |\r\n| 2012-09-30 19:30:00 | 2012-09-30 20:30:00 |\r\n| 2012-09-30 19:00:00 | 2012-09-30 20:30:00 |\r\n| 2012-09-30 19:30:00 | 2012-09-30 20:00:00 |\r\n| 2012-09-30 19:00:00 | 2012-09-30 20:00:00 |\r\n| 2012-09-30 19:00:00 | 2012-09-30 20:00:00 |\r\n| 2012-09-30 18:30:00 | 2012-09-30 20:00:00 |\r\n| 2012-09-30 18:30:00 | 2012-09-30 20:00:00 |\r\n| 2012-09-30 19:00:00 | 2012-09-30 19:30:00 |\r\n| 2012-09-30 18:30:00 | 2012-09-30 19:30:00 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect starttime, starttime + slots*(interval '30 minutes') endtime\r\n\tfrom cd.bookings\r\n\torder by endtime desc, starttime desc\r\n\tlimit 10 \r\n```\r\n\r\nThis question simply returns the start time for a booking, and a calculated end time which is equal to `start time + (30 minutes * slots)`. Note that it's perfectly okay to multiply intervals.\r\n\r\nThe other thing you'll notice is the use of order by and limit to get the last ten bookings. All this does is order the bookings by the (descending) time at which they end, and pick off the top ten.\r\n\r\n\r\n\r\n### Return a count of bookings for each month\r\n\r\nReturn a count of bookings for each month, sorted by month \r\n\r\n\r\nExpected results:\r\n\r\n| month | count |\r\n| ------------------- | ----- |\r\n| 2012-07-01 00:00:00 | 658 |\r\n| 2012-08-01 00:00:00 | 1472 |\r\n| 2012-09-01 00:00:00 | 1913 |\r\n| 2013-01-01 00:00:00 | 1 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect date_trunc('month', starttime) as month, count(*)\r\n\tfrom cd.bookings\r\n\tgroup by month\r\n\torder by month \r\n```\r\n\r\nThis one is a fairly simple reuse of concepts we've seen before. We simply count the number of bookings, and aggregate by the booking's start time, truncated to the month.\r\n\r\n\r\n\r\n### Work out the utilisation percentage for each facility by month\r\n\r\nWork out the utilisation percentage for each facility by month, sorted by name and month, rounded to 1 decimal place. Opening time is 8am, closing time is 8.30pm. You can treat every month as a full month, regardless of if there were some dates the club was not open. \r\n\r\n\r\nExpected results:\r\n\r\n| name | month | utilisation |\r\n| --------------- | ------------------- | ----------- |\r\n| Badminton Court | 2012-07-01 00:00:00 | 23.2 |\r\n| Badminton Court | 2012-08-01 00:00:00 | 59.2 |\r\n| Badminton Court | 2012-09-01 00:00:00 | 76.0 |\r\n| Massage Room 1 | 2012-07-01 00:00:00 | 34.1 |\r\n| Massage Room 1 | 2012-08-01 00:00:00 | 63.5 |\r\n| Massage Room 1 | 2012-09-01 00:00:00 | 86.4 |\r\n| Massage Room 2 | 2012-07-01 00:00:00 | 3.1 |\r\n| Massage Room 2 | 2012-08-01 00:00:00 | 10.6 |\r\n| Massage Room 2 | 2012-09-01 00:00:00 | 16.3 |\r\n| Pool Table | 2012-07-01 00:00:00 | 15.1 |\r\n| Pool Table | 2012-08-01 00:00:00 | 41.5 |\r\n| Pool Table | 2012-09-01 00:00:00 | 62.8 |\r\n| Pool Table | 2013-01-01 00:00:00 | 0.1 |\r\n| Snooker Table | 2012-07-01 00:00:00 | 20.1 |\r\n| Snooker Table | 2012-08-01 00:00:00 | 42.1 |\r\n| Snooker Table | 2012-09-01 00:00:00 | 56.8 |\r\n| Squash Court | 2012-07-01 00:00:00 | 21.2 |\r\n| Squash Court | 2012-08-01 00:00:00 | 51.6 |\r\n| Squash Court | 2012-09-01 00:00:00 | 72.0 |\r\n| Table Tennis | 2012-07-01 00:00:00 | 13.4 |\r\n| Table Tennis | 2012-08-01 00:00:00 | 39.2 |\r\n| Table Tennis | 2012-09-01 00:00:00 | 56.3 |\r\n| Tennis Court 1 | 2012-07-01 00:00:00 | 34.8 |\r\n| Tennis Court 1 | 2012-08-01 00:00:00 | 59.2 |\r\n| Tennis Court 1 | 2012-09-01 00:00:00 | 78.8 |\r\n| Tennis Court 2 | 2012-07-01 00:00:00 | 26.7 |\r\n| Tennis Court 2 | 2012-08-01 00:00:00 | 62.3 |\r\n| Tennis Court 2 | 2012-09-01 00:00:00 | 78.4 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect name, month, \r\n\tround((100*slots)/\r\n\t\tcast(\r\n\t\t\t25*(cast((month + interval '1 month') as date)\r\n\t\t\t- cast (month as date)) as numeric),1) as utilisation\r\n\tfrom (\r\n\t\tselect facs.name as name, date_trunc('month', starttime) as month, sum(slots) as slots\r\n\t\t\tfrom cd.bookings bks\r\n\t\t\tinner join cd.facilities facs\r\n\t\t\t\ton bks.facid = facs.facid\r\n\t\t\tgroup by facs.facid, month\r\n\t) as inn\r\norder by name, month \r\n```\r\n\r\nThe meat of this query (the inner subquery) is really quite simple: an aggregation to work out the total number of slots used per facility per month. If you've covered the rest of this section and the category on aggregates, you likely didn't find this bit too challenging.\r\n\r\nThis query does, unfortunately, have some other complexity in it: working out the number of days in each month. We can calculate the number of days between two months by subtracting two timestamps with a month between them. This, unfortunately, gives us back on interval datatype, which we can't use to do mathematics. In this case we've worked around that limitation by converting our timestamps into *dates* before subtracting. Subtracting date types gives us an integer number of days.\r\n\r\nA alternative to this workaround is to convert the interval into an *epoch* value: that is, a number of seconds. To do this use `EXTRACT(EPOCH FROM month)/(24*60*60)`. This is arguably a much nicer way to do things, but is much less portable to other database systems.\r\n\r\n***\r\n\r\n## String Operations\r\n\r\nString operations in most RDBMSs are, arguably, needlessly painful. Fortunately, Postgres is better than most in this regard, providing strong regular expression support. This section covers basic string manipulation, use of the LIKE operator, and use of regular expressions. I also make an effort to show you some alternative approaches that work reliably in most RDBMSs. Be sure to check out Postgres' string function [docs page](http://www.postgresql.org/docs/current/static/functions-matching.html) if you're not confident about these exercises.\r\n\r\nAnthony Molinaro's [SQL Cookbook](http://shop.oreilly.com/product/9780596009762.do) provides some excellent documentation of (difficult) cross-DBMS compliant SQL string manipulation. I'd strongly recommend his book, particularly as it's published by O'Reilly, whose ethical policy of DRM-free ebook distribution deserves rich rewards.\r\n\r\n### Format the names of members\r\n\r\nOutput the names of all members, formatted as 'Surname, Firstname' \r\n\r\n\r\nExpected results:\r\n\r\n| name |\r\n| ------------------------ |\r\n| GUEST, GUEST |\r\n| Smith, Darren |\r\n| Smith, Tracy |\r\n| Rownam, Tim |\r\n| Joplette, Janice |\r\n| Butters, Gerald |\r\n| Tracy, Burton |\r\n| Dare, Nancy |\r\n| Boothe, Tim |\r\n| Stibbons, Ponder |\r\n| Owen, Charles |\r\n| Jones, David |\r\n| Baker, Anne |\r\n| Farrell, Jemima |\r\n| Smith, Jack |\r\n| Bader, Florence |\r\n| Baker, Timothy |\r\n| Pinker, David |\r\n| Genting, Matthew |\r\n| Mackenzie, Anna |\r\n| Coplin, Joan |\r\n| Sarwin, Ramnaresh |\r\n| Jones, Douglas |\r\n| Rumney, Henrietta |\r\n| Farrell, David |\r\n| Worthington-Smyth, Henry |\r\n| Purview, Millicent |\r\n| Tupperware, Hyacinth |\r\n| Hunt, John |\r\n| Crumpet, Erica |\r\n| Smith, Darren |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect surname || ', ' || firstname as name from cd.members \r\n```\r\n\r\n Building strings in sql is similar to other languages, with the exception of the concatenation operator: ||. Some systems (like SQL Server) use +, but || is the SQL standard. \r\n\r\n\r\n\r\n### Find facilities by a name prefix\r\n\r\nFind all facilities whose name begins with 'Tennis'. Retrieve all columns. \r\n\r\n\r\nExpected results:\r\n\r\n| facid | name | membercost | guestcost | initialoutlay | monthlymaintenance |\r\n| ----- | -------------- | ---------- | --------- | ------------- | ------------------ |\r\n| 0 | Tennis Court 1 | 5 | 25 | 10000 | 200 |\r\n| 1 | Tennis Court 2 | 5 | 25 | 8000 | 200 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect * from cd.facilities where name like 'Tennis%'; \r\n```\r\n\r\nThe SQL `LIKE` operator is a highly standard way of searching for a string using basic matching. The % character matches any string, while _ matches any single character.\r\n\r\nOne point that's worth considering when you use `LIKE` is how it uses indexes. If you're using the 'C' [locale](http://www.postgresql.org/docs/current/static/locale.html), any `LIKE` string with a fixed beginning (as in our example here) can use an index. If you're using any other locale, `LIKE` will not use any index by default. See [here](http://www.postgresql.org/docs/current/static/indexes-opclass.html) for details on how to change that.\r\n\r\n\r\n\r\n### Perform a case-insensitive search\r\n\r\nPerform a case-insensitive search to find all facilities whose name begins with 'tennis'. Retrieve all columns. \r\n\r\n\r\nExpected results:\r\n\r\n| facid | name | membercost | guestcost | initialoutlay | monthlymaintenance |\r\n| ----- | -------------- | ---------- | --------- | ------------- | ------------------ |\r\n| 0 | Tennis Court 1 | 5 | 25 | 10000 | 200 |\r\n| 1 | Tennis Court 2 | 5 | 25 | 8000 | 200 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect * from cd.facilities where upper(name) like 'TENNIS%'; \r\n```\r\n\r\nThere's no direct operator for case-insensitive comparison in standard SQL. Fortunately, we can take a page from many other language's books, and simply force all values into upper case when we do our comparison. This renders case irrelevant, and gives us our result.\r\n\r\nAlternatively, Postgres does provide the `ILIKE` operator, which performs case insensitive searches. This isn't standard SQL, but it's arguably more clear.\r\n\r\nYou should realise that running a function like `UPPER` over a column value prevents Postgres from making use of any indexes on the column (the same is true for `ILIKE`). Fortunately, Postgres has got your back: rather than simply creating indexes over columns, you can also create indexes over [expressions](http://www.postgresql.org/docs/current/static/indexes-expressional.html). If you created an index over `UPPER(name)`, this query could use it quite happily.\r\n\r\n\r\n\r\n### Find telephone numbers with parentheses\r\n\r\nYou've noticed that the club's member table has telephone numbers with very inconsistent formatting. You'd like to find all the telephone numbers that contain parentheses, returning the member ID and telephone number sorted by member ID. \r\n\r\n\r\nExpected results:\r\n\r\n| memid | telephone |\r\n| ----- | -------------- |\r\n| 0 | (000) 000-0000 |\r\n| 3 | (844) 693-0723 |\r\n| 4 | (833) 942-4710 |\r\n| 5 | (844) 078-4130 |\r\n| 6 | (822) 354-9973 |\r\n| 7 | (833) 776-4001 |\r\n| 8 | (811) 433-2547 |\r\n| 9 | (833) 160-3900 |\r\n| 10 | (855) 542-5251 |\r\n| 11 | (844) 536-8036 |\r\n| 13 | (855) 016-0163 |\r\n| 14 | (822) 163-3254 |\r\n| 15 | (833) 499-3527 |\r\n| 20 | (811) 972-1377 |\r\n| 21 | (822) 661-2898 |\r\n| 22 | (822) 499-2232 |\r\n| 24 | (822) 413-1470 |\r\n| 27 | (822) 989-8876 |\r\n| 28 | (855) 755-9876 |\r\n| 29 | (855) 894-3758 |\r\n| 30 | (855) 941-9786 |\r\n| 33 | (822) 665-5327 |\r\n| 35 | (899) 720-6978 |\r\n| 36 | (811) 732-4816 |\r\n| 37 | (822) 577-3541 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect memid, telephone from cd.members where telephone ~ '[()]'; \r\n```\r\n\r\n We've chosen to answer this using regular expressions, although Postgres does provide other string functions like `POSITION` that would do the job at least as well. Postgres implements POSIX regular expression matching via the ~ operator. If you've used regular expressions before, the functionality of the operator will be very familiar to you.\r\n\r\n\r\n\r\nAs an alternative, you can use the SQL standard `SIMILAR TO` operator. The regular expressions for this have similarities to the POSIX standard, but a lot of differences as well. Some of the most notable differences are:\r\n\r\n- As in the `LIKE` operator, `SIMILAR TO` uses the '_' character to mean 'any character', and the '%' character to mean 'any string'. \r\n- A `SIMILAR TO` expression must match the whole string, not just a substring as in posix regular expressions. This means that you'll typically end up bracketing an expression in '%' characters. \r\n- The '.' character does not mean 'any character' in `SIMILAR TO` regexes: it's just a plain character. \r\n\r\nThe `SIMILAR TO` equivalent of the given answer is shown below:\r\n\r\n```sql\r\nselect memid, telephone from cd.members where telephone similar to '%[()]%';\r\n```\r\n\r\nFinally, it's worth noting that regular expressions usually don't use indexes. Generally you don't want your regex to be responsible for doing heavy lifting in your query, because it will be slow. If you need fuzzy matching that works fast, consider working out if your needs can be met by [full text search](http://www.postgresql.org/docs/current/static/textsearch.html).\r\n\r\n\r\n\r\n### Pad zip codes with leading zeroes\r\n\r\nThe zip codes in our example dataset have had leading zeroes removed from them by virtue of being stored as a numeric type. Retrieve all zip codes from the members table, padding any zip codes less than 5 characters long with leading zeroes. Order by the new zip code. \r\n\r\n\r\nExpected results:\r\n\r\n| zip |\r\n| ----- |\r\n| 00000 |\r\n| 00234 |\r\n| 00234 |\r\n| 04321 |\r\n| 04321 |\r\n| 10383 |\r\n| 11986 |\r\n| 23423 |\r\n| 28563 |\r\n| 33862 |\r\n| 34232 |\r\n| 43532 |\r\n| 43533 |\r\n| 45678 |\r\n| 52365 |\r\n| 54333 |\r\n| 56754 |\r\n| 57392 |\r\n| 58393 |\r\n| 64577 |\r\n| 65332 |\r\n| 65464 |\r\n| 66796 |\r\n| 68666 |\r\n| 69302 |\r\n| 75655 |\r\n| 78533 |\r\n| 80743 |\r\n| 84923 |\r\n| 87630 |\r\n| 97676 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect lpad(cast(zipcode as char(5)),5,'0') zip from cd.members order by zip \r\n```\r\n\r\nPostgres' `LPAD` function is the star of this particular show. It does basically what you'd expect: allow us to produce a padded string. We need to remember to cast the zipcode to a string for it to be accepted by the `LPAD` function.\r\n\r\nWhen inheriting an old database, It's not that unusual to find wonky decisions having been made over data types. You may wish to fix mistakes like these, but have a lot of code that would break if you changed datatypes. In that case, one option (depending on performance requirements) is to create a [view](http://www.postgresql.org/docs/current/static/sql-createview.html) over your table which presents the data in a fixed-up manner, and gradually migrate.\r\n\r\n\r\n\r\n### Count the number of members whose surname starts with each letter of the alphabet \r\n\r\nYou'd like to produce a count of how many members you have whose surname starts with each letter of the alphabet. Sort by the letter, and don't worry about printing out a letter if the count is 0. \r\n\r\n\r\nExpected results:\r\n\r\n| letter | count |\r\n| ------ | ----- |\r\n| B | 5 |\r\n| C | 2 |\r\n| D | 1 |\r\n| F | 2 |\r\n| G | 2 |\r\n| H | 1 |\r\n| J | 3 |\r\n| M | 1 |\r\n| O | 1 |\r\n| P | 2 |\r\n| R | 2 |\r\n| S | 6 |\r\n| T | 2 |\r\n| W | 1 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect substr (mems.surname,1,1) as letter, count(*) as count \r\n from cd.members mems\r\n group by letter\r\n order by letter \r\n```\r\n\r\nThis exercise is fairly straightforward. You simply need to retrieve the first letter of the member's surname, and do some basic aggregation to achieve a count. We use the `SUBSTR` function here, but there's a variety of other ways you can achieve the same thing. The `LEFT` function, for example, returns you the first n characters from the left of the string. Alternatively, you could use the `SUBSTRING` function, which allows you to use regular expressions to extract a portion of the string.\r\n\r\nOne point worth noting: as you can see, string functions in SQL are based on 1-indexing, not the 0-indexing that you're probably used to. This will likely trip you up once or twice before you get used to it :-)\r\n\r\n\r\n\r\n### Clean up telephone numbers\r\n\r\nThe telephone numbers in the database are very inconsistently formatted. You'd like to print a list of member ids and numbers that have had '-','(',')', and ' ' characters removed. Order by member id. \r\n\r\n\r\nExpected results:\r\n\r\n| memid | telephone |\r\n| ----- | ---------- |\r\n| 0 | 0000000000 |\r\n| 1 | 5555555555 |\r\n| 2 | 5555555555 |\r\n| 3 | 8446930723 |\r\n| 4 | 8339424710 |\r\n| 5 | 8440784130 |\r\n| 6 | 8223549973 |\r\n| 7 | 8337764001 |\r\n| 8 | 8114332547 |\r\n| 9 | 8331603900 |\r\n| 10 | 8555425251 |\r\n| 11 | 8445368036 |\r\n| 12 | 8440765141 |\r\n| 13 | 8550160163 |\r\n| 14 | 8221633254 |\r\n| 15 | 8334993527 |\r\n| 16 | 8339410824 |\r\n| 17 | 8114096734 |\r\n| 20 | 8119721377 |\r\n| 21 | 8226612898 |\r\n| 22 | 8224992232 |\r\n| 24 | 8224131470 |\r\n| 26 | 8445368036 |\r\n| 27 | 8229898876 |\r\n| 28 | 8557559876 |\r\n| 29 | 8558943758 |\r\n| 30 | 8559419786 |\r\n| 33 | 8226655327 |\r\n| 35 | 8997206978 |\r\n| 36 | 8117324816 |\r\n| 37 | 8225773541 |\r\n\r\n\r\nAnswer:\r\n\r\n```sql\r\nselect memid, translate(telephone, '-() ', '') as telephone\r\n from cd.members\r\n order by memid; \r\n```\r\n\r\nThe most direct solution is probably the `TRANSLATE` function, which can be used to replace characters in a string. You pass it three strings: the value you want altered, the characters to replace, and the characters you want them replaced with. In our case, we want all the characters deleted, so our third parameter is an empty string.\r\n\r\nAs is often the way with strings, we can also use regular expressions to solve our problem. The `REGEXP_REPLACE` function provides what we're looking for: we simply pass a regex that matches all non-digit characters, and replace them with nothing, as shown below. The 'g' flag tells the function to replace as many instances of the pattern as it can find. This solution is perhaps more robust, as it cleans out more bad formatting.\r\n\r\n```sql\r\nselect memid, regexp_replace(telephone, '[^0-9]', '', 'g') as telephone\r\n from cd.members\r\n order by memid;\r\n```\r\n\r\nMaking automated use of free-formatted text data can be a chore. Ideally you want to avoid having to constantly write code to clean up the data before using it, so you should consider having your database enforce correct formatting for you. You can do this using a [CHECK](http://www.postgresql.org/docs/current/static/ddl-constraints.html) constraint on your column, which allow you to reject any poorly-formatted entry. It's tempting to perform this kind of validation in the application layer, and this is certainly a valid approach. As a general rule, if your database is getting used by multiple applications, favour pushing more of your checks down into the database to ensure consistent behaviour between the apps.\r\n\r\nOccasionally, adding a constraint isn't feasible. You may, for example, have two different legacy applications asserting differently formatted information. If you're unable to alter the applications, you have a couple of options to consider. Firstly, you can define a [trigger](http://www.postgresql.org/docs/current/static/sql-createtrigger.html) on your table. This allows you to intercept data before (or after) it gets asserted to your table, and normalise it into a single format. Alternatively, you could build a [view](http://www.postgresql.org/docs/current/static/sql-createview.html) over your table that cleans up information on the fly, as it's read out. Newer applications can read from the view and benefit from more reliably formatted information.\r\n\r\n***\r\n\r\n## Recursive Queries\r\n\r\nCommon Table Expressions allow us to, effectively, create our own temporary tables for the duration of a query - they're largely a convenience to help us make more readable SQL. Using the [WITH RECURSIVE](http://www.postgresql.org/docs/current/static/queries-with.html) modifier, however, it's possible for us to create recursive queries. This is enormously advantageous for working with tree and graph-structured data - imagine retrieving all of the relations of a graph node to a given depth, for example.\r\n\r\nThis category shows you some basic recursive queries that are possible using our dataset.\r\n\r\n\r\n\r\n### Find the upward recommendation chain for member ID 27\r\n\r\nFind the upward recommendation chain for member ID 27: that is, the member who recommended them, and the member who recommended that member, and so on. Return member ID, first name, and surname. Order by descending member id. \r\n\r\n\r\nExpected results:\r\n\r\n| recommender | firstname | surname |\r\n| ----------- | --------- | ------- |\r\n| 20 | Matthew | Genting |\r\n| 5 | Gerald | Butters |\r\n| 1 | Darren | Smith |\r\n\r\n\r\nAnswer:\r\n\r\n\r\n```sql\r\nwith recursive recommenders(recommender) as (\r\n\tselect recommendedby from cd.members where memid = 27\r\n\tunion all\r\n\tselect mems.recommendedby\r\n\t\tfrom recommenders recs\r\n\t\tinner join cd.members mems\r\n\t\t\ton mems.memid = recs.recommender\r\n)\r\nselect recs.recommender, mems.firstname, mems.surname\r\n\tfrom recommenders recs\r\n\tinner join cd.members mems\r\n\t\ton recs.recommender = mems.memid\r\norder by memid desc \r\n```\r\n\r\n`WITH RECURSIVE` is a fantastically useful piece of functionality that many developers are unaware of. It allows you to perform queries over hierarchies of data, which is very difficult by other means in SQL. Such scenarios often leave developers resorting to multiple round trips to the database system.\r\n\r\nYou've seen `WITH` before. The Common Table Expressions (CTEs) defined by WITH give you the ability to produce inline views over your data. This is normally just a syntactic convenience, but the `RECURSIVE` modifier adds the ability to join against results already produced to produce even more. A recursive `WITH` takes the basic form of:\r\n\r\n```sql\r\nWITH RECURSIVE NAME(columns) as (\r\n\t<initial statement>\r\n\tUNION ALL \r\n\t<recursive statement>\r\n)\r\n```\r\n\r\nThe initial statement populates the initial data, and then the recursive statement runs repeatedly to produce more. Each step of the recursion can access the CTE, but it sees within it only the data produced by the previous iteration. It repeats until an iteration produces no additional data. \r\n\r\nThe most simple example of a recursive `WITH` might look something like this:\r\n\r\n```sql\r\nwith recursive increment(num) as (\r\n\tselect 1\r\n\tunion all\r\n\tselect increment.num + 1 from increment where increment.num < 5\r\n)\r\nselect * from increment;\r\n```\r\n\r\nThe initial statement produces '1'. The first iteration of the recursive statement sees this as the content of `increment`, and produces '2'. The next iteration sees the content of `increment` as '2', and so on. Execution terminates when the recursive statement produces no additional data.\r\n\r\nWith the basics out of the way, it's fairly easy to explain our answer here. The initial statement gets the ID of the person who recommended the member we're interested in. The recursive statement takes the results of the initial statement, and finds the ID of the person who recommended them. This value gets forwarded on to the next iteration, and so on.\r\n\r\nNow that we've constructed the recommenders CTE, all our main `SELECT` statement has to do is get the member IDs from recommenders, and join to them members table to find out their names.\r\n\r\n\r\n\r\n### Find the downward recommendation chain for member ID 1\r\n\r\nFind the downward recommendation chain for member ID 1: that is, the members they recommended, the members those members recommended, and so on. Return member ID and name, and order by ascending member id. \r\n\r\n\r\nExpected results:\r\n\r\n| memid | firstname | surname |\r\n| ----- | --------- | --------- |\r\n| 4 | Janice | Joplette |\r\n| 5 | Gerald | Butters |\r\n| 7 | Nancy | Dare |\r\n| 10 | Charles | Owen |\r\n| 11 | David | Jones |\r\n| 14 | Jack | Smith |\r\n| 20 | Matthew | Genting |\r\n| 21 | Anna | Mackenzie |\r\n| 26 | Douglas | Jones |\r\n| 27 | Henrietta | Rumney |\r\n\r\n\r\nAnswer:\r\n\r\n\r\n```sql\r\nwith recursive recommendeds(memid) as (\r\n\tselect memid from cd.members where recommendedby = 1\r\n\tunion all\r\n\tselect mems.memid\r\n\t\tfrom recommendeds recs\r\n\t\tinner join cd.members mems\r\n\t\t\ton mems.recommendedby = recs.memid\r\n)\r\nselect recs.memid, mems.firstname, mems.surname\r\n\tfrom recommendeds recs\r\n\tinner join cd.members mems\r\n\t\ton recs.memid = mems.memid\r\norder by memid \r\n```\r\n\r\nThis is a pretty minor variation on the previous question. The essential difference is that we're now heading in the opposite direction. One interesting point to note is that unlike the previous example, this CTE produces multiple rows per iteration, by virtue of the fact that we're heading down the recommendation tree (following all branches) rather than up it.\r\n\r\n\r\n\r\n### Produce a CTE that can return the upward recommendation chain for any member\r\n\r\nProduce a CTE that can return the upward recommendation chain for any member. You should be able to select recommender from recommenders where member=x. Demonstrate it by getting the chains for members 12 and 22. Results table should have member and recommender, ordered by member ascending, recommender descending. \r\n\r\n\r\nExpected results:\r\n\r\n| member | recommender | firstname | surname |\r\n| ------ | ----------- | --------- | -------- |\r\n| 12 | 9 | Ponder | Stibbons |\r\n| 12 | 6 | Burton | Tracy |\r\n| 22 | 16 | Timothy | Baker |\r\n| 22 | 13 | Jemima | Farrell |\r\n\r\n\r\nAnswer:\r\n\r\n\r\n```sql\r\nwith recursive recommenders(recommender, member) as (\r\n\tselect recommendedby, memid\r\n\t\tfrom cd.members\r\n\tunion all\r\n\tselect mems.recommendedby, recs.member\r\n\t\tfrom recommenders recs\r\n\t\tinner join cd.members mems\r\n\t\t\ton mems.memid = recs.recommender\r\n)\r\nselect recs.member member, recs.recommender, mems.firstname, mems.surname\r\n\tfrom recommenders recs\r\n\tinner join cd.members mems\t\t\r\n\t\ton recs.recommender = mems.memid\r\n\twhere recs.member = 22 or recs.member = 12\r\norder by recs.member asc, recs.recommender desc \r\n```\r\n\r\nThis question requires us to produce a CTE that can calculate the upward recommendation chain for any user. Most of the complexity of working out the answer is in realising that we now need our CTE to produce two columns: one to contain the member we're asking about, and another to contain the members in their recommendation tree. Essentially what we're doing is producing a table that flattens out the recommendation hierarchy.\r\n\r\nSince we're looking to produce the chain for every user, our initial statement needs to select data for each user: their ID and who recommended them. Subsequently, we want to pass the member field through each iteration without changing it, while getting the next recommender. You can see that the recursive part of our statement hasn't really changed, except to pass through the 'member' field.\r\n","note":"Don't delete this file! It's used internally to help with page regeneration."}