Skip to content

Commit

Permalink
add solution to command line homework
Browse files Browse the repository at this point in the history
  • Loading branch information
justmarkham committed Aug 25, 2015
1 parent 15ac4e5 commit 54d6b76
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 6 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ Tuesday | Thursday

### Class 3: Data Reading and Cleaning
* Git and GitHub assorted tips ([slides](slides/02_git_github.pdf))
* Review command line homework (solution)
* Review command line homework ([solution](homework/02_command_line_chipotle.md))
* Python:
* Spyder interface
* Looping exercise
Expand Down Expand Up @@ -156,6 +156,7 @@ Tuesday | Thursday
* If you want to go really deep into Pandas (and NumPy), read the book [Python for Data Analysis](http://shop.oreilly.com/product/0636920023784.do), written by the creator of Pandas.
* This notebook demonstrates the different types of [joins in Pandas](http://nbviewer.ipython.org/github/justmarkham/DAT8/blob/master/notebooks/05_pandas_merge.ipynb), for when you need to figure out how to merge two DataFrames.
* This is a nice, short tutorial on [pivot tables](https://beta.oreilly.com/learning/pivot-tables) in Pandas.
* For working with geospatial data in Python, [GeoPandas](http://geopandas.org/index.html) looks promising. This [tutorial](http://michelleful.github.io/code-blog/2015/04/24/sgmap/) uses GeoPandas (and scikit-learn) to build a "linguistic street map" of Singapore.
**Visualization Resources:**
* Watch [Look at Your Data](https://www.youtube.com/watch?v=coNDCIMH8bk) (18 minutes) for an excellent example of why visualization is useful for understanding your data.
Expand Down
30 changes: 25 additions & 5 deletions homework/02_command_line_chipotle.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
## Class 2 Homework: Command Line Chipotle

**Command Line Tasks:**
#### Submitting Your Homework

* Create a Markdown file that includes your answers **and** the code you used to arrive at those answers.
* Add this Markdown file to a GitHub repo that you'll use for all of your coursework.
* Submit a link to your repo using the homework submission form.

#### Command Line Tasks

1. Look at the head and the tail of **chipotle.tsv** in the **data** subdirectory of this repo. Think for a minute about how the data is structured. What do you think each column means? What do you think each row means? Tell me! (If you're unsure, look at more of the file contents.)
2. How many orders do there appear to be?
Expand All @@ -11,8 +17,22 @@
7. Count the approximate number of occurrences of the word "dictionary" (regardless of case) across all files in the DAT8 repo.
8. **Optional:** Use the the command line to discover something "interesting" about the Chipotle data. Try using the commands from the "advanced" section!

**Submitting Your Homework:**
#### Solution

* Create a Markdown file that includes your answers **and** the code you used to arrive at those answers.
* Add this Markdown file to a GitHub repo that you'll use for all of your coursework.
* Submit a link to your repo using the homework submission form.
1. **order_id** is the unique identifier for each order. **quantity** is the number purchased of a particular item. **item_name** is the primary name for the item being purchased. **choice_description** is list of modifiers for that item. **price** is the price for that entire line (taking **quantity** into account). A given order consists of one or more rows, depending upon the number of unique items being purchased in that order.
* `head chipotle.tsv`
* `tail chipotle.tsv`
2. There are 1834 orders (since 1834 is the highest **order_id** number).
3. The file has 4623 lines.
* `wc -l chipotle.tsv`
4. Chicken burritos are more popular than steak burritos.
* Compare `grep -i 'chicken burrito' chipotle.tsv | wc -l` with `grep -i 'steak burrito' chipotle.tsv | wc -l`
* Alternatively, use the 'c' option of `grep` to skip the piping step: `grep -ic 'chicken burrito' chipotle.tsv`
5. Black beans are more popular than pinto beans (on chicken burritos).
* Compare `grep -i 'chicken burrito' chipotle.tsv | grep -i 'black beans' | wc -l` with `grep -i 'chicken burrito' chipotle.tsv | grep -i 'pinto beans' | wc -l`
* Alternatively, use the 'c' option of `grep` and a more complex regular expression pattern to skip the piping steps: `grep -ic 'chicken burrito.*black beans' chipotle.tsv`
6. At the moment, the CSV and TSV files in the DAT8 repo are **airlines.csv**, **chipotle.tsv**, and **sms.tsv**, all of which are in the **data** subdirectory.
* Change your working directory to DAT8, and then use `find . -name *.?sv`
7. At the moment, there are 13 lines in DAT8 files that contain the word 'dictionary', which is a good approximation of the number of occurrences.
* Change your working directory to DAT8, and then use `grep -ir 'dictionary' . | wc -l`
* Alternatively, use the 'c' option of `grep` to skip the piping step: `grep -irc 'dictionary' .`

0 comments on commit 54d6b76

Please sign in to comment.