diff --git a/bin/copy b/bin/copy deleted file mode 120000 index 6244af80..00000000 --- a/bin/copy +++ /dev/null @@ -1 +0,0 @@ -../copy \ No newline at end of file diff --git a/data/ucl b/data/ucl new file mode 120000 index 00000000..1edb287a --- /dev/null +++ b/data/ucl @@ -0,0 +1 @@ +/Users/plewis/geog0111/work \ No newline at end of file diff --git a/docs/001_Notebook_use 2.md b/docs/001_Notebook_use 2.md new file mode 100644 index 00000000..18520de4 --- /dev/null +++ b/docs/001_Notebook_use 2.md @@ -0,0 +1,220 @@ +# Use of Jupyter notebooks + +This is a jupyter notebook designed to let you get used to notebooks, and to test your python and notebooks installation. + +You can find much information on using notebooks on the [web](https://jupyter.org/), so you might start by exploring some of that. + +You should do the tasks in this notebook before the first class. We need to assume at that class that you have the basic familiarity with notebooks you will gain here. + +## Introduction + +This is your first Jupyter notebook of the class. Jupyter notebooks will form the primary teaching and learning tool in this course. The format of the notebooks will be similar for all sessions. + +### Course material + +You will find full, up to date information on this course GEOG0111 on the [UCL course moodle page](https://moodle.ucl.ac.uk/course/view.php?id=21495). + +### Course load + +This course is intended to be 25 % of your course load for the term. You will find fuller information on the [GEOG0111 course moodle page](https://moodle.ucl.ac.uk/course/view.php?id=21495), but that percentage should give you some idea of the amount of effort we are expecting you to put in (on average) per week. + +### Learning + +You will be expected to learn from what we present in these notebooks and by following up material referenced and wider resources. + +Learning is mostly blocked into two-week chunks, with a test you need to submit at the end of the block. You will receive feedback from your test submission to help you learn from what you have done well and not so well. It is important that you submit materials for these tests. It is not about 'getting the right answer', but giving us the opportunity to regularly gauge your progress. + +There will be two pieces of work that you submit for formal assessment on the course: one half way through, and one at the end. Again, we will provide you with feedback on these. + +You will be covering a number of notebooks per week in a learning chunk, and you will need to keep up. If you find you are having problems, or there are reasons you cannot work, you must let us know so we can help you. + +We will provide more information on learning in this course and the resources you have access to on the course moodle page. + +### Prerequisites + +There are no prerequisites for this notebook. + +Note that you can 'run' the code in a code block using the 'run' widget (above) or hitting the keys ('typing') and at the same time. + +### Timing + +The session should take around 20 minutes though you will spend longer on follow-up material and exercises. + + +## Some resources + +There is a useful [cheatsheet](https://www.anaconda.com/wp-content/uploads/2019/03/11-2018-JupyterLab-Notebook-Cheatsheet.pdf) on using Jupyter, and [another, on markdown syntax](https://guides.github.com/pdfs/markdown-cheatsheet-online.pdf) for you to use. + + +## Anaconda and Jupyter + +We will be using software from the [anaconda distribution of python](https://anaconda.org/anaconda/python). This should already be installed for you if you are viewing this, but we can run some quick tests. Running the cell below (`>| Run`) should give, the following, or higher: + + jupyter core : 4.6.1 + jupyter-notebook : 6.0.3 + ipython : 7.12.0 + ipykernel : 5.1.4 + jupyter client : 5.3.4 + jupyter lab : 1.2.6 + nbconvert : 5.6.1 + ipywidgets : 7.5.1 + nbformat : 5.0.4 + traitlets : 4.3.3 + conda 4.8.2 + Python 3.7.6 + +If that is not the case, then make a copy of what it does produce, and contact the course organisers through [moodle](https://moodle.ucl.ac.uk/course/view.php?id=21495). + + + + +```bash +%%bash +# tests +jupyter --version +conda -V +python -V +``` + + jupyter core : 4.6.3 + jupyter-notebook : 6.1.3 + qtconsole : 4.7.6 + ipython : 7.18.1 + ipykernel : 5.3.4 + jupyter client : 6.1.7 + jupyter lab : 2.2.6 + nbconvert : 5.6.1 + ipywidgets : 7.5.1 + nbformat : 5.0.7 + traitlets : 4.3.3 + conda 4.8.4 + Python 3.7.8 + + +The code cell above that we ran is a [`unix` (`bash`) shell](https://en.wikipedia.org/wiki/Bash_(Unix_shell)), indicated by the [cell magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html) `%%bash` on the first line. This is a mechanism that lets us run unix commands in a Python notebook. + + + +## How we will be using notebooks + +We will be using Jupyter notebooks to present course notes and view and run exercises. + +### Saving your work + +The first thing we want you to do when you open a new notebook is to **make a copy of it in the [`work`](work) folder**. + +**You should make a copy of each new notebook when you start.** If you don't, then any work you do may not be there the next time you run the notebook. When you come back to run a notebook again, in a new session, you should run the saved notebook (unless you want to start afresh). + + + +### Cells + +The notebook is made up of a series of [cells](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html#:~:text=A%20cell%20is%20a%20multiline,markdown%20cells%2C%20and%20raw%20cells). Some cells, such as the one this is written in, are 'text' cells, where we format the text in a language called [markdown](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html). + +Take a few minutes to explore the notebook menu, and note how to do things like: + +* save the notebook +* save the notebook with a checkpoint: useful for exercises, as you can go back to previous versions! +* make a copy of the notebook and rename it +* download the notebook as a pdf +* restart the kernel (the 'engine' running this notebook) +* restart the kernel and clear output + +### Exercise: add a cell + +We can add new cells to this document via the `Insert -> Insert Cell Below` menu in the menu bar at the top of this document. + +Notice that you can double click on a cell to edit its contents. + +Add a cell now, below, and use the `Cell -> Cell Type` menu to make this cell type `markdown`. Add some text in there ... lyrics from your favourite song, whatever you like ... + + +### Exercise: add some cell formatting + + +Add another cell now, below, and use the Cell -> Cell Type menu to make this cell type markdown. + +Read up on the [features of markdown](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet), and this time, include one or more of the following in your cell: + +* a heading +* a sub-heading +* and equation +* links to a web page +* a table +* a image +* some html + +## Coding + +Next, let's try a code cell below and do our first python coding. You should notice that it indicates this in the cell type box on the menu. + +We will use the method `print()` to print out a `string` (a list of characters) called `first_string`. + +We *execute* ('run') the code in the cell, either with the `>| Run` button above, or by pressing the `SHIFT RETURN` keys at the same time. + + +```python +# comment with a hash + +# set a string variable +first_string = "hello world" + +# print this +print(first_string) +``` + + hello world + + +The type of cell we use is `Code` (rather than `Markdown` above). + +Remember that we *execute* ('run') the code in the cell, either with the `>| Run` button above, or by pressing the `SHIFT RETURN` keys at the same time. + +Try that out, running the code cell above. + + + +### Exercise + +Now: + +* create a code cell below +* create a string called `second_string` with the text `hello again` +* call the `print()` method with this as an [argument](https://en.wikipedia.org/wiki/Parameter_(computer_programming)) +* run the cell + +### Exercise + +* create a code cell below +* print the values of `first_string` and `second_string` that we created above. +* what does that tell you about information we create in one cell and try to use in another? + +### Exercise + +* create a code cell below +* try to print a variable `third_string` (that you haven't yet created) +* run the cell +* what does this tell you about trying to print variables we haven't created? + +### Exercise + +* create a code cell below +* *now* create a string called `third_string` with the text `hello once more` +* run the cell, then the **cell above** +* what does that tell you about information we create in one cell and try to use in another above? + +# Summary + +This notebook has introduced you to using jupyter notebooks. + +To make sure you understand it all, it is worthwhile restarting the kernel with cleared output (`Kernel -> Restart & Clear Output`() and running it all again. Once you are happy with that, you might try (`Kernel -> Restart & Run All`(). + +We have explored the notebook menu, and seen how to run cells, create new cells, and change the cell type to something appropriate (`Markdown` or `Code` here). + +We have seen how to set a string variable and print the value of that variable. + +We have noticed that variables are persistent between cells, so if we define a variable in one cell, we can use it *later on*. We have seen that if we try to access a variable before we have declared it, it will throw a `NameError`, telling us this. Having seen this type of error once, and understanding why it occurred should prepare us for the next time we see one similar. + +We have seen one of the *dangers* of a notebook: it allows you to go back and forth running cells. This can lead to confusion, as the next time you run the same notebook in cell order, you may not get the same result! It is one of the most common mistakes for a beginner to make, so be aware of this, and try to always run the cells in the same order. **You can test for this type of error by restarting the kernel, clearing the output, and running all cells.** + +We have written our very first `python` codes! diff --git a/docs/001_Notebook_use_answers 2.md b/docs/001_Notebook_use_answers 2.md new file mode 100644 index 00000000..bf5256e6 --- /dev/null +++ b/docs/001_Notebook_use_answers 2.md @@ -0,0 +1,202 @@ +# Use of Jupyter notebooks : Answers to exercises + +### Exercise: add a cell + +We can add new cells to this document via the `Insert -> Insert Cell Below` menu in the menu bar at the top of this document. + +Notice that you can double click on a cell to edit its contents. + +Add a cell now, below, and use the `Cell -> Cell Type` menu to make this cell type `markdown`. Add some text in there ... lyrics from your favourite song, whatever you like ... + + +# ANSWER +This is a markdown cell ... + +Hello world is traditionally the first coding you do. + +### Exercise: add some cell formatting + + +Add another cell now, below, and use the Cell -> Cell Type menu to make this cell type markdown. + +Read up on the [features of markdown](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet), and this time, include one or more of the following in your cell: + +* a heading +* a sub-heading +* and equation +* links to a web page +* a table +* a image +* some html + +# ANSWER + +# Main Heading + +## equation + +Equations + +\begin{equation*} +\left( \sum_{k=1}^n a_k b_k \right)^2 \leq \left( \sum_{k=1}^n a_k^2 \right) \left( \sum_{k=1}^n b_k^2 \right) +\end{equation*} + +### link + +[click me and I will pop up a google search window](https://www.google.com) + +### table + +| a | b | c | d | e | +|:-:|:-:|:-:|:-:|-| +| 🙈 | 💥 | 🦧 | 🐇 | 🐪 | +| f | g | h | i | j | +| 🙈 | 💥 | 🦧 | 🐇 | 🐪 | + +### image + +![ucl logo](images/ucl_logo.png) + + +### html + + + +

HTML

+ +

Hey, I'm a paragraph!

+ + + + + +### Exercise + +Now: + +* create a code cell below +* create a string called `second_string` with the text `hello again` +* call the `print()` method with this as an [argument](https://en.wikipedia.org/wiki/Parameter_(computer_programming)) +* run the cell + + +```python +# ANSWER +# create a code cell below +# create a string called `second_string` with the text `hello again` + +second_string = 'hello again' + +# call the `print()` method with this as an argument +print(second_string) + +# run the cell +``` + + hello again + + +### Exercise + +* create a code cell below +* print the values of `first_string` and `second_string` that we created above. +* what does that tell you about information we create in one cell and try to use in another? + + +```python +# ANSWER +# create a code cell below +# print the values of first_string and second_string that we created above. + +print(first_string) +print(second_string) + +# or + +print(first_string,second_string) + +# what does that tell you about information we create in one cell and try to use in another? +# +# It tells us that the informnation is persistent, i.e. once we have created the +# variables, we can use them in running any cells later. +``` + + hello world + hello again + hello world hello again + + +### Exercise + +* create a code cell below +* try to print a variable `third_string` (that you haven't yet created) +* run the cell +* what does this tell you about trying to print variables we haven't created? + + +```python +# ANSWER +# create a code cell below + +# try to print a variable `third_string` that you haven't yet created +print(third_string) + +# The first time we run it +# it comes up with an error +# NameError: name 'third_string' is not defined +# telling us that we have tried to access a variable name that +# we have not yet defined. Be aware of this type of error. + +# what does that tell you about trying to print variables we haven't created? +# +# if we try to use a variable before we create it, the code will fail and throw +# an error. This is useful information: learn to read the errors and understand what +# it is telling you. +``` + + + --------------------------------------------------------------------------- + + NameError Traceback (most recent call last) + + in + 3 + 4 # try to print a variable `third_string` that you haven't yet created + ----> 5 print(third_string) + 6 + 7 # The first time we run it + + + NameError: name 'third_string' is not defined + + +### Exercise + +* create a code cell below +* *now* create a string called `third_string` with the text `hello once more` +* run the cell, then the **cell above** +* what does that tell you about information we create in one cell and try to use in another above? + + +```python +# ANSWER +# create a code cell below +# now create a string called third_string with the text hello once more +third_string = 'hello once more' + +# run the cell above +# same as ... +print(third_string) + +# what does that tell you about information we create in one cell and +# try to use in another above? +# +# we can run cells in any order. Once we had created third_string, the +# previous exercise print(third_string) executed as we expected. +# The *Danger* is that the next time we run this notebook in cell order +# the cell above will fail again. Learn from the mistakes we make. +# Remember what this type of error can mean. +``` + + hello once more + diff --git a/docs/002_Unix 2.md b/docs/002_Unix 2.md new file mode 100644 index 00000000..92524bc7 --- /dev/null +++ b/docs/002_Unix 2.md @@ -0,0 +1,374 @@ +# 002 Some basic UNIX + + +## Introduction + + +### Purpose + +Although this course is about coding in Python, it can be of great value to you to learn at least some basic concepts and commands of the operating system. To that end, in this session we will learn some basic `unix` commands. You will be able to use these in almost any modern computing operating system may use: the `unix` shell is a core part of [`linux`](https://www.linux.org/) and [macOS](https://en.wikipedia.org/wiki/MacOS) and is directly available to you even in [Windows 10](https://docs.microsoft.com/en-us/windows/wsl/about). If you use these notes through the [`JupyterLab`](https://jupyterlab.readthedocs.io/en/stable/) interface, even from a mobile device, you will have access to a [`unix` shell](https://jupyterlab.readthedocs.io/en/stable/user/terminal.html?highlight=bash) to run commands. + +There are many online tutorials on unix. A good place to start backup material and some more features for the material we will cover today is [software-carpentry.org](https://v4.software-carpentry.org/shell/index.html). + + +### Prerequisites + +You will need some understanding of the following: + + +* [001 Using Notebooks](001_Notebook_use.md) + + +Remember that you can 'run' the code in a code block using the 'run' widget (above) or hitting the keys ('typing') and at the same time. + +### Timing + +The session should take around 15 minutes. + +## Running unix commands + +The code cells in this notebook take Python commands by default, but we can run `unix` commands either by pre-pending a single command with `!`: + + +```python +!pwd +``` + + /Users/plewis/Documents/GitHub/geog0111/notebooks + + +or by using the [cell magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html) `%%bash`: + + +```bash +%%bash + +# comment is after # +pwd +``` + + /Users/plewis/Documents/GitHub/geog0111/notebooks + + +If you are using these notes through the [`JupyterLab`](https://jupyterlab.readthedocs.io/en/stable/) interface you have access to a [terminal](https://jupyterlab.readthedocs.io/en/stable/user/terminal.html?highlight=bash) to run unix commands. + +This original directory is now given by the shell variable `${here}`. + +## Navigating the file system + +### `~` `.` `..` + +You will be used to the idea of navigating the filesystem from any previous computing you have ever done. You may have done this by clicking your way to a certain 'location' using `File explorer` (in Windows 10) or `Finder` (in MacOS), but you will have some familiarity with the tree-like nature of a filesystem: you go up or down in the system to find your way to the files and directories you want. + +When we do this typing command in the `unix` shell, the concepts are exactly the same, but we have some new symbols to learn to help us navigate: + + - ~ (tilde/twiddle) + - . (dot) + - .. (dot-dot) + + +The tilde symbol `~` is a shorthand to refer to your home directory (this would generally be `C:\Users\username` on windows, `/Users/username` on MacOS or `/home/username` in linux, where `username` is your username). On windows, the file separator is `\` (backslash), but on [posix](https://en.wikipedia.org/wiki/POSIX#:~:text=The%20Portable%20Operating%20System%20Interface,maintaining%20compatibility%20between%20operating%20systems.) (`unix`-like) systems it is '/', forward slash. This can cause some issues when changing operating system. You should try to use the posix '/' whenever you can as this is more portable. + +The symbol `.` means the current directory, and `..` refers to one level up in the directory tree. + +### `cd` `pwd` + +The command `cd filepath` is used in the shell to change from one directory to another. Typically, when you start a shell, you will be in your home directory. We can explicitly 'go to' (i.e. `cd` to) your home with `cd ~`. We use the command `pwd` to print the current working directory. + +So the following sequence: + + +```bash +%%bash +cd ~ +pwd +``` + + /Users/plewis + + +changes directory to our home, and prints the directory name. + + +```bash +%%bash +pwd +cd .. +pwd +``` + + + --------------------------------------------------------------------------- + + CalledProcessError Traceback (most recent call last) + + in + ----> 1 get_ipython().run_cell_magic('bash', '', 'pwd\ncd ..\npwd\n') + + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell) + + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/IPython/core/magics/script.py in named_script_magic(line, cell) + + + in shebang(self, line, cell) + + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/IPython/core/magic.py in (f, *a, **k) + + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/IPython/core/magics/script.py in shebang(self, line, cell) + + + CalledProcessError: Command 'b'pwd\ncd ..\npwd\n'' died with . + + +### `ls` `ls -lh` `*` + +The command `ls` lists the files specified. For example: + + + +```bash +%%bash +cd ~/geog0111 +ls R* +``` + + README.md + + +Here, `R*` uses the wildcard `*`, so `R*` means `R` followed by zero or more characters. + +If we specify the option `-lh` then it provides a long listing (`-l`) with file size in 'human-readable' format (`-h`): + + +```bash +%%bash +cd ~/geog0111 +ls -lh README.md +``` + + -rw-r--r-- 1 plewis staff 321B 6 Sep 22:34 README.md + + +Here, the file size if `321B` (321 Bytes), and the file is owned by the user `plewis`. The field `-rw-r--r--` provides information on file permissions. Ignoring the first `-`, it is in 3 sets of 3 bits: + + rw- r-- r-- + +which refer to permissions for the user, group, and everyone, respectively. The permission fields are `rwx`, meaning permissions of read, write, and execute, respectively. Execute here means that we can run the file as a script. In the example above, the no execute permission is set (it is not a script file), the user has read and write permission, and group and everyone have only read permissions. So, only the user can write to this file, but everyone can read it. + +These fields `rwx` can be viewed as 3 bits which we can interpret as a [base-8 (octal) number](https://en.wikipedia.org/wiki/Octal) (i.e. between 0 and 7) where: + + --- -> 0 + --x -> 1 + -w- -> 2 + -wx -> 3 + r-- -> 4 + r-x -> 5 + rw- -> 6 + rwx -> 7 + +Following that, we interpret the field `rw-r--r--` from above as `644`. The most common file permissions you will likely see or need are: + + 644 -> rw-r--r-- + 755 -> rwxr-xr-x + + + +### `chmod` + +We can change file permissions with the command `chmod`. For example: + + +```bash +%%bash +cd ~/geog0111 +ls -lh README.md +chmod 755 README.md +ls -lh README.md +chmod 644 README.md +``` + + -rw-r--r-- 1 plewis staff 321B 6 Sep 22:34 README.md + -rwxr-xr-x 1 plewis staff 321B 6 Sep 22:34 README.md + + +First the permissions of the file are 644 as we saw above, then we use `chmod 755` to change to 755, then back again to 644. Most commonly, we will use this later ion to apply execute permission to a file: + + chmod 755 filename + + +### absolute and relative pathnames + +A posix directory name that **starts with** the file separator '/' is called an **absolute** pathname: it is addressed from the root of the file system (`/`). An example of an absolute filename is `/home/jovyan/geog0111/README.md`. If the filename starts with `~`, it is in effect an absolute pathname. For example: + + + +```bash +%%bash +ls -l ~/geog0111/README.md +``` + + -rw-r--r-- 1 plewis staff 321 6 Sep 22:34 /Users/plewis/geog0111/README.md + + + +A *relative pathname* is one that does not start with `/` or `~`. It is specified relative to where we are in the filesystem in the current shell. For example: + + +```bash +%%bash +cd ~ +ls -l geog0111/README.md +``` + + -rw-r--r-- 1 plewis staff 321 6 Sep 22:34 geog0111/README.md + + +Recall that we use `..` to specify 'up one level'. Then: + + +```bash +%%bash +# cd to absolute path ~/geog0111/images +cd ~/geog0111/images +pwd + +# now relatuve cd up one and down to bin +cd ../bin +pwd + +# now relative cd up one level +cd .. +pwd +``` + + /Users/plewis/geog0111/images + /Users/plewis/geog0111/bin + /Users/plewis/geog0111 + + +### Create and delete a file `touch` `cat` `rm` + +If a file doesn't exist, we can create a zero-sized file using `touch`: + + +```bash +%%bash +touch work/newfile.dat work/newerfile.dat +ls -l work/n* +``` + + -rw-r--r-- 1 plewis staff 0 7 Sep 15:11 work/newerfile.dat + -rw-r--r-- 1 plewis staff 0 7 Sep 15:11 work/newfile.dat + + +If it already exists, `touch` just updates the access time. + +We can use the command `cat` to create a text files, for example: + + +```bash +%%bash + +# code between the next line and the +# End Of File (EOF) marker will be saved in +# to the file work/newfile.dat +# the symbols << and > involve +# redirection +cat << EOF > work/newererfile.dat + +# this will go into the file +hello world - this is some text in a file + +EOF + +# ls -l to see what we have: 73 Bytes here +ls -lh work/n* +``` + + -rw-r--r-- 1 plewis staff 73B 7 Sep 15:11 work/newererfile.dat + -rw-r--r-- 1 plewis staff 0B 7 Sep 15:11 work/newerfile.dat + -rw-r--r-- 1 plewis staff 0B 7 Sep 15:11 work/newfile.dat + + +We can also use `cat` to see what is in a file: + + +```bash +%%bash +cat work/newererfile.dat +``` + + + # this will go into the file + hello world - this is some text in a file + + + +We can use the command `rm` to delete a file: + + +```bash +%%bash +rm work/newfile.dat +ls -lh work/n* + +# now tidy up +rm work/n* +``` + + -rw-r--r-- 1 plewis staff 73B 7 Sep 15:11 work/newererfile.dat + -rw-r--r-- 1 plewis staff 0B 7 Sep 15:11 work/newerfile.dat + + +### Creating from JupyterLab + +If you are using this notebook in JupyterLab, go to the launcher tab and you should see various tools that you can launch: + +![JupyterLab tools](images/jl.png) + +Among these you will see 'text file'. Launch a text file, write your Python code into the file, and save it (`File -> Save As`) to the Python file name you want in your `work directory` (e.g. `work/test.py`). + +Alternatively, use the menu item `File -> New -> Text File` to open a new text file. + +### Exercise + +* Create a file `work/newfile.dat` using touch and check the new file size. +* Create a file `work/newfile.dat` using cat and check the new file size. +* Use the menu item `File -> Open` to edit the file you have created and print the new file size +* Use `cat` to show the new file content +* delete the file + +## Unix + +### Exercise + +Using the `unix` commands and ideas from above: + +* show a listing of the files in the relative directory `geog0111` that start with the letter `f` +* interpret the file permissions and sizes of the files in there + +## Summary + +In this section, we have learned the following `unix` commands and symbols: + +| cmd | meaning | example use | +|---|---|--| +| `~` | twiddle / tilde - home | `cd ~/geog0111` | +| `.` | dot - current directory | `cd .` | +| `*` | wildcard | `ls R*` | +| `cd` | change directory | `cd ~/geog0111` | +| `pwd` | print working directory | `pwd` | +| `ls` | list | `ls README.md` | +| `ls -l` | long list | `ls -l README.md` | +| `ls -lh` | human-readable long list |`ls -lh README.md` | +| `chmod` | change mode (permissions) | `chmod 644 README.md` | +| `touch` | create zero-size if it doesn't exists else update access time | `touch README.md`| +| `rm` | remove | `rm work/n*` | +| `755` | `rwxr-xr-x` | `chmod 755 bin/*` | +| `644` | `rw-r--r--` | `chmod 644 README.md` | + +We have seen that we can use the cell magic `%%bash` or `!` to use `unix` commands in Python code cells in a notebook. This is a very basic introduction to unix, but it will allow you to make better use of the operating system and these notebooks. diff --git a/docs/002_Unix_answers 2.md b/docs/002_Unix_answers 2.md new file mode 100644 index 00000000..b128dbde --- /dev/null +++ b/docs/002_Unix_answers 2.md @@ -0,0 +1,96 @@ +# 002 Some basic UNIX : Answers to exercises + +### Exercise + +* Create a file `work/newfile.dat` using touch and check the new file size. +* Create a file `work/newfile.dat` using cat and check the new file size. +* Use the menu item `File -> Open` to edit the file you have created and print the new file size +* Use `cat` to show the new file content +* delete the file + + +```bash +%%bash +# ANSWER +# Create a file `work/newfile.dat` using touch and check the new file size. +touch work/newfile.dat +ls -l work/newfile.dat +``` + + -rw-r--r-- 1 plewis staff 73 7 Sep 15:08 work/newfile.dat + + + +```bash +%%bash +# ANSWER +# Create a file `work/newfile.dat` using cat and check the new file size. +cat << EOF > work/newfile.dat + +# this will go into the file +hello world - this is some text in a file + +EOF +ls -l work/newfile.dat +``` + + -rw-r--r-- 1 plewis staff 73 7 Sep 15:08 work/newfile.dat + + + +```bash +%%bash +# ANSWER +# Use the menu item File -> Open to edit the +# file you have created and print the new file size +# --> do interactively <-- +ls -l work/newfile.dat + +# Use cat to show the new file content +cat work/newfile.dat +``` + + -rw-r--r-- 1 plewis staff 73 7 Sep 15:08 work/newfile.dat + + # this will go into the file + hello world - this is some text in a file + + + + +```bash +%%bash +# ANSWER +# delete the file +rm work/newfile.dat +``` + +## Unix + +### Exercise + +Using the `unix` commands and ideas from above: + +* show a listing of the files in the relative directory `geog0111` that start with the letter `f` +* interpret the file permissions and sizes of the files in there + + +```bash +%%bash +# ANSWER +# show a listing of the files in the relative +# directory geog0111 that start with the letter f +# so +# geog0111/f* +ls -lh geog0111/f* + +# interpret the file permissions and sizes of the files in there +# the file sizes are 2.2KB, 4.3KB and 1.9KB respectively +# the file permissions are all 644, so, read and write for the user, +# and only read for others +``` + + -rw-r--r-- 1 plewis staff 2.2K 6 Sep 22:34 geog0111/fire_practical_model.py + -rw-r--r-- 1 plewis staff 4.3K 6 Sep 22:34 geog0111/fire_practical_satellite.py + -rw-r--r-- 1 plewis staff 1.9K 6 Sep 22:34 geog0111/fire_practical_telecon.py + diff --git a/docs/003_Local_Install 2.md b/docs/003_Local_Install 2.md new file mode 100644 index 00000000..66c138bc --- /dev/null +++ b/docs/003_Local_Install 2.md @@ -0,0 +1,87 @@ +# 003 Full native installation of software and notes + + +## Introduction + +### Purpose of these notes + +These notes describe how to set up a full installation of the software for this course on your computer. You will need access to the internet, both to download software, and to a more limited extent during the course, to download data. We can call this installation a 'full native' installation, as we will be installing all of the software you need to run the course directly. Other options for running include: + +* remote running on UCL notebooks +* local installation via Docker + +These notes cover installations on: + +* Windows 10 +* Mac OS X +* Linux + +### Sufficient free disk space + +You will need to have sufficient free disk space on your computer if you want to run this course locally. We suggest that yopu should have at least **128 GB** of free space to run this conmfortably. If you do not, you will need to free up space on your computer. + + +## Windows 10 + +### Windows 10 Student Edition 1909 or higher + +To do a full installation of these notes on your Windows 10 computer, you need to be able to install software that doesn't come from the microsoft store. To do this, you should first check what version of windows you are running. + +At the prompt at the bottom left of your screen (`Type here to search`), type `About your PC`. This will bring up the settings window, in the `About` section. + +Scroll down to the section `Windows specifications` and note the `Edition` field. If this says `Home edition`, then you will need to upgrade your version of Windows 10. Note also what is in the `Edition` field. It may say `Pro` or `Enterprise`. These is probably ok, but if it says anything else we reccommend you use `Windows 10 Education` edition, verions `1909` or later (e.g. `2004`). Even if you use `Pro` or `Enterprise`, you should update your system to edition to at least `1909`. + +To update, navigate in a browser to [UCL software database](http://swdb.ucl.ac.uk/?filter=windows%2010), entering your UCL login and password as prompted. Look under the `Downloads` tab, for `Windows 10 1909` or `Windows 10 2004`. You will most likely want to to use 64-bit, rather than 32-bit versions. Follow the instructions to upgrade to the Student Edition. This will probably only involvce installing a new software key, but you should consider backing up any important files from your system before doing this. + +### Required Software + +You will need to install the following software: + +* [Anaconda Python](https://docs.anaconda.com/anaconda/install/windows/) +* [GitHub Desktop](https://desktop.github.com/) + +Follow the links above to install these before going any further. If you hit a problem, follow any troubleshooting information provided, or uninstall, then try to re-install again. Follow recommendations about any options. Set the local GitHub directory to be `Documents\GitHub`. **Do not proceed without having these properly installed.** + +### Install the notes + +Launch GitHub Desktop. If you have a [github account](https://github.com/join?source=login), use `File -> Options` to sign in to your account. + +To install these notes, use the menu `File -> Clone repository`. Enter `UCL-EO/geog0111` as the github repository to clone, and select `Clone`. + +### Install the environment + +Type `Anaconda Powershell Prompt` in the `Type here to search` box at the lower left of youyr screen, and open the app. You should *not* need to do this as Adminstrator, so *do not* select that option if you see it. + +This will bring up a shell terminal. In the terminal, assuming you puth the `geog0111` repository in `.\Documents\GitHub\geog0111`, type: + + cd .\Documents\GitHub\geog0111 + +and hit `` to run the command. This will change the directory (folder) youy are in in the terminal to where we have installed the course notes. + +![cd](images/windows-cd.png) + +Next, we run a script to install all of the libraries we need: + + bash -i .\bin\set_up_system.sh + +and hit `` to run the command. + +![bash](images/windows-setup.png) + +This may take some time to run (10s of minutes) as it needs to examine your current setup and download libraries. You should see activity in the window, such as: + +![running](images/windows-setup-running.png) + +and + +![running 2](images/windows-setup-running-2.png) + +but eventually it should + + + + + +```python + +``` diff --git a/docs/005_Help 2.md b/docs/005_Help 2.md new file mode 100644 index 00000000..27d1fa70 --- /dev/null +++ b/docs/005_Help 2.md @@ -0,0 +1,508 @@ +# 005 Help + +## Introduction + +### Purpose + +In this notebook, we will learn how to get useful information on python commands using `help()` and associated methods. + +We will use `help(list)` as an example to learn about the class `list` from the help material. You are not expected to learn everything about `lists` here, as we will return to it later in the course. But you should find this useful in learning how to learn. + +We will learn how completion can help us understand our options. + +We will learn how to access some on-line resources. + +### Prerequisites + +You will need some understanding of the following: + +* [001 Using Notebooks](001_Notebook_use.md) + + +We will use some technical vocabulary that you should familiarize yourself: + +* [functions and method](https://www.tutorialspoint.com/difference-between-method-and-function-in-python) +* [list](https://www.w3schools.com/python/python_lists.asp) +* [class](https://docs.python.org/3/tutorial/classes.html) +* [in place](https://www.geeksforgeeks.org/inplace-vs-standard-operators-python/) +* [Completion](#Completion) + +### Timing + +The session should take around 30 minutes. + +## Help Method + +### help() + +You can get help on an object using the `help()` method. This will return a full manual page of the class documentation. You need to gain some experience in reading these and understanding some of the terminology. + + + +```python +#the method help() +help(list) +``` + + Help on class list in module builtins: + + class list(object) + | list(iterable=(), /) + | + | Built-in mutable sequence. + | + | If no argument is given, the constructor creates a new empty list. + | The argument must be an iterable if specified. + | + | Methods defined here: + | + | __add__(self, value, /) + | Return self+value. + | + | __contains__(self, key, /) + | Return key in self. + | + | __delitem__(self, key, /) + | Delete self[key]. + | + | __eq__(self, value, /) + | Return self==value. + | + | __ge__(self, value, /) + | Return self>=value. + | + | __getattribute__(self, name, /) + | Return getattr(self, name). + | + | __getitem__(...) + | x.__getitem__(y) <==> x[y] + | + | __gt__(self, value, /) + | Return self>value. + | + | __iadd__(self, value, /) + | Implement self+=value. + | + | __imul__(self, value, /) + | Implement self*=value. + | + | __init__(self, /, *args, **kwargs) + | Initialize self. See help(type(self)) for accurate signature. + | + | __iter__(self, /) + | Implement iter(self). + | + | __le__(self, value, /) + | Return self<=value. + | + | __len__(self, /) + | Return len(self). + | + | __lt__(self, value, /) + | Return self list of strings + + If called without an argument, return the names in the current scope. + Else, return an alphabetized list of names comprising (some of) the attributes + of the given object, and of attributes reachable from it. + If the object supplies a method named __dir__, it will be used; otherwise + the default dir() logic is used and returns: + for a module object: the module's attributes. + for a class object: its attributes, and recursively the attributes + of its bases. + for any other object: its attributes, its class's attributes, and + recursively the attributes of its class's base classes. + + + +If we use `dir()` with no arguments, we will see a list of the names of variables defined in this notebook: + + +```python +print(dir()) +``` + + ['In', 'Out', '_', '__', '___', '__builtin__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', '_dh', '_i', '_i1', '_i10', '_i2', '_i3', '_i4', '_i5', '_i6', '_i7', '_i8', '_i9', '_ih', '_ii', '_iii', '_oh', 'alist', 'blist', 'clist', 'exit', 'get_ipython', 'info', 'quit'] + + +You might notice that we see the variable names `alist` and `blist` in here, as we created them in an earlier cell. + +### `locals()` + +The function `locals()` is similar to `dir()` used as above, but it contains the *values* that the variables are set to. So, we recall that there is a variable with the name `alist` from above. + + + +```python +help(locals) +``` + + Help on built-in function locals in module builtins: + + locals() + Return a dictionary containing the current scope's local variables. + + NOTE: Whether or not updates to this dictionary will affect name lookups in + the local scope and vice-versa is *implementation dependent* and not + covered by any backwards compatibility guarantees. + + + +We could of course just print this as above via: + + +```python +print(alist) +``` + + [-1, 0, 2, 1, 7, -3, 5, 4] + + +but sometimes, we want to access the variable through its name `alias`. In this case, we can type: + + +```python +print(locals()['alist']) +``` + + [-1, 0, 2, 1, 7, -3, 5, 4] + + +Use of `dir()` and `locals()` in this way can be very useful if you want to see what variables have been set to. + +#### Exercise + +* Create a code cell below and assign the variable `my_var` the value of `10` (hint: `my_var = 10`) +* Run `dir()` again and confirm that `my_var` now appears in the list + + +#### Exercise + +* Print the value of `my_var` using `print(my_var)` +* Print the value of `my_var` using `print(locals()['my_var'])` +* confirm that they give the same answer + +## Completion + + +### `dir(list)` + +If we type `dir(list)` you will see that it gives the list of methods we can use for the class `list`. You should notice that this is the same as the list of methods we saw when we used `help(list)`: + + +```python +print(dir(list)) +``` + + ['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort'] + + +### Completion + +Another useful thing is to see a list of potential methods in a class or e.g. to know what variables already declared start with `f`. + +Modern editors, [IDEs](https://en.wikipedia.org/wiki/Integrated_development_environment) and related tools often have features for completion of filenames, variables etc. This can help you minimise typing errors (especially for variables), and also help you to keep track of filenames, functions etc. + +The Jupyter notebooks we are using have this sort of completion feature. Exactly how it works depends on the browser and the server used, but it will usually involve using the `` key (either once or twice), and/or spacebar and/or hovering over the text. + +The information this gives is similar to what youy get using `dir` as above. + +Try this out now, broadly following the guidelines below. Note down what works for you and get used to using it. + +If you are using a jupyter hub server (or similar) + + place the cursor after the `.` below + hit the key, rather than in this cell + Dont run this cell + + +```python +# Dont run this cell +list. +``` + + + File "", line 2 + list. + ^ + SyntaxError: invalid syntax + + + +Really, this is just using the fact that `` key performs variable name completion. + +This means that if you e.g. have variables called `the_long_one` and `the_long_two` set: + + +```python +the_long_one = 1 +the_long_two = 2 +``` + +The next time you want to refer to this string in code, you need only type as many letters needed to distinguish this from other variable names, then hit `` to complete the name as far as possible. + +#### Exercise + +* in the cell below, place the cursor after the letter t and hit ``. It should show you a list of things that begin with `t`. +* Use this to write the line of code `the_long_one = 1000` +* in the cell below, place the cursor after the letters `th` and hit ``. It should show you a list of things that begin with `th`. In this case it should just give you the options of `the_long_one` or `the_long_two`. +* If you hit `` again, the variable name will be completed as far as it can, here, up to `the_long_`. Use this to write the line of code `the_long_two = 2000` + + +```python +# do exercise here ... put the cursor after the t or th and +# use for completion. Dont run this cell +t +th +``` + + + --------------------------------------------------------------------------- + + NameError Traceback (most recent call last) + + in + 1 # do exercise here ... put the cursor after the t or th and + 2 # use for completion. Dont run this cell + ----> 3 t + 4 th + + + NameError: name 't' is not defined + + +### Learning new things + +Let's use that knowledge to learn something new: + +* Use online material from [https://www.w3schools.com](https://www.w3schools.com/python) or elsewhere to learn the basics of `for` loops. + +#### Exercise + +* Find help for the class `range` to understand how to use this to generate a sequence of integers from 10 to 1 in steps of -1 +* Use what you have learned to write a `for` loop below that counts backwards from 10 to 0 + +## Summary + +In this session, we have learned some different ways to access help on the operation and options for python commands. These include: `help()`, use of `?`, completion, and using online help. Practically, you may use one or more of these methods to find out how something works, or get some examples. + +You might notice that there are many online forums you can post to to get advice on coding, and we mention [https://stackoverflow.com](https://stackoverflow.com) as an example. Remember that not all posts are equally useful: pay attention to comments from other users on any answer, as well as post votes. Do not look on `stackoverflow` until you have exhausted simpler help methods. You should *not* generally be posting on these in this course. You will find answers to all that you need in these notes or on existing online pages. You most certainly must *not* post on forums asking questions about anyt exercises you need to complete or work you need to submit. The course administrators may monitor this + diff --git a/docs/005_Help_answers 2.md b/docs/005_Help_answers 2.md new file mode 100644 index 00000000..cc421f78 --- /dev/null +++ b/docs/005_Help_answers 2.md @@ -0,0 +1,223 @@ +# 005 Help : Answers to exercises + +#### Exercise + +* Read through the help information for list, above. +* In a new cell, create lists called `alist` and `blist`: + + alist = ['one','three','two'] + blist = ['four','six','zero'] + +* print the lists with: + + print(alist,blist) + +Using the help information, work out how to: + +* extend `alist` with `blist` to create `['one','three','two','four','six','zero']`. N.B. This is not quite the same as our use of `append()` above. +* sort the new `alist` into **alpabetical order** and print the results + + + +```python +# ANSWERS +alist = ['one', 'three', 'two'] +blist = ['four', 'six', 'zero'] +print(alist, blist) + +# join: +# extend(self, iterable, /) +# Extend list by appending elements from the iterable. +alist.extend(blist) +print("extended", alist) + +# sort +# sort(self, /, *, key=None, reverse=False) +# Stable sort *IN PLACE*. +alist.sort() +print("sorted", alist) +``` + + ['one', 'three', 'two'] ['four', 'six', 'zero'] + extended ['one', 'three', 'two', 'four', 'six', 'zero'] + sorted ['four', 'one', 'six', 'three', 'two', 'zero'] + + +#### Exercise + +* Create a code cell below +* Create a variable called `alist` containing a list of numbers (not in order) +* Print `alist` +* Create a variable called `blist` containing another list of numbers (not in order) +* Print `blist` +* Join the list `blist` to `alist` using the in-place method `.extend()` +* Print `alist` +* reverse `alist` +* print alist +* What advantages and disadvantages do you think in-place methods have? + + +```python +# ANSWERS +# Create a variable called `alist` containing a list of numbers (not in order) +alist = [4,5,-3,7] +# Print `alist` +print(alist) + +# Create a variable called `blist` containing another list of numbers (not in order) +blist = [1,2,0,-1] +# Print `blist` +print(blist) + +# Join the list `blist` to `alist` using the in-place method `.extend()` +alist.extend(blist) +# Print `alist` +print(alist) + +# reverse `alist` +alist.reverse() +# print alist +print(alist) + +info=''' +What advantages and disadvantages do you think in-place methods have? + + Using the in-place methods like sort and reverse are memory efficient: + we do not need to create new variables with the result of the sorting + or reverseing etc. + The downside is that you need to be careful when using them: + Don't make the mistake of using the returned value as this + will be 0 if the operation was successful. + As an example, we might think that + + alist.sort().reverse() + + would work, but it wont, because alist.sort() returns 0 + then the second operation we attempt is 0.reverse() which + is meaningless and will fail. Rather, we ust do: + + alist.sort() + alist.reverse() + + as separate operations + ''' +print(info) +``` + + [4, 5, -3, 7] + [1, 2, 0, -1] + [4, 5, -3, 7, 1, 2, 0, -1] + [-1, 0, 2, 1, 7, -3, 5, 4] + + What advantages and disadvantages do you think in-place methods have? + + Using the in-place methods like sort and reverse are memory efficient: + we do not need to create new variables with the result of the sorting + or reverseing etc. + The downside is that you need to be careful when using them: + Don't make the mistake of using the returned value as this + will be 0 if the operation was successful. + As an example, we might think that + + alist.sort().reverse() + + would work, but it wont, because alist.sort() returns 0 + then the second operation we attempt is 0.reverse() which + is meaningless and will fail. Rather, we ust do: + + alist.sort() + alist.reverse() + + as separate operations + + + +#### Exercise + +* Create a code cell below and assign the variable `my_var` the value of `10` (hint: `my_var = 10`) +* Run `dir()` again and confirm that `my_var` now appears in the list + + + +```python +# ANSWER +# Create a code cell below and assign the variable my_var the value of 10 (hint: my_var = 10) +my_var = 10 + +# Run dir() again +print(dir()) + +# confirm that my_var now appears in the list +msg = ''' +I can see my_var in the list printed out +''' +print(msg) +``` + + ['In', 'Out', '_', '__', '___', '__builtin__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', '_dh', '_i', '_i1', '_i10', '_i11', '_i12', '_i13', '_i14', '_i2', '_i3', '_i4', '_i5', '_i6', '_i7', '_i8', '_i9', '_ih', '_ii', '_iii', '_oh', 'alist', 'blist', 'clist', 'exit', 'get_ipython', 'info', 'my_var', 'quit'] + + I can see my_var in the list printed out + + + +#### Exercise + +* Print the value of `my_var` using `print(my_var)` +* Print the value of `my_var` using `print(locals()['my_var'])` +* confirm that they give the same answer + + +```python +# ANSWER +# Print the value of my_var using print(my_var) +print(my_var) + +# Print the value of my_var using print(locals()['my_var']) +print(locals()['my_var']) + +# confirm that they give the same answer +msg = ''' +I can see they are the same +''' +print(msg) +``` + + 10 + 10 + + I can see they are the same + + + +### Learning new things + +Let's use that knowledge to learn something new: + +* Use online material from [https://www.w3schools.com](https://www.w3schools.com/python) or elsewhere to learn the basics of `for` loops. + +#### Exercise + +* Find help for the class `range` to understand how to use this to generate a sequence of integers from 10 to 1 in steps of -1 +* Use what you have learned to write a `for` loop below that counts backwards from 10 to 0 + + +```python +# ANSWER +# Use what you have learned to write a +# `for` loop below that counts backwards from 10 to 0 + +for i in range(10,0,-1): + print(i) +``` + + 10 + 9 + 8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 + diff --git a/docs/010_Python_Introduction 2.md b/docs/010_Python_Introduction 2.md new file mode 100644 index 00000000..ab21c7c4 --- /dev/null +++ b/docs/010_Python_Introduction 2.md @@ -0,0 +1,229 @@ +# 010 Introduction to Python + +## Introduction + +[Python](http://www.python.org/) is a high level programming language that is freely available, relatively easy to learn and portable across different computing systems. In Python, you can rapidly develop solutions for the sorts of problems you might need to solve in your MSc courses and in the world beyond. Code written in Python is also easy to maintain, is (or should be) self-documented, and can easily be linked to code written in other languages. + +Relevant features include: + +- it is automatically compiled and executed +- code is portable provided you have the appropriate Python modules. +- for compute intensive tasks, you can easily make calls to methods written in (faster) lower-level languages such as C or FORTRAN +- there is an active user and development community, which means that new capabilities appear over time and there are many existing extensions and enhancements easily available to you. + +For further background on Python, look over the material on [Advanced Scientific Programming in Python](https://python.g-node.org/wiki/schedule) and/or the [software-carpentry.org](http://software-carpentry.org/v3/py01.html) and [python.org](http://www.python.org/) web sites. + + +### Purpose + +In this section we will learn some of the fundamental concepts in Python concerning variables, as well as writing comments and the use of the function `print()` and newline and tab characters. + +### Prerequisites + +You will need some understanding of the following: + +* [001 Using Notebooks](001_Notebook_use.md) +* [005 Getting help](005_Help.md) + +Remember that you can 'run' the code in a code block using the 'run' widget (above) or hitting the keys ('typing') and at the same time. + +### Timing + +The session should take around 15 minutes. + +## Some basics + +### Comments + +Comments are statements ignored by the language interpreter. + +Any text after a `#` in a *code block* is a comment. + + +#### Exercise +* Try running the code block below +* Explain what happened ('what the computer did') + +### `print()` + + + +```python +help(print) +``` + + Help on built-in function print in module builtins: + + print(...) + print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False) + + Prints the values to a stream, or to sys.stdout by default. + Optional keyword arguments: + file: a file-like object (stream); defaults to the current sys.stdout. + sep: string inserted between values, default a space. + end: string appended after the last value, default a newline. + flush: whether to forcibly flush the stream. + + + +To print some value (by default, to the terminal you are using, knows as the standard output `stdout`), use the `print(...)` function. + + + + +```python +# For example, to print the string 'hello world': +print('hello world') + +# to print the list ('hello','world'): +print('hello', 'world') +``` + + hello world + hello world + + +#### Exercise + +* Insert a new cell below here +* Print out the string `Today I am learning Python`. + +### newline and tab + +We can gain more control over our printing by understanding some special characters we use in print formatting: + + newline \n + tab \t + +When we specify these characters in a print statement, they have the impact of starting text on the following time, and aligning text the next tab location respectively. These are concepts you will be familiar with from word processing, although you may not have thought about them explicitly. + +Any time we place these characters in a string that we print out, they will affect the formatting out our printed statement: + + +```python +# For example, to print the string 'hello world' +# with a simple space +print('hello world') + +# with a newline in the middle +print('hello\nworld') + +# with a tab in the middle +print('hello\tworld') +``` + + hello world + hello + world + hello world + + +#### Exercise + +* Insert a new cell below here +* print a string `"all the world's a stage and all the men and women merely players"` +* print this same string, but with each word on a new line +* print this same string with two columns of words, for as many lines as needed + +## Variables and Values + +### Variables and values + +The idea of **variables** is fundamental to any programming. + +You can think of this as the *name* of *something*, so it is a way of allowing us to refer to some object in the language. A related idea we will find useful is to think of the variable name as a **key**. What the variable *is* set to is called its **value**. + +Putting these ideas together, we can think of the variable name and its value as a `key: value` pair: + + key: value + +We can say that the `value` is **assigned to** the `key`. + +**Remember: the `key` is the name of the variable, the `value` is what is stored in the variable.** + +So let's start with a variable we will call (*declare to be*) `my_store`. + +We will give a *value* of the string `'one'` to this variable: + + +```python +# assign the value 'one' to the variable (key) my_store +my_store = 'one' + +# Print the value of my_store +print(my_store) +``` + + one + + +#### Exercise + +* Insert a new cell below here +* set a variable called `message` to contain the string `hello world` +* print the value of the variable `message` + +### Symbol names and conventions + +Symbol names, such as those used for variables, in Python can contain the usual character set `a-z`, `A-Z`, `0-9`, as well as `_`. + +Symbol names cannot start with a number. + +Normal variables start with a letter. Those starting and ending with double underscore `_` have [special meaning](https://docs.python.org/3.8/reference/lexical_analysis.html#reserved-classes-of-identifiers) and should only be used in those special contexts (e.g. `__doc__`, or `__main__`). + +The convention is that variables start with a lower case character and [classes](https://docs.python.org/3/tutorial/classes.html) start with capitals. e.g.: + + my_var = 10 + + class ClassName: + + . + . + . + + +### Invalid names + +The following are *not* valid in names and will result in an error: + + * characters liable to intepretation (`SyntaxError: invalid syntax`), including (comma! `,`) and: + + +, -, *, **, =, $, ! etc. + + * extended characters (`SyntaxError: invalid character in identifier`) such as emojis + + 😀, 我在这里 etc. + +[Reserved keywords](https://docs.python.org/3.8/reference/lexical_analysis.html#keywords) cannot be used as variable names: + + False class finally is return + None continue for lambda try + True def from nonlocal while + and del global not with + as elif if or yield + assert else import pass + break except in raise + +as these have particular meanings in Python syntax. + +All of these can obviously be used as `values` in strings, just not as `key` names. + +See [https://docs.python.org/3.4/reference/lexical_analysis.html](https://docs.python.org/3.4/reference/lexical_analysis.html) for further details. + +#### Exercise + +* Make a code cell below +* declare the variable `dash='\n----------'` and print it +* declare your own variables to contain the following values, trying to use a range of allowed names + + 1, 2, 'one', 'hello world', '1\n2\n3\t 4 5 6\n😀, 我在这里' + +* print the variables to see if they contain what you expect, followed in each instance by `dash` (to space the answers out) + +## Summary + +In this section, you have had an introduction to the Python programming language, running in a [`jupyter notebook`](http://jupyter.org) environment. + +You have seen how to write comments in code, how to form `print` statements, `\n` and `\t` and basic concepts of variables and values. + +We have outlined some rules and conventions for symbol names. diff --git a/docs/010_Python_Introduction_answers 2.md b/docs/010_Python_Introduction_answers 2.md new file mode 100644 index 00000000..b167ed38 --- /dev/null +++ b/docs/010_Python_Introduction_answers 2.md @@ -0,0 +1,152 @@ +# 010 Introduction to Python : Answers to exercises + + +#### Exercise +* Try running the code block below +* Explain what happened ('what the computer did') + +**ANSWER** + +Nothing 'apparently' happened, but really, the code block was interpreted as a set of Python commands and executed. As there is only a comment, there was no output. + +#### Exercise + +* Insert a new cell below here +* Print out the string `Today I am learning Python`. + + +```python +# ANSWER +print('Today I am learning Python') +``` + + Today I am learning Python + + +#### Exercise + +* Insert a new cell below here +* print a string `"all the world's a stage and all the men and women merely players"` +* print this same string, but with each word on a new line +* print this same string with two columns of words, for as many lines as needed + + +```python +#ANSWER + +# print a string "all the world's a stage" +print("all the world's a stage and all the men and women merely players") +``` + + all the world's a stage and all the men and women merely players + + + +```python +#ANSWER + +# print this same string, but with each word on a new line +print("all\nthe\nworld's\na\nstage\nand\nall\nthe\nmen\nand\nwomen\nmerely\nplayers") +``` + + all + the + world's + a + stage + and + all + the + men + and + women + merely + players + + + +```python +# ANSWER +# print this same string with two columns of words, for as many lines as needed +# This needs alternating newline and tab +print("all\tthe\nworld's\ta\nstage\tand\nall\tthe\nmen\tand\nwomen\tmerely\nplayers") +``` + + all the + world's a + stage and + all the + men and + women merely + players + + +#### Exercise + +* Insert a new cell below here +* set a variable called `message` to contain the string `hello world` +* print the value of the variable `message` + + +```python +# ANSWER + +message = 'hello world' +print(message) +``` + + hello world + + +#### Exercise + +* Make a code cell below +* declare the variable `dash='\n----------'` and print it +* declare your own variables to contain the following values, trying to use a range of allowed names + + 1, 2, 'one', 'hello world', '1\n2\n3\t 4 5 6\n😀, 我在这里' + +* print the variables to see if they contain what you expect, followed in each instance by `dash` (to space the answers out) + + +```python +# ANSWER + +# Make a code cell below +# declare the variable dash='\n----------' and print it +dash='\n----------' +print(dash) + +# declare your own variables to contain the following values, trying to use a range of allowed names +# 1, 2, 'one', 'hello world', '1\n2\n3\t 4 5 6\n😀, 我在这里' + +avar = 1 +bvar = 2 +one = 'one' +Hello = 'hello world' +string_thing = '1\n2\n3\t 4 5 6\n😀, 我在这里' + +#print the variables to see if they contain what you expect, followed in each instance by dash (to space the answers out) +print(avar,dash) +print(bvar,dash) +print(one,dash) +print(Hello,dash) +print(string_thing,dash) +``` + + + ---------- + 1 + ---------- + 2 + ---------- + one + ---------- + hello world + ---------- + 1 + 2 + 3 4 5 6 + 😀, 我在这里 + ---------- + diff --git a/docs/011_Python_data_types 2.md b/docs/011_Python_data_types 2.md new file mode 100644 index 00000000..2b19e5d3 --- /dev/null +++ b/docs/011_Python_data_types 2.md @@ -0,0 +1,445 @@ +# 011 Python data types + +## Introduction + + +### Purpose + +In this section we will learn some of the fundamental data types in Python (`int`, `float`, `str`, `bool`), how to convert between data types, and use of the `type()` function. + + +### Prerequisites + +You will need some understanding of the following: + +* [001 Using Notebooks](001_Notebook_use.md) +* [005 Getting help](005_Help.md) +* [010 Variables, comments and print()](010_Python_Introduction.md) + +Remember that you can 'run' the code in a code block using the 'run' widget (above) or hitting the keys ('typing') and at the same time. + +### Timing + +The session should take around 30 minutes. + +## Data types + +### Data types: `str` + +Recall how we can print out a message by first storing the text in a variable: + + +```python +# set a variable called message to contain the string hello world +message = 'hello world' + +# print the value of the variable message +print(message) +``` + + hello world + + + +Above, we set the variable to be a string `str` type, because we wanted to use it to represent a string. + +In a string, each character is represented by an [ASCII](http://www.asciitable.com) codes. + +So the [string](https://en.wikibooks.org/wiki/Python_Programming/Text) `one` is built up of `o` + `n` + `e`, represented by the ASCII codes `111`, `110` and `101` respectively. + + +#### Exercise + +* If the ASCII code for `e` is `101` and the code for `n` is `110`, what is the code for `a`? + +### `len()` + +We can find the length of the string using the function `len()`, for example: + + +```python +# set a variable called message to contain the string hello world +message = 'hello world' + +# print the value of the variable message +print(message,len(message)) +``` + + hello world 11 + + +#### Exercise + +* in a code cell below, create a variable called `name` and set it to your name +* print the string name, and its length +* comment on why the length is the value you find + +### `type()` + +In a computing language, the *sort of thing* the variable can be set to is called its **data type**. In Python, we can access this with the function `type()`: + + + + +```python +# assign the value 'one' to the variable (key) my_store +my_store = 'one' + +# Print the value of my_store +print('this has type', type(my_store)) +``` + + this has type + + +#### Exercise + +* insert a new cell below here +* set a variable called message to contain the string `hello world` +* print the value and data type of the variable message + +### Data types: `float` + +Another fundamental data type is `float`, used to store decimal numbers such as `120.23`. + +Not surprisingly, we can use floating point numbers (and other number representations) to do arithmetic. We can use `print()` similarly to above to print an integer value. + +Sometimes, such as for very large or very small floating point values, we use [scientific notation](https://en.wikibooks.org/wiki/A-level_Computing/AQA/Paper_2/Fundamentals_of_data_representation/Floating_point_numbers), e.g. to represent Plank's constant: + +$$h = 6.62607015 \times 10^{−34} J \dot s $$ + +we would not want to have to write out over zero values after the decimal point. Instead, we use the **mantissa** $6.62607015$ and **exponent** $-34$ directly: + + h = 6.62607015e-34 + + +You will sometimes see float numbers represented in this way. It is of additional interest because it is related more closely to how floating point numbers are [stored on a computer](https://users.cs.fiu.edu/~downeyt/cop2400/float.htm#:~:text=Eight%20digits%20are%20used%20to,means%20negative%2C%200%20means%20positive.). + +As an example of floating point arithmetic, let us consider the energy associated with a photon of a given wavelength $\lambda$ (nm) using the [Planck-Einstein equation](https://web.archive.org/web/20160712123152/http://pveducation.org/pvcdrom/2-properties-sunlight/energy-photon): + +$$ E = \frac{hc}{\lambda}$$ + +with: + +* $h$ as above, $=6.62607015 \times 10^{−34} J \dot s$ +* $c$ the speed of light $= 2.99792458 \times 10^8 m/s$ +* $E$ the photon energy (in $J$) + +Given light with a wavelength of 1024 nanometers ($nm$), calculate the energy in $J$. + +First, since + +$$1\ nm = 1 \times 10^{−9} m$$ + +we calculate the wavelength in $m$: + + l_m = l_nm * 1e-9 + +Then, implement the Planck-Einstein equation: + + E = h * c / l_m + + + +```python +# values of c and h +# in scientific notation +c = 2.99792458e8 # m/s +h = 6.62607015e-34 # J s + +# wavelength in nm +l_nm = 1024.0 # nm + +# wavelength in m +l_m = l_nm * 1e-9 # m + +# Planck-Einstein in J +E_J = h * c / l_m # J + +print('Photon of wavelength', l_nm, 'nm') +print('has an energy of', E_J, 'J') +``` + + Photon of wavelength 1024.0 nm + has an energy of 1.9398885323720004e-19 J + + + We can compare the value of energy we get in $J$ with that using a [web calculator](http://www.calctool.org/CALC/other/converters/e_of_photon) and confirm the value of `1.93989e-19` for Near Infrared light (`1024` nm). + +#### Exercise + +Since the energy level expressed in $J$ is quite small, we might more conveniently express it in units of eV. Given that: + +$$ + 1\ Electron\ volt\ (eV) = 1.602176565 \times 10^{-19} J +$$ + +* Insert a new cell below here +* calculate the energy associated with a blue photon at 450 nm, in eV +* confirm your answer using a [web calculator](http://www.calctool.org/CALC/other/converters/e_of_photon) + +### Data types: `int` + +Another fundamental data type is `int`, used to store integer (whole) numbers (in base 10). We often use them for counting and similar tasks. + + +```python +i = 0 +print(i,'this has type', type(i)) + +# increment i by 1 +# same as i = i + 1 +i += 1 +print('increment i:',i) +i += 1 +print('increment i:',i) +``` + + 0 this has type + increment i: 1 + increment i: 2 + + +Not surprisingly, we can also use integers to do all sorts of arithmetic. Because of potential rounding issues though, we have to pay a little attention to whether we want the result of division to remain an integer or become a floating point number. + +We can use `print()` similarly to above to print an integer value. + + +```python +# set the variable x,m and c to integer +# values +x = 10 +m = 20 +c = 6 + +# calculate y from the formula +y = m * x + c + +# print the value of y +print('y =', y) +``` + + y = 206 + + +We have seen examples of addition `+` and multiplication `*`. We use `x ** y` to represent `x` to the power of `y`. For division, we use `//` to enforce integer division ([floor division](https://python-reference.readthedocs.io/en/latest/docs/operators/floor_division.html)). + + + +#### Exercise + +* insert a new cell below here +* using integer arithmetic, print the result of: + - 2 to the power of 8 + - 1024 divided by 2 +* set a variable called `x` to the result of 7 (floor) divided by 3. + - print the value of `x`, and confirm its data type is `int` + +### Data types: `bool` + +The last fundamental data type we will deal with here is the Boolean or 'logical' type `bool`. Here, a variable can represent the value of `True` (equivalent to `1`) or `False` (equivalent to `0`). + +There are a great many uses for this in using logic in coding. + + +```python +# examples of bool type +is_set = True +is_ready = False +``` + +#### Exercise + +* Insert a new cell below here +* Set a variable called `is_class_today` to the value `True` +* print the variable name, its value, and its data type + +### Logical Operators: `not`, `and`, `or` + +Logical operators combine boolean variables. Recall from above: + + +```python +print (type(True),type(False)); +``` + + + + +The three main logical operators you will use are: + + not, and, or + +The impact of the `not` operator should be straightforward to understand, though we can first write it in a 'truth table': + + + +| A | not A | +|:---:|:---:| +| T | F | +| F | T | + + +```python +print('not True is',not True) +print('not False is',not False) +``` + + not True is False + not False is True + + +#### Exercise + +* Insert a new cell below here +* write a statement to set a variable `x` to `True` and print the value of `x` and `not x` +* what does `not not x` give? Make sure you understand why + + +The operators `and` and `or` should also be quite straightforward to understand: they have the same meaning as in normal english. Note that `or` is 'inclusive' (so, read `A or B` as 'either A or B or both of them'). + + +```python +print('True and True is', True and True) +print('True and False is', True and False) +print('False and True is', False and True) +print('False and False is', False and False) +``` + + True and True is True + True and False is False + False and True is False + False and False is False + + +So, `A and B` is `True`, if and only if both `A` is `True` and `B` is `True`. Otherwise, it is `False` + +We can represent this in a 'truth table': + + +| A | B | A and B | +|:---:|:---:|:---:| +| T | T | T | +| T | F | F | +| F | T | F | +| F | F | F | + + + +#### Exercise + +* draw a truth table *on some paper*, label the columns `A`, `B` and `A and B` and fill in the columns `A` and `B` as above +* without looking at the example above, write the value of `A and B` in the third column. +* draw another truth table *on some paper*, label the columns `A`, `B` and `A and B` and fill in the columns `A` and `B` as above +* write the value of `A or B` in the third column. + +If you are unsure, test the response using code, below: + +We can apply these principles to more complex compound statements. In building a truth table, we must state all of the possible permutations for the variables. + +For two variables (`A` and `B`) we had: + + +| A | B | +|:---:|:---: +| T | T | +| T | F | +| F | T | +| F | F | ` + + + +Notice the pattern of alternating `T` and `F` in the columns. + +For three variables, the equivalent table is: + +| A | B | C | +|:---:|:---:|:---:| +| T| T | T | +| T| T | F | +| T| F | T | +| T | F | F | +| F| T | T | +| F| T | F | +| F| F | T | +| F| F | F | + + +Again, notice the alternating patterns in the columns so that we cover all permutations. + + + + +#### Exercise + +* Copy the 3 variable truth table from above onto paper +* fill out a column with `A and B` +* fill out a column with `((A and B) or C) ` +* Try some other compound statements + +If you are unsure, or to check your answers, test the response using code, below. + +## Conversion between data types + +You can explicitly convert between data types **where this makes sense** using: + + int() + float() + str() + bool() + + +```python +start_number = 1 +print("starting with",start_number) + +int_number = int(start_number) +print('int_number',int_number,type(int_number)) + +# now convert to float +float_number = float(start_number) +print('float_number',float_number,type(float_number)) + +# now convert to str +str_number = str(start_number) +print('str_number',str_number,type(str_number)) + +# now convert to bool +bool_number = bool(start_number) +print('bool_number',bool_number,type(bool_number)) + +``` + + starting with 1 + int_number 1 + float_number 1.0 + str_number 1 + bool_number True + + +#### Exercise + +* insert a new cell below here +* copy the code in the cell above, set `start_number` to `0`, and run + * What are the boolean representations of `0` and `1`? +* What would happen if you set `start_number` to the string `'zero'`, and why? + +## Summary + +In this section, we have been introduced to the core data types in Python: + + int + float + str + bool + +and how to convert between then, where this is feasible: + + int() + float() + str() + bool() + +We have also learned the `type()` function to return the data type, and `len()` to find the length of a string. + +We should also know how to do logical combinations of boolean variables, and visualise this with truth tables. diff --git a/docs/011_Python_data_types_answers 2.md b/docs/011_Python_data_types_answers 2.md new file mode 100644 index 00000000..edb2fd25 --- /dev/null +++ b/docs/011_Python_data_types_answers 2.md @@ -0,0 +1,359 @@ +# 011 Python data types : Answers to exercises + +#### Exercise + +* If the ASCII code for `e` is `101` and the code for `n` is `110`, what is the code for `a`? + +#### Exercise + +* in a code cell below, create a variable called `name` and set it to your name +* print the string name, and its length +* comment on why the length is the value you find + + +```python +# ANSWER + +# in a code cell below, create a variable called `name` and set it to your name +name = 'Charlie Walker' + +# print the string name, and its length +print(name,len(name)) + +# comment on why the length is the value you find +msg = ''' + The string 'Charlie' has 7 characters + the string 'Walker' has 6 characters + plus we have a space iun the middle + so, the length we would expect is 7 + 6 + 1 = 14 + + This is what is returned by len(name) +''' +print(msg) +``` + + Charlie Walker 14 + + The string 'Charlie' has 7 characters + the string 'Walker' has 6 characters + plus we have a space iun the middle + so, the length we would expect is 7 + 6 + 1 = 14 + + This is what is returned by len(name) + + + +#### Answer + +We could find this by examining ASCII code [tables](http://www.asciitable.com) and see that `a` (lower case a) has the code 97. + +Alternatively, we could search for help on this topic, and find that the python function `ord()` converts from string to ASCII code: + + +```python +# ANSWER +print("the ASCII code for 'a' is",ord('a')) +``` + + the ASCII code for 'a' is 97 + + +Alternatively, we might notice that `n` is the 14th letter of the alphabet, and `e`the 5th, so the code seems to be `97 + N` where `N` is the order the letter appears in the alphabet. We can confirm this with the 15th letter `o` which we see from above has the code `111`. + +#### Exercise + +* insert a new cell below here +* set a variable called message to contain the string `hello world` +* print the value and data type of the variable message + + +```python +# ANSWER + +# set a variable called message to contain the string hello world +message = 'hello world' + +# print the value and data type of the variable message +print(message,type(message)) +``` + + hello world + + +#### Exercise + +Since the energy level expressed in $J$ is quite small, we might more conveniently express it in units of eV. Given that: + +$$ + 1\ Electron\ volt\ (eV) = 1.602176565 \times 10^{-19} J +$$ + +* Insert a new cell below here +* calculate the energy associated with a blue photon at 450 nm, in eV +* confirm your answer using a [web calculator](http://www.calctool.org/CALC/other/converters/e_of_photon) + + +```python +# Answer +# Copy mostly from above: + +# values of c and h +c = 2.99792458e8 +h = 6.62607015e-34 + +print(h * c) +# wavelength in nm: BLUE +# calculate the energy associated with a blue photon at 450 nm, in eV +l_nm = 450.0 + +# wavelength in m +l_m = l_nm * 1e-9 + +# Planck-Einstein in J +E_J = h * c / l_m + +# conversion formula given above +E_eV = E_J / 1.602176565e-19 +print('Photon of wavelength', l_nm, 'nm') +print('has an energy of', E_eV, 'eV') +# which compares with 2.75520 eV given in the web calculator +``` + + 1.9864458571489286e-25 + Photon of wavelength 450.0 nm + has an energy of 2.7552045282834468 eV + + +#### Exercise + +* insert a new cell below here +* using integer arithmetic, print the result of: + - 2 to the power of 8 + - 1024 divided by 2 +* set a variable called `x` to the result of 7 (floor) divided by 3. + - print the value of `x`, and confirm its data type is `int` + + +```python +# +# answer +# using integer arithmetic, print the result of: + +# 2 to the power of 8 +print(2**8) + +# 1024 divided by 2 integer division (floor) +print(1024 // 2) + +# set a variable called x to the result of 7 divided by 3 +x = 7 // 3 +print('Integer: 7 divided by 3 is', x, type(x)) + +# We contrast this with the use of / +# which results in a variable of type float +x = 7 / 3 +print('7 divided by 3 is', x, type(x)) +``` + + 256 + 512 + Integer: 7 divided by 3 is 2 + 7 divided by 3 is 2.3333333333333335 + + +#### Exercise + +* Insert a new cell below here +* Set a variable called `is_class_today` to the value `True` +* print the variable name, its value, and its data type + + +```python +# ANSWER +# Insert a new cell below here +# Set a variable called `is_class_today` to the value `True` + +is_class_today = True + +# print the variable name, its value, and its data type + +print('is_class_today',is_class_today,type(is_class_today)) +``` + + is_class_today True + + +#### Exercise + +* Insert a new cell below here +* write a statement to set a variable `x` to `True` and print the value of `x` and `not x` +* what does `not not x` give? Make sure you understand why + + + +```python +# ANSWER + +# write a statement to set a variable x to True +x = True + +# and print the value of x and not x +print('x is',x) +print('not x is',not x) + +# what does not not x give? +print('not not x is',not not x) +msg = ''' +answer +------ +not not x is the same as just x + +A double negative cancels out, in effect +''' +print(msg) +``` + + x is True + not x is False + not not x is True + + answer + ------ + not not x is the same as just x + + A double negative cancels out, in effect + + + + +```python +# ANSWER +# do the testing here e.g. +print (True or False) +``` + + True + + +| blank | A and B | A or B | +|:---:|:---:|:---:| +| ![blank](images/tt1.png) | ![blank](images/ttand.png) | ![blank](images/ttor.png) | + + + + +```python +# ANSWER +# do the testing here e.g. +print ((True and False) or True) +``` + + True + + +| blank | A and B | ((A and B) or C) | +|:---:|:---:|:---:| +| ![blank](images/tt2.png) | ![blank](images/tt2and.png) | ![blank](images/tt2or.png) | + + + + +```python +# ANSWER + +# write a statement to set a variable x to True a +x = True + +# print the value of x +print('x is',x) + +# print the value of not x +print('not x is',not x) + +# what does not not x give? +print('not not x is',not not x) +# not not cancels out (double negative) +``` + + x is True + not x is False + not not x is True + + + +```python +# ANSWER + +# Set a variable called `is_class_today` to the value `True` +is_class_today = True + +# print the variable name, its value, and its data type +print('is_class_today',is_class_today,type(is_class_today)) +``` + + is_class_today True + + +#### Exercise + +* insert a new cell below here +* copy the code in the cell above, set `start_number` to `0`, and run + * What are the boolean representations of `0` and `1`? +* What would happen if you set `start_number` to the string `'zero'`, and why? + + +```python +# ANSWER + +# copy the code in the cell above, and set start_number to 0 +start_number = 0 + +print("starting with", start_number) +int_number = int(start_number) +print('int_number', int_number, type(int_number)) +# now convert to float +float_number = float(int_number) +print('float_number', float_number, type(float_number)) +# now convert to str +str_number = str(int_number) +print('str_number', str_number, type(str_number)) +# now convert to bool +bool_number = bool(int_number) +print('bool_number', bool_number, type(bool_number)) + +# What is the boolean representation of 0? +msg = ''' + +What would happen if you set start_number to the string 'zero', and why? + +Answer +------ + 1 -> True + 0 -> False + + If we set start_number to 'zero' then int('zero') will fail + because it cannot convert a word representation of this sort to an integer + only a character representation such as '0' or '1' +''' +print(msg) +``` + + starting with 0 + int_number 0 + float_number 0.0 + str_number 0 + bool_number False + + + What would happen if you set start_number to the string 'zero', and why? + + Answer + ------ + 1 -> True + 0 -> False + + If we set start_number to 'zero' then int('zero') will fail + because it cannot convert a word representation of this sort to an integer + only a character representation such as '0' or '1' + + diff --git a/docs/012_Python_strings 2.md b/docs/012_Python_strings 2.md new file mode 100644 index 00000000..10563de8 --- /dev/null +++ b/docs/012_Python_strings 2.md @@ -0,0 +1,376 @@ +# 012 String formatting + +## Introduction + +### Purpose + +In this section we will learn some more depth about strings: features and formatting. + +### Prerequisites + +You will need some understanding of the following: + + +* [001 Using Notebooks](001_Notebook_use.md) +* [005 Getting help](005_Help.md) +* [010 Variables, comments and print()](010_Python_Introduction.md) +* [011 Data types](011_Python_data_types.md) In particular, you should be understand strings and know how to find the length of a string. + +### Timing + +The session should take around 30 minutes. + + +## String features + +### Quotes and escapes + +We have seen strings before, and noted that they are collections of characters (`a`, `b`, `1`, ...). Strings and characters are input by surrounding the relevant text in either double (`"`) or single (`'`) quotes. You can use this feature to print out a string with quotes, for example: + + +```python +print ("'a string in single quotes'") +print ('"a string in double quotes"') +``` + + 'a string in single quotes' + "a string in double quotes" + + +Some elements of the string may be special codes for print formatting, such as newline `\n` or tab `\t`. If we insert these in the string, they will add a newline or a tab respectively. Both of these might *look like* multiple characters, but rather are interpreted instead as a single character. + +What if we needed to print out `\n` as part of the string, e.g. print the string: + + "beware of \n and \t" + +we will find that they are (as we probably suspected) interpreted. Using single or double quotes will make no difference: + + +```python +print("beware of \n and \t") +print('beware of \n and \t') +``` + + beware of + and + beware of + and + + +What we need to do is to present the `print()` with two characters `\` and `n`, instead of the single character `\n`. The problem now is that `\` has special meaning in a string: it *escapes* the following character, i.e. it makes the interpreter ignore the meaning of the following character. If we tried to generate a string: + + "\" + + the code would fail, because `\"` means *don't* interpret `"` in its usual sense (i.e. as a quote) and we would have an unclosed string. + + The trick then, is to use `\` to escape the meaning of `\`. So, if we want to print `\`, we set the string as `\\`: + + +```python +print("\\") +``` + + \ + + +#### Exercise + +* insert a new cell below here +* Use what we have learned above to print the phrase `"beware of \n and \t"`, including quotes. + +Another time we use the `\` as an escape character is in trying to make long strings in our code more readable. We can do this by putting an escape `\` **just before** we hit the return key (newline!) on the keyboard, and so spread what would be a command or variable over a single long line over multiple lines. + +For example: + + +```python +# from https://www.usgs.gov/faqs/what-remote-sensing-and-what-it-used? +string = \ +"Remote sensing is the process of detecting and \ +monitoring the physical characteristics of an \ +area by measuring its reflected and emitted \ +radiation at a distance (typically from \ +satellite or aircraft)." + +print(string) +``` + + Remote sensing is the process of detecting and monitoring the physical characteristics of an area by measuring its reflected and emitted radiation at a distance (typically from satellite or aircraft). + + +Here, when we type `string = ` on the first line, the Python interpreter expects a string to be specified next. By using instead `\` *just before we hit the return*, we are essentially escaping that newline, and the rest of the command (the string definition here) can take place on the following line. We repeat this idea to spread the string over multiple lines. + +This can be really useful. + +In the special case of a string that we want to define over multiple lines though, Python has a special format using triple quotes (single or double): + + ''' + multiple + line + string + ''' + +that means we don't need to escape each end of line within the text. + + +```python +# from https://www.usgs.gov/faqs/what-remote-sensing-and-what-it-used? +string = ''' +Remote sensing is the process of detecting and +monitoring the physical characteristics of an +area by measuring its reflected and emitted +radiation at a distance (typically from +satellite or aircraft). +''' + +print(string) +``` + + + Remote sensing is the process of detecting and + monitoring the physical characteristics of an + area by measuring its reflected and emitted + radiation at a distance (typically from + satellite or aircraft). + + + +Notice how this is different to the case when we escaped the newline characters withing the string. In fact, at the end of each line of text, this string contains `\n` newline characters (we just don't see them). + +#### Exercise + +* Insert a new cell below here +* Write Python code that prints a string containing the following text, spaced over four lines as intended. There should be no space at the start of the line. + + The Owl and the Pussy-cat went to sea + In a beautiful pea-green boat, + They took some honey, and plenty of money, + Wrapped up in a five-pound note. + +* Write Python code that prints a string containing the above text, all on a single line. + +## String arithmetic + +We can use some of the arithmetic operators with strings. In particular `*` and `+`. + +### `*` + +In the context of a string, the operator `*` is used to repeat the string. For example: + + +```python +# multiplication example +dash = '-' +dash10 = dash * 10 +print(dash,len(dash)) +print(dash10,len(dash10)) +``` + + - 1 + ---------- 10 + + +#### Exercise + +* In a new cell below, generate a string called `base` and set this to the string `Hello` +* print base and its length +* set a new variable `mult` to `base * 10` +* print `mult` and its length +* comment on why the lengths are the values reported + +### `+` + +The plus operator `+` adds two strings together, in the sense of [concatenating](https://en.wikipedia.org/wiki/Concatenation) the strings. For example: + + +```python +# hello world +astring = 'hello' +bstring = 'world' + +cstring = astring + bstring +print('I joined',astring,'to',bstring,'with + and got',cstring) +``` + + I joined hello to world with + and got helloworld + + +#### Exercise + +You may have noticed that when we use `+` to join `hello + world` above, there is no space between the words. This is because we have not told the computer to put any such space in. + +* Copy the code from the hello world example above +* create a new string called `gap` containing whitespace: `gap = ' '` +* using `gap`, edit the code so that `cstring` has a gap between the words + +## String formating + +### `str.format()` + +We know that we can join strings together with `+` or, from a list with `str.join()`. + +Whilst we have seen that you can print a string with some variables in it, e.g.: + + +```python +float_val = 10.6 +guess_value = 13.4 +print("The number you are thinking of is",float_val,'but I guessed',guess_value) +``` + + The number you are thinking of is 10.6 but I guessed 13.4 + + +strings of that nature can soon become unwieldy. We could have converted each item to a string, and then joined the strings: + + +```python +float_val = 10.6 +guess_value = 13.4 +# using + & inserting the correct white spaces +string = "The number you are thinking of is " + \ + str(float_val) + \ + ' but I guessed ' + \ + str(guess_value) +print(string) +``` + + The number you are thinking of is 10.6 but I guessed 13.4 + + +but neither of these is very readable, or indeed very re-useable. + +A neater way to form a string with variable inserts is to use the `format()` method: + + str.format(...) + | S.format(*args, **kwargs) -> str + | + | Return a formatted version of S, using substitutions from args and kwargs. + | The substitutions are identified by braces ('{' and '}'). + | + +Using this approach, we would set up a template: + + string_template = \ + "The number you are thinking of is {think} but I guessed {guess}" + +with variables `float_val` and `guess_value` defined between braces `{}`. + +To insert values into this template, we use the string method `format()`. If the template variables are named (as in `{think}` and `{guess}` here), then we use **keyword arguments** with `format()`. For example: + + +```python +string_template = \ + "The number you are thinking of is {think} but I guessed {guess}" + +float_val = 10.6 +guess_value = 13.4 + +print(string_template.format(think=float_val,\ + guess=guess_value)) +``` + + The number you are thinking of is 10.6 but I guessed 13.4 + + +This has the advantage that the template is easily re-useable, that we have been explicit about the variables we insert. + +A further refinement on this is to specify some formatting statement for the variables we use. For example, we might want the value `float_val` or `guess_value` to be specified to two decimal places. To do this, we can provide a formatting statement to associate with the variable name in the template. This is done using a `:` qualifier, followed by a description of the format. For example, the format statement of two figures after the decimal point for a `float`is `:.2f`. + + +```python +string_template = \ + "The number you are thinking of is {think:.2f} but I guessed {guess:.2f}" + +float_val = 10.6 +guess_value = 13.4 + +print(string_template.format(think=float_val,\ + guess=guess_value)) +``` + + The number you are thinking of is 10.60 but I guessed 13.40 + + +Other useful format statements examples include: + + : >10.2f + ^^ ^ + 10, so length 10 in total <-| ||-> f is code for float + |-> .4 so length of 4 after decimal + + : 0>8d + pad space with 0 <-||||-> d is code for integer + ||-> 8 is length of string + |-> use > for right align + + : _>10s + pad space with _ <- || ||-> s is code for string + use > for right align <-| |-> 10 is length of string + + : _<10s + pad space with _ <- || ||-> s is code for string + use < for left align <-| |-> 10 is length of string + + The syntax might seem a little awkward at first, but its is [very powerful](https://docs.python.org/3/library/string.html#formatstrings). So long as you are aware that it exists and know of some examples, you should be able to pick it up. + + +```python +print(" : >.2f -> {x: >.2f}".format(x=10.3)) +print(" :0>8d -> {x:0>8d}".format(x=10)) +print(" :_>10s -> {x:_>10s}".format(x='hello')) +print(" :_<10s -> {x:_<10s}".format(x='hello')) +``` + + : >.2f -> 10.30 + :0>8d -> 00000010 + :_>10s -> _____hello + :_<10s -> hello_____ + + +Suppose we want to write a set of results to a different file for each job, where the identifier of the job is an integer number between 0 and 99999999. + +We want the file names associated with these results to appear in a logical order. + +We could just use file names `0.dat`, `1.dat`, `2.dat`, `3.dat` ... `10.dat`, `11.dat` etc. but we would find that when we list the files, they will be in the order `0.dat`, `1.dat`, `10.dat`, `11.dat`, ... `2.dat`, `3.dat`, because this is the natural ['lexicographic' order](https://www.tutorialspoint.com/Sort-the-words-in-lexicographical-order-in-Python#:~:text=Sorting%20words%20in%20lexicographical%20order,(not%20the%20data%20structure).) generally used by computers. We could define some awkward way of getting around this, but it would be much simpler if we simply padded the filenames with `0` characters, so that they appeared as `00000000.dat`, `00000001.dat`, `00000002.dat`, `00000003.dat` ... `00000010.dat`, `00000011.dat`. Then, lexigraphically, they will appear in the order we intended. + + + +#### Exercise + +* set a variable `index` to be an integer between `0` and `99999999`. +* use this to generate a zero-padded filename of the form `00000010.dat` +* print out the filename + + + +### `f-string` + +An alternative way of formatting a string that can be useful is the use of the `f-string`. In this, we place an `f` character at the start of the string. It is a sort of short-hand for what we do with the format statement, where the variables given in the braces are directly inserted. + +Note that we can use the same sort of formatting statements for the f-string as when using `.format()` above. + + +```python +string = f'The number you are thinking of is {float_val} but I guessed {guess_value}' +print(string) +``` + + The number you are thinking of is 10.6 but I guessed 13.4 + + +#### Exercise + +* Insert a new cell below here +* create a template string of the form: + + "what have the {people} ever done for us?" + +* assign the word `Romans` to the variable `people` and print the formatted template: hint: use the `str.format()` method to insert this into the template. +* repeat this using an f-string directly. + +Actually, there are a lot more [useful things](https://realpython.com/python-f-strings/#simple-syntax) we can do with an `f` string, but we will leave it here at this point. + +## Summary + +In this section, we have introduced some more detail on strings, especially string formatting. You should have agained an understanding of the use of quotes and escape code, as well as using `f-string`, `string.format()`. diff --git a/docs/012_Python_strings_answers 2.md b/docs/012_Python_strings_answers 2.md new file mode 100644 index 00000000..ec8cf396 --- /dev/null +++ b/docs/012_Python_strings_answers 2.md @@ -0,0 +1,271 @@ +# 012 String formatting : Answers to exercises + +#### Exercise + +* insert a new cell below here +* Use what we have learned above to print the phrase `"beware of \n and \t"`, including quotes. + + +```python +# ANSWER +# Use what we have learned above to print the phrase +# "beware of \n and \t", including quotes. + +# try this first +string = "beware of \n and \t" +print('wrong:\t\t',string) + +# now escape the \ characters +string = "beware of \\n and \\t" +print('good:\t\t',string,'\t\tbut no quotes') + +# now escape the \ characters +# and add quotes +string = '"beware of \\n and \\t"' +print('great:\t\t',string) + +# now escape the \ characters +# and add quotes by escaping +string = "\"beware of \\n and \\t\"" +print('great:\t\t',string) +``` + + wrong: beware of + and + good: beware of \n and \t but no quotes + great: "beware of \n and \t" + great: "beware of \n and \t" + + +#### Exercise + +* Insert a new cell below here +* Write Python code that prints a string containing the following text, spaced over four lines as intended. There should be no space at the start of the line. + + The Owl and the Pussy-cat went to sea + In a beautiful pea-green boat, + They took some honey, and plenty of money, + Wrapped up in a five-pound note. + +* Write Python code that prints a string containing the above text, all on a single line. + + +```python +# ANSWER + +# Write Python code that prints a string containing +# the following text, spaced over four lines as intended. + +lear = ''' +The Owl and the Pussy-cat went to sea +In a beautiful pea-green boat, +They took some honey, and plenty of money, +Wrapped up in a five-pound note. + ''' +print(lear) +``` + + + The Owl and the Pussy-cat went to sea + In a beautiful pea-green boat, + They took some honey, and plenty of money, + Wrapped up in a five-pound note. + + + + +```python +# ANSWER + +# Write Python code that prints a string +# containing the above text, all on a single line. + + +# we can still space it out clearly, but +# now escape the new lines +lear = \ +"The Owl and the Pussy-cat went to sea \ +In a beautiful pea-green boat, \ +They took some honey, and plenty of money, \ +Wrapped up in a five-pound note." +print(lear) +``` + + The Owl and the Pussy-cat went to sea In a beautiful pea-green boat, They took some honey, and plenty of money, Wrapped up in a five-pound note. + + +#### Exercise + +* In a new cell below, generate a string called `base` and set this to the string `Hello` +* print base and its length +* set a new variable `mult` to `base * 10` +* print `mult` and its length +* comment on why the lengths are the values reported + + +```python +# ANSWER +# In a new cell below, generate a string called base and set this to the string Hello +base = 'Hello' + +#print base and its length +print(base,len(base)) + +#set a new variable mult to base * 10 +mult = base * 10 + +#print mult and its length +print (mult,len(mult)) + +#comment on why the lengths are the values reported +msg = ''' +The string 'Hello' has 5 characters. We set the evariable base +to be this, so the length of the string base is 5. + +We set mult to be base * 10. For a string, * repeats the string, +so we end up with a string the same as base but repeated 10 times. +Since th length of base was 5, the length of mult will be 5 * 10 = 50 +''' +print(msg) +``` + + Hello 5 + HelloHelloHelloHelloHelloHelloHelloHelloHelloHello 50 + + The string 'Hello' has 5 characters. We set the evariable base + to be this, so the length of the string base is 5. + + We set mult to be base * 10. For a string, * repeats the string, + so we end up with a string the same as base but repeated 10 times. + Since th length of base was 5, the length of mult will be 5 * 10 = 50 + + + +#### Exercise + +You may have noticed that when we use `+` to join `hello + world` above, there is no space between the words. This is because we have not told the computer to put any such space in. + +* Copy the code from the hello world example above +* create a new string called `gap` containing whitespace: `gap = ' '` +* using `gap`, edit the code so that `cstring` has a gap between the words + + +```python +# ANSWER + +# Copy the code from the hello world example above +astring = 'hello' +bstring = 'world' + +# create a new string called `gap` containing whitespace: `gap = ' '` +gap = ' ' + +# using `gap`, edit the code so that `cstring` has a gap between the words +cstring = astring + gap + bstring +print('I joined',astring,'to',gap,'to',bstring,'with + and got',cstring) +``` + + I joined hello to to world with + and got hello world + + + +```python +# ANSWER + +# in a new cell below, generate a string called base and set this to the string Hello +base = 'Hello' + +# print base and its length +print(base,len(base)) + +# set a new variable mult to base * 10 +mult = base * 10 + +# print mult and its length +print(mult,len(mult)) + +# comment on why the lengths are the values reported +msg = ''' + comment on why the lengths are the values reported + + The string called base had length 5 + The new string called mult was a repeat of the string + base, 10 times, using the multiplication operator * + + As we would expect, the new string had length 10 * 5 = 50 +''' +print(msg) +``` + + Hello 5 + HelloHelloHelloHelloHelloHelloHelloHelloHelloHello 50 + + comment on why the lengths are the values reported + + The string called base had length 5 + The new string called mult was a repeat of the string + base, 10 times, using the multiplication operator * + + As we would expect, the new string had length 10 * 5 = 50 + + + +#### Exercise + +* set a variable `index` to be an integer between `0` and `99999999`. +* use this to generate a zero-padded filename of the form `00000010.dat` +* print out the filename + + + + +```python +# ANSWER + +# set a variable index to be an integer between 0 and 99999999. +index = 1265 + +# use this to generate a zero-padded filename of the form 00000010.dat +# Note that we use 8d here as we want at the string part with +# the number to be of length 8 +filename = "{:0>8d}.dat".format(index) + +# print out the filename +print(filename) +``` + + 00001265.dat + + +#### Exercise + +* Insert a new cell below here +* create a template string of the form: + + "what have the {people} ever done for us?" + +* assign the word `Romans` to the variable `people` and print the formatted template: hint: use the `str.format()` method to insert this into the template. +* repeat this using an f-string directly. + + +```python +# ANSWER +# Insert a new cell below here +# create a template string of the form: + +template="what have the {people} ever done for us?" + +# assign the word Romans to the variable people and print +# the formatted template: hint: use the str.format() +# method to insert this into the template. + +print(template.format(people='Romans')) + +# repeat this using an f-string directly. +people='Romans' +print(f"what have the {people} ever done for us?") +``` + + what have the Romans ever done for us? + what have the Romans ever done for us? + diff --git a/docs/013_Python_string_methods 2.md b/docs/013_Python_string_methods 2.md new file mode 100644 index 00000000..fdbe3c73 --- /dev/null +++ b/docs/013_Python_string_methods 2.md @@ -0,0 +1,802 @@ +# 013 String methods + +## Introduction + +### Purpose + +In this section we will learn some about strings, in particular, string methods. + + +### Prerequisites + +You will need some understanding of the following: + + +* [001 Using Notebooks](001_Notebook_use.md) +* [005 Getting help](005_Help.md) +* [010 Variables, comments and print()](010_Python_Introduction.md) +* [011 Data types](011_Python_data_types.md) In particular, you should be understand strings. +* [012 String formatting](012_Python_strings.md) + +### Timing + +The session should take around 20 minutes. + + + +## Strings + +### `help(str)` + +We can get a list of the string methods and associated information on how to use them from `help(str)`. We will go through some of these in this notebook, but you should be aware of the wider set of methods available. You don't need to go through all of these now, but notice how to get this information. + + +```python +help(str) +``` + + Help on class str in module builtins: + + class str(object) + | str(object='') -> str + | str(bytes_or_buffer[, encoding[, errors]]) -> str + | + | Create a new string object from the given object. If encoding or + | errors is specified, then the object must expose a data buffer + | that will be decoded using the given encoding and error handler. + | Otherwise, returns the result of object.__str__() (if defined) + | or repr(object). + | encoding defaults to sys.getdefaultencoding(). + | errors defaults to 'strict'. + | + | Methods defined here: + | + | __add__(self, value, /) + | Return self+value. + | + | __contains__(self, key, /) + | Return key in self. + | + | __eq__(self, value, /) + | Return self==value. + | + | __format__(self, format_spec, /) + | Return a formatted version of the string as described by format_spec. + | + | __ge__(self, value, /) + | Return self>=value. + | + | __getattribute__(self, name, /) + | Return getattr(self, name). + | + | __getitem__(self, key, /) + | Return self[key]. + | + | __getnewargs__(...) + | + | __gt__(self, value, /) + | Return self>value. + | + | __hash__(self, /) + | Return hash(self). + | + | __iter__(self, /) + | Implement iter(self). + | + | __le__(self, value, /) + | Return self<=value. + | + | __len__(self, /) + | Return len(self). + | + | __lt__(self, value, /) + | Return self int + | + | Return the number of non-overlapping occurrences of substring sub in + | string S[start:end]. Optional arguments start and end are + | interpreted as in slice notation. + | + | encode(self, /, encoding='utf-8', errors='strict') + | Encode the string using the codec registered for encoding. + | + | encoding + | The encoding in which to encode the string. + | errors + | The error handling scheme to use for encoding errors. + | The default is 'strict' meaning that encoding errors raise a + | UnicodeEncodeError. Other possible values are 'ignore', 'replace' and + | 'xmlcharrefreplace' as well as any other name registered with + | codecs.register_error that can handle UnicodeEncodeErrors. + | + | endswith(...) + | S.endswith(suffix[, start[, end]]) -> bool + | + | Return True if S ends with the specified suffix, False otherwise. + | With optional start, test S beginning at that position. + | With optional end, stop comparing S at that position. + | suffix can also be a tuple of strings to try. + | + | expandtabs(self, /, tabsize=8) + | Return a copy where all tab characters are expanded using spaces. + | + | If tabsize is not given, a tab size of 8 characters is assumed. + | + | find(...) + | S.find(sub[, start[, end]]) -> int + | + | Return the lowest index in S where substring sub is found, + | such that sub is contained within S[start:end]. Optional + | arguments start and end are interpreted as in slice notation. + | + | Return -1 on failure. + | + | format(...) + | S.format(*args, **kwargs) -> str + | + | Return a formatted version of S, using substitutions from args and kwargs. + | The substitutions are identified by braces ('{' and '}'). + | + | format_map(...) + | S.format_map(mapping) -> str + | + | Return a formatted version of S, using substitutions from mapping. + | The substitutions are identified by braces ('{' and '}'). + | + | index(...) + | S.index(sub[, start[, end]]) -> int + | + | Return the lowest index in S where substring sub is found, + | such that sub is contained within S[start:end]. Optional + | arguments start and end are interpreted as in slice notation. + | + | Raises ValueError when the substring is not found. + | + | isalnum(self, /) + | Return True if the string is an alpha-numeric string, False otherwise. + | + | A string is alpha-numeric if all characters in the string are alpha-numeric and + | there is at least one character in the string. + | + | isalpha(self, /) + | Return True if the string is an alphabetic string, False otherwise. + | + | A string is alphabetic if all characters in the string are alphabetic and there + | is at least one character in the string. + | + | isascii(self, /) + | Return True if all characters in the string are ASCII, False otherwise. + | + | ASCII characters have code points in the range U+0000-U+007F. + | Empty string is ASCII too. + | + | isdecimal(self, /) + | Return True if the string is a decimal string, False otherwise. + | + | A string is a decimal string if all characters in the string are decimal and + | there is at least one character in the string. + | + | isdigit(self, /) + | Return True if the string is a digit string, False otherwise. + | + | A string is a digit string if all characters in the string are digits and there + | is at least one character in the string. + | + | isidentifier(self, /) + | Return True if the string is a valid Python identifier, False otherwise. + | + | Use keyword.iskeyword() to test for reserved identifiers such as "def" and + | "class". + | + | islower(self, /) + | Return True if the string is a lowercase string, False otherwise. + | + | A string is lowercase if all cased characters in the string are lowercase and + | there is at least one cased character in the string. + | + | isnumeric(self, /) + | Return True if the string is a numeric string, False otherwise. + | + | A string is numeric if all characters in the string are numeric and there is at + | least one character in the string. + | + | isprintable(self, /) + | Return True if the string is printable, False otherwise. + | + | A string is printable if all of its characters are considered printable in + | repr() or if it is empty. + | + | isspace(self, /) + | Return True if the string is a whitespace string, False otherwise. + | + | A string is whitespace if all characters in the string are whitespace and there + | is at least one character in the string. + | + | istitle(self, /) + | Return True if the string is a title-cased string, False otherwise. + | + | In a title-cased string, upper- and title-case characters may only + | follow uncased characters and lowercase characters only cased ones. + | + | isupper(self, /) + | Return True if the string is an uppercase string, False otherwise. + | + | A string is uppercase if all cased characters in the string are uppercase and + | there is at least one cased character in the string. + | + | join(self, iterable, /) + | Concatenate any number of strings. + | + | The string whose method is called is inserted in between each given string. + | The result is returned as a new string. + | + | Example: '.'.join(['ab', 'pq', 'rs']) -> 'ab.pq.rs' + | + | ljust(self, width, fillchar=' ', /) + | Return a left-justified string of length width. + | + | Padding is done using the specified fill character (default is a space). + | + | lower(self, /) + | Return a copy of the string converted to lowercase. + | + | lstrip(self, chars=None, /) + | Return a copy of the string with leading whitespace removed. + | + | If chars is given and not None, remove characters in chars instead. + | + | partition(self, sep, /) + | Partition the string into three parts using the given separator. + | + | This will search for the separator in the string. If the separator is found, + | returns a 3-tuple containing the part before the separator, the separator + | itself, and the part after it. + | + | If the separator is not found, returns a 3-tuple containing the original string + | and two empty strings. + | + | replace(self, old, new, count=-1, /) + | Return a copy with all occurrences of substring old replaced by new. + | + | count + | Maximum number of occurrences to replace. + | -1 (the default value) means replace all occurrences. + | + | If the optional argument count is given, only the first count occurrences are + | replaced. + | + | rfind(...) + | S.rfind(sub[, start[, end]]) -> int + | + | Return the highest index in S where substring sub is found, + | such that sub is contained within S[start:end]. Optional + | arguments start and end are interpreted as in slice notation. + | + | Return -1 on failure. + | + | rindex(...) + | S.rindex(sub[, start[, end]]) -> int + | + | Return the highest index in S where substring sub is found, + | such that sub is contained within S[start:end]. Optional + | arguments start and end are interpreted as in slice notation. + | + | Raises ValueError when the substring is not found. + | + | rjust(self, width, fillchar=' ', /) + | Return a right-justified string of length width. + | + | Padding is done using the specified fill character (default is a space). + | + | rpartition(self, sep, /) + | Partition the string into three parts using the given separator. + | + | This will search for the separator in the string, starting at the end. If + | the separator is found, returns a 3-tuple containing the part before the + | separator, the separator itself, and the part after it. + | + | If the separator is not found, returns a 3-tuple containing two empty strings + | and the original string. + | + | rsplit(self, /, sep=None, maxsplit=-1) + | Return a list of the words in the string, using sep as the delimiter string. + | + | sep + | The delimiter according which to split the string. + | None (the default value) means split according to any whitespace, + | and discard empty strings from the result. + | maxsplit + | Maximum number of splits to do. + | -1 (the default value) means no limit. + | + | Splits are done starting at the end of the string and working to the front. + | + | rstrip(self, chars=None, /) + | Return a copy of the string with trailing whitespace removed. + | + | If chars is given and not None, remove characters in chars instead. + | + | split(self, /, sep=None, maxsplit=-1) + | Return a list of the words in the string, using sep as the delimiter string. + | + | sep + | The delimiter according which to split the string. + | None (the default value) means split according to any whitespace, + | and discard empty strings from the result. + | maxsplit + | Maximum number of splits to do. + | -1 (the default value) means no limit. + | + | splitlines(self, /, keepends=False) + | Return a list of the lines in the string, breaking at line boundaries. + | + | Line breaks are not included in the resulting list unless keepends is given and + | true. + | + | startswith(...) + | S.startswith(prefix[, start[, end]]) -> bool + | + | Return True if S starts with the specified prefix, False otherwise. + | With optional start, test S beginning at that position. + | With optional end, stop comparing S at that position. + | prefix can also be a tuple of strings to try. + | + | strip(self, chars=None, /) + | Return a copy of the string with leading and trailing whitespace removed. + | + | If chars is given and not None, remove characters in chars instead. + | + | swapcase(self, /) + | Convert uppercase characters to lowercase and lowercase characters to uppercase. + | + | title(self, /) + | Return a version of the string where each word is titlecased. + | + | More specifically, words start with uppercased characters and all remaining + | cased characters have lower case. + | + | translate(self, table, /) + | Replace each character in the string using the given translation table. + | + | table + | Translation table, which must be a mapping of Unicode ordinals to + | Unicode ordinals, strings, or None. + | + | The table must implement lookup/indexing via __getitem__, for instance a + | dictionary or list. If this operation raises LookupError, the character is + | left untouched. Characters mapped to None are deleted. + | + | upper(self, /) + | Return a copy of the string converted to uppercase. + | + | zfill(self, width, /) + | Pad a numeric string with zeros on the left, to fill a field of the given width. + | + | The string is never truncated. + | + | ---------------------------------------------------------------------- + | Static methods defined here: + | + | __new__(*args, **kwargs) from builtins.type + | Create and return a new object. See help(type) for accurate signature. + | + | maketrans(x, y=None, z=None, /) + | Return a translation table usable for str.translate(). + | + | If there is only one argument, it must be a dictionary mapping Unicode + | ordinals (integers) or characters to Unicode ordinals, strings or None. + | Character keys will be then converted to ordinals. + | If there are two arguments, they must be strings of equal length, and + | in the resulting dictionary, each character in x will be mapped to the + | character at the same position in y. If there is a third argument, it + | must be a string, whose characters will be mapped to None in the result. + + + +## Object methods + +### Concatenate strings: `+` and `len()` + +We can do a number of things with strings which are very useful. These methods are defined on generic objects by Python, but we can use them with strings as an example. + +For one, we can concatenate strings using the `+` symbol: + + +```python +string1 = 'hello' +string2 = 'world' +spacer = ' ' + +# concatenate these +result = string1 + spacer + string2 +print(result) +``` + + hello world + + +Another method we will find useful with strings is the `len()` function. + + +```python +help(len) +``` + + Help on built-in function len in module builtins: + + len(obj, /) + Return the number of items in a container. + + + +When the object is a string, the 'number of items' refers to the number of characters, so `len(str)` returns the length of the string. + + +```python +# generate a string called t +# and see how long it is +# use f-strings for covenience +t = 'hello' +print (f'the length of {t} is {len(t)}') + +# generate a string called s +# and see how long it is +s = "Hello" + "there" + "everyone" +print (f'the length of {s} is {len(s)}') +``` + + the length of hello is 5 + the length of Hellothereeveryone is 18 + + +#### Exercise + +* insert a new cell below here +* what might a zero-length string look like? Try to generate one, and check its length. +* the `Hello there everyone` example above has no spaces between the words. Copy the code and modify it to have spaces. +* confirm that you get the expected increase in length. + +## String methods + +### `replace()` and `strip()` + + +```python +help(str.replace) +``` + + Help on method_descriptor: + + replace(self, old, new, count=-1, /) + Return a copy with all occurrences of substring old replaced by new. + + count + Maximum number of occurrences to replace. + -1 (the default value) means replace all occurrences. + + If the optional argument count is given, only the first count occurrences are + replaced. + + + +The string method `replace()` replaces substrings defined in `old` with those defined in `new`. + +In the example below, we replace the sub-string `"happy"` with a new string containing the emoji "😃": + + +```python +original_string = "I'm a very happy string" +print('original:\t',original_string) + +new_string = original_string.replace("happy", "😀") +print ('new:\t\t',new_string) +``` + + original: I'm a very happy string + new: I'm a very 😀 string + + + +```python +help(str.strip) +``` + + Help on method_descriptor: + + strip(self, chars=None, /) + Return a copy of the string with leading and trailing whitespace removed. + + If chars is given and not None, remove characters in chars instead. + + + +`strip()` is very useful in string formatting and general tidying up. + +Suppose we had the string: + + ":::😀:😀:😀::::::" + +but what we wanted was: + + "😀:😀:😀" + +i.e. we want to strip the `:` characters from the right and left ends of the string. We can't easily use `replace()` without affecting the `:` characters we want to keep. We can achieve this with the `strip()` method though. + + +```python +old_string = ":::😀:😀:😀::::::" +print(old_string) + +new_string = old_string.strip(':') +print(new_string) +``` + + :::😀:😀:😀:::::: + 😀:😀:😀 + + +#### Exercise + +* Insert a new cell below here +* Take the multi-line string: + +`''' +----Remote sensing is the process of detecting and +monitoring the physical characteristics of an +area by measuring its reflected and emitted +radiation at a distance (typically from +satellite or aircraft).---- +'''` + + and use it to generate a single line string, without the `-` characters at either end. + + +### `split()` and `join()` + + +```python +help(str.split) +``` + + Help on method_descriptor: + + split(self, /, sep=None, maxsplit=-1) + Return a list of the words in the string, using sep as the delimiter string. + + sep + The delimiter according which to split the string. + None (the default value) means split according to any whitespace, + and discard empty strings from the result. + maxsplit + Maximum number of splits to do. + -1 (the default value) means no limit. + + + +`split()` and `join()` are a pair of really useful string methods. The former is used to split a string into a list of sub-strings. For example: + + +```python +string = \ +" Remote sensing is the process of detecting and \ +monitoring the physical characteristics of an \ +area by measuring its reflected and emitted \ +radiation at a distance (typically from \ +satellite or aircraft). " + +string_list = string.split() + +print(string_list) +``` + + ['Remote', 'sensing', 'is', 'the', 'process', 'of', 'detecting', 'and', 'monitoring', 'the', 'physical', 'characteristics', 'of', 'an', 'area', 'by', 'measuring', 'its', 'reflected', 'and', 'emitted', 'radiation', 'at', 'a', 'distance', '(typically', 'from', 'satellite', 'or', 'aircraft).'] + + +We see that the string is 'parsed' into a list of separate sub-strings, which in this case represent words in the sentence. The default delimiter used to split the string is `' '`, whitespace (space or tab), though we could specify others if we needed. + +Any whitespece to the left or right of the string has no impact here, so we do not need to explicitly `strip()` the string. + +If we want to generate a string from a set of sub-strings, we use the `join()` method. + + +```python +help(str.join) +``` + + Help on method_descriptor: + + join(self, iterable, /) + Concatenate any number of strings. + + The string whose method is called is inserted in between each given string. + The result is returned as a new string. + + Example: '.'.join(['ab', 'pq', 'rs']) -> 'ab.pq.rs' + + + + For this, we declare the string delimiter we wish to use. For example, to reconstruct the sentence from the string list with whitespace delimitation: + + +```python +string_list = ['Remote', 'sensing', 'is', 'the', 'process', + 'of', 'detecting', 'and', 'monitoring', 'the', + 'physical', 'characteristics', 'of', 'an', 'area', + 'by', 'measuring', 'its', 'reflected', 'and', 'emitted', + 'radiation', 'at', 'a', 'distance', '(typically', 'from', + 'satellite', 'or', 'aircraft).'] + +string = ' '.join(string_list) +print(string) +``` + + Remote sensing is the process of detecting and monitoring the physical characteristics of an area by measuring its reflected and emitted radiation at a distance (typically from satellite or aircraft). + + +#### Exercise + +* Insert a new cell below here +* Take the string + + The Owl and the Pussy-cat went to sea + In a beautiful pea-green boat, + They took some honey, and plenty of money, + Wrapped up in a five-pound note. + + and split it into a list of sub-strings. +* Then re-construct the string, separating each word by a colon character `':'` +* Print out the list of sub-strings and the re-constructed string + +### `slice` + +A string can be thought of as an ordered 'array' of characters. + +So, for example the string `hello` can be thought of as a construct containing `h` then `e`, `l`, `l`, and `o`. + +We can index a string, so that e.g. `'hello'[0]` is `h`, `'hello'[1]` is `e` etc. Notice that index `0` is used for the first item. + +We have seen above the idea of the 'length' of a string. In this example, the length of the string `hello` is 5. The final item in this case would be `'hello'[4]`, because we count indices from 0. + + +```python +string = 'hello' + +# length +slen = len(string) +print('length of',string,'is',slen) + +# select these indices +i = 0 +print('character',i,'of',string,'is',string[i]) + +i = 3 +print('character',i,'of',string,'is',string[i]) + +i = 4 +print('character',i,'of',string,'is',string[i]) + +``` + + length of hello is 5 + character 0 of hello is h + character 3 of hello is l + character 4 of hello is o + + +#### Exercise + +* Insert a new cell below here +* copy the code above, and see what happens if you set `i` to be the value of length of the string. +* why does it respond in this way? + + + +We can use the idea of a 'slice' to access particular elements within the string. + +For a slice, we can specify: + +* start index (0 is the first) +* stop index (not including this) +* skip (do every 'skip' character) + +When specifying this as array access, this is given as, e.g.: + +`array[start:stop:skip]` + +* The default start is 0 +* The default stop is the length of the array +* The default skip is 1 + +We can use negative numbers in specifying `start:stop:skip`: in that case, they are counted from the end of the string (`-1` is the last character). + +We can specify a slice with the default values by leaving the terms out: + +`array[::2]` + +would give values in the array `array` from 0 to the end, in steps of 2. + +We can do the same by using `None` to indicate the default: + +`array[None:None:2]` + + +This idea is fundamental to array processing in Python. We will see later that the same mechanism applies to all ordered groups. + + + +```python +s = "Hello World" +print (s,len(s)) + +start = None +stop = 11 +skip = 2 +print (s[start:stop:skip]) + +# use -ve numbers to specify from the end +# use None to take the default value +start = -3 +stop = None +skip = 1 +print (s[start:stop:skip]) +``` + + Hello World 11 + HloWrd + rld + + +#### Exercise + +The example above allows us to access an individual character(s) of the array. + +* Insert a new cell below here +* based on the example above, print the string starting from the default start value, up to the default stop value, in steps of `2`. This should be `HloWrd`. +* write code to print out the 4$^{th}$ letter (character) of the string `s`. This should be `l`. + + +## Summary + +In this section, we have introduced some more detail on string, especially string methods. There are many more methods you can use, but we have tried to cover the main ones here, but there are many [resources](https://www.w3schools.com/python/python_strings.asp#:~:text=Strings%20are%20Arrays,access%20elements%20of%20the%20string.) you can use to follow up. + +You should know how to make a single line or multi-line string. You should know how to use `replace`, `strip`, `split` and `join` on a string, as well as use concepts of indexing a string array and using ideas of `slice`. You should recognise the `None` character. You shouyld know how to find information on how to use other string methods. diff --git a/docs/013_Python_string_methods_answers 2.md b/docs/013_Python_string_methods_answers 2.md new file mode 100644 index 00000000..24ee6f11 --- /dev/null +++ b/docs/013_Python_string_methods_answers 2.md @@ -0,0 +1,290 @@ +# 013 String methods : Answers to exercises + + +```python +# ANSWER + +# lets set up a variable called string to make this clearer +# and do this piece by piece +string = 'beware of \n and \t' +print("wrong:", string) + +# escape the \ +string = 'beware of \\n and \\t' +print("good:\t\t", string, '\tbut no quotes') + +# escape the \ +string = '"beware of \\n and \\t"' +print("great:\t\t", string) + +# or ... escape the quotes. as well! +string = "\"beware of \\n and \\t\"" +print("great again:\t", string) +``` + + wrong: beware of + and + good: beware of \n and \t but no quotes + great: "beware of \n and \t" + great again: "beware of \n and \t" + + +#### Exercise + +* insert a new cell below here +* what might a zero-length string look like? Try to generate one, and check its length. +* the `Hello there everyone` example above has no spaces between the words. Copy the code and modify it to have spaces. +* confirm that you get the expected increase in length. + + +```python +# ANSWER + +# insert a new cell below here +# what might a zero-length string look like? Try to generate one, and check its length. + +t = '' +print (f'the length of {t} is {len(t)}') + +# the `Hello there everyone` example above has no spaces between the words. Copy the code and modify it to have spaces. + +space = ' ' +s = "Hello" + space + "there" + space + "everyone" +print (f'the length of {s} is {len(s)}') + +# confirm that you get the expected increase in length. +msg = ''' +The old string had length 18 +now, with two spaces, this has length 20 as expected +''' +print(msg) +``` + + the length of is 0 + the length of Hello there everyone is 20 + + The old string had length 18 + now, with two spaces, this has length 20 as expected + + + +#### Exercise + +* Insert a new cell below here +* Take the multi-line string: + +`''' +----Remote sensing is the process of detecting and +monitoring the physical characteristics of an +area by measuring its reflected and emitted +radiation at a distance (typically from +satellite or aircraft).---- +'''` + + and use it to generate a single line string, without the `-` characters at either end. + + + +```python +# ANSWER + +old_string = ''' +----Remote sensing is the process of detecting and +monitoring the physical characteristics of an +area by measuring its reflected and emitted +radiation at a distance (typically from +satellite or aircraft).---- +''' +print(old_string) + +# replace newline with empty string! +# and strip the result after +new_string = old_string.replace('\n','').strip('-') +print(new_string) +``` + + + ----Remote sensing is the process of detecting and + monitoring the physical characteristics of an + area by measuring its reflected and emitted + radiation at a distance (typically from + satellite or aircraft).---- + + Remote sensing is the process of detecting and monitoring the physical characteristics of an area by measuring its reflected and emitted radiation at a distance (typically from satellite or aircraft). + + +#### Exercise + +* Insert a new cell below here +* Take the string + + The Owl and the Pussy-cat went to sea + In a beautiful pea-green boat, + They took some honey, and plenty of money, + Wrapped up in a five-pound note. + + and split it into a list of sub-strings. +* Then re-construct the string, separating each word by a colon character `':'` +* Print out the list of sub-strings and the re-constructed string + + +```python +# Answer + +# Take the string +string = ''' +The Owl and the Pussy-cat went to sea +In a beautiful pea-green boat, +They took some honey, and plenty of money, +Wrapped up in a five-pound note. +''' + +# and split it into a list of sub-strings. +list_string = string.split() +# print this out +print(list_string) + +# Then re-construct the string, separating each word by a colon character ':' +recon_string = ':'.join(list_string) +# print this out +print(recon_string) +``` + + ['The', 'Owl', 'and', 'the', 'Pussy-cat', 'went', 'to', 'sea', 'In', 'a', 'beautiful', 'pea-green', 'boat,', 'They', 'took', 'some', 'honey,', 'and', 'plenty', 'of', 'money,', 'Wrapped', 'up', 'in', 'a', 'five-pound', 'note.'] + The:Owl:and:the:Pussy-cat:went:to:sea:In:a:beautiful:pea-green:boat,:They:took:some:honey,:and:plenty:of:money,:Wrapped:up:in:a:five-pound:note. + + + +```python +# Answer +# the Hello there everyone example above has no spaces between the words. +# copy the code and modify it to have spaces. + +# generate a string called s +# and see how long it is + +# lets have a spacer variable +spacer = ' ' +quote = '"' +# add the spaces in +s = "Hello" + spacer + "there" + spacer + "everyone" +print ('the length of',quote+s+quote,'is',len(s)) + +# confirm that you get the expected increase in length. +# It is now 20 rather than 18 above +``` + + the length of "Hello there everyone" is 20 + + +#### Exercise + +* Insert a new cell below here +* copy the code above, and see what happens if you set `i` to be the value of length of the string. +* why does it respond in this way? + + +```python +# ANSWER + +# copy the code +string = 'hello' + +# length +slen = len(string) +print('length of', string, 'is', slen) + +# copy the code above, and see what happens if you set `i` to be the value of length of the string. +i = slen +print('character', i, 'of', string, 'is', string[i]) +``` + + length of hello is 5 + + + + --------------------------------------------------------------------------- + + IndexError Traceback (most recent call last) + + in + 10 # copy the code above, and see what happens if you set `i` to be the value of length of the string. + 11 i = slen + ---> 12 print('character', i, 'of', string, 'is', string[i]) + + + IndexError: string index out of range + + + +```python +# ANSWER + +# Why does it respond in this way? +msg = ''' + This fails with: + + IndexError: string index out of range + + because string[5] does not exist + as the length of string is 5: we can + only idex from 0 to 4 +''' +print(msg) + +``` + + + This fails with: + + IndexError: string index out of range + + because string[5] does not exist + as the length of string is 5: we can + only idex from 0 to 4 + + + +#### Exercise + +The example above allows us to access an individual character(s) of the array. + +* Insert a new cell below here +* based on the example above, print the string starting from the default start value, up to the default stop value, in steps of `2`. This should be `HloWrd`. +* write code to print out the 4$^{th}$ letter (character) of the string `s`. This should be `l`. + + + +```python +# ANSWER + +s = "Hello World" +print (s,len(s)) + +# based on the example above, print the string starting +# from the default start value, up to the default stop value, in steps of `2`. + +# default start -> None +start = None +# default stop -> None +stop = None +skip = 2 +print (s[start:stop:skip]) +``` + + Hello World 11 + HloWrd + + + +```python +# ANSWER + +s = "Hello World" +# write code to print out the 4 𝑡ℎ letter (character) of the string s. +# index 3 is the 4th character !!! +print(s[3]) +``` + + l + diff --git a/docs/014_Python_groups 2.md b/docs/014_Python_groups 2.md new file mode 100644 index 00000000..d1e62eeb --- /dev/null +++ b/docs/014_Python_groups 2.md @@ -0,0 +1,391 @@ +# 014 Groups + +## Introduction + +### Purpose + +In this section we will learn how use groups of objects in Python. + + +### Prerequisites + +You will need some understanding of the following: + + +* [001 Using Notebooks](001_Notebook_use.md) +* [005 Getting help](005_Help.md) +* [010 Variables, comments and print()](010_Python_Introduction.md) +* [011 Data types](011_Python_data_types.md) In particular, you should be understand strings. +* [012 String formatting](012_Python_strings.md) +* [013_Python_string_methods](013_Python_string_methods.md) + +In the exercises below, make use of f-strings when building statements to print. + +### Timing + +The session should take around 30 minutes. + + + +## Groups of things +Very often, we will want to group items together. There are several main mechanisms for doing this in Python, known as: + +* string e.g. `hello` +* tuple, e.g. `(1, 2, 3)` +* list, e.g. `[1, 2, 3]` + +A slightly different form of group is a dictionary: + +* dict, e.g. `{1:'one', 2:'two', 3:'three'}` + +You will notice that each of the grouping structures `tuple`, `list` and `dict` use a different form of bracket and that quotes are used to bracket a string. + +We have seen that a string is an ordered collection in the material above, so will deal with the others here. + +We noted the concept of length (`len()`), that elements of the ordered collection could be accessed via an index or slice. All of these same ideas apply to the first set of groups (string, tuple, list, numpy array) as they are all ordered collections. + +A dictionary is not (by default) ordered, however, so indices have no role. Instead, we use 'keys'. + +### `tuple` +A tuple is a group of items separated by commas. In the case of a tuple, the brackets are optional. +You can have a group of differnt types in a tuple (e.g. `int`, `float`, `str`, `bool`) + +#### Using a `tuple` + + +```python +# for spacing +dash = '\n----------' + +# load into the tuple + +t = (1, 2, 'three', False) + +# unload from the tuple +# notice we must have the same number of items +a,b,c,d = t + +print(t,dash) +print(a,b,c,d,dash) + +print('the type of t is',type(t)) +``` + + (1, 2, 'three', False) + ---------- + 1 2 three False + ---------- + the type of t is + + +If there is only one element in a tuple, you must put a comma , at the end, otherwise it is not interpreted as a tuple: + + + + +```python +t = (1) +print (t,type(t)) +t = (1,) +print (t,type(t)) +``` + + 1 + (1,) + + +You can have an empty tuple though: + + + + +```python +t = () +print (t,type(t)) +``` + + () + + +#### Exercise + +* create a tuple called `t` that contains the integers `1` to `5` inclusive +* print out the value of `t` +* use the tuple to set variables `a1`,`a2`,`a3`,`a4`,`a5` +* print `a1`,`a2`,`a3`,`a4`,`a5` + + +### `list` +A `list` is similar to a `tuple`. One main difference is that you can **change individual elements in a list but not in a tuple**. +To convert between a list and tuple, use the 'casting' methods `list()` and `tuple()`: + + +```python + +# a tuple +t0 = (1,2,3) + +# cast to a list +l = list(t0) + +# cast to a tuple +t = tuple(l) + +print('type of {} is {}'.format(t,type(t))) +print('type of {} is {}'.format(l,type(l))) +``` + + type of (1, 2, 3) is + type of [1, 2, 3] is + + +You can concatenate (join) lists or tuples with the `+` operator: + + + + +```python +l0 = [1,2,3] +l1 = [4,5,6] + +l = l0 + l1 +print ('joint list:',l) +``` + + joint list: [1, 2, 3, 4, 5, 6] + + +A common method associated with lists or tuples is: +* `index()` + + +```python +l0 = [2,8,4,32,16] + +# print the index of the item integer 4 +# in the tuple / list + +item_number = 4 + +# Note the dot . here +# as index is a method of the class list +ind = l0.index(item_number) + +# notice that this is different +# as len() is not a list method, but +# does operatate on lists/tuples +# Note: do not use len as a variable name! +llen = len(l0) + +print(f'the index of {item_number} in {l0} is {ind}') +print(f'the length of the {type(l0)} {l0} is {llen}') +``` + + the index of 4 in [2, 8, 4, 32, 16] is 2 + the length of the [2, 8, 4, 32, 16] is 5 + + +#### Exercise + +* copy the code to a new code block below, and test that this works with lists, as well as tuples + +#### Exercise + +* set a list called `l0` with `l0 = [2,8,4,32,16]` +* find the index of the integer 16 in the tuple/list +* what is the index of the first item? +* what is the length of the tuple/list? +* what is the index of the last item? + +A list has a much richer set of methods than a tuple. This is because we can add or remove list items (but not tuple). + +* `insert(i,j)` : insert `j` beore item `i` in the list +* `append(j)` : append `j` to the end of the list +* `sort()` : sort the list + +Recall from [005_Help](005_Help.md#Exercise) that `sort()` is an `in-place` operation, and remeber the consequences of that. Notice that `insert()` and `append()` are also `in-place` operations. + +This list of methods suggests that tuples and lists are 'ordered' (i.e. they maintain the order they are loaded in) so that indiviual elements may be accessed through an 'index'. The index values start at 0 as we saw above. The index of the last element in a list/tuple is the length of the group, minus 1. This can also be referred to an index `-1`. + + +```python +l0 = [2,8,4,32,16] + +# insert 64 at the begining (before item 0) +# Note that this inserts 'in place' +# i.e. the list is changed by calling this +l0.insert(0,64) + +# insert 128 *before* the last item (item -1) +l0.insert(-1,128) + +# append 256 on the end +l0.append(256) + +# copy the list and sort the copy +# Note the use of the copy() method here +# to create a copy because the in-place method +# will change l0 +l1 = l0.copy() + +# Note that this sorts 'in place' +# i.e. the list is changed by calling this +l1.sort() + +print(f'the list {l0} once sorted is {l1}') +``` + + the list [64, 2, 8, 4, 32, 128, 16, 256] once sorted is [2, 4, 8, 16, 32, 64, 128, 256] + + +#### Exercise + +* set a list called `l0` with `l0 = [2,8,4,32,16]` +* find the index of `16` in this list +* use this insert the number `128` between the entries for `32` and `16` +* take a copy of `l0`, call it `l0_test` and insert the string `'hello world'` at index `-2` +* what positive index number could we have used in place of `-2` here? +* why? + +### multiple dimensional lists + +Lists and tuples are not limited to a single dimension. Sometimes we will want to define multi-dimensional lists, e.g.: + + +```python +x = [[1,2,3],[4,5,6,7],[9,[4,2]],8] +print(x) +print(x[1]) +print(x[1][2]) +``` + + [[1, 2, 3], [4, 5, 6, 7], [9, [4, 2]], 8] + [4, 5, 6, 7] + 6 + + +but the structure and parsing of this is now quite complicated. + +Using multiple dimensions is sometime necessary, but can cause complications. Often we can find a simpler alternative. + +One time we may want them is in associating one list with another, for example: + + +```python +values = [1,2,3,4] +keys = ['one','two','three','four'] + +combined = [values,keys] +print(combined) +``` + + [[1, 2, 3, 4], ['one', 'two', 'three', 'four']] + + +This is a *regular* list, because the length of the sub-list is the same in both cases. + +### `dict` + + + +The collections we have used so far have all been ordered. This means that we can refer to a particular element in the group by an index, e.g. `array[10]`. + +A dictionary is not (by default) ordered. Instead of indices, we use 'keys' to refer to elements: each element has a key associated with it. It can be very useful for data organisation (e.g. databases) to have a key to refer to, rather than e.g. some arbitrary column number in a gridded dataset. + +A dictionary is defined as a group in braces (curley brackets). For each elerment, we specify the key and then the value, separated by `:`. + + +```python +a = {'one': 1, 'two': 2, 'three': 3} + +# we then refer to the keys and values in the dict as: + +print (f'a:\n\t {a}') +print (f'a.keys():\n\t {a.keys()}') # the keys +print (f'a.values():\n\t {a.values()}') # returns the values +print (f'a.items():\n\t {a.items()}') # returns a list of tuples +``` + + a: + {'one': 1, 'two': 2, 'three': 3} + a.keys(): + dict_keys(['one', 'two', 'three']) + a.values(): + dict_values([1, 2, 3]) + a.items(): + dict_items([('one', 1), ('two', 2), ('three', 3)]) + + +We refer to specific items using the key e.g.: + + +```python +print(a['one']) +``` + + 1 + + +You can add to a dictionary using the in-place operator `update`: + + +```python +help(dict.update) +``` + + Help on method_descriptor: + + update(...) + D.update([E, ]**F) -> None. Update D from dict/iterable E and F. + If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] + If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v + In either case, this is followed by: for k in F: D[k] = F[k] + + + + +```python +a.update({'four':4,'five':5}) +print(a) + +# or for a single value +a['six'] = 6 +print(a) +``` + + {'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5} + {'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6} + + +Quite often, you find that you have the keys you want to use in a dictionary as a list or array, and the values in another list. + +In such a case, we can use the method `zip(keys,values)` to load into the dictionary. For example: + + +```python +values = [1,2,3,4] +keys = ['one','two','three','four'] + +a = dict(zip(keys,values)) + +print(a) +``` + + {'one': 1, 'two': 2, 'three': 3, 'four': 4} + + +#### Exercise + +* create a list called `months` with the names of the months of the year +* create a list called `ndays` with the number of days in each month (for this year) +* confirm that the two lists have the same length (12) +* Use these two lists to make a dictionary called `days_in_month` with the key as month name and value as the number of days in that month. +* print out the dictionary and confirm it is as expected +* set a variable `m` to be the name of a month +* using `m` and your dictionary, print out the number of days in month `m` + +## Summary + +In this section, we have extended the types of data we might come across to include groups. We dealt with ordered groups of various types (`tuple`, `list`). We saw dictionaries as collections with which we refer to individual items with a key. We saw how we can use `zip()` to help load a dataset from arrays into a dictionary. + +You should know how to build, access and modify strings, lists, tuples and dictionaries. You should be very familiar with formatted print statements by now. diff --git a/docs/014_Python_groups_answers 2.md b/docs/014_Python_groups_answers 2.md new file mode 100644 index 00000000..4f2d55de --- /dev/null +++ b/docs/014_Python_groups_answers 2.md @@ -0,0 +1,200 @@ +# 014 Groups : Answers to exercises + +#### Exercise + +* create a tuple called `t` that contains the integers `1` to `5` inclusive +* print out the value of `t` +* use the tuple to set variables `a1`,`a2`,`a3`,`a4`,`a5` +* print `a1`,`a2`,`a3`,`a4`,`a5` + + +```python +# ANSWER +# create a tuple called t that contains the integers 1 to 5 inclusive +t = (1,2,3,4,5) + +# print out the value of t +print(t) + +# use the tuple to set variables a1,a2,a3,a4,a5 +a1,a2,a3,a4,a5 = t +print(a1,a2,a3,a4,a5) +``` + + (1, 2, 3, 4, 5) + 1 2 3 4 5 + + +#### Exercise + +* copy the code to a new code block below, and test that this works with lists, as well as tuples + + +```python +# ANSWER + +# copy the code to the block below, and test that this works with lists, as well as tuples + +# use tuple +l0 = (2,8,4,32,16) + +# print the index of the item integer 4 +# in the tuple / list + +item_number = 4 + +# Note the dot . here +# as index is a method of the class list +ind = l0.index(item_number) + +# notice that this is different +# as len() is not a list method, but +# does operatate on lists/tuples +# Note: do not use len as a variable name! +llen = len(l0) + +print(f'the index of {item_number} in {l0} is {ind}') +print(f'the length of the {type(l0)} {l0} is {llen}') +``` + + the index of 4 in (2, 8, 4, 32, 16) is 2 + the length of the (2, 8, 4, 32, 16) is 5 + + +#### Exercise + +* set a list called `l0` with `l0 = [2,8,4,32,16]` +* find the index of the integer 16 in the tuple/list +* what is the index of the first item? +* what is the length of the tuple/list? +* what is the index of the last item? + + +```python +# ANSWER + +# set a list called l0 with l0 = [2,8,4,32,16] +l0 = [2,8,4,32,16] + +# find the index of the integer 16 in the tuple/list +value = 16 +print(f'index of {value} in {l0} is {l0.index(value)}') + +# what is the index of the first item? +value = l0[0] +print(f'index of {value} in {l0} is {l0.index(value)}') + +# what is the length of the tuple/list? +print(f'length of {l0} is {len(l0)}') + +# what is the index of the last item? +last_item = len(l0) - 1 +value = l0[last_item] +print(f'index of {value} in {l0} is {l0.index(value)}') + +# or simply use -1l, rememberimg that we can index -ve +value = l0[-1] +print(f'index of {value} in {l0} is {l0.index(value)}') +``` + + index of 16 in [2, 8, 4, 32, 16] is 4 + index of 2 in [2, 8, 4, 32, 16] is 0 + length of [2, 8, 4, 32, 16] is 5 + index of 16 in [2, 8, 4, 32, 16] is 4 + index of 16 in [2, 8, 4, 32, 16] is 4 + + +#### Exercise + +* set a list called `l0` with `l0 = [2,8,4,32,16]` +* find the index of `16` in this list +* use this insert the number `128` between the entries for `32` and `16` +* take a copy of `l0`, call it `l0_test` and insert the string `'hello world'` at index `-2` +* what positive index number could we have used in place of `-2` here? +* why? + + +```python +# ANSWER + +# set a list called `l0` with `l0 = [2,8,4,32,16]` +l0 = [2,8,4,32,16] + + +# find the index of `16` in this list +index_16 = l0.index(16) + +# insert the number `128` between the entries for `32` and `16` +l0.insert(index_16,128) +print(l0) + +# take a copy of `l0`, call it `l0_test` +# and insert the string `'hello world'` at index `-2` +l1 = l0.copy() +l1.insert(-2,'hello world') +print(l1) + +# what positive index number could we have used in place of `-2` here +# the answer is 4 +l1 = l0.copy() +l1.insert(4,'hello world') +print(l1) + +# why? +msg = ''' +since the length of l0 is 6 (when we copy it) +then -2 corresponds to the +ve index 6-2 = 4 +''' +print(msg) +``` + + [2, 8, 4, 32, 128, 16] + [2, 8, 4, 32, 'hello world', 128, 16] + [2, 8, 4, 32, 'hello world', 128, 16] + + since the length of l0 is 6 (when we copy it) + then -2 corresponds to the +ve index 6-2 = 4 + + + +#### Exercise + +* create a list called `months` with the names of the months of the year +* create a list called `ndays` with the number of days in each month (for this year) +* confirm that the two lists have the same length (12) +* Use these two lists to make a dictionary called `days_in_month` with the key as month name and value as the number of days in that month. +* print out the dictionary and confirm it is as expected +* set a variable `m` to be the name of a month +* using `m` and your dictionary, print out the number of days in month `m` + + +```python +# ANSWER +# create a list called `months` with the names of the months of the year +months = ["January","February","March","April","May",\ + "June","July","August","September","October",\ + "November","December"] +# create a list called `ndays` with the number of days in each month (for this year) +ndays = [31,29,31,30,31,30,31,31,30,31,30,31] + +# confirm that the two lists have the same length (12) +print(f'length of months: {len(months)}') +print(f'length of ndays: {len(ndays)}') + +# Use these two lists to make a dictionary called `days_in_month` +# with the key as month name and value as the number of days in that month. +days_in_month = dict(zip(months,ndays)) + +# print out the dictionary and confirm it is as expected +print(days_in_month) + +# set a variable `m` to be the name of a month +m = 'January' +print(f'The number of days in {m} is {days_in_month[m]}') +``` + + length of months: 12 + length of ndays: 12 + {'January': 31, 'February': 29, 'March': 31, 'April': 30, 'May': 31, 'June': 30, 'July': 31, 'August': 31, 'September': 30, 'October': 31, 'November': 30, 'December': 31} + The number of days in January is 31 + diff --git a/docs/015_Python_control 2.md b/docs/015_Python_control 2.md new file mode 100644 index 00000000..db0fa5a5 --- /dev/null +++ b/docs/015_Python_control 2.md @@ -0,0 +1,194 @@ +# 015 Control in Python: `if` + +## Introduction + +### Purpose + +In this section we will learn how to add conditional control to our codes. We will cover the conditional statement: `if`. + +### Prerequisites + +You will need some understanding of the following: + + +* [001 Using Notebooks](001_Notebook_use.md) +* [005 Getting help](005_Help.md) +* [010 Variables, comments and print()](010_Python_Introduction.md) +* [011 Data types](011_Python_data_types.md) +* [012 String formatting](012_Python_strings.md) +* [013_Python_string_methods](013_Python_string_methods.md) +* [014_Python_groups](014_Python_groups.md) + +### Timing + +The session should take around 30 minutes. + +## Comparison Operators and `if` + +### Comparison Operators + +A comparison operator 'compares' two terms (e.g. the contents of variables) and returns a boolean data type (`True` or `False`). + +For example, to see if the value of some variable `a` has 'the same value as' ('equivalent to') the value of some variable `b`, we use the equivalence operator (`==`). To test for non equivalence, we use the not equivalent operator `!=` (read the `!` as 'not'): + + + +```python +a = 100 +b = 10 +# +# These are *not* the same, so we expect +# a == b : False + +# Note the use of \n and \t in here +# from 010 for formatting +print (f'a is {a} and b is {b}') +print (f'a is equivalent to b? {a == b}') +``` + + a is 100 and b is 10 + a is equivalent to b? False + + +#### Exercise + +* insert a new cell below here. Use f-strings in forming strings. +* copy the code above +* add a `print` statement to your code that tests for non equivalence of `a` and `b` +* repeat this in a new cell, but now change the values (or type) of the variables `a` and `b` to `float` or `bool` + + +A fuller set of comparison operators allows greater or less than tests: + +|symbol| meaning| +|:---:|:---:| +| == | is equivalent to | +| != | is not equivalent to | +| > | greater than | +|>= | greater than or equal to| +|< | less than| +|<= | less than or equal to | + +so that, for example: + + +```python +# Comparison examples + +# is one plus one list equal to two list? +print (f'1 + 1 == 2 : {1 + 1 == 2}') + +# is one less than or equal to 0.999? +print (f'1 <= 0.999 : {1 <= 0.999}') + +# is one plus one not equal to two? +print (f'1 + 1 != 2 : {1 + 1 != 2}') + +# "is 100 less than 2?" +print (f'100 < 2 : {100 < 2}') + +``` + + 1 + 1 == 2 : True + 1 <= 0.999 : False + 1 + 1 != 2 : False + 100 < 2 : False + + +#### Exercise + +* insert a new cell below here +* create variables `a` and `b` and set them to types and values of your choice +* create a variable called `gt_test` and set it to the result of `a > b` +* print a statement of what you have used, and the value of `gt_test` +* explain why you get the result you do + +### Conditional test: `if ... elif ... else ...` + +A common use of comparisons is for program control, using an `if` statement of the form: + + if condition1: + # do this 1 + doit1() + ... + elif condition2: + # do this 2 + doit2() + ... + else: + # do this 3 + doit3() + ... + + +Implicit in these statements is that the conditions return `True` to pass the tests, i.e. we could more fully write: + + if condition1 == True: + # do this 1 + ... + +This form of conditional statement allows us to run blocks of code *only under a particular condition* (or set of conditions). + +In Python, the statement(s) we run on condition are *indented*. + +The indent can be one or more spaces or a `` character, the choice is up to the programmer. However, it **must be consistent**. It is generally best to use spaces rather than tab characters, it is all too easy to mistake one for the other. + +Pay attention to indentation in conditional statements. Getting it wrong is one of the more common errors new Python coders make. + + +```python +test = 3 +print('test result is',test) + +# initialise retval +retval = None + +# conduct some tests, and set the +# variable retval to True if we pass +# any test + +if test >= 1: + retval = True + print('passed test 1: "if test >= 1"') +elif test == 0: + retval = True + print('passed test 2: "if test == 0"') +else: + retval = False + print('failed both tests') + +print('retval is',retval) +``` + + test result is 3 + passed test 1: "if test >= 1" + retval is True + + +#### Exercise + +* insert a new cell below here +* set a variable `doy` to represent the day of year and initialise it to some integer between 1 and 365 inclusive +* set a variable `month` to be `'January'` +* set a variable `year` to be `'2020'` +* Write a series of conditional statements that set the variable `month` to the correct month for the value of `doy` +* Print the month for the given doy +* Test that you get the right result for several `doy` values + +You should assume that `doy` value `1` represents January 1st. + +You might find a [DOY calendar](https://www.esrl.noaa.gov/gmd/grad/neubrew/Calendar.jsp) helpful here. + +![DOY calendar](images/doycal.png) + +We will see later that this is not the best way to do calculations of this sort. First, it is all too easy to make mistakes in typing in both the `doy` boundaries and the month names. Second, it is not very flexible: for instance, consider how would you need to change it for leap or non-leap years. Third, it is not at all [pythonic](https://stackoverflow.com/questions/25011078/what-does-pythonic-mean#:~:text=Pythonic%20means%20code%20that%20doesn,is%20intended%20to%20be%20used.), i.e. doesn't make use of the features of Python that could make it clear and concise. + +That said, it is an easily-understandable exercise to try out using conditional statements. + +## Summary + +We should know know how to use `if` statements in Python to control program flow. We can make choices as to what happens in the code, depending on whether or not one or more tests are passed. This is a common feature of all coding languages, but it is important here that you get used to doing this in Python. + +We know that conditions inside `if` statements use indentation in Python, and we know to be careful in our use of this. + +There are additional notes in [docs.python.org](https://docs.python.org/3/tutorial/controlflow.html#the-range-function) you can foloow up to deepen your understanding of these topics. diff --git a/docs/015_Python_control_answers 2.md b/docs/015_Python_control_answers 2.md new file mode 100644 index 00000000..46ecba3c --- /dev/null +++ b/docs/015_Python_control_answers 2.md @@ -0,0 +1,206 @@ +# 015 Control in Python: `if` : Answers to exercises + +#### Exercise + +* insert a new cell below here. Use f-strings in forming strings. +* copy the code above +* add a `print` statement to your code that tests for non equivalence of `a` and `b` +* repeat this in a new cell, but now change the values (or type) of the variables `a` and `b` to `float` or `bool` + + + +```python +# ANSWER + +# copy the code above +a = 100 +b = 10 +# +# These are *not* the same, so we expect +# a == b : False + +# Note the use of \n and \t in here +# from 010 for formatting +print (f'a is {a} and b is {b}') +print (f'a is equivalent to b? {a == b}') + +# add a print statement to your code that tests +# for non equivalence of a and b +print (f'a is not equivalent to b? {a != b}') +``` + + a is 100 and b is 10 + a is equivalent to b? False + a is not equivalent to b? True + + + +```python +# ANSWER + +# repeat this in a new cell, but now change the values +# (or type) of the variables `a` and `b` to `float` or `bool` +# FLOAT + +# copy the code above +a = 100.0 +b = 10.0 +# +# These are *not* the same, so we expect +# a == b : False + +# Note the use of \n and \t in here +# from 010 for formatting +print (f'a is {a} and b is {b}') +print (f'a is equivalent to b? {a == b}') + +# add a print statement to your code that tests +# for non equivalence of a and b +print (f'a is not equivalent to b? {a != b}') +``` + + a is 100.0 and b is 10.0 + a is equivalent to b? False + a is not equivalent to b? True + + + +```python +# ANSWER + +# repeat this in a new cell, but now change the values +# (or type) of the variables `a` and `b` to `float` or `bool` +# BOOL + +# copy the code above +a = True +b = True +# +# These are *not* the same, so we expect +# a == b : False + +# Note the use of \n and \t in here +# from 010 for formatting +print (f'a is {a} and b is {b}') +print (f'a is equivalent to b? {a == b}') + +# add a print statement to your code that tests +# for non equivalence of a and b +print (f'a is not equivalent to b? {a != b}') +``` + + a is True and b is True + a is equivalent to b? True + a is not equivalent to b? False + + +#### Exercise + +* insert a new cell below here +* create variables `a` and `b` and set them to types and values of your choice +* create a variable called `gt_test` and set it to the result of `a > b` +* print a statement of what you have used, and the value of `gt_test` +* explain why you get the result you do + + +```python +# ANSWER + +# create variables a and b and set them to values of your choice +# here, we choose int values 2 and 4 respectively +a = 2 +b = 4 + +# create a variable called `gt_test` and set it to the result of `a > b` +gt_test = a > b + +# print the statement you have used, and the value of `gt_test` +print(f'a > b test for a = {a} and b = {b} : {gt_test}') + +# explain why you get the result you do +msg = ''' + explain why you get the result you do + + gt_test is False here, because the statement that a > b + is not True since a is 2 and b is 4 +''' +print(msg) +``` + + a > b test for a = 2 and b = 4 : False + + explain why you get the result you do + + gt_test is False here, because the statement that a > b + is not True since a is 2 and b is 4 + + + +#### Exercise + +* insert a new cell below here +* set a variable `doy` to represent the day of year and initialise it to some integer between 1 and 365 inclusive +* set a variable `month` to be `'January'` +* set a variable `year` to be `'2020'` +* Write a series of conditional statements that set the variable `month` to the correct month for the value of `doy` +* Print the month for the given doy +* Test that you get the right result for several `doy` values + +You should assume that `doy` value `1` represents January 1st. + +You might find a [DOY calendar](https://www.esrl.noaa.gov/gmd/grad/neubrew/Calendar.jsp) helpful here. + +![DOY calendar](images/doycal.png) + + +```python +# ANSWER +# set a variable doy to represent the day of year and +# initialise it to some integer between 1 and 365 inclusive +doy = 230 + +# set a variable month to be 'January' +month = 'January' + +# set a variable year to be '2020' +year = '2020' + +# Write a series of conditional statements that set the +# variable month to the correct month for the value of doy +if ( doy < 1 ) or (doy > 366): + # good to catch errors + month = 'out of bounds error: doy='+doy +elif ( doy <= 31 ): + month = 'January' +elif ( doy <= 60 ): + month = 'February' +elif ( doy <= 91 ): + month = 'March' +elif ( doy <= 121 ): + month = 'April' +elif ( doy <= 152 ): + month = 'May' +elif ( doy <= 182 ): + month = 'June' +elif ( doy <= 213 ): + month = 'July' +elif ( doy <= 244 ): + month = 'August' +elif ( doy <= 274 ): + month = 'September' +elif ( doy <= 305 ): + month = 'October' +elif ( doy <= 335 ): + month = 'November' +else: + # it must be December ! + month = 'December' + +# Print the month for the given doy +print(f'for doy {doy} year {year} the month is {month}') + +# Test that you get the right result for several doy values +``` + + for doy 230 year 2020 the month is August + diff --git a/docs/016_Python_for 2.md b/docs/016_Python_for 2.md new file mode 100644 index 00000000..b89b4041 --- /dev/null +++ b/docs/016_Python_for 2.md @@ -0,0 +1,330 @@ +# 016 More control in Python: `for` + +## Introduction + +### Purpose + +In this section we will learn how to add more control to our code by using loops. We will mainly be using the `for` statements for this. We will also learn about the use of `assert` statements to check our code is operating as intended. + +### Prerequisites + +You will need some understanding of the following: + + +* [001 Using Notebooks](001_Notebook_use.md) +* [005 Getting help](005_Help.md) +* [010 Variables, comments and print()](010_Python_Introduction.md) +* [011 Data types](011_Python_data_types.md) +* [012 String formatting](012_Python_strings.md) +* [013_Python_string_methods](013_Python_string_methods.md) +* [014_Python_groups](014_Python_groups.md) +* [015_Python_control](015_Python_control.md) + +### Timing + +The session should take around 40 minutes. + +## Looping with `for` + +### `for ... in ...` + +Very commonly, we need to iterate or 'loop' over some set of items. + +The basic stucture for doing this (in Python, and many other languages) is `for item in group:`, where `item` is the name of some variable and `group` is a set of values. + +The loop is run so that `item` takes on the first value in `group`, then the second, etc. Notice in the code below that the expressions inside the loop use indentation to indicate the loop. As when we discussed indentation in `if` statements, be careful to align your statements or the code will fail. + + +```python +''' +for loop +''' + +group = [4,3,2,1] + +for item in group: + # print item in loop + print(item) + +print ('blast off!') +``` + + 4 + 3 + 2 + 1 + blast off! + + +The `group` in this example is the list of integer numbers `[4,3,2,1]`. A `list` is a group of comma-separated items contained in square brackets `[]` as we have seen before. + +In Python, the statement(s) we run whilst looping (here `print(item)`) are *indented*. + +The indent can be one or more spaces, the choice is up to the programmer. You can use `` but should probably avoid it. Whatever you use, it **must be consistent**. We suggest you use 4 spaces. + +It is important to note the difference between the code above and: + + +```python +''' +for loop +''' + +group = [4,3,2,1] + +for item in group: + # print item in loop + print(item) + print ('blast off!') +``` + + 4 + blast off! + 3 + blast off! + 2 + blast off! + 1 + blast off! + + +In the second case, we have the `print ('blast off!')` statement inside the loop as it is indented. So it is executed each time we are in the loop. In the first case, it is outside the loop and is only run once the loop is completed. + +#### Exercise + +* generate a list of strings called `group` with the names of (some of) the items in your pocket or bag (or make some up!) +* set up a `for` loop with `group`, setting the variable `item` +* within the loop, print each value of item in turn +* at the end of the loop, print `I'm done` + +Quite often, we want to keep track of the 'index' of the item in the loop (the 'item number'). + +One way to do this would be to use a variable (called `count` here). + +Before we enter the loop, we initialise the `count` to zero. Then, within the loop, we would need to increment `count` b y one each time (i.e. add `1` to `count`): + + +```python +''' +for loop with enumeration +''' + +group = ['cat', 'fish', '🦄', 'house'] + +# Before we enter the loop, we initialise the `count` to zero. +count = 0 + +for item in group: + # print the count value and item + print(f'count: {count} : {item}') + # increment count by 1 + count += 1 + +``` + + count: 0 : cat + count: 1 : fish + count: 2 : 🦄 + count: 3 : house + + +#### Exercise + +* copy the code above +* check to see if the value of `count` at the end of the loop is the same as the length of the list. +* Why should this be so? + +## `range()` + +If we want to use in index to count explicitly, we can use the `range()` function. The arguments of this are `(stop)`, `(start,stop)` or `(start,stop,step)`. If not sepcified, the default values os `start` is `0`, and `step`, `1`, so `range(10)` is equivalent to `range(0,10,1)`. + +The function returns an object similar to a `list` type, but known as an iterator. An iterator can be thought of as a list that returns a single item at a time. We generally use them in a for loop or similar structure. The iterator returns integers starting at `start`, up to (but not including) `stop`, in steps of `step`. + +For example: + + + +```python +# (0,6,1) -> 0 to 6 (but not 6) in steps of 1 +for i in range(6): + print(i) +``` + + 0 + 1 + 2 + 3 + 4 + 5 + + + +```python +# (2,10,2) -> 0 to 10 (but not 10) in steps of 2 +for i in range(2,10,2): + print(i) +``` + + 2 + 4 + 6 + 8 + + +#### Exercise + +* use `range()` to print numbers counting down from 10 to 1 (**inclusive**) +* include comments to explain your answer + +## `enumerate()` + +Since counting in loops is a common task, we can use the built in method [`enumerate()`](https://docs.python.org/3/library/functions.html#enumerate) to achieve the same thing as above. + +The syntax to achieve the same as the code above is then: + + +```python +''' +for loop with enumerate() +''' +group = ['hat','dog','keys'] + +for count,item in enumerate(group): + # print counter in loop + print(f'item {count} is {item}') + +``` + + item 0 is hat + item 1 is dog + item 2 is keys + + +#### Exercise + +* copy the code above +* as in the previous exercise, check to see if the value of `count` at the end of the loop is the same as the length of the list. +* Explain why you get the result you do + +## looping over dictionaries, and `assert` + +Let's set up a dictionary with the names of the months as keys, and the n umber of days in. each month as the item. + +We will introduce a new term `assert` to test that the lengths of the lists are equal before we proceed. This takes the form: + + asset statement + +If statement is `True`, the assertion passes (the code flow continues). If it is `false`, the code execution will stop at that point. It is very useful for error checking. + + +```python +''' +Using the months exercise from 014_Python_groups + +first construct the dictionary we want: days_in_month +''' +months = ["January","February","March","April","May",\ + "June","July","August","September","October",\ + "November","December"] +# create a list called `ndays` with the number of days in each month (for this year) +ndays = [31,29,31,30,31,30,31,31,30,31,30,31] + +# now use assert to test if the lengths are equal: +# we will do that by making the statement: +# len(months) == len(ndays) +# which can be True or False + +assert len(months) == len(ndays) + +# Use these two lists to make a dictionary called `days_in_month` +# with the key as month name and value as the number of days in that month. +days_in_month = dict(zip(months,ndays)) + +# what are the keys +print(f'the keys are: {days_in_month.keys()}') +``` + + the keys are: dict_keys(['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']) + + +The `for ... in ...` structure we saw above applies to any group of items (or more formally, any [iterable item](https://www.w3schools.com/python/python_iterators.asp)). How could we apply this to looping over dictionaries for instance? + +One straightforward way would be to simply loop over the dictionary keys: + + +```python +''' +loop over the keys and print key and value +''' +for k in days_in_month.keys(): + d = days_in_month[k] + print(f'Month {k} has {d} days') +``` + + Month January has 31 days + Month February has 29 days + Month March has 31 days + Month April has 30 days + Month May has 31 days + Month June has 30 days + Month July has 31 days + Month August has 31 days + Month September has 30 days + Month October has 31 days + Month November has 30 days + Month December has 31 days + + +This works fine, but we can simplify the structure by looping over the iterable object `items()` instead of `keys(). + + +```python +print(list(days_in_month.items())) +``` + + [('January', 31), ('February', 29), ('March', 31), ('April', 30), ('May', 31), ('June', 30), ('July', 31), ('August', 31), ('September', 30), ('October', 31), ('November', 30), ('December', 31)] + + +`items()` returns a set of tuples containing (`key`, `value`). So we can directly loop over that to have the much simpler code: + + +```python +''' +use items +''' +for k,d in days_in_month.items(): + print(f'Month {k} has {d} days') +``` + + Month January has 31 days + Month February has 29 days + Month March has 31 days + Month April has 30 days + Month May has 31 days + Month June has 30 days + Month July has 31 days + Month August has 31 days + Month September has 30 days + Month October has 31 days + Month November has 30 days + Month December has 31 days + + +#### Exercise + +* set up list of numbers (years) from 2008 to 2019 **inclusive**, +* set up a list of corresponding chinese zodiac names as the items (look [online](https://www.chinahighlights.com/travelguide/chinese-zodiac/#:~:text=In%20order%2C%20the%2012%20Chinese,a%20year%20of%20the%20Rat.) for this information). +* check that the lists have the same length +* form a dictionary from the two lists, using `dict(zip())` as in the examples above +* use `.items()` as above to loop over each year, and print the year name and the zodiac name with an f-string of the form: `f'{y} is the year of the {z}'`, assuming `y` is the key and `z` the item. +* Describe what you are doing at each step + +## Summary + +We should know know how to use `if` statements in Python to control program flow. We can make choices as to what happens in the code, depending on whether or not one or more tests are passed. This is a common feature of all coding languages, but it is important here that you get used to doing this in Python. + +We know that conditions inside `if` statements use indentation in Python, and we know to be careful in our use of this. We have learnt about `enumerate()` and `range()`. + +We have also seen the use of `assert` to do some checking that our code is correct. + +There are additional notes in [docs.python.org](https://docs.python.org/3/tutorial/controlflow.html#the-range-function) you can follow up to deepen your understanding of these topics. You can get more practice with `assert` at [w3schools](https://www.w3schools.com/python/ref_keyword_assert.asp). diff --git a/docs/016_Python_for_answers 2.md b/docs/016_Python_for_answers 2.md new file mode 100644 index 00000000..90d4a0e3 --- /dev/null +++ b/docs/016_Python_for_answers 2.md @@ -0,0 +1,264 @@ +# 016 More control in Python: `for` : Answers to exercises + +#### Exercise + +* generate a list of strings called `group` with the names of (some of) the items in your pocket or bag (or make some up!) +* set up a `for` loop with `group`, setting the variable `item` +* within the loop, print each value of item in turn +* at the end of the loop, print `I'm done` + + +```python +''' +# ANSWER + +for loop +''' +# generate a list of strings called `group` with the names +# of (some of) the items in your pocket or bag (or make some up!) +group = ['keys','hat','💄','🌴'] + +# set up a `for` loop with `group`, +# setting the variable `item` +for item in group: + # within the loop, print + # each value of item in turn + print(item) + +# at the end of the loop, print done +# note the quote types +print("I'm done") +``` + + keys + hat + 💄 + 🌴 + I'm done + + +#### Exercise + +* copy the code above +* check to see if the value of `count` at the end of the loop is the same as the length of the list. +* Why should this be so? + + +```python +''' +# ANSWER + +for loop with enumeration +''' + +# copy the code above +group = ['cat', 'fish', '🦄', 'house'] + +# Before we enter the loop, we initialise the `count` to zero. +count = 0 + +for item in group: + # print the count value and item + print(f'count: {count} : {item}') + # increment count by 1 + count += 1 + +# check to see if the value of `count` at the end +# of the loop is the same as the length of the list. +print('-'*10) +print(f'count is now {count}') +print(f'the length of the list group is {len(group)}') + +msg = ''' + Why should this be so? + + There are 4 items in the list group. + We initially set count to be 0, then add 1 to it + after we print each item in the for loop. So, after the + first item, it is 1, then 2 etc. + + At the end of all 4 items, count will then be 4, the length + of the list we looped over +''' +print(msg) +``` + + count: 0 : cat + count: 1 : fish + count: 2 : 🦄 + count: 3 : house + ---------- + count is now 4 + the length of the list group is 4 + + Why should this be so? + + There are 4 items in the list group. + We initially set count to be 0, then add 1 to it + after we print each item in the for loop. So, after the + first item, it is 1, then 2 etc. + + At the end of all 4 items, count will then be 4, the length + of the list we looped over + + + +#### Exercise + +* use `range()` to print numbers counting down from 10 to 1 (**inclusive**) +* include comments to explain your answer + + +```python +# ANSWER +# use range() to print numbers counting down from 10 to 1 (inclusive) +for i in range(10,0,-1): + print(i) +# include comments to explain your answer +msg = ''' +from the instructions, it is clear that start is 10 +end should be 0, since the count is only up to (but not including) +this value. + +To count down, we use a step of -1 +''' +print(msg) +``` + + 10 + 9 + 8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 + + from the instructions, it is clear that start is 10 + end should be 0, since the count is only up to (but not including) + this value. + + To count down, we use a step of -1 + + + +#### Exercise + +* copy the code above +* as in the previous exercise, check to see if the value of `count` at the end of the loop is the same as the length of the list. +* Explain why you get the result you do + + +```python +''' +# ANSWER + +for loop with enumerate() +''' + +# copy the code above +group = ['hat','dog','keys'] + + +for count,item in enumerate(group): + # print counter in loop + print(f'item {count} is {item}') + +# as in the previous exercise, +# check to see if the value of `count` +# at the end of the loop is the same as the length of the list. +print('-'*10) +print(f'count is now {count}') +print(f'the length of the list group is {len(group)}') + +msg = ''' + Explain why you get the result you do + + There are 4 items in the list group. + when we use enumerate to loop over the list + count is incremented by 1 each time we enter the loop. + In the previous example, in was incremented after + the print statement. + + So now, at the end of all 4 items, count will only be 3, the length + of the list we looped over, minus 1 +''' +print(msg) +``` + + item 0 is hat + item 1 is dog + item 2 is keys + ---------- + count is now 2 + the length of the list group is 3 + + Explain why you get the result you do + + There are 4 items in the list group. + when we use enumerate to loop over the list + count is incremented by 1 each time we enter the loop. + In the previous example, in was incremented after + the print statement. + + So now, at the end of all 4 items, count will only be 3, the length + of the list we looped over, minus 1 + + + +#### Exercise + +* set up list of numbers (years) from 2008 to 2019 **inclusive**, +* set up a list of corresponding chinese zodiac names as the items (look [online](https://www.chinahighlights.com/travelguide/chinese-zodiac/#:~:text=In%20order%2C%20the%2012%20Chinese,a%20year%20of%20the%20Rat.) for this information). +* check that the lists have the same length +* form a dictionary from the two lists, using `dict(zip())` as in the examples above +* use `.items()` as above to loop over each year, and print the year name and the zodiac name with an f-string of the form: `f'{y} is the year of the {z}'`, assuming `y` is the key and `z` the item. +* Describe what you are doing at each step + + +```python +# ANSWER + +# Set up list of numbers (years) from 2008 to 2019 inclusive, +years = [2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019] +# set up a list of corresponding chinese zodiac names as the items +# (look online for this information). +zodiac = ['rat', 'ox', 'tiger', 'rabbit', \ + 'dragon', 'snake', 'horse', 'goat',\ + 'monkey','rooster','dog','pig'] + +# check that the lists have the same length +assert len(years) == len(zodiac) + +# form a dictionary from the two lists, using dict(zip()) as in the examples above +# we want years as the key and zodiac as the items, so we use zip(years,zodiac) +# then convert (cast) it to a dictionary called zodiacYear +zodiacYear = dict(zip(years,zodiac)) + +# use .items() as above to loop over each year, and +# print the year name and the zodiac name +# with an f-string of the form: `f'{y} is the year of the {z}'` +# assuming y is the key and z the item. + +# do the loop so that y is the key and z the item +for y,z in zodiacYear.items(): + print(f'{y} is the year of the {z}') + +# it prints the results fine +``` + + 2008 is the year of the rat + 2009 is the year of the ox + 2010 is the year of the tiger + 2011 is the year of the rabbit + 2012 is the year of the dragon + 2013 is the year of the snake + 2014 is the year of the horse + 2015 is the year of the goat + 2016 is the year of the monkey + 2017 is the year of the rooster + 2018 is the year of the dog + 2019 is the year of the pig + diff --git a/docs/017_Functions 2.md b/docs/017_Functions 2.md new file mode 100644 index 00000000..68cc5c03 --- /dev/null +++ b/docs/017_Functions 2.md @@ -0,0 +1,668 @@ +# Functions in Python + +## Introduction + +### Purpose + +In this session, we will learn about Functions in Python. In essence, a function allows us to write better, more compact and re-usable code. This is a concept we will use a lot in later sessions, so make sure you fully familiarise yourself with the material. + + +### Prerequisites + +You will need some understanding of the following: + +* [001 Using Notebooks](001_Notebook_use.md) +* [005 Getting help](005_Help.md) +* [010 Variables, comments and print()](010_Python_Introduction.md) +* [011 Data types](011_Python_data_types.md) +* [012 String formatting](012_Python_strings.md) +* [013_Python_string_methods](013_Python_string_methods.md) +* [014_Python_groups](014_Python_groups.md) +* [015_Python_control](015_Python_control.md) +* [016_Python_for](016_Python_for.md) + +In particular, you will need to recall how to use: + + - [`assert`](016_Python_for.md#looping-over-dictionaries,-and-assert) + - [`str.join()`](013_Python_string_methods.md#split()-and-join()) + - [`zip`](014_Python_groups.md#dict) + +### Timing + +The session should take around 30 minutes to go through the first time. The exercises will take longer than that. + + +## Introduction to Functions + +A [function](https://docs.python.org/3/glossary.html#term-function) is a block of code statements that we can use to carry out a specific purpose. + + +The simplest form of function has no inputs or outputs, but simply performs some task when we call it: + +![function](images/no_in_out.png) + + +An example of a simple function in Python is: + + +```python +def hello_world(): + ''' + Purpose: + print the string 'hello world' + ''' + print('hello world') +``` + +This is designed to print the string `hello world` when we call it. + +Notice the formatting here: the function is declared: + + def hello_world(): + +and the *contents* of the function are indented in (by 4 spaces here). + +We use the function in Python code as: + + +```python +hello_world() +``` + + hello world + + +and access the function document string by: + + +```python +help(hello_world) +``` + + Help on function hello_world in module __main__: + + hello_world() + Purpose: + print the string 'hello world' + + + +### Exercise + +* in a new code cell below, write a function called `my_name` that prints your name +* demonstrate that your code works (i.e. run it in a code cell) +* show the doc string using `help()` + +**Advice**: make sure it has an appropriate document string, based on the example in the notes, and also check that you have the indentation correct for the code in the function. Notice the semicolon `:` at the end of the `def` statement. + +## Function specification + + +More generally, we could think of the the function as **a sort of filter**: it takes some **inputs** (specified in the arguments), makes some calculation based on these, i.e. that is a *function* of these inputs, and returns an **output**. + +![function](images/in_out.png) + + + +In this sense: + + * It will generally have one or more [arguments](https://docs.python.org/3/glossary.html#argument): `(arg1, arg2, ...)` that form the **inputs**. + * It will often return some value (or set of values) as the **output**: `retval` + * It will have a name: `my_function` + +![function io](images/im_funct.png) + +### Anatomy of a function + +The format of a function in Python is: + + def my_function(arg1,arg2,...): + ''' + Document string + ''' + + # comments + + retval = ... + + # return + return retval + +The keyword `def` defines a function, followed by the function name, a list (actually, a [`tuple`](https://docs.python.org/3/library/stdtypes.html?highlight=tuple#tuple)) of arguments, then a semicolon `:`. + +The contents of the function are indented to a consistent level of spaces. + +The function will typically have a document string, generally a multi-line string defined within triple quotes. We use this to document information about the function, such as its author, purpose, and inputs and outputs. + +Within the function, we can refer to the arguments (`arg1` and `arg2` here, though they will generally have more meaningful names), make some calculation based on these, and generally, return some value (`retval` here). + +### Code design + +This idea of a *filter* can be useful when thinking how to design a function. We can see that we need to define: + + * purpose + * inputs + * output + +Let's suppose we need to design a function that will take a first name and last name, and combine them into your full name (assuming for now that you have two names). + +The *purpose* of our function could be stated as: + + purpose: + + generate a name string from list of strings + +The inputs could be: + + inputs: + - name_list : list of names + +And the output: + + return: + - the full name + +Without knowing any real coding then, we could develop the template for this function, along with an initial document string. + +We do need to give the function a name, so let's use `full_name` here. + +We have started with the idea of some purpose for our code, then defined what the expected inputs and outputs would be. We can call coding at that level of generalisation [pseudocode](https://en.wikipedia.org/wiki/Pseudocode). We could have written our task is a form of pseudocode such as: + + algorithm full_name is + input: List of strings in variable name_list + output: string in variable retval + + purpose: generate a name string from list of strings + + # CODE BLOCK to achieve aim (NOT DONE) + # test by passing input to output + retval = name_list + + where we have left the `CODE BLOCK` blank at the moment, and replaced it by simply sending the function input to the output so we can test the code structure. It can be of value when designing codes to first develop some pseudocode such as above, but in reality such statements are very closely related to what we would write in high-level codes like Python: + + +```python +def full_name(name_list): + ''' + + purpose: + generate a name string from list of strings + + inputs: + - name_list : list of names + + return: + - the full name + ''' + # CODE BLOCK to achieve aim (NOT DONE) + # test by passing input to output + retval = name_list + + # return + return retval +``` + +That's a good start, and it allows us to develop a function that we can run and test. + +To test, we can set a list of example strings. We then *call* the function `full_name()` with this argument, and set the value returned in the variable `full`. + + +```python +names = ['Fred','Bloggs'] + +full = full_name(names) +print(full) +``` + + ['Fred', 'Bloggs'] + + +From our test, we can see that the function doesn't yet achieve what we wanted: it simply returns the input list, rather than the full name. + +To proceed, we need to know how to make a combined string. It can be useful to test our understanding of the code we will need to achieve the aim of the function. We do not need to do that inside the function, but can instead try to think of some examples we could use to test the ideas. + +One way to achieve the aim of the function this would be to use the string [`join`](https://docs.python.org/3/library/stdtypes.html#str.join) operation that we came across the in [Python string methods](013_Python_string_methods.md#split()-and-join()) notes. + +This works by placing a key string between string items in a list. For example, if we want to separate strings by `:`, we would use: + + ':'.join(names) + + +```python +':'.join(names) +``` + + + + + 'Fred:Bloggs' + + + +In our function, we want to use a single 'whitespace' value, so `' '` as the key: + + +```python +' '.join(names) +``` + + + + + 'Fred Bloggs' + + + +Now we are sure of the coding concept to achieve what we want in the filter, we can write the function: + + +```python +def full_name(name_list): + ''' + + purpose: + generate a name string from list of strings + + inputs: + - name_list : list of names + + return: + - the full name + ''' + # join the names in name_list together + retval = ' '.join(name_list) + + # return + return retval +``` + +we try to make the docstring useful and test what it shows: + + +```python +help(full_name) +``` + + Help on function full_name in module __main__: + + full_name(name_list) + purpose: + generate a name string from list of strings + + inputs: + - name_list : list of names + + return: + - the full name + + + +then run our code: + + +```python +full = full_name(['Fred','Bloggs']) +print(full) +``` + + Fred Bloggs + + +## Test + +It is a good idea if we can write a test for our function. This should cover some typical case or cases, and check that we get the correct output for a particular input. We can use the [assert](https://www.w3schools.com/python/ref_keyword_assert.asp) method that we have seen in the [Python for](016_Python_for.md#looping-over-dictionaries,-and-assert) notes: + + assert True + +For example: + + +```python +assert full_name(['Fred','Bloggs']) == "Fred Bloggs" +print('test passed') +``` + + test passed + + +remember that if this assertion fails, we get an `AssertionError` (you can try that out by putting something incorrect in the assertion above and re-running the cell). If the error is raised, our code will strop running and report the error. + +We will learn more about code testing later, but for the moment, we suggest that you use one or more `assert` statements that try out different inputs-output matches with your function. + +### Exercise + +We assume for this exercise that you know how to create a dictionary from two lists of the same length. This was covered in the [Python_Groups](014_Python_groups.md#dict) notes. + +In this exercise, we suggest that you follow the design approach we took above: + +- Think first what you will use as inputs and outputs to the function, and come up with some examples of inputs and outputs +- Then consider the Python code you would need to go from the inputs to the outputs + * Develop and test the core code to achieve the function purpose in a notebook cell with an example input + * Consider what you might use as a test for your code +- Develop skeleton + * Write a skeleton function defining the purpose, inputs and outputs. In the skeleton code, you can just pass the inputs straight to the outputs. + * Confirm that that works before going further. + * Confirm that your document string is useful. + * Write a test +- Implement the core code in the function + * Confirm that that works + * Confirm that your document string is useful. + * Write a test +- Consider any flaws in your code and how you might improve it + +**Your task for the exercise is:** + +* design a function to convert two lists of the same length into a dictionary +* the design must include relevant comments, document strings and tests + +## More on arguments + +Python functions can take [two types of arguments](https://book.pythontips.com/en/latest/args_and_kwargs.html): + +* positional arguments +* keyword arguments + +### Positional arguments + +The arguments we have used above are positional arguments, in that their definition in the function depends on the order they are specified in. For example: + + +```python +def hello(s1,s2): + ''' + Purpose: + print out positional arguments + + Inputs: + s1 : first argument + s2 : secopnd argument + ''' + print(f'argument 0 is {s1}') + print(f'argument 1 is {s2}') + +hello('hello','world') +``` + + argument 0 is hello + argument 1 is world + + +Sometimes in Python documentation, you will see the arguments specified simply as: + + example(*args, *kwargs) + +This is the most general way of specifying function arguments. The first item in this case `*args` are the positional arguments. Although we generally specify them explicitly as above, we can also use + + *args + +to specify them, where `args` is a list-like object. In this form, the example above becomes: + + + + +```python +def hello(*args): + ''' + print out positional arguments + + Inputs: + *args : list of positional arguments + ''' + # loop over the list + for i,s in enumerate(args): + print(f'argument {i} is {s}') + +hello('hello','world','again') +``` + + argument 0 is hello + argument 1 is world + argument 2 is again + + + +```python +# or using *args where args is a list +l = ['hello','world','again','as','list'] +hello(*l) +``` + + argument 0 is hello + argument 1 is world + argument 2 is again + argument 3 is as + argument 4 is list + + +In this example, we have not specified how many positional arguments there are, but obviously we need to attach some meaning to each of them in the order supplied. Sometimes this is useful in code, where we just want to loop over a list of arguments, but you should mostly be wary about using it unless you really need to. + +A good example of the use of `*args` is the `print()` statement. It will print out however many positional arguments we specify: + + +```python +print('hello','world','again') + +l = ['hello','world','again','as','list'] +# print the list, specifying l as a single positional argument +print(l) +# print the list passing each list item as a positional argument +print(*l) +``` + + hello world again + ['hello', 'world', 'again', 'as', 'list'] + hello world again as list + + +### Keyword arguments + +The second type of argument we mentioned above was keyword arguments. These are typically used to modify the behaviour of a function are are of the form: + + verbose=True + sep=' ' + + +We can see examples of these with the `print` function: + + +```python +help(print) +``` + + Help on built-in function print in module builtins: + + print(...) + print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False) + + Prints the values to a stream, or to sys.stdout by default. + Optional keyword arguments: + file: a file-like object (stream); defaults to the current sys.stdout. + sep: string inserted between values, default a space. + end: string appended after the last value, default a newline. + flush: whether to forcibly flush the stream. + + + +where a set of optional keyword arguments are specified. All keyword arguments are specified with a default value (`sep=' ', end='\n', file=sys.stdout, flush=False` above). If we do not specify a keyword when we call the function, this is the value that that variable will take within the function. + +Note that keywords must be specified **after** positional arguments. The keywords can be in any order (they are not positional). Keywords can only be given once. + +But, if we want, we can override the defaults by setting the keyword when we call the function: + + +```python +l = ['hello','world','again','as','list'] +# print the list passing each list item as a positional argument +# default with sep as ' ' +print(*l) +# with sep as 'X' +print(*l,sep='X') +# with sep as ':' +print(*l,sep=':') +``` + + hello world again as list + helloXworldXagainXasXlist + hello:world:again:as:list + + +This is a very useful feature for functions: we can set default behaviour, but the user cxan modify this when they call the function. + +For example, let's add a `verbose` keyword to our `hello()` function. The behaviour we want is that if the verbose flag is set, we print lots of information to the user. In this case: + + print(f'argument {i}:',end=' ') + +which will print the index `i`. We have used the `kwarg` `end=''` for `print()` so that if this is called, it does not print a newline, but a space instead. + + +```python +def hello(*args,verbose=False): + ''' + print out positional arguments + + Inputs: + *args : list of positional arguments + + Optional keyword arguments: + verbose : print the index + ''' + # loop over the list + for i,s in enumerate(args): + # if the verbose flag is set + # then print detailed information + if verbose: + print(f'argument {i}:',end=' ') + print(f'{s}') + +``` + + +```python +dash='='*5 + +# call without verbose +print(f'{dash} verbose=False {dash}') +hello('hello','world','again') + +# call with verbose +print(f'{dash} verbose=False {dash}') +hello('hello','world','again',verbose=True) +``` + + ===== verbose=False ===== + hello + world + again + ===== verbose=False ===== + argument 0: hello + argument 1: world + argument 2: again + + +### Exercise + +* Starting from the function `list2dict(keys,values)` that you developed above, add keyword arguments to the code to achieve the following: + - if check=True : perform checks on the input data + - if verbose=True : print out information on what is going on in the function + - set all default keywords to False +* Make sure you perform tests as above, and that you update document strings + + +```python +# ANSWER +# - Confirm that your document string is useful. +help(list2dict) + +``` + + Help on function list2dict in module __main__: + + list2dict(keys, values, check=False, verbose=False) + Purpose: + generate a dictionary from lists of keys and values + + Note: + the length of the lists must be the same + + Inputs: + - keys : list of values for the keys + - values : list of values to associate with the keys + + Output: + - retval : dictionary with keys and values derived + from the input lists + + Optional keyword arguments: + verbose : print detailed information, default False + check : perform internal tests + + + + +```python +#answer +# tests for the KWARGS + +dash='='*5 +# - Write tests that also show the kwargs +# no kwargs +print(f'{dash} no kwargs {dash}') +assert list2dict(['January','February'],[31,29]) == {'January': 31, 'February': 29} +print('test passed') + +print(f'{dash} verbose=True {dash}') +assert list2dict(['January','February'],[31,29],\ + verbose=True) \ + == {'January': 31, 'February': 29} + +print(f'{dash} check=True {dash}') +assert list2dict(['January','February'],[31,29],\ + check=True) \ + == {'January': 31, 'February': 29} + +print(f'{dash} check=True and verbose=True {dash}') +assert list2dict(['January','February'],[31,29],\ + check=True,verbose=True) \ + == {'January': 31, 'February': 29} +print('tests passed') +``` + + ===== no kwargs ===== + test passed + ===== verbose=True ===== + --> zipping dictionary for lists of length 2 + ===== check=True ===== + ===== check=True and verbose=True ===== + --> perfoming sanity check on array lengths + --> zipping dictionary for lists of length 2 + tests passed + + +As a final point of `kwargs`, you might still be wondering why this was specified as: + + example(*args,**kwargs) + +above. We have seen what the `*args` part means: if `args` is a list, then each item in the list is passed as a positional argument. The same idea applies to `**kwargs` but instead of a list, `kwargs` refers to a dictionary. If you think about the information you need to pass for keword arguments, you would understand why this is the case. + +By using `**kwargs`, where `kwargs` is a dictionary, the key-value pairs in the dictionary are passed as `key=value`. For example: + + +```python +args = ['hello','world','again','as','list'] +# set up dictionary for kwargs +# with X as sep and a string at the end oif the line +kwargs = {'sep' : 'X', 'end' : '<- end of the line\n'} + +print(*args,**kwargs) +``` + + helloXworldXagainXasXlist<- end of the line + + +The use of `**kwargs` can be useful sometimes, as you can more easily keep track of keywords for some particular configuration of running a code. For that reason, and because you will see it sometimes in documentation, you should be aware of it. Most likely you won't be using it a lot in your early code development though. + +## Summary + +In this section, we have learned about writing a function. We have seen that they generally will have zero or more input positional arguments and zero or more keyword arguments. They will typically return some value. We have also seen how we can define a `doc string` to give the user information on how to use the function, and also how we can use `assert` to build tests for our codes. We have been through some design considerations, and seen that it is best to plan you functions by thinking about the purpose, the inputs and the outputs. Then, for the core code, you just need to develop a skeleton code and docstring structure, test that, and insert your core code. You should think about modifications using keyword arguments that you might want to include, but these will often come in a second pass of development. + +When we write Python codes from now on, we will often make use of functions. diff --git a/docs/017_Functions_answers 2.md b/docs/017_Functions_answers 2.md new file mode 100644 index 00000000..3d8b4541 --- /dev/null +++ b/docs/017_Functions_answers 2.md @@ -0,0 +1,498 @@ +# Functions in Python : Answers to exercises + +### Exercise + +* in a new code cell below, write a function called `my_name` that prints your name +* demonstrate that your code works (i.e. run it in a code cell) +* show the doc string using `help()` + +**Advice**: make sure it has an appropriate document string, based on the example in the notes, and also check that you have the indentation correct for the code in the function. Notice the semicolon `:` at the end of the `def` statement. + + +```python +# ANSWER 1 + +# in a new code cell below, write a function called my_name that prints your name +def my_name(): + ''' + Purpose: + print my name + ''' + print('Lewis') +``` + + +```python +# ANSWER 2 +# demonstrate that your code works (i.e. run it in a code cell) +my_name() +``` + + Lewis + + + +```python +# ANSWER 3 +# show the doc string using help() +help(my_name) +``` + + Help on function my_name in module __main__: + + my_name() + Purpose: + print my name + + + +### Exercise + +We assume for this exercise that you know how to create a dictionary from two lists of the same length. This was covered in the [Python_Groups](014_Python_groups.md#dict) notes. + +In this exercise, we suggest that you follow the design approach we took above: + +- Think first what you will use as inputs and outputs to the function, and come up with some examples of inputs and outputs +- Then consider the Python code you would need to go from the inputs to the outputs + * Develop and test the core code to achieve the function purpose in a notebook cell with an example input + * Consider what you might use as a test for your code +- Develop skeleton + * Write a skeleton function defining the purpose, inputs and outputs. In the skeleton code, you can just pass the inputs straight to the outputs. + * Confirm that that works before going further. + * Confirm that your document string is useful. + * Write a test +- Implement the core code in the function + * Confirm that that works + * Confirm that your document string is useful. + * Write a test +- Consider any flaws in your code and how you might improve it + +**Your task for the exercise is:** + +* design a function to convert two lists of the same length into a dictionary +* the design must include relevant comments, document strings and tests + + +```python +# Answer 1 + +# design a function to convert two lists of +# the same length into a dictionary + +# Think first what you will use as inputs and +# outputs to the function, and come up with some +# examples of inputs and outputs + +msg = ''' + +The output is a dictionary, with keys and values + +example output from a previous exercise, with +month names as keys and days in month as values. +Note that we dont need a long example, just one +complex enough to test + +retval = { + 'January':31, + 'February':29 +} + + +The associated input lists for this would be +something like: + +month_names = ['January','February'] +month_days = [31,29] + +''' +print(msg) +``` + + + + The output is a dictionary, with keys and values + + example output from a previous exercise, with + month names as keys and days in month as values. + Note that we dont need a long example, just one + complex enough to test + + retval = { + 'January':31, + 'February':29 + } + + + The associated input lists for this would be + something like: + + month_names = ['January','February'] + month_days = [31,29] + + + + + +```python +# Answer 2 + +# design a function to convert two lists of +# the same length into a dictionary + +# Then consider the Python code you would need to +# go from the inputs to the outputs +# - Develop and test the core code to achieve +# the function purpose in a notebook cell with an example input + +# set up example inputs from. ideas above +month_names = ['January','February'] +month_days = [31,29] + +# recalling the code about how to create a dictionary from +# two lists: +retval = dict(zip(month_names,month_days)) + +# print this +print(retval) + +# Consider what you might use as a test for your code +# this will be an assert statement of the form +# assert retval == {'January': 31, 'February': 29} +assert retval == {'January': 31, 'February': 29} +print('test passed') +``` + + {'January': 31, 'February': 29} + test passed + + + +```python +# Answer 3 +# Develop skeleton +# - Write a skeleton function defining the purpose, inputs and outputs. +# In the skeleton code, you can just pass the inputs straight to the outputs. + +# we call our function list2dict +# and use relevanmt names for the two lists +def list2dict(keys,values): + ''' + Purpose: + generate a dictionary from lists of keys and values + + Note: + the length of the lists must be the same + + Inputs: + - keys : list of values for the keys + - values : list of values to associate with the keys + + Output: + - retval : dictionary with keys and values derived + from the input lists + ''' + # for skeleton ,just set output = input + # but use a tuple to make it explicit that a single + # item is returned + return (keys,values) + +# - Confirm that that works before going further. + +# we need some test keys and values to do this +month_names = ['January','February'] +month_days = [31,29] + +# run and check that something is returned +print(f'function returns: {list2dict(month_names,month_days)}') + +# - Confirm that your document string is useful. +help(list2dict) + +# - Write a test +# with the input and output the same for now!! +assert list2dict(['January','February'],[31,29]) == (['January', 'February'], [31, 29]) +print('test passed') +``` + + function returns: (['January', 'February'], [31, 29]) + Help on function list2dict in module __main__: + + list2dict(keys, values) + Purpose: + generate a dictionary from lists of keys and values + + Note: + the length of the lists must be the same + + Inputs: + - keys : list of values for the keys + - values : list of values to associate with the keys + + Output: + - retval : dictionary with keys and values derived + from the input lists + + test passed + + + +```python +# Answer 4 +# Implement the core code in the function + +# we call our function list2dict +# and use relevanmt names for the two lists +def list2dict(keys,values): + ''' + Purpose: + generate a dictionary from lists of keys and values + + Note: + the length of the lists must be the same + + Inputs: + - keys : list of values for the keys + - values : list of values to associate with the keys + + Output: + - retval : dictionary with keys and values derived + from the input lists + ''' + # for the dictionary from the lists + # and set variable retval to the response + retval = dict(zip(keys,values)) + + # remember to output!! + return retval + +# Confirm that that works + +# we need some test keys and values to do this +month_names = ['January','February'] +month_days = [31,29] + +# run and check that something is returned +print(f'function returns: {list2dict(month_names,month_days)}') + +# - Confirm that your document string is useful. +help(list2dict) + +# - Write a test +assert list2dict(['January','February'],[31,29]) == {'January': 31, 'February': 29} +print('test passed') +``` + + function returns: {'January': 31, 'February': 29} + Help on function list2dict in module __main__: + + list2dict(keys, values) + Purpose: + generate a dictionary from lists of keys and values + + Note: + the length of the lists must be the same + + Inputs: + - keys : list of values for the keys + - values : list of values to associate with the keys + + Output: + - retval : dictionary with keys and values derived + from the input lists + + test passed + + + +```python +# Answer 5 +# Consider any flaws in your code and how you might improve it + +msg = ''' +# Consider any flaws in your code and how you might improve it + +Since we require the inputs to be lists, we could ensure this +by using + + keys = list(keys) + values = list(values) + +or we might examine the data types + +An obvious improvement would be to test that the input +lists are of the same length. + +This is done with: + +assert len(keys) == len(values) +''' +print(msg) + +# we call our function list2dict +# and use relevanmt names for the two lists +def list2dict(keys,values): + ''' + Purpose: + generate a dictionary from lists of keys and values + + Note: + the length of the lists must be the same + + Inputs: + - keys : list of values for the keys + - values : list of values to associate with the keys + + Output: + - retval : dictionary with keys and values derived + from the input lists + ''' + # Since we require the inputs to be lists, + # we ensure this + keys = list(keys) + values = list(values) + + # test the inpouts are the same length + assert len(keys) == len(values) + + # for the dictionary from the lists + # and set variable retval to the response + retval = dict(zip(keys,values)) + + # remember to output!! + return retval + +# Confirm that that works + +# we need some test keys and values to do this +month_names = ['January','February'] +month_days = [31,29] + +# run and check that something is returned +print(f'function returns: {list2dict(month_names,month_days)}') + +# - Confirm that your document string is useful. +help(list2dict) + +# - Write a test +assert list2dict(['January','February'],[31,29]) == {'January': 31, 'February': 29} +print('test passed') +``` + + + # Consider any flaws in your code and how you might improve it + + Since we require the inputs to be lists, we could ensure this + by using + + keys = list(keys) + values = list(values) + + or we might examine the data types + + An obvious improvement would be to test that the input + lists are of the same length. + + This is done with: + + assert len(keys) == len(values) + + function returns: {'January': 31, 'February': 29} + Help on function list2dict in module __main__: + + list2dict(keys, values) + Purpose: + generate a dictionary from lists of keys and values + + Note: + the length of the lists must be the same + + Inputs: + - keys : list of values for the keys + - values : list of values to associate with the keys + + Output: + - retval : dictionary with keys and values derived + from the input lists + + test passed + + +### Exercise + +* Starting from the function `list2dict(keys,values)` that you developed above, add keyword arguments to the code to achieve the following: + - if check=True : perform checks on the input data + - if verbose=True : print out information on what is going on in the function + - set all default keywords to False +* Make sure you perform tests as above, and that you update document strings + + +```python +# ANSWER +# Starting from the function list2dict(keys,values) +# that you developed above, add keyword arguments +# to the code to achieve the following: +# if check=True : perform checks on the input data +# if verbose=True : print out information on what is going on in the function +# set all default keywords to False + +# we call our function list2dict +# and use relevanmt names for the two lists + +# KWARGS +# if check=True : perform checks on the input data +# if verbose=True : print out information on what is going on in the function +# set all default keywords to False +def list2dict(keys,values,check=False,verbose=False): + ''' + Purpose: + generate a dictionary from lists of keys and values + + Note: + the length of the lists must be the same + + Inputs: + - keys : list of values for the keys + - values : list of values to associate with the keys + + Output: + - retval : dictionary with keys and values derived + from the input lists + + Optional keyword arguments: + verbose : print detailed information, default False + check : perform internal tests + ''' + # NB updated doc string ^ + + # test the inputs are the same length + # if check flag set True + if check: + # verbose comments + if verbose: + print('--> perfoming sanity check on array lengths') + assert len(keys) == len(values) + + # for the dictionary from the lists + # and set variable retval to the response + + # verbose comments + if verbose: + print(f'--> zipping dictionary for lists of length {len(keys)}') + retval = dict(zip(keys,values)) + + # remember to output!! + return retval + +# Confirm that that works + +# we need some test keys and values to do this +month_names = ['January','February'] +month_days = [31,29] + +# run and check that something is returned +print(f'function returns: {list2dict(month_names,month_days)}') + +``` + + function returns: {'January': 31, 'February': 29} + diff --git a/docs/018_Python_files 2.md b/docs/018_Python_files 2.md new file mode 100644 index 00000000..fa7f2446 --- /dev/null +++ b/docs/018_Python_files 2.md @@ -0,0 +1,315 @@ +# 018 Files and other Resources + + +## Introduction + + + +### Purpose + +In this session, we will learn about files and similar resources. We will introduce the standard Python library [`pathlib`](https://docs.python.org/3/library/pathlib.html) which is how we deal with file paths, as well as the useful package [`urlpath`](https://github.com/chrono-meter/urlpath) that allows a similar object-oriented approach with files and other objects on the web. We will also cover opening and closing files, and some simple read- and write-operations. + + + +### Prerequisites + +You will need some understanding of the following: + + +* [001 Using Notebooks](001_Notebook_use.md) +* [005 Getting help](005_Help.md) + +Remember that you can 'run' the code in a code block using the 'run' widget (above) or hitting the keys ('typing') and at the same time. + +### Timing + +The session should take around XX hours. + +## Resource location + + +We store information on a computer in files, or file-like resources. We will use the term 'file' below to mean either of these concepts, other than specific issues relating to particular types of file/resource. + +To get information from files, we need to be able to specify some **address** for the file/resource location, along with some way of interacting with the file. These concepts are captured in the idea of a [URI](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier) (Uniform Resource Indicator). You will most likely have come across the related idea of a [Uniform Resource Locator (URL)](https://en.wikipedia.org/wiki/URL), which is a URL such as [https://www.geog.ucl.ac.uk/people/academic-staff/philip-lewis](https://www.geog.ucl.ac.uk/people/academic-staff/philip-lewis) +that gives: + +* the location of the resource: `people/academic-staff/philip-lewis` +* the access and interpretation protocol: [`https`](https://en.wikipedia.org/wiki/HTTPS) (secure [`http`](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol)) +* the network domain name: [`www.geog.ucl.ac.uk`](https://www.geog.ucl.ac.uk) + +When we visit this URL using an appropriate tool such as a browser, the tool can access and interpret the information in the resource: in this case, interpret the [html code](https://www.w3schools.com/html) in the file pointed to by the URL. + +Similarly, we will be used to the idea of accessing `files` on the computer. These may be in the local file system, or on some network or cloud storage that might be accessible from the local file system. An example of such a file would be some Python code file such as +[`geog0111/helloWorld.py`](http://localhost:8888/edit/notebooks/geog0111/helloWorld.py). + +We will use this idea to make a dictionary of our ENSO dataset, using the items in the header for the keys. In this way, we obtain a more elegant representation of the dataset, and can refer to items by names (keys) instead of column numbers. + + +```python +import requests +import numpy as np +import io + +# access dataset as above +url = "http://www.esrl.noaa.gov/psd/enso/mei.old/table.html" +txt = requests.get(url).text + +# copy the useful data +start_head = txt.find('YEAR') +start_data = txt.find('1950\t') +stop_data = txt.find('2018\t') + +header = txt[start_head:start_data].split() +data = np.loadtxt(io.StringIO(txt[start_data:stop_data]),unpack=True) + +# use zip to load into a dictionary +data_dict = dict(zip(header, data)) + +key = 'MAYJUN' +# plot data +plt.figure(0,figsize=(12,7)) +plt.title('ENSO data from {0}'.format(url)) +plt.plot(data_dict['YEAR'],data_dict[key],label=key) +plt.xlabel('year') +plt.ylabel('ENSO') +plt.legend(loc='best') +``` + +**Exercise 1.3.5** + +* copy the code above, and modify so that datasets for months `['MAYJUN','JUNJUL','JULAUG']` are plotted on the graph + +Hint: use a for loop + + +```python +# do exercise here +# ANSWER + +import requests +import numpy as np +import io + +# access dataset as above +url = "http://www.esrl.noaa.gov/psd/enso/mei.old/table.html" +txt = requests.get(url).text + +# copy the useful data +start_head = txt.find('YEAR') +start_data = txt.find('1950\t') +stop_data = txt.find('2018\t') + +header = txt[start_head:start_data].split() +data = np.loadtxt(io.StringIO(txt[start_data:stop_data]),unpack=True) + +# use zip to load into a dictionary +data_dict = dict(zip(header, data)) + + +''' +Do the loop here +''' +for i,key in enumerate(['MAYJUN','JUNJUL','JULAUG']): + # plot data + ''' + Use enumeration i as figure number + ''' + plt.figure(i,figsize=(12,7)) + plt.title('ENSO data from {0}'.format(url)) + plt.plot(data_dict['YEAR'],data_dict[key],label=key) + plt.xlabel('year') + plt.ylabel('ENSO') + plt.legend(loc='best') +``` + +We can also usefully use a dictionary with a printing format statement. In that case, we refer directly to the key in ther format string. This can make printing statements much easier to read. We don;'t directly pass the dictionary to the `fortmat` staterment, but rather `**dict`, where `**dict` means "treat the key-value pairs in the dictionary as additional named arguments to this function call". + +So, in the example: + + +```python +import requests +import numpy as np +import io + +# access dataset as above +url = "http://www.esrl.noaa.gov/psd/enso/mei/table.html" +txt = requests.get(url).text + +# copy the useful data +start_head = txt.find('YEAR') +start_data = txt.find('1950\t') +stop_data = txt.find('2018\t') + +header = txt[start_head:start_data].split() +data = np.loadtxt(io.StringIO(txt[start_data:stop_data]),unpack=True) + +# use zip to load into a dictionary +data_dict = dict(zip(header, data)) +print(data_dict.keys()) + +# print the data for MAYJUN +print('data for MAYJUN: {MAYJUN}'.format(**data_dict)) +``` + +The line `print('data for MAYJUN: {MAYJUN}'.format(**data_dict))` is equivalent to writing: + + print('data for {MAYJUN}'.format(YEAR=data_dict[YEAR],DECJAN=data_dict[DECJAN], ...)) + +In this way, we use the keys in the dictionary as keywords to pass to a method. + +Another useful example of such a use of a dictionary is in saving a numpy dataset to file. + +If the data are numpy arrays in a dictionary as above, we can store the dataset using: + + + + +```python +import requests +import numpy as np +import io + +# access dataset as above +url = "http://www.esrl.noaa.gov/psd/enso/mei/table.html" +txt = requests.get(url).text + +# copy the useful data +start_head = txt.find('YEAR') +start_data = txt.find('1950\t') +stop_data = txt.find('2018\t') + +header = txt[start_head:start_data].split() +data = np.loadtxt(io.StringIO(txt[start_data:stop_data]),unpack=True) + +# use zip to load into a dictionary +data_dict = dict(zip(header, data)) + +filename = 'enso_mei.npz' + +# save the dataset +np.savez_compressed(filename,**data_dict) +``` + +What we load from the file is a dictionary-like object ``. + +If needed, we can cast this to a dictionary with `dict()`, but it is generally more efficient to keep the original type. + + +```python +# load the dataset + +filename = 'enso_mei.npz' + +loaded_data = np.load(filename) + +print(type(loaded_data)) + +# test they are the same using np.array_equal +for k in loaded_data.keys(): + print('\t',k,np.array_equal(data_dict[k], loaded_data[k])) +``` + +**Exercise 1.3.6** + +* Using what you have learned above, access the Met Office data file [`https://www.metoffice.gov.uk/hadobs/hadukp/data/monthly/HadSEEP_monthly_qc.txt`](https://www.metoffice.gov.uk/hadobs/hadukp/data/monthly/HadSEEP_monthly_qc.txt) and create a 'data package' in a numpy`.npz` file that has keys of `YEAR` and each month in the year, with associated datasets of Monthly Southeast England precipitation (mm). +* confirm that tha data in your `npz` file is the same as in your original dictionary +* produce a plot of October rainfall using these data for the years 1900 onwards + + +```python +# do exercise here +# ANSWER + +''' +Exploration of dataset shows: + + +Monthly Southeast England precipitation (mm). Daily automated values used after 1996. +Wigley & Jones (J.Climatol.,1987), Gregory et al. (Int.J.Clim.,1991) +Jones & Conway (Int.J.Climatol.,1997), Alexander & Jones (ASL,2001). Values may change after QC. +YEAR JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC ANN + 1873 87.1 50.4 52.9 19.9 41.1 63.6 53.2 56.4 62.0 86.0 59.4 15.7 647.7 + 1874 46.8 44.9 15.8 48.4 24.1 49.9 28.3 43.6 79.4 96.1 63.9 52.3 593.5 + +so we have 3 lines of header +then the column titles +then the data + +we can define these as before using + +txt.find('YEAR') +start_data = txt.find('1873') +stop_data = None + + +Other than the filenames then, the code +is identical +''' + +import requests +import numpy as np +import io + +# access dataset as above +url = "https://www.metoffice.gov.uk/hadobs/hadukp/data/monthly/HadSEEP_monthly_qc.txt" +txt = requests.get(url).text + +# copy the useful data +start_head = txt.find('YEAR') +start_data = txt.find('1873') +stop_data = None + +header = txt[start_head:start_data].split() +data = np.loadtxt(io.StringIO(txt[start_data:stop_data]),unpack=True) + +# use zip to load into a dictionary +data_dict = dict(zip(header, data)) + +filename = 'HadSEEP_monthly_qc.npz' + +# save the dataset +np.savez_compressed(filename,**data_dict) +``` + + +```python +# ANSWER + +loaded_data = np.load(filename) + +print(type(loaded_data)) + +# test they are the same using np.array_equal +for k in loaded_data.keys(): + print('\t',k,np.array_equal(data_dict[k], loaded_data[k])) +``` + + +```python +# ANSWER + +''' +October rainfall, 1900+ +''' + +year = loaded_data['YEAR'] + +# mask where years match +mask = year >= 1900 + +oct = loaded_data['OCT'] + +# set invalid data points to nan +oct[oct<0] = np.nan + +plt.plot(year[mask],oct[mask]) +``` + +### 1.3.5 Summary + +In this section, we have extended the types of data we might come across to include groups . We dealt with ordered groups of various types (`tuple`, `list`), and introduced the numpy package for numpy arrays (`np.array`). We saw dictionaries as collections with which we refer to individual items with a key. + +We learned in the previous section how to pull apart a dataset presented as a string using loops and various using methods and to construct a useful dataset 'by hand' in a list or similar structure. It is useful, when learning to program, to know how to do this. + +Here, we saw that packages such as numpy provide higher level routines that make reading data easier, and we would generally use these in practice. We saw how we can use `zip()` to help load a dataset from arrays into a dictionary, and also the value of using a dictionary representation when saving numpy files. diff --git a/docs/019_Running_Python 2.md b/docs/019_Running_Python 2.md new file mode 100644 index 00000000..523d71f1 --- /dev/null +++ b/docs/019_Running_Python 2.md @@ -0,0 +1,584 @@ +# 019 Python codes + +## Introduction + + +Whilst writing codes in Jupyter notebooks is useful and powerful, you will find over the course of GEOG0111 that you will need to develop your own codes as Python scripts. These are more portable than codes embedded in notebooks, and can also be used to develop your own code libraries. + +In this session, you will learn how to generate and use Python scripts from a notebook. In particular, you will learn the typical format of such scripts, and generate some simple scripts around some basic functions. + +At the end of the session, there is a practical exercise for you to do and submit. It does not provide part of your formal assessment, but it is important that you do the work and submit it, so we know if you are on track or need further help. If you do not submit the work, we cannot help you in this way. + + +### Prerequisites + +You will need some understanding of the following: + +* [001 Using Notebooks](001_Notebook_use.md) +* [005 Getting help](005_Help.md) +* [010 Variables, comments and print()](010_Python_Introduction.md) +* [011 Data types](011_Python_data_types.md) +* [012 String formatting](012_Python_strings.md) +* [013_Python_string_methods](013_Python_string_methods.md) +* [014_Python_groups](014_Python_groups.md) +* [015_Python_control](015_Python_control.md) +* [016_Python_for](016_Python_for.md) +* [017_Functions](017_Functions.md) + +In particular, you will need to recall how to use: + + - [`unix` commands](002_Unix.md): `ls -l; chmod` + - [`functions`](017_Functions.md) + + +## Running Python Code in a file + +We will generally use these Jupyter notebooks for running you Python codes, but you should also learn how to develop and run Python codes outside of the notebook. We provide a quick example of that here. + +### Running a Python script + +Python codes are written in files with the suffix `.py`, e.g. [`geog0111/helloWorld.py`](geog0111/helloWorld.py). + +If the file is defined as a Python script, then we can run. it from a shell in the same way we would run any other `unix` command: + + +```python +!geog0111/helloWorld.py +``` + + hello world + + +A Python script is a Python code file that contains: + + #!/usr/bin/env python + +as the first line in the file, and is executable. We can see if a file is executable by getting a 'long' listing using the `unix` command `ls -l`: + + +```python +!ls -l geog0111/helloWorld.py +``` + + -rwxr-xr-x 1 plewis staff 514 6 Sep 22:34 geog0111/helloWorld.py + + +The first field gives us the file permissions: + + -rwxr-xr-x + +with the `x` bit telling us that it is executable. The permissions for this file are set to `755` (`111` `101` `101` in octal) which we can set with the `unix` command: + + +```bash +%%bash +chmod 755 geog0111/helloWorld.py +ls -l geog0111/helloWorld.py +``` + + -rwxr-xr-x 1 plewis staff 514 6 Sep 22:34 geog0111/helloWorld.py + + +### Running Python code from Python + +We can also run the Python code contained in [`geog0111/helloWorld.py`](geog0111/helloWorld.py) from a `notebook` cell using `%run` (or from an `ipython` prompt using just `run`): + + +```python +%run geog0111/helloWorld.py +``` + + hello world + + +Another thing we can do is to `import` the code from the Python file into Python: + + +```python +import sys +from pathlib import Path +sys.path.insert(0,Path().cwd().joinpath('geog0111').as_posix()) + + +from helloWorld import helloWorld +helloWorld() +``` + + hello world + + + +```python + +helloWorld.helloWorld() +``` + +This import statement imports the module `helloWorld` from the library `geog0111` (w + +### Form of a Python script + +We will go into more details on this in later classes, but the main format of a Python file should be along the lines of the following example: + + +```python +#!/usr/bin/env python +# -*- coding: utf-8 -*- + +''' +helloWorld + +Purpose: + + function print the string 'hello world' + +''' + +__author__ = "P Lewis" +__copyright__ = "Copyright 2020 P Lewis" +__license__ = "GPLv3" +__email__ = "p.lewis@ucl.ac.uk" + +def helloWorld(): + ''' + function to print the string 'hello world' + + ''' + print('hello world') + + +# example calling the function +def main(): + helloWorld() + +if __name__ == "__main__": + # execute only if run as a script + main() + +``` + +This same code is contained in the file [geog0111/helloWorld.py](geog0111/helloWorld.py). Although the amount of comments and code in there might seem a little overkill just to print out `hello world`, it is good for you to get into the habit of writing your Python codes in this way. + +To briefly explain the blocks of code: + + #!/usr/bin/env python + # -*- coding: utf-8 -*- + +This first line of the file allows the shell to know that this file should be interpreted with Python. It is good practice for you to include this as the first line of your Python codes so they can be run as scripts. The second line here is optional, but provides information on the python file encoding. + + + ''' + helloWorld + + Purpose: + + function print the string 'hello world' + + ''' + __author__ = "P Lewis" + __copyright__ = "Copyright 2020 P Lewis" + __license__ = "GPLv3" + __email__ = "p.lewis@ucl.ac.uk" + +We then have a document string (contained within pairs of triple quotes) describing the purpose and some features of the code in this file. We provide some further information as variables written in the form `__var__` as relevant metadata for the file. + + def helloWorld(): + ''' + function to print the string 'hello world' + ''' + print('hello world') + +Next, we define a *function* containing the code we want. In this case, the main body of the code is simply: + + print('hello world') + +as in the examples above. The syntax `def helloWorld():` defines a function called `helloWorld` with no arguments (there is nothing defined here in the brackets). The code for the function is indented (4 spaces). + +After defining the function, we give a document string to associate with it and give users help information: + + +```python +help(helloWorld) +``` + +In the last part of the file, we provide some utilities that allow the file to be run as a script. + +First, we define a function called `main()` that provides an example of calling the function we have created: + + # example calling the function + def main(): + helloWorld() + + +This should be accompanied by some comments of doc-string explaining what we are doing in the code. + +The final part of the file contains the code that allows this file to be run as a script: + + if __name__ == "__main__": + # execute only if run as a script + main() + +This is pretty standard syntax, so we will not go into details now, other than to note that this is the mechanism by which running this file as a script makes the call to `main()`, which then calls our example code running `helloWorld()`. + +The final thing we need to do is to make our script executable. This involves setting some permissions on the file with the function `Path().chmod()`. The code to do this, for the file [geog0111/helloWorld.py](geog0111/helloWorld.py) from Python is: + + +```python +from pathlib import Path +Path('geog0111/helloWorld.py').chmod(int('755', base=8)) +``` + +Sometimes, we might use the `unix` (`bash`) shell equivalent command to do this: + + +```bash +%%bash +chmod 755 geog0111/helloWorld.py +``` + +Now we can run the Python code from a code cell: + + +```python +%run geog0111/helloWorld.py +``` + +Note that using `%run` in the code cell (or just `run` in other Python codes) allows us to run a python command from a Python file in the cell, rather than the usual Python codes. + +An alternative is to make the whole cell into one to run bash commands with `%%bash`: + + +```bash +%%bash +geog0111/helloWorld.py +``` + + +## Editing a file + +To do the task below, you will need to invoke a text editor of some sort to create the Python file. There are several ways you can do this, depending on how you are using these notebooks. + +### From JupyterLab + +If you are using this notebook in JupyterLab, go to the launcher tab and you should see various tools that you can launch: + +![JupyterLab tools](images/jl.png) + +Among these you will see 'text file'. Launch a text file, write your Python code into the file, and save it (`File -> Save As`) to the Python file name you want in your `work directory` (e.g. `work/test.py`). + +Alternatively, use the menu item `File -> New -> Text File` to open a new text file. + +To change the permissions on the file from JupyterLab, open a shell (e.g. using +`File -> New -> Terminal`) and in there, type (assuming your file is called `work/test.py`): + + chmod 755 work/test.py + + +![JupyterLab terminal](images/term.png) + +### Jupyter notebook + + +[geog0111/helloWorld.py](http://localhost:8888/edit/notebooks/geog0111/helloWorld.py) + + +### Use a text editor + +To do the task below, you will need to invoke a text editor of some sort to create the Python file. Ideally, you should learn to do this outside of the notebook: invoke a text editor on your computer, put your Python code into the file, and save it to the desired location. + +### Create in bash + +We can simply create a file in a bash script by following the example below: + + +```bash +%%bash + +# code between the next line and the +# End Of File (EOF) marker will be saved in +# to the file work/myHelloWorld.py +cat << EOF > work/myHelloWorld.py +#!/usr/bin/env python +# -*- coding: utf-8 -*- + +''' +helloWorld + +Purpose: + + function print the string 'hello world' + +''' +__author__ = "P Lewis" +__copyright__ = "Copyright 2020 P Lewis" +__license__ = "GPLv3" +__email__ = "p.lewis@ucl.ac.uk" + +def helloWorld(): + ''' + function to print the string 'hello world' + ''' + print('hello world') + + +# example calling the function +def main(): + helloWorld() + +if __name__ == "__main__": + # execute only if run as a script + main() +EOF + +# Chmod 755 to make the file executable +chmod 755 work/myHelloWorld.py +``` + +This should have created a file called `work/myHelloWorld.py` and made it executable (using bash to call chmod, rather than doing it in Python as above). + +Now run it as above: + + +```python +%run work/myHelloWorld.py +``` + +You should now be able to load this Python file directly in Jupyter, using `File -> Open` then click through to the Python file you generated. + +Once you save the file, run the Python code again (`%run work/myHelloWorld.py`). + +#### Exercise + +* Create a Python file in your [`work`](work) folder based on the example above and call it `work/myFirstCode.py` +* Modify the code to achieve the following: + - make a function called `myFirstCode` that prints out a greeting message + - update the document strings in the file as appropriate + +## Adding command line arguments + +We have achieved something today, in creating a Python code in a file, making it executable, and running it. However, it is a little limited in scope: it just prints out the message the we **hard wired** into it. + +It is better to design codes to have some more flexibility than this. Suppose we want to be able to run our code `work/myFirstCode.py` so that we can pass a name though to the script and print out, e.g.: + + hello from Fred + +or + + hello from Hermione + +How could we achieve that? + +The answer, as we have seen before, is to pass an argument to our function. So, instead of having: + + def helloWorld(): + ''' + function to print the string 'hello world' + ''' + print('hello from me') + +we pass an argument that we might call `name`, and conveniently pass this through to the `print()` statement using an `f-string`. + + + def helloWorld(name): + ''' + function to print the string 'hello world' + ''' + print(f'hello from {name}') + +Let's test that before going further: + + +```python +def hello(name): + ''' + function to print 'hello from {name}' + ''' + print(f'hello from {name}') + +# call the function with some argument +hello('Fred') +hello('Hermione') +``` + +We have seen that that code operates correctly -- it is very good practice to test any small incremental developments in your code in this way. + +Next, let's think about how to embed that modified function in our code in our file. We will call the code `work/hello.py` and copy most of it from the previous example. We need to make sure we update any document strings, and also take care that we are calling the correct function (it is `hello()` now, not `helloWorld()` as previously). + +Last then, when we make the call to `hello()` from the `main()` function, we need to make sure we pass it a string that it can print out, For example: + + def main(): + hello('Fred') + +Otherwise, if we leeave `main()` as: + + def main(): + hello() + +the code would not run and give a `TypeError:` + + TypeError: hello() missing 1 required positional argument: 'name' + + Again, this is an incremental change, so let's test the new code before going any further: + + +```bash +%%bash + +# code between the next line and the +# End Of File (EOF) marker will be saved in +# to the file work/hello.py +cat << EOF > work/hello.py +#!/usr/bin/env python +# -*- coding: utf-8 -*- + +''' +hello + +Purpose: + + function to print 'hello from {name}' + +''' +__author__ = "P Lewis" +__copyright__ = "Copyright 2020 P Lewis" +__license__ = "GPLv3" +__email__ = "p.lewis@ucl.ac.uk" + +def hello(name): + ''' + function to print 'hello from {name}' + ''' + print(f'hello from {name}') + +# example calling the function +def main(): + hello('Fred') + +if __name__ == "__main__": + # execute only if run as a script + main() +EOF + +# Chmod 755 to make the file executable +chmod 755 work/hello.py +``` + + +```python +# run coxde +%run work/hello.py +``` + +We now have a more flexible function `hello(name)`, but when we run the script `work/hello.py` we will always be running the same code in `main()`, so we can only ever get a hello from Fred. + +When we write a Python script, we often will want to make that more flexible too. We can do this by passing a command-line argument through to the script. What we want to happen is that when we run: + + %run work/hello.py Hermione + +we get the response: + + hello from Hermione + +That isn't too much of a step from where we are. We simply need to pass the command line argument through the script. In a Python script, we do this using the list + + sys.argv + +Let's just see how that works: + + +```python +import sys + +print(sys.argv) +``` + +Running from the notebook, we get to see the full command that is run when we launch this notebook. Since `sys` is a package, we first import it: + + import sys + +Then we can access the list of arguments `sys.argv`. The first item in the list, `sys.argv[0]` is the name of the program being run (`ipykernel_launcher.py`). The other items in the list are the command line arguments for running the notebook. + +Lets see how this applies to running our own code. We generate a short test script: + + +```bash +%%bash + +cat << EOF > work/test.py +#!/usr/bin/env python +import sys + +print(sys.argv) +EOF + +# Chmod 755 to make the file executable +chmod 755 work/test.py +``` + + +```python +%run work/test.py Hermione +``` + +We see that running the script: + + work/test.py Hermione + +means trhat `sys.argv` inside the script contains the liust `['work/test.py', 'Hermione']`. + +### Exercise: Submitted Practical + +Although we provide access to answers for this exercise, we want you to submit the codes you generate via Moodle, so that we can provide feedback. You should avoid looking at the answers before you submit your work. This submitted work does not count towards your course assessment, it is purely to allow us to provide some rapid feedback to you on how you are doing. You will need to put together a few elements from the notes so far to do all prts of this practical, but yoiu should all be capable of doing it well. Pay attention to writing tidy code, with useful, clear comments and document strings. + +* Create a Python code in a file called `work/greet.py` that does the following: + - define a function `greet(name)` that prints out a greeting from the name in the argument `name` + - define a function `main() that passes a string from the script command line to a function `greet(name)` + - calls `main()` if the file is run as a Python script + - show a test of the script working + - has plentiful commenting and document strings + + - As a test, when you run the script: + + %run work/greet.py Fred + + you would expect to get a response of the form: + + greetings from Fred + + and if you run: + %run work/greet.py Hermione + + then + greetings from Hermione + +* To go further with this exercise, you might test to see that the length of `sys.argv` is as long as you expect it to be, so you can tell the user when they forget toi include the name +* To go even further with this exercise, you might attempt to make the script function so that if you run it as: + + %run work/greet.py Fred Hermione + + it responds: + + greetings from Fred + greetings from Hermione + +## Summary + +In this session, we have learned the usual form of a Python script, including the `#! /usr/bin/env python` on the first line of the file to call the Python interpreter. We have seen how to include document strings telling any user how to use the codes we develop, and well as providing lots of comments with `#` lines to describe what we are doing in the code, and why. We have seen that a Python script will typically finish with something along the lines of: + + def main(args): + # call or demonstrate our function + hello(args) + + if __name__ == "__main__": + # execute only if run as a script + main(sys.argv) + +the `main()` function serving to provide a call to the codes we have developed, as well as any interface to the command line. We have seen that `sys.argv` contains command line parameters (and the program name) which we can use to modify the behaviour of the script. + +We have also seen how we can use + + cat << EOF > file.py + EOF + +in a bash script to generate and document our Python files, though we would also typically edits the file through some text editor. + +We have seen how to run a Python script from a notebook, using `%run` or via a bash shell with `%%bash`. We have seen that to 'run' a Python script, we need to change the file permissions iusing `chmod`, either in `bash` or using `Path().chmod()`. We have used the octal code `755` to change the file permissions for execution. diff --git a/docs/019_Running_Python_answers 2.md b/docs/019_Running_Python_answers 2.md new file mode 100644 index 00000000..4ffbd3a8 --- /dev/null +++ b/docs/019_Running_Python_answers 2.md @@ -0,0 +1,386 @@ +# 019 Python codes : Answers to exercises + +#### Exercise + +* Create a Python file in your [`work`](work) folder based on the example above and call it `work/myFirstCode.py` +* Modify the code to achieve the following: + - make a function called `myFirstCode` that prints out a greeting message + - update the document strings in the file as appropriate + + +```bash +%%bash +# ANSWER : create the Python file with the code we want + +# Instructions: +# +# Create a Python file in your work +# folder based on the example in geog0111/helloWorld.py +# and call it `work/myFirstCode.py` +# +# Modify the code to achieve the following: +# make a function called myFirstCode that prints out a greeting message +# update the document strings in the file as appropriate + +# code between the next line and the +# End Of File (EOF) marker will be saved in +# to the file work/myFirstCode.py +cat << EOF > work/myFirstCode.py +#!/usr/bin/env python +# -*- coding: utf-8 -*- + +''' +myFirstCode + +Purpose: + + function to print a greeting + +''' +__author__ = "P Lewis" +__copyright__ = "Copyright 2020 P Lewis" +__license__ = "GPLv3" +__email__ = "p.lewis@ucl.ac.uk" + +def myFirstCode(): + ''' + function to print a greeting + ''' + print('hello from me') + + +# example calling the function +def main(): + myFirstCode() + +if __name__ == "__main__": + # execute only if run as a script + main() +EOF + +# Chmod 755 to make the file executable +chmod 755 work/myFirstCode.py +``` + + +```python +# ANSWER +%run work/myFirstCode.py +``` + +### Exercise: Submitted Practical + +Although we provide access to answers for this exercise, we want you to submit the codes you generate via Moodle, so that we can provide feedback. You should avoid looking at the answers before you submit your work. This submitted work does not count towards your course assessment, it is purely to allow us to provide some rapid feedback to you on how you are doing. You will need to put together a few elements from the notes so far to do all prts of this practical, but yoiu should all be capable of doing it well. Pay attention to writing tidy code, with useful, clear comments and document strings. + +* Create a Python code in a file called `work/greet.py` that does the following: + - define a function `greet(name)` that prints out a greeting from the name in the argument `name` + - define a function `main() that passes a string from the script command line to a function `greet(name)` + - calls `main()` if the file is run as a Python script + - show a test of the script working + - has plentiful commenting and document strings + + - As a test, when you run the script: + + %run work/greet.py Fred + + you would expect to get a response of the form: + + greetings from Fred + + and if you run: + %run work/greet.py Hermione + + then + greetings from Hermione + +* To go further with this exercise, you might test to see that the length of `sys.argv` is as long as you expect it to be, so you can tell the user when they forget toi include the name +* To go even further with this exercise, you might attempt to make the script function so that if you run it as: + + %run work/greet.py Fred Hermione + + it responds: + + greetings from Fred + greetings from Hermione + + +```bash +%%bash +# ANSWER 1 +# + +# code between the next line and the +# End Of File (EOF) marker will be saved in +# to the file work/greet.py +cat << EOF > work/greet.py +#!/usr/bin/env python +# -*- coding: utf-8 -*- + +import sys + +''' +greet + +Purpose: + + script to print hello from name + +''' +__author__ = "P Lewis" +__copyright__ = "Copyright 2020 P Lewis" +__license__ = "GPLv3" +__email__ = "p.lewis@ucl.ac.uk" + +''' +Instructions: +Create a Python code in a file called work/greet.py that does the following: +define a function greet(name) that prints out a greeting from the name + in the argument name +define a function main() that passes a string from the script command + line to a function greet(name) +calls main() if the file is run as a Python script +has plentiful commenting and document strings +''' + +# define a function greet(name) that prints +# out a greeting from the name in the argument name +def greet(name): + ''' + function to print "greetings from {name}" + ''' + print(f'greetings from {name}') + + +# define a function main() that passes a string +# from the script command line to a function greet(name) + +# call name with a string +# name that we pass to greet(name) +def main(name): + greet(name) + +# calls main() if the file is run as a Python script +if __name__ == "__main__": + # execute only if run as a script + # we pass the first command line argument argv[1] + # remembering that argv[0[ is the program name + main(sys.argv[1]) +EOF + +# Chmod 755 to make the file executable +chmod 755 work/greet.py +``` + + +```python +msg = ''' +As a test, when you run the script: + + %run work/greet.py Fred +you would expect to get a response of the form: + + greetings from Fred +and if you run: + + %run work/greet.py Hermione +then + + greetings from Hermione + +''' +print(msg) + +print('work/greet.py Fred ->') +%run work/greet.py Fred +print('work/greet.py Hermione ->') +%run work/greet.py Hermione +``` + + +```bash +%%bash +# ANSWER 2 +# +# To go further with this exercise, you might test to see that the length of sys.argv is as long as you expect it to be, +# so you can tell the user when they forget to include the name + +# code between the next line and the +# End Of File (EOF) marker will be saved in +# to the file work/greet.py +cat << EOF > work/greet.py +#!/usr/bin/env python +# -*- coding: utf-8 -*- + +# import required package(s) +import sys + +''' +greet + +Purpose: + + script to print hello from name + +''' +__author__ = "P Lewis" +__copyright__ = "Copyright 2020 P Lewis" +__license__ = "GPLv3" +__email__ = "p.lewis@ucl.ac.uk" + + + +''' +Instructions: + +Create a Python code in a file called work/greet.py that does the following: +define a function greet(name) that prints out a greeting from the name + in the argument name +define a function main() that passes a string from the script command + line to a function greet(name) +calls main() if the file is run as a Python script +has plentiful commenting and document strings +''' + +# define a function greet(name) that prints +# out a greeting from the name in the argument name +def greet(name): + ''' + function to print "greetings from {name}" + ''' + print(f'greetings from {name}') + + +# define a function main() that passes a string +# from the script command line to a function greet(name) + +# call name with a string +# name that we pass to greet(name) +def main(name): + greet(name) + +# calls main() if the file is run as a Python script +if __name__ == "__main__": + # execute only if run as a script + # we pass the first command line argument argv[1] + # remembering that argv[0[ is the program name + + # TEST for string length in here: + # To go further with this exercise, you might test to see that the length of sys.argv is as long as you expect it to be, + # so you can tell the user when they forget to include the name + if len(sys.argv) > 1: + main(sys.argv[1]) + else: + print(f'{sys.argv[0]}: error - no command line name given') +EOF + +# Chmod 755 to make the file executable +chmod 755 work/greet.py +``` + + +```python +# test +print('work/greet.py ->') +%run work/greet.py +print('work/greet.py Hermione ->') +%run work/greet.py Hermione +``` + + +```bash +%%bash +# ANSWER 3 +# +# To go even further with this exercise, you might attempt to +# make the script function so that if you run it as: +# %run work/greet.py Fred Hermione +# it responds: +# greetings from Fred +# greetings from Hermione +# in many ways, this is easier than answer 2 +# as we just use a loop + +# code between the next line and the +# End Of File (EOF) marker will be saved in +# to the file work/greet.py +cat << EOF > work/greet.py +#!/usr/bin/env python +# -*- coding: utf-8 -*- + +# import required package(s) +import sys + +''' +greet + +Purpose: + + script to print hello from name + +Author: P. Lewis +Email: p.lewis@ucl.ac.uk +Date: 28 Aug 2020 + +Instructions: +Create a Python code in a file called work/greet.py that does the following: +define a function greet(name) that prints out a greeting from the name + in the argument name +define a function main() that passes a string from the script command + line to a function greet(name) +calls main() if the file is run as a Python script +has plentiful commenting and document strings +''' + +# define a function greet(name) that prints +# out a greeting from the name in the argument name +def greet(name): + ''' + function to print "greetings from {name}" + ''' + print(f'greetings from {name}') + + +# define a function main() that passes a string +# from the script command line to a function greet(name) + +# call name with a string +# name that we pass to greet(name) +def main(name): + greet(name) + +# calls main() if the file is run as a Python script +if __name__ == "__main__": + # execute only if run as a script + # we pass the first command line argument argv[1] + # remembering that argv[0[ is the program name + + # To go even further with this exercise, you might attempt to + # make the script function so that if you run it as: + # %run work/greet.py Fred Hermione + # it responds: + # greetings from Fred + # greetings from Hermione + # in many ways, this is easier than answer 2 + # as we just use a loop + for n in sys.argv[1:]: + # and call main() with n + main(n) +EOF + +# Chmod 755 to make the file executable +chmod 755 work/greet.py +``` + + +```python +# test +# separate the responses to see them easier +dash = '\n'+'='*10 + +print(f'{dash}\nwork/greet.py ->{dash}') +%run work/greet.py +print(f'{dash}\nwork/greet.py Hermione ->{dash}') +%run work/greet.py Hermione +print(f'{dash}\nwork/greet.py Hermione Fred ->{dash}') +%run work/greet.py Hermione Fred +``` diff --git a/docs/020_NASA_MODIS_Earthdata 2.md b/docs/020_NASA_MODIS_Earthdata 2.md new file mode 100644 index 00000000..b1963855 --- /dev/null +++ b/docs/020_NASA_MODIS_Earthdata 2.md @@ -0,0 +1,576 @@ +# NASA MODIS Earthdata + + +## Introduction + +### Purpose + +In this notebook, we will use high-level codes from `geog0111` to familiarise ourselves with downloading and interpreting NASA MODIS datasets from [`NASA EarthData`](https://urs.earthdata.nasa.gov). We will also be visualising these data in this notebook. + +We will be **introducing NASA MODIS land products**, and viewing the MODIS LAI product as an example. This notebook should serve as an introduction to accessing similar products from Earthdata. + +The aim of the codes here is not to provide an exhaustive interface to MODIS data products, although the same scripts should be useable for most, if not all similar products. Rather, it is to use these high-level codes to easily access and visualise the data to understand their properties. + +Neither is it to develop or use an [API](https://en.wikipedia.org/wiki/Application_programming_interface) to access the data. If all you want is to get hold of some data product for some defined location and time, then you might use an API such as [Appeears](https://lpdaacsvc.cr.usgs.gov/appeears/). + +Students who take the [GEOG0111 course](https://github.com/UCL-EO/geog0111) will develop codes along similar lines to this later in the term, so for them, these notes also illustrate some of the things they will be able to do when you have finished this course. For them, we will *look under the bonnet* of such codes, and learn how to develop them. For others, they can use these codes as they stand to access MODIS data via Earthdata. + +### Prerequisites + +Before you can use the material in this notebook, you will need to register as a user at the [`NASA EarthData`](https://urs.earthdata.nasa.gov/users/new). + +Once you have done that, make sure you know your `username` and `password` ready for below. + +The are no assumptions that you know any python code at this point: the use of code should be high enough level that you can easily understand what is going on, and use the constructs shown to modify the codes to your purpose. + +For completeness, we list the python and other codes below. + +We do assume that you have basic familiarity with using [Jupyter notebooks](001_Notebook_use.md). + +You should run through the [Credentials](#Credentials) section below before proceeding further with these notes. + +### Credentials + +We will store your credentials for [`NASA EarthData`] (https://urs.earthdata.nasa.gov/users/new) to allow easier data downloading. + +**N.B. using `cylog().login()` is only intended to work with access to NASA Earthdata and to prevent you having to expose your username and password in these notes**. + + +In the `geog0111` library, we have a Python class called `cylog`, written to allow easier persistent interface to NASA download servers. + +First, we import `cylog` from the `geog0111` library. + +Run the cell below: + + +```python +try: + from geog0111.cylog import cylog + from geog0111.nasa_requests import test +except: + raise SystemExit("Error loading the required uclgeog library") +``` + +If this gave an error, there is a problem importing the `geog0111` library and you should get help on this in a support class. + +### Earthdata login + +Run the cell below, and enter your `username` and `password` if prompted. + + +```python +cy = cylog(init=True) + +# check this has worked +print('has this worked?',test()) +``` + + + --------------------------------------------------------------------------- + + StdinNotImplementedError Traceback (most recent call last) + + in + ----> 1 cy = cylog(init=True) + 2 + 3 # check this has worked + 4 print('has this worked?',test()) + + + ~/Documents/GitHub/geog0111/notebooks/geog0111/cylog.py in __init__(self, init, destination_folder) + 60 return + 61 else: + ---> 62 self._setup(destination_folder=destination_folder) + 63 + 64 def _setup(self,destination_folder='.cylog'): + + + ~/Documents/GitHub/geog0111/notebooks/geog0111/cylog.py in _setup(self, destination_folder) + 63 + 64 def _setup(self,destination_folder='.cylog'): + ---> 65 username = input("Enter your username: ") + 66 password = getpass() + 67 key = Fernet.generate_key() + + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/ipykernel/kernelbase.py in raw_input(self, prompt) + 856 if not self._allow_stdin: + 857 raise StdinNotImplementedError( + --> 858 "raw_input was called, but this frontend does not support input requests." + 859 ) + 860 return self._input_request(str(prompt), + + + StdinNotImplementedError: raw_input was called, but this frontend does not support input requests. + + + +```python +cy.login() +``` + + + --------------------------------------------------------------------------- + + NameError Traceback (most recent call last) + + in + ----> 1 cy.login() + + + NameError: name 'cy' is not defined + + +If you want to force the code to let you re-enter your credentials (e.g. you got it wrong before, or have changed them, or the test fails), then change the call to: + + cy = cylog(init=True) + +and re-run. + +`cylog` stores your username and password in a file that only you can read. We can use this as a convenient way to pull some NASA MODIS data. + +### Code used + + + +In the code below, we use the following python constructs: + +* [`import` modules](https://www.w3schools.com/python/python_modules.asp) +* [Error trapping: `try ... except`](https://www.w3schools.com/python/python_try_except.asp#:~:text=The%20try%20block%20lets%20you,the%20try%2D%20and%20except%20blocks.) +* [`assert`](https://www.w3schools.com/python/ref_keyword_assert.asp) +* [`dictionary`](https://www.w3schools.com/python/python_dictionaries.asp) +* [`print()`](https://www.w3schools.com/python/ref_func_print.asp) +* [string `format()`](https://www.w3schools.com/python/ref_string_format.asp) +* [variables](https://www.w3schools.com/python/python_variables.asp) +* [keyword arguments](https://www.w3schools.com/python/gloss_python_function_keyword_arguments.asp) +* [np.logical_or](https://numpy.org/doc/stable/reference/generated/numpy.logical_or.html) + +Their meaning should be quite obvious from their context, but we provide links here to materiual at [https://www.w3schools.com/](https://www.w3schools.com/) should you wish to understand them further here. + +## MODIS LAI product + +To introduce geospatial processing, we will use a dataset from the MODIS LAI product over the UK. + +The data product [MOD15](https://modis.gsfc.nasa.gov/data/dataprod/mod15.php) LAI/FPAR has been generated from NASA MODIS sensors Terra and Aqua data since 2002. We are now in dataset collection 6 (the data version to use). + + LAI is defined as the one-sided green leaf area per unit ground area in broadleaf canopies and as half the total needle surface area per unit ground area in coniferous canopies. FPAR is the fraction of photosynthetically active radiation (400-700 nm) absorbed by green vegetation. Both variables are used for calculating surface photosynthesis, evapotranspiration, and net primary production, which in turn are used to calculate terrestrial energy, carbon, water cycle processes, and biogeochemistry of vegetation. Algorithm refinements have improved quality of retrievals and consistency with field measurements over all biomes, with a focus on woody vegetation. + +We use such data to map and understand about the dynamics of terrestrial vegetation / carbon, for example, for climate studies. + +The raster data are arranged in tiles, indexed by row and column, to cover the globe: + + +![MODIS tiles](https://www.researchgate.net/profile/J_Townshend/publication/220473201/figure/fig5/AS:277546596880390@1443183673583/The-global-MODIS-Sinusoidal-tile-grid.png) + + +### Exercise + +The pattern on the tile names is `hXXvYY` where `XX` is the horizontal coordinate and `YY` the vertical. + + +* use the map above to work out the names of the two tiles that we will need to access data over the UK +* set the variable `tiles` to contain these two names in a list + +For example, for the two tiles covering Madagascar, we would set: + + tiles = ['h22v10','h22v11'] + + +### Accessing NASA MODIS URLs + +**Warning: The NASA data servers tend to be down for maintainance on Wednesday morning EST** + +Although you can access MODIS datasets through the [NASA Earthdata](https://urs.earthdata.nasa.gov/home) interface, there are many occasions that we would want to just automatically pull datasets. As we note above, we could use some existing API for this, such as [Appeears](https://lpdaacsvc.cr.usgs.gov/appeears/), but we are aiming here at being able to ultimately develop codes that do this from a lower-level perspective. + +Automation has many roles, and is particularly useful when you want a time series of data that might involve many files. For example, for analysing LAI or other variables over space/time) we will want to write code that pulls the time series of data. + +If you visit the site [https://e4ftl01.cr.usgs.gov/MOTA/MCD15A3H.006](https://e4ftl01.cr.usgs.gov/MOTA/MCD15A3H.006), you will see 'date' style links (e.g. `2018.09.30`) through to sub-directories. + +In these, e.g. [https://e4ftl01.cr.usgs.gov/MOTA/MCD15A3H.006/2018.09.30/](https://e4ftl01.cr.usgs.gov/MOTA/MCD15A3H.006/2018.09.30/) you will find URLs of a set of files. + +The files pointed to by the URLs are the MODIS MOD15 4-day composite 500 m LAI/FPAR product [MCD15A3H](https://lpdaac.usgs.gov/dataset_discovery/modis/modis_products_table/mcd15a3h_v006). + +There are links to several datasets on the page, including 'quicklook files' that are jpeg format images of the datasets, e.g.: + +![MCD15A3H.A2018273.h17v03](https://e4ftl01.cr.usgs.gov/MOTA/MCD15A3H.006/2018.09.30/BROWSE.MCD15A3H.A2018273.h17v03.006.2018278143630.1.jpg) + +as well as `xml` files and `hdf` datasets. + + + +### Data Products + +If we look at the dataserver we hae specified [https://e4ftl01.cr.usgs.gov](https://e4ftl01.cr.usgs.gov), we will see that a number of sub-directories exist. Each of these 'server directories' points to a different data stream: + + [DIR] ASTT/ 2019-08-05 07:54 - + [DIR] COMMUNITY/ 2020-06-02 08:45 - + [DIR] ECOSTRESS/ 2020-04-09 10:30 - + [DIR] GEDI/ 2020-02-10 09:58 - + [DIR] MEASURES/ 2020-03-17 10:55 - + [DIR] MOLA/ 2020-06-01 09:20 - + [DIR] MOLT/ 2020-04-14 08:06 - + [DIR] MOTA/ 2019-12-27 06:49 - + [DIR] VIIRS/ 2020-06-23 10:26 - + +For example, we might notice [VIIRS](https://e4ftl01.cr.usgs.gov/VIIRS) which takes us to the [VIIRS data products](https://viirsland.gsfc.nasa.gov), or [GEDI](https://e4ftl01.cr.usgs.gov/GEDI) [spaceborne lidar](https://gedi.umd.edu/) data. Each of these data streams will have their own properties that we need to appreciate before using them. + +### MOTA + +The URL we have used above, [https://e4ftl01.cr.usgs.gov/MOTA/MCD15A3H.006/2018.09.30/](https://e4ftl01.cr.usgs.gov/MOTA/MCD15A3H.006/2018.09.30/) starts with a call to the server directory `MOTA`, so we can think of `https://e4ftl01.cr.usgs.gov/MOTA` as the base level URL. + +MOTA refers to combined MODIS Terra and Aqua datasets. Similarly, MOLA and MOLT refer to datasets generated from single MODIS sensors of Aqua and Terra, respectively. + +The rest of the directory information `MCD15A3H.006/2018.09.30` tells us: + +* the product name `MCD15A3H` +* the product version `006` +* the date of the dataset `2018.09.30` + +There are several ways we could specify the date information. The most 'human readable' is probably `YYYY.MM.DD` as given here. + +### MODIS filename format + +If we vist the link to a particular date for this dataset [https://e4ftl01.cr.usgs.gov/MOTA/MCD15A3H.006/2018.09.30/](https://e4ftl01.cr.usgs.gov/MOTA/MCD15A3H.006/2018.09.30/), we see some files that have the suffix `hdf`. + +The `hdf` filenames are of the form: + + MCD15A3H.A2018273.h35v10.006.2018278143650.hdf + +where: + +* the first field (`MCD15A3H`) gives the product code +* the second (`A2018273`) gives the observation date: day of year `273`, `2018` here +* the third (`h35v10`) gives the 'MODIS tile' code for the data location +* the remaining fields specify the product version number (`006`) and a code representing the processing date. + +If we look at the [product specification page](https://lpdaac.usgs.gov/products/mcd15a3hv006/) we see that the data product has multiple data layers. In the case of MCD15A3H, this is: + +|SDS Name |Description | Units |Data Type |Fill Value| No Data Value |Valid Range| Scale Factor +|:-:|:-:|:-:|:-:|:-:|:-:|:-:|-| +| Fpar_500m | Fraction of Photosynthetically Active Radiation |Percent| 8-bit unsigned integer |249 to 255 |N/A |0 to 100 |0.01 +|Lai_500m |Leaf Area Index| m²/m²| 8-bit unsigned integer| 249 to 255 |N/A| 0 to 100| 0.1| +|FparLai_QC |Quality for FPAR and LAI |Class Flag |8-bit unsigned integer |255| N/A |0 to 254 |N/A +| FparExtra_QC |Extra detail Quality for FPAR and LAI |Class Flag| 8-bit unsigned integer| 255 |N/A |0 to 254 |N/A +|FparStdDev_500m| Standard deviation of FPAR |Percent| 8-bit unsigned integer| 248 to 255 |N/A| 0 to 100 |0.01 +|LaiStdDev_500m| Standard deviation of LAI |m²/m²| 8-bit unsigned integer| 248 to 255| N/A |0 to 100 |0.1 + + +## Getting and visualising the data + +### Grid + +One thing we might need sometimes is to specify the `grid` used by the data product. Mostly, this is just the same as the product name (this is the default in our codes by just setting `grid` to the same as the product name). + +For the product `MCD15A3H` that we use here though, the grid is `MOD_Grid_MCD15A3H`, so we need to specify this. This issue is something to look out for when you specify a MODIS product you haven't use before. This is not specified in the product user guides or specifications, but you will *mostly* find it the associated [file specifications document](https://ladsweb.modaps.eosdis.nasa.gov/filespec/MODIS/6/MCD15A3H). When you use a new proiduct then, don't forget to check the appropriate [file specifications](https://ladsweb.modaps.eosdis.nasa.gov/filespec/MODIS/6) to find the grid object used! + +If you can't find it, just try to use the default (set `grid` to `None`). + +If that fails to return anything useful, the easiest thinbg to do is to examine the SDS datasets in the file itself. + +For example, lets try using the default grid: + + +```python +from uclgeog.process_timeseries import mosaic, visualise +# libraries we need + +####################### +# specify what we want +# in a dictionary +####################### +# UK tiles +# specify day of year (DOY) and year + +params = { + 'tiles' : ['h17v03', 'h17v04', 'h18v03', 'h18v04'], + 'doy' : 1, + 'year' : 2020, + 'product' : 'MCD15A3H', + 'layer' : 'Lai_500m', + 'grid' : None, + 'base_url': 'https://e4ftl01.cr.usgs.gov/MOTA' +} + +# check to see if it worked +# and trap errors +try: + data = mosaic(params) + assert data is not None +except AssertionError: + print("\nThis hasn't worked") +else: + print("\nThis worked") +``` + + + --------------------------------------------------------------------------- + + ModuleNotFoundError Traceback (most recent call last) + + in + ----> 1 from uclgeog.process_timeseries import mosaic, visualise + 2 # libraries we need + 3 + 4 ####################### + 5 # specify what we want + + + ModuleNotFoundError: No module named 'uclgeog' + + +The code exits with the message: + + failed to warp ['HDF4_EOS:EOS_GRID:"data/MCD15A3H.A2020001.h17v03.006.2020006031702.hdf":MCD15A3H:Lai_500m', 'HDF4_EOS:EOS_GRID:"data/MCD15A3H.A2020001.h17v04.006.2020006031910.hdf":MCD15A3H:Lai_500m', 'HDF4_EOS:EOS_GRID:"data/MCD15A3H.A2020001.h18v03.006.2020006033540.hdf":MCD15A3H:Lai_500m', 'HDF4_EOS:EOS_GRID:"data/MCD15A3H.A2020001.h18v04.006.2020006032422.hdf":MCD15A3H:Lai_500m'] 2020, 1, ['h17v03', 'h17v04', 'h18v03', 'h18v04'], data/ + +This is telling us that it has tried to access a dataset + + HDF4_EOS:EOS_GRID:"data/MCD15A3H.A2020001.h17v03.006.2020006031702.hdf":MCD15A3H:Lai_500m + +and this is where it has failed. + +We could use python calls to check what this should be, but we mostly find it easier to use system tool, `gdalinfo` in this case. [`gdal`](https://gdal.org/) is software for geospatial processing that can deal with a wide range of formats. We will make a lot of use of it later on. + +For now, we can run a system command below to see what the SDS `Lai_500m` looks like in one of the files it has downloaded (we get the filename from the list reported above). + + +```python +!gdalinfo data/MCD15A3H.A2020001.h17v03.006.2020006031702.hdf | grep Lai_500m +``` + + ERROR 4: data/MCD15A3H.A2020001.h17v03.006.2020006031702.hdf: No such file or directory + gdalinfo failed - unable to open 'data/MCD15A3H.A2020001.h17v03.006.2020006031702.hdf'. + + +From this, we see that the dataset specification is really + + HDF4_EOS:EOS_GRID:"data/MCD15A3H.A2020001.h17v03.006.2020006031702.hdf":MOD_Grid_MCD15A3H:Lai_500m + +and not what we previously assumed: + + HDF4_EOS:EOS_GRID:"data/MCD15A3H.A2020001.h17v03.006.2020006031702.hdf":MCD15A3H:Lai_500m + +We have most of the specification correct, but have used `MCD15A3H:Lai_500m` instead of `MOD_Grid_MCD15A3H:Lai_500m`. Let's fix this now: + + +```python +from uclgeog.process_timeseries import mosaic, visualise +# libraries we need + +####################### +# specify what we want +# in a dictionary +####################### +# UK tiles +# specify day of year (DOY) and year + +params = { + 'tiles' : ['h17v03', 'h17v04', 'h18v03', 'h18v04'], + 'doy' : 1, + 'year' : 2020, + 'product' : 'MCD15A3H', + 'layer' : 'Lai_500m', + 'grid' : 'MOD_Grid_MCD15A3H', + 'base_url': 'https://e4ftl01.cr.usgs.gov/MOTA' + +} + +# check to see if it worked +# and trap errors +try: + data = mosaic(params) + assert data is not None +except AssertionError: + print("\nThis hasn't worked") +else: + print("\nThis worked") +``` + + + --------------------------------------------------------------------------- + + ModuleNotFoundError Traceback (most recent call last) + + in + ----> 1 from uclgeog.process_timeseries import mosaic, visualise + 2 # libraries we need + 3 + 4 ####################### + 5 # specify what we want + + + ModuleNotFoundError: No module named 'uclgeog' + + +### Download + +So, other than some terms (e.g. version number) we can take as defaults, when we want to access a MODIS product as tile data, we need to specify: + +* product code +* SDS Name (scientific dataset name) +* tile(s) +* day of year (DOY) +* year + +Now we have some appreciation of the MODIS dataset description requirements, we can use the method `mosaic_and_clip()` in `uclgeog` to download some example datasets: + + # UK tiles + tiles = ['h17v03', 'h17v04', 'h18v03', 'h18v04'] + # specify day of year (DOY) and year + doy,year = 1,2020 + # product + product = 'MCD15A3H' + # SDS + layer = "Lai_500m" + # grid + grid = 'MOD_Grid_MCD15A3H' + +One useful thing we have implemented in `mosaic_and_clip()` is to mosaic data from different tiles together into one contiguous dataset. So, although we will have data specified over four tiles, we will mosaic it together into a single array. + + +```python +from uclgeog.process_timeseries import mosaic_and_clip, visualise +import numpy as np +# libraries we need + +####################### +# specify what we want +# in a dictionary +####################### +# UK tiles +# specify day of year (DOY) and year + +params = { + 'tiles' : ['h17v03', 'h17v04', 'h18v03', 'h18v04'], + 'doy' : 1, + 'year' : 2020, + 'product' : 'MCD15A3H', + 'layer' : 'Lai_500m', + 'grid' : 'MOD_Grid_MCD15A3H', + 'verbose' : True, + 'base_url': 'https://e4ftl01.cr.usgs.gov/MOTA' +} + +####################### +# download and interpret +# and mask non-valid numbers by setting to NaN +# see data table above +####################### +try: + data = mosaic(params) + assert data is not None +except AssertionError: + print("\nThis hasn't worked") +else: + data = data.astype(float) + data[data>248] = np.nan + ####################### + # print some feedback + ####################### + print(f'the variable lai contains a dataset of dimension {data.shape}') + print('for product {product} SDS {layer}'.format(**params)) + print('for day {doy} of year {year} for tiles {tiles}'.format(**params)) + +``` + + + --------------------------------------------------------------------------- + + ModuleNotFoundError Traceback (most recent call last) + + in + ----> 1 from uclgeog.process_timeseries import mosaic_and_clip, visualise + 2 import numpy as np + 3 # libraries we need + 4 + 5 ####################### + + + ModuleNotFoundError: No module named 'uclgeog' + + +### Visualise + +We have now generated a dataset, stored in a variable `lai`. We are likely to want to perform some analysis on this, but we might also like to visualise the dataset. + +We can do this using a python package [matplotlib](https://matplotlib.org) that we will gain more experience with later. + +For now, we will simply implement a typical image visualisation, with a dataset title, and scale bar. We will use a method `visualise()` from our `uclgeog` library to do this. + + +```python +# call visualise +title = 'product {product} SDS {layer}\n'.format(**params) + \ + 'for day {doy} of year {year} for tiles {tiles}'.format(**params) +# set the max value to 3.0 to be able to see whats going on +plot=visualise(data,title=title,vmax=3.0) +``` + + + --------------------------------------------------------------------------- + + NameError Traceback (most recent call last) + + in + 1 # call visualise + ----> 2 title = 'product {product} SDS {layer}\n'.format(**params) + \ + 3 'for day {doy} of year {year} for tiles {tiles}'.format(**params) + 4 # set the max value to 3.0 to be able to see whats going on + 5 plot=visualise(data,title=title,vmax=3.0) + + + NameError: name 'params' is not defined + + +# Exercises + +### Exercise: change the year and DOY + +Using the lines of code above, download and visualise the LAI dataset for a different DOY and year. Remember that it is a 4-day synthesis, so there are only datasets on doy 1,5,9, ... + +Put comments in your code using `#` to start a comment, to describe what you are doing. + +You might want to set `verbose` to `True` to get some feedback on what is going on. + +### Exercise: change the location + +Using the lines of code above, download and visualise the LAI dataset for a different location. + +You will need to specify the tile or tiles that you wish to use. + +As before, put comments in your code using `#` to start a comment, to describe what you are doing. + +You might want to set `verbose` to `True` to get some feedback on what is going on. + +### Exercise: change the SDS + +Using the lines of code above, download and visualise the LAI dataset for a different location. + +Now, instead of using the data layer `Lai_500m`, visualise another data layer in the LAI dataset. See the table above of [the product specification](https://lpdaac.usgs.gov/products/mcd15a3hv006/) for details. + +### Exercise: change the product to another on MOTA + +Using the lines of code above, download and visualise a different MODIS product. + +You can see the option codes on the server we have been using by [looking in the directory https://e4ftl01.cr.usgs.gov/MOTA](https://e4ftl01.cr.usgs.gov/MOTA). + +You get get the meanings of the codes from simply googling them, or you can look them up on the [MODIS data product page](https://modis.gsfc.nasa.gov/data/dataprod/). + +### Exercise: Snow + +The MODIS snow products are on a different server to the one we used above, [`https://n5eil01u.ecs.nsidc.org/MOST`](https://n5eil01u.ecs.nsidc.org/MOST) for MODIS Terra data and [`https://n5eil01u.ecs.nsidc.org/MOSA`](https://n5eil01u.ecs.nsidc.org/MOSA) for MODIS Aqua. Product information is available on the [product website](https://nsidc.org/data/myd10a1). Note that there is not combined Terra and Aqua product. + +Use the codes above to explore, download, and plot a snow dataset from the `MOD10A1` product. + +### Exercise: Land Cover + +The MODIS land cover product is `MCD12Q1`. + +Use the codes above to explore, download, and plot a land cover dataset from the `MCD12Q1` product. + +# Summary + +In these notes, we have introduced the characteristics of MODIS data products, and learned how to specify, access, and display them for a few servers. You will have accessed a number of products under a number of conditions in the exercises, but you are encouraged to explore this further. + +The main item to do with using data products of this sort, that we haven't covered yet, is the interpretation of Quality Assurance (QA) data. This is often packed information into bits, and can be a little tricky at first. However, as with above, once you have a little familiarisation with a few cases, you will be able to applky this more widely. + +You should spend some time going through the various links to explore the different datasets, and try out the exercises above for various products. The familiarity you gain from this will help when it comes to building our own codes later on. diff --git a/docs/020_NASA_MODIS_Earthdata_answers 2.md b/docs/020_NASA_MODIS_Earthdata_answers 2.md new file mode 100644 index 00000000..2ab2c4d0 --- /dev/null +++ b/docs/020_NASA_MODIS_Earthdata_answers 2.md @@ -0,0 +1,605 @@ +# NASA MODIS Earthdata : Answers to exercises + +### Exercise + +The pattern on the tile names is `hXXvYY` where `XX` is the horizontal coordinate and `YY` the vertical. + + +* use the map above to work out the names of the two tiles that we will need to access data over the UK +* set the variable `tiles` to contain these two names in a list + +For example, for the two tiles covering Madagascar, we would set: + + tiles = ['h22v10','h22v11'] + + +```python +# tiles for the UK + +tiles = ['h17v03', 'h17v04', 'h18v03', 'h18v04'] +``` + +### Exercise: change the year and DOY + +Using the lines of code above, download and visualise the LAI dataset for a different DOY and year. Remember that it is a 4-day synthesis, so there are only datasets on doy 1,5,9, ... + +Put comments in your code using `#` to start a comment, to describe what you are doing. + +You might want to set `verbose` to `True` to get some feedback on what is going on. + + +```python +from uclgeog.process_timeseries import mosaic_and_clip, visualise +import numpy as np + +####################### +# location: madagascar +####################### +params = { + 'tiles' : ['h17v03', 'h17v04', 'h18v03', 'h18v04'], + 'doy' : 5, + 'year' : 2010, + 'product' : 'MCD15A3H', + 'layer' : 'Lai_500m', + 'grid' : 'MOD_Grid_MCD15A3H', + 'verbose' : True, + 'base_url': 'https://e4ftl01.cr.usgs.gov/MOTA' +} + +try: + data = mosaic(params) + assert data is not None +except AssertionError: + print("\nThis hasn't worked") +else: + data = data.astype(float) + data[data>248] = np.nan + ####################### + # call visualise + ####################### + title = 'product {product} SDS {layer}\n'.format(**params) + \ + 'for day {doy} of year {year} for tiles {tiles}'.format(**params) + plot=visualise(data,title=title,vmax=3.0) +``` + + + --------------------------------------------------------------------------- + + ModuleNotFoundError Traceback (most recent call last) + + in + ----> 1 from uclgeog.process_timeseries import mosaic_and_clip, visualise + 2 import numpy as np + 3 + 4 ####################### + 5 # location: madagascar + + + ModuleNotFoundError: No module named 'uclgeog' + + + +```python +from uclgeog.process_timeseries import mosaic_and_clip, visualise + +####################### +# doy = 1 + 4 * 20 here +####################### +params = { + 'tiles' : ['h17v03', 'h17v04', 'h18v03', 'h18v04'], + 'doy' : 1+4*30, + 'year' : 2020, + 'product': 'MCD15A3H', + 'layer' : 'Lai_500m', + 'grid' : 'MOD_Grid_MCD15A3H', + 'verbose': True +} + +####################### +# download and interpret +####################### +# check to see if it worked +# and trap errors +try: + lai = mosaic(params) + assert lai is not None +except: + print("\nThis hasn't worked") +else: + lai = lai.astype(float) + lai[lai>248] = np.nan + ####################### + # call visualise + ####################### + title = 'product {product} SDS {layer}\n'.format(**params) + \ + 'for day {doy} of year {year} for tiles {tiles}'.format(**params) + plot=visualise(lai,title=title,vmax=3.0) +``` + + + --------------------------------------------------------------------------- + + ModuleNotFoundError Traceback (most recent call last) + + in + ----> 1 from uclgeog.process_timeseries import mosaic_and_clip, visualise + 2 + 3 ####################### + 4 # doy = 1 + 4 * 20 here + 5 ####################### + + + ModuleNotFoundError: No module named 'uclgeog' + + +### Exercise: change the location + +Using the lines of code above, download and visualise the LAI dataset for a different location. + +You will need to specify the tile or tiles that you wish to use. + +As before, put comments in your code using `#` to start a comment, to describe what you are doing. + +You might want to set `verbose` to `True` to get some feedback on what is going on. + + +```python +from uclgeog.process_timeseries import mosaic_and_clip, visualise +import numpy as np + +####################### +# location: madagascar +####################### +params = { + 'tiles' : ['h22v10','h22v11'], + 'doy' : 5, + 'year' : 2010, + 'product' : 'MCD15A3H', + 'layer' : 'Lai_500m', + 'grid' : 'MOD_Grid_MCD15A3H', + 'verbose' : True, + 'base_url': 'https://e4ftl01.cr.usgs.gov/MOTA' +} +try: + data = mosaic(params) + assert data is not None +except AssertionError: + print("\nThis hasn't worked") +else: + data = data.astype(float) + data[data>248] = np.nan + ####################### + # call visualise + ####################### + title = 'product {product} SDS {layer}\n'.format(**params) + \ + 'for day {doy} of year {year} for tiles {tiles}'.format(**params) + plot=visualise(data,title=title,vmax=3.0) +``` + + + --------------------------------------------------------------------------- + + ModuleNotFoundError Traceback (most recent call last) + + in + ----> 1 from uclgeog.process_timeseries import mosaic_and_clip, visualise + 2 import numpy as np + 3 + 4 ####################### + 5 # location: madagascar + + + ModuleNotFoundError: No module named 'uclgeog' + + +### Exercise: change the SDS + +Using the lines of code above, download and visualise the LAI dataset for a different location. + +Now, instead of using the data layer `Lai_500m`, visualise another data layer in the LAI dataset. See the table above of [the product specification](https://lpdaac.usgs.gov/products/mcd15a3hv006/) for details. + + +```python +from uclgeog.process_timeseries import mosaic_and_clip, visualise +import numpy as np + +####################### +# location: madagascar +####################### + +params = { + 'tiles' : ['h22v10','h22v11'], + 'doy' : 5, + 'year' : 2010, + 'product' : 'MCD15A3H', + 'layer' : 'FparLai_QC', + 'grid' : 'MOD_Grid_MCD15A3H', + 'verbose' : True, + 'base_url': 'https://e4ftl01.cr.usgs.gov/MOTA' +} + +####################### +# download and interpret +# note the valid range is different +# see the product table above +####################### + +try: + data = mosaic(params) + assert data is not None +except AssertionError: + print("\nThis hasn't worked") +else: + data = data.astype(float) + data[data>254] = np.nan + ####################### + # call visualise + ####################### + title = 'product {product} SDS {layer}\n'.format(**params) + \ + 'for day {doy} of year {year} for tiles {tiles}'.format(**params) + plot=visualise(data,title=title) +``` + + + --------------------------------------------------------------------------- + + ModuleNotFoundError Traceback (most recent call last) + + in + ----> 1 from uclgeog.process_timeseries import mosaic_and_clip, visualise + 2 import numpy as np + 3 + 4 ####################### + 5 # location: madagascar + + + ModuleNotFoundError: No module named 'uclgeog' + + +### Exercise: change the product to another on MOTA + +Using the lines of code above, download and visualise a different MODIS product. + +You can see the option codes on the server we have been using by [looking in the directory https://e4ftl01.cr.usgs.gov/MOTA](https://e4ftl01.cr.usgs.gov/MOTA). + +You get get the meanings of the codes from simply googling them, or you can look them up on the [MODIS data product page](https://modis.gsfc.nasa.gov/data/dataprod/). + + +```python +from uclgeog.process_timeseries import mosaic_and_clip, visualise +import numpy as np + +####################### +# location: madagascar +# product MCD64A1 Burned Area +# see product page on +# https://lpdaac.usgs.gov/products/mcd64a1v006/ +# we see one of the SDS layers is 'Burn Date' +# and that 1 to 366 are valid +# +# get the grid from +# https://ladsweb.modaps.eosdis.nasa.gov/filespec/MODIS/6/MCD64A1 +####################### +params = { + 'tiles' : ['h22v10'], + 'doy' : 1, + 'year' : 2020, + 'product' : 'MCD64A1', + 'layer' : 'Burn Date', + 'grid' : 'MOD_Grid_Monthly_500m_DB_BA', + 'verbose' : True, + 'base_url': 'https://e4ftl01.cr.usgs.gov/MOTA' +} + +####################### +# download and interpret +# note the valid range is different +# see the product table above +# Use a different variable name: +# its not lai any more! +####################### +try: + data = mosaic(params) + assert data is not None +except: + print("\nThis hasn't worked") +else: + data = data.astype(float) + data[np.logical_or(data>366,data<1)] = np.nan + ####################### + # call visualise + ####################### + title = 'product {product} SDS {layer}\n'.format(**params) + \ + 'for day {doy} of year {year} for tiles {tiles}'.format(**params) + plot=visualise(data,title=title) +``` + + + --------------------------------------------------------------------------- + + ModuleNotFoundError Traceback (most recent call last) + + in + ----> 1 from uclgeog.process_timeseries import mosaic_and_clip, visualise + 2 import numpy as np + 3 + 4 ####################### + 5 # location: madagascar + + + ModuleNotFoundError: No module named 'uclgeog' + + +### Exercise: Snow + +The MODIS snow products are on a different server to the one we used above, [`https://n5eil01u.ecs.nsidc.org/MOST`](https://n5eil01u.ecs.nsidc.org/MOST) for MODIS Terra data and [`https://n5eil01u.ecs.nsidc.org/MOSA`](https://n5eil01u.ecs.nsidc.org/MOSA) for MODIS Aqua. Product information is available on the [product website](https://nsidc.org/data/myd10a1). Note that there is not combined Terra and Aqua product. + +Use the codes above to explore, download, and plot a snow dataset from the `MOD10A1` product. + + +```python +from uclgeog.process_timeseries import mosaic_and_clip, visualise +import numpy as np + +####################### +# location: E Europe +# product +# see product page on +# https://nsidc.org/data/MYD10A1/versions/6 +# 0-100 is valid +# NDSI_Snow_Cover +# grid is MOD_Grid_Snow_500m +####################### +params = { + 'tiles' : ['h19v03'], + 'doy' : 1, + 'year' : 2010, + 'product' : 'MOD10A1', + 'layer' : 'NDSI_Snow_Cover', + 'grid' : 'MOD_Grid_Snow_500m', + 'verbose' : True, + 'base_url': 'https://n5eil01u.ecs.nsidc.org/MOST' +} + +####################### +# download and interpret +# note the valid range is different +# see the product table above +# Use a different variable name: +# its not lai any more! +####################### +try: + data = mosaic(params) + assert data is not None +except: + print("\nThis hasn't worked") +else: + data = data.astype(float) + data[np.logical_or(data>100,data<1)] = np.nan + ####################### + # call visualise + ####################### + title = 'product {product} SDS {layer}\n'.format(**params) + \ + 'for day {doy} of year {year} for tiles {tiles}'.format(**params) + plot=visualise(data,title=title) +``` + + + --------------------------------------------------------------------------- + + ModuleNotFoundError Traceback (most recent call last) + + in + ----> 1 from uclgeog.process_timeseries import mosaic_and_clip, visualise + 2 import numpy as np + 3 + 4 ####################### + 5 # location: E Europe + + + ModuleNotFoundError: No module named 'uclgeog' + + + +```python +# check for grid info ... +!gdalinfo data/MOD10A1.A2010001.h19v03.006.2016083014706.hdf | grep NDSI_Snow_Cover +``` + + ERROR 4: data/MOD10A1.A2010001.h19v03.006.2016083014706.hdf: No such file or directory + gdalinfo failed - unable to open 'data/MOD10A1.A2010001.h19v03.006.2016083014706.hdf'. + + +### Exercise: Land Cover + +The MODIS land cover product is `MCD12Q1`. + +Use the codes above to explore, download, and plot a land cover dataset from the `MCD12Q1` product. + + +```python +from uclgeog.process_timeseries import mosaic_and_clip, visualise +import numpy as np + +####################### +# location: madagascar +# product MCD64A1 Burned Area +# see product page on +# https://lpdaac.usgs.gov/products/mcd12q1v006/ +# we see one of the SDS layers is 'LC_Type1' +# and that 1 to 17 are valid +# +# get the grid from +# https://ladsweb.modaps.eosdis.nasa.gov/filespec/MODIS/6/MCD64A1 +####################### +params = { + 'tiles' : ['h22v10'], + 'doy' : 1, + 'year' : 2018, + 'product' : 'MCD12Q1', + 'layer' : 'LC_Type1', + 'grid' : 'MCD12Q1', + 'verbose' : True, + 'base_url': 'https://e4ftl01.cr.usgs.gov/MOTA' +} + +####################### +# download and interpret +# note the valid range is different +# see the product table above +# Use a different variable name: +# its not lai any more! +####################### +try: + data = mosaic(params) + assert data is not None +except: + print("\nThis hasn't worked") +else: + data = data.astype(float) + data[np.logical_or(data>17,data<1)] = np.nan + ####################### + # call visualise + ####################### + title = 'product {product} SDS {layer}\n'.format(**params) + \ + 'for day {doy} of year {year} for tiles {tiles}'.format(**params) + plot=visualise(data,title=title) +``` + + + --------------------------------------------------------------------------- + + ModuleNotFoundError Traceback (most recent call last) + + in + ----> 1 from uclgeog.process_timeseries import mosaic_and_clip, visualise + 2 import numpy as np + 3 + 4 ####################### + 5 # location: madagascar + + + ModuleNotFoundError: No module named 'uclgeog' + + + +```python +from uclgeog.process_timeseries import mosaic_and_clip, visualise + +####################### +# location: madagascar +# product MCD12C1 yearly Land cover +# see product page on +# https://lpdaac.usgs.gov/products/mcd12q1v006/ +# we see one of the SDS layers is 'Majority_Land_Cover_Type_1' +# and that 255 is invalid +# +# get the grid from +# https://ladsweb.modaps.eosdis.nasa.gov/filespec/MODIS/6/MCD12Q1 +# +# Note that date for dataset is 2001.01.01 +# from https://e4ftl01.cr.usgs.gov/MOTA/MCD12Q1.006/ +# year 2019 & 2020 not there yet!! +####################### +params = { + 'tiles' : ['h22v10'], + 'doy' : 1, + 'year' : 2018, + 'product': 'MCD12Q1', + 'layer' : 'LC_Type1', + 'grid' : None, + 'verbose': True +} + +####################### +# download and interpret +# note the valid range is different +# see the product table above +# Use a different variable name: +# its not lai any more! +####################### +try: + data = mosaic(params) + assert data is not None +except: + print("\nThis hasn't worked") +else: + data = data.astype(float) + data[data>254] = np.nan + ####################### + # call visualise + ####################### + title = 'product {product} SDS {layer}\n'.format(**params) + \ + 'for day {doy} of year {year} for tiles {tiles}'.format(**params) + plot=visualise(data,title=title) +``` + + + --------------------------------------------------------------------------- + + ModuleNotFoundError Traceback (most recent call last) + + in + ----> 1 from uclgeog.process_timeseries import mosaic_and_clip, visualise + 2 + 3 ####################### + 4 # location: madagascar + 5 # product MCD12C1 yearly Land cover + + + ModuleNotFoundError: No module named 'uclgeog' + + + +```python +from uclgeog.process_timeseries import mosaic_and_clip, visualise + +############# +# FparLai_QC +############# + +####################### +# single tile here +# for SDS FparLai_QC +# note that valid values different here +####################### +params = { + 'tiles' : ['h18v03'], + 'doy' : 1+4*30, + 'year' : 2020, + 'product': 'MCD15A3H', + 'layer' : 'FparLai_QC', + 'verbose': True +} + +####################### +# download and interpret +####################### +lai = mosaic_and_clip(**params).astype(float) +lai[lai>254] = np.nan +####################### +# call visualise +# Don't' set vmax now +# as we want to see the +# full range of values +####################### +title = 'product {product} SDS {layer}\n'.format(**params) + \ + 'for day {doy} of year {year} for tiles {tiles}'.format(**params) +plot=visualise(lai,title=title) +``` + + + --------------------------------------------------------------------------- + + ModuleNotFoundError Traceback (most recent call last) + + in + ----> 1 from uclgeog.process_timeseries import mosaic_and_clip, visualise + 2 + 3 ############# + 4 # FparLai_QC + 5 ############# + + + ModuleNotFoundError: No module named 'uclgeog' + diff --git a/docs/021_GoogleEarthEngine 2.md b/docs/021_GoogleEarthEngine 2.md new file mode 100644 index 00000000..5a33aa03 --- /dev/null +++ b/docs/021_GoogleEarthEngine 2.md @@ -0,0 +1,324 @@ +# 201 Google Earth Engine + +## Introduction + + +### Purpose + +In this notebook, we introduce the python interface to [Google Earth Engine](https://earthengine.google.com) (GEE) using the [`geemap`](https://github.com/giswqs/geemap) package. + +We use GEE to explore some Earth Observation datasets and their characteristics and learn about interpreting quality control data. + + +We do not intend this to be a complete course on using GEE, and we do not want you spending all of your time developing in GEE at the moment: this is a course in scientific programming using Python and so needs to be more general. This then, it is a one-session introduction to some important datasets, the core ideas of processing in GEE, and some of the core methods you might use. When you want to develop your own codes, you will find the [`geemap` documentation](https://github.com/giswqs/geemap) and [examples](https://github.com/giswqs/earthengine-py-notebooks) of great use. + + + + +### Prerequisites + +You will need some understanding Python basics from part 1 of this course (notebooks with the code XXX ) + +You will also need to make sure you have a [google account](https://support.google.com/accounts/answer/27441?hl=en) to be able to use GEE, and will need to know your username and password. In addition, you will need to sign up for a GEE account. You need to request this by filling out the form at [signup.earthengine.google.com]( https://signup.earthengine.google.com/). **You will need to do this before we start the class** as you will need to wait for approval from Google. + +### Timing + +The session should take around 30 minutes to initially explore, though you could spend a lifetime looking at all of the datasets! + +## Earth Engine + +### What is GEE? + +Google Earth Engine (GEE) is a facility for access to vast quantities of Earth Observation (EO) data, as well as many other geospatial datasets. It is a hugely valuable resource for scientists, as well as making a significant contribution to the democratisation of EO: anyone can sign up for a google account and use these resources to both access and process data. With a little coding experience, anyone can develop their own products or analyses. + +You will find an increasingly large range of projects and [examples](https://earthengine.google.com/case_studies/) available. For more information on GEE, see the [GEE FAQ](https://earthengine.google.com/faq/). You can develop and deploy your own apps either using GEE on google Cloud [example](https://plewis.users.earthengine.app/view/nceo-united-kingdom), or hosted on other resources such as [heroku](https://github.com/giswqs/earthengine-apps) ([example](https://geemap-demo.herokuapp.com)). + +There are some complexities and limitations to GEE that you should understand as well. In particular, whilst you can do some truly amazing things using GEE resources, they are provided free to you, and so there are limits to the amount of processing you can do at any one time, as well as quite strict limits on the amount of GEE local storage made available to you. You can certainly do great science within GEE, but to be a good coder, you will need wider exposure to accessing datasets than just through GEE. + +### Interfaces + +#### Code editor + +The main interface to GEE is through the [web-based code editor](https://code.earthengine.google.com/). There is a good set of [documentation on this](https://developers.google.com/earth-engine/guides/getstarted) that you can browse through at a later date. In the code editor, you can run and develop [JavaScript](https://www.javascript.com/) codes, access saved datasets and documentation, and gain some basic experience of using GEE. Although we have not taught you JavaScript, you will notice that it is a high-level language with many similarities to Python. The GEE [guide for Python installation](https://developers.google.com/earth-engine/guides/python_install) provides some succinct advice and examples of common syntax differences between JavaScript and Python. In addition, there are resources available to allow you to [translate GEE JavaScript codes into Python](https://github.com/giswqs/geemap/blob/master/examples/notebooks/15_convert_js_to_py.md). + +As a follow-up to this class, we suggest that you look in the `Scripts` tab of the [code editor](https://code.earthengine.google.com/), and try out one or more of the examples under the `Examples` list, for example, `Examples -> ImageCollection -> Landsat Simple Composite`. To use this, you need do no more than load the code by clicking on it, then click on the `Run` button. This example is a good one to start with: if you pan out in the viewer window you will see that GEE can process this 30 m resolution dataset *anywhere in the world* for you, in near real-time. It is showing a composite of all of the Landsat images over 6 months in 2015, between the dates `2015-1-1` and `2015-7-1`. To do this requires only around 4 lines of code. This is an amazing feat. + +#### QGIS + +Users of the popular [`QGIS`](https://qgis.org/en/site/) tool will be interested to know that GEE is available as a plugin. One version of this using the Python package [`ee`](https://anaconda.org/conda-forge/earthengine-api) is [plugin to QGIS](https://gee-community.github.io/qgis-earthengine-plugin/). + +#### ee and geemap + +We will also be using the [`ee`](https://anaconda.org/conda-forge/earthengine-api) Python package to access GEE, but with the [`geemap`](https://github.com/giswqs/geemap) package providing the mapping front-end. [`geemap`](https://github.com/giswqs/geemap) has very good documentation and an excellent range of [examples](https://github.com/giswqs/earthengine-py-notebooks). This should make it much easier for you to access GEE. + + + + +### GEE datasets + +A fundamental part of GEE is the vast quantities of data that it gives access to. The core datasets are described in the [GEE data catalog](https://developers.google.com/earth-engine/datasets). The GEE code that you write has straightforward access to any or all of these datasets and, importantly, is able to process them using GEE resources. + +You do not need to download the datasets, and do not need to know great details of what goes on internally in the engine to use GEE. But, as we will see, you still need to think carefully about any interpretation of the data. + +You should spend some time after the class exploring the GEE datasets in the [data catalog](https://developers.google.com/earth-engine/datasets), but for this session, we will concentrate on the following quantities: + +* Surface Reflectance +* Leaf Area Index + + +#### Leaf Area Index + +The data product [MOD15](https://modis.gsfc.nasa.gov/data/dataprod/mod15.php) LAI/FPAR has been generated from NASA MODIS sensors Terra and Aqua data since 2002. We are now in dataset collection 6 (the data version to use). + + LAI is defined as the one-sided green leaf area per unit ground area in broadleaf canopies and as half the total needle surface area per unit ground area in coniferous canopies. FPAR is the fraction of photosynthetically active radiation (400-700 nm) absorbed by green vegetation. Both variables are used for calculating surface photosynthesis, evapotranspiration, and net primary production, which in turn are used to calculate terrestrial energy, carbon, water cycle processes, and biogeochemistry of vegetation. Algorithm refinements have improved quality of retrievals and consistency with field measurements over all biomes, with a focus on woody vegetation. + +https://developers.google.com/earth-engine/datasets/tags/lai + +https://developers.google.com/earth-engine/datasets/tags/crop + + +```python +import ee +import geemap +``` + + +```python +Map = geemap.Map() +Map + +``` + + + Map(center=[40, -100], controls=(WidgetControl(options=['position'], widget=HBox(children=(ToggleButton(value=… + + + +```python +point = ee.Geometry.Point([-87.7719, 41.8799]) + +image = ee.ImageCollection('MODIS/006/MCD15A3H') \ + .filterBounds(point) \ + .filterDate('2019-01-01', '2019-12-31') \ + .max() \ + .select('Lai') + +vis_params = { + 'min': 0, + 'max': 60, + 'bands': ['Lai'] +} + +Map.centerObject(point, 8) +Map.addLayer(image, vis_params, "MODIS LAI") +``` + + +```python +props = geemap.image_props(image) +props.getInfo() +``` + + + --------------------------------------------------------------------------- + + HttpError Traceback (most recent call last) + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/ee/data.py in _execute_cloud_call(call, num_retries) + 344 try: + --> 345 return call.execute(num_retries=num_retries) + 346 except googleapiclient.errors.HttpError as e: + + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/googleapiclient/_helpers.py in positional_wrapper(*args, **kwargs) + 133 logger.warning(message) + --> 134 return wrapped(*args, **kwargs) + 135 + + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/googleapiclient/http.py in execute(self, http, num_retries) + 906 if resp.status >= 300: + --> 907 raise HttpError(resp, content, uri=self.uri) + 908 return self.postproc(resp, content) + + + HttpError: + + + During handling of the above exception, another exception occurred: + + + EEException Traceback (most recent call last) + + in + 1 props = geemap.image_props(image) + ----> 2 props.getInfo() + + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/ee/computedobject.py in getInfo(self) + 93 The object can evaluate to anything. + 94 """ + ---> 95 return data.computeValue(self) + 96 + 97 def encode(self, encoder): + + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/ee/data.py in computeValue(obj) + 708 body={'expression': serializer.encode(obj, for_cloud_api=True)}, + 709 project=_get_projects_path(), + --> 710 prettyPrint=False))['result'] + 711 return send_('/value', { + 712 'json': obj.serialize(for_cloud_api=False), + + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/ee/data.py in _execute_cloud_call(call, num_retries) + 345 return call.execute(num_retries=num_retries) + 346 except googleapiclient.errors.HttpError as e: + --> 347 raise _translate_cloud_exception(e) + 348 + 349 + + + EEException: Date: Parameter 'value' is required. + + + +```python +props.get('IMAGE_DATE').getInfo() + +``` + + + --------------------------------------------------------------------------- + + HttpError Traceback (most recent call last) + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/ee/data.py in _execute_cloud_call(call, num_retries) + 344 try: + --> 345 return call.execute(num_retries=num_retries) + 346 except googleapiclient.errors.HttpError as e: + + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/googleapiclient/_helpers.py in positional_wrapper(*args, **kwargs) + 133 logger.warning(message) + --> 134 return wrapped(*args, **kwargs) + 135 + + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/googleapiclient/http.py in execute(self, http, num_retries) + 906 if resp.status >= 300: + --> 907 raise HttpError(resp, content, uri=self.uri) + 908 return self.postproc(resp, content) + + + HttpError: + + + During handling of the above exception, another exception occurred: + + + EEException Traceback (most recent call last) + + in + ----> 1 props.get('IMAGE_DATE').getInfo() + + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/ee/computedobject.py in getInfo(self) + 93 The object can evaluate to anything. + 94 """ + ---> 95 return data.computeValue(self) + 96 + 97 def encode(self, encoder): + + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/ee/data.py in computeValue(obj) + 708 body={'expression': serializer.encode(obj, for_cloud_api=True)}, + 709 project=_get_projects_path(), + --> 710 prettyPrint=False))['result'] + 711 return send_('/value', { + 712 'json': obj.serialize(for_cloud_api=False), + + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/ee/data.py in _execute_cloud_call(call, num_retries) + 345 return call.execute(num_retries=num_retries) + 346 except googleapiclient.errors.HttpError as e: + --> 347 raise _translate_cloud_exception(e) + 348 + 349 + + + EEException: Date: Parameter 'value' is required. + + + +```python +props.get('CLOUD_COVER').getInfo() + +``` + + + --------------------------------------------------------------------------- + + HttpError Traceback (most recent call last) + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/ee/data.py in _execute_cloud_call(call, num_retries) + 344 try: + --> 345 return call.execute(num_retries=num_retries) + 346 except googleapiclient.errors.HttpError as e: + + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/googleapiclient/_helpers.py in positional_wrapper(*args, **kwargs) + 133 logger.warning(message) + --> 134 return wrapped(*args, **kwargs) + 135 + + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/googleapiclient/http.py in execute(self, http, num_retries) + 906 if resp.status >= 300: + --> 907 raise HttpError(resp, content, uri=self.uri) + 908 return self.postproc(resp, content) + + + HttpError: + + + During handling of the above exception, another exception occurred: + + + EEException Traceback (most recent call last) + + in + ----> 1 props.get('CLOUD_COVER').getInfo() + + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/ee/computedobject.py in getInfo(self) + 93 The object can evaluate to anything. + 94 """ + ---> 95 return data.computeValue(self) + 96 + 97 def encode(self, encoder): + + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/ee/data.py in computeValue(obj) + 708 body={'expression': serializer.encode(obj, for_cloud_api=True)}, + 709 project=_get_projects_path(), + --> 710 prettyPrint=False))['result'] + 711 return send_('/value', { + 712 'json': obj.serialize(for_cloud_api=False), + + + ~/anaconda3/envs/geog0111/lib/python3.7/site-packages/ee/data.py in _execute_cloud_call(call, num_retries) + 345 return call.execute(num_retries=num_retries) + 346 except googleapiclient.errors.HttpError as e: + --> 347 raise _translate_cloud_exception(e) + 348 + 349 + + + EEException: Date: Parameter 'value' is required. + + + +```python + +``` diff --git a/docs/031_Plotting 2.md b/docs/031_Plotting 2.md new file mode 100644 index 00000000..73dee27f --- /dev/null +++ b/docs/031_Plotting 2.md @@ -0,0 +1,710 @@ +The iconic cover shows the oscillation signal coming from the Pulsar PSR B1919+21 https://en.wikipedia.org/wiki/PSR_B1919%2B21 + +derived from https://github.com/igorol/unknown_pleasures_plot and https://matplotlib.org/3.1.1/gallery/animation/unchained.html + + +```python +import pandas as pd +import numpy as np +import matplotlib.pyplot as plt +from matplotlib import animation, rc + +from IPython.display import HTML + +``` + + +```python +url='https://raw.githubusercontent.com/igorol/unknown_pleasures_plot/master/pulsar.csv' +df=pd.read_csv(url,header=None) +``` + + +```python +df +``` + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
0123456789...290291292293294295296297298299
0-0.81-0.91-1.09-1.00-0.59-0.82-0.43-0.68-0.71-0.27...-0.080.19-0.19-0.18-0.20-0.26-0.52-0.44-0.58-0.54
1-0.61-0.40-0.42-0.38-0.55-0.51-0.71-0.79-0.52-0.40...-0.34-0.58-0.26-0.64-1.05-0.83-0.80-0.47-0.13-0.12
2-1.43-1.15-1.25-1.13-0.76-0.250.400.260.300.36...-0.290.160.830.991.280.11-0.77-0.88-0.45-1.01
3-1.09-0.85-0.72-0.74-0.26-0.04-0.190.180.030.19...0.480.52-0.14-1.13-1.07-1.03-0.78-0.400.180.27
4-1.13-0.98-0.93-0.90-1.14-1.00-0.90-1.18-1.30-1.07...-0.27-0.47-0.49-0.23-0.75-0.29-0.54-0.65-0.64-0.94
..................................................................
750.620.640.590.300.010.05-0.630.070.360.78...0.200.220.230.27-0.10-0.21-0.09-0.24-0.17-0.62
760.320.310.280.42-0.24-0.48-0.73-0.640.040.02...-0.44-0.53-0.50-0.49-0.63-0.56-0.50-0.38-0.58-0.43
77-0.09-0.14-0.24-0.24-0.660.000.290.290.600.86...0.08-0.88-1.17-0.36-0.31-0.120.29-0.020.210.44
780.110.050.05-0.05-0.03-0.29-0.08-0.54-0.010.01...-0.73-0.54-0.53-0.92-0.68-0.87-1.31-1.02-1.10-1.62
790.12-0.12-0.12-0.45-0.24-0.48-0.57-0.19-0.07-0.59...0.120.03-0.280.02-0.010.130.09-0.01-0.03-0.23
+

80 rows × 300 columns

+
+ + + + +```python +vertical_margin = 20 +horizontal_margin = 100 +x_size = 28 +y_size = 25 +linewidth = 4 +plot_name='images/new_order.png' + + +plt.style.use('dark_background') +fig, ax = plt.subplots(figsize=(x_size,y_size),frameon=False) + +data = np.array(df) + +n_lines = data.shape[0] +x = range(data.shape[1]) + +ax.set_yticks([]) +ax.set_xticks([]) +ax.set_xlim(min(x)-horizontal_margin, max(x)+horizontal_margin) +ax.set_ylim(-vertical_margin, df.shape[0] + vertical_margin) + +def init(): + lines = [] + fills = [] + for i in range(n_lines): + line = data[i]/3 + (n_lines - i) + pltline = ax.plot(x, line, lw=linewidth, c='white', alpha=1, zorder=i/n_lines) + pltfill = ax.fill_between(x, -5,line, facecolor='black', zorder=i/n_lines) + lines.append(pltline) + fills.append(pltfill) + return (lines,fills) + +xx=init() +``` + + +![png](031_Plotting_files/031_Plotting_4_0.png) + + + +```python +data.shape +``` + + + + + (80, 300) + + + + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.animation import FuncAnimation +import mpl_toolkits.axes_grid1 +import matplotlib.widgets +vertical_margin = 2.0 +horizontal_margin = 10.0 +x_size = 2.8 +y_size = 2.5 +linewidth = .4 +fig, ax = plt.subplots(figsize=(x_size,y_size),frameon=False) + + +data = np.array(df) + +n_lines = data.shape[0] +x = range(data.shape[1]) + +ax.set_yticks([]) +ax.set_xticks([]) +ax.set_xlim(min(x)-horizontal_margin, max(x)+horizontal_margin) +ax.set_ylim(-vertical_margin, df.shape[0] + vertical_margin) + +def init(): + lines = [] + fills = [] + for i in range(n_lines): + line = data[i]/3 + (n_lines - i) + pltline = ax.plot(x, line, lw=linewidth, c='white', alpha=1, zorder=i/n_lines) + pltfill = ax.fill_between(x, -5,line, facecolor='black', zorder=i/n_lines) + lines.append(pltline) + fills.append(pltfill) + return (lines,fills) + +xx=init() +#plt.savefig(plot_name) +def update(i): + # Shift all data to the right with wrap around + old = data[-1,-1] + data[-1,1:] = data[-1,:-1] + # Fill-in new values + data[-1,0] = old + for i in range(n_lines): + matplotlib.lines.Line2D(lines[i],data[i]) + return lines[0] + +lines,fills = init() +anim = animation.FuncAnimation(fig, update, interval=10,frames=10,blit=True) + +# call the animator. blit=True means only re-draw the parts that have changed. +``` + + +![png](031_Plotting_files/031_Plotting_6_0.png) + + + +```python +HTML(anim.to_html5_video()) +``` + + + + + + + + + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.animation import FuncAnimation + +fig, ax = plt.subplots() +xdata, ydata = [], [] +ln, = plt.plot([], [], 'ro') + +def init(): + ax.set_xlim(0, 2*np.pi) + ax.set_ylim(-1, 1) + return ln, + +def update(frame): + xdata.append(frame) + ydata.append(np.sin(frame)) + ln.set_data(xdata, ydata) + return ln, + +ani = FuncAnimation(fig, update, frames=np.linspace(0, 2*np.pi, 128), + init_func=init, blit=True) +plt.show() +``` + + +![png](031_Plotting_files/031_Plotting_8_0.png) + + + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.animation as animation +from matplotlib.widgets import Slider + +TWOPI = 2*np.pi + +fig, ax = plt.subplots() + +t = np.arange(0.0, TWOPI, 0.001) +initial_amp = .5 +s = initial_amp*np.sin(t) +l, = plt.plot(t, s, lw=2) + +ax = plt.axis([0,TWOPI,-1,1]) + +axamp = plt.axes([0.25, .03, 0.50, 0.02]) +# Slider +samp = Slider(axamp, 'Amp', 0, 1, valinit=initial_amp) + +def update(val): + # amp is the current value of the slider + amp = samp.val + # update curve + l.set_ydata(amp*np.sin(t)) + # redraw canvas while idle + fig.canvas.draw_idle() + +# call update function on slider value change +samp.on_changed(update) + +plt.show() +``` + + +![png](031_Plotting_files/031_Plotting_9_0.png) + + + + + + + + + + + + + + + + + + +```python + +``` diff --git a/docs/040_GDAL 2.md b/docs/040_GDAL 2.md new file mode 100644 index 00000000..f7d673c2 --- /dev/null +++ b/docs/040_GDAL 2.md @@ -0,0 +1,160 @@ +# 3 Geospatial processing with `gdal` + +

Table of Contents

+
    + + +[GDAL](https://gdal.org) is the workhorse of geospatial processing. Basically, GDAL offers a common library to access a vast number of formats (if you want to see how vast, [check this](https://gdal.org/formats_list.html)). In addition to letting you open and convert obscure formats to something more useful, a lot of functionality in terms of processing raster data is available (for example, working with projections, combining datasets, accessing remote datasets, etc). + +For vector data, the counterpart to GDAL is OGR (which is now a part of the GDAL library anyway), which also supports [many vector formats](https://gdal.org/ogr_formats.html). The combination of both libraries is a very powerful tool to work with geospatial data, not only from Python, but from [many other popular computer languages](https://trac.osgeo.org/gdal/#GDALOGRInOtherLanguages). + +In this session, we will introduce the `gdal` geospatial module which can read a wide range of raster scientific data formats. We will also introduce the related `ogr` vector package. + +In pacticular, we will learn how to: + +* access and download NASA geophysical datasets (specifically, the MODIS LAI/FPAR product) +* apply a vector mask to the dataset +* apply quality control flags to the data +* stack datasets into a 3D numpy dataset for further analysis, including interpolation of missing values +* visualise the data +* store the stacked dataset + +**These are all tasks that you will be required to do for the formal assessment of this course. You will however be using a different NASA dataset.** + + +## 3.1 MODIS LAI product +To introduce geospatial processing, we will use a dataset from the MODIS LAI product over the UK. + +You should note that the dataset you need to use for your assessed practical is a MODIS dataset with similar characteristics to the one in this example. + +The data product [MOD15](https://modis.gsfc.nasa.gov/data/dataprod/mod15.php) LAI/FPAR has been generated from NASA MODIS sensors Terra and Aqua data since 2002. We are now in dataset collection 6 (the data version to use). + + LAI is defined as the one-sided green leaf area per unit ground area in broadleaf canopies and as half the total needle surface area per unit ground area in coniferous canopies. FPAR is the fraction of photosynthetically active radiation (400-700 nm) absorbed by green vegetation. Both variables are used for calculating surface photosynthesis, evapotranspiration, and net primary production, which in turn are used to calculate terrestrial energy, carbon, water cycle processes, and biogeochemistry of vegetation. Algorithm refinements have improved quality of retrievals and consistency with field measurements over all biomes, with a focus on woody vegetation. + +We use such data to map and understand about the dynamics of terrestrial vegetation / carbon, for example, for climate studies. + +The raster data are arranged in tiles, indexed by row and column, to cover the globe: + + +![MODIS tiles](https://www.researchgate.net/profile/J_Townshend/publication/220473201/figure/fig5/AS:277546596880390@1443183673583/The-global-MODIS-Sinusoidal-tile-grid.png) + + + +**Exercise 3.1.1** + +The pattern on the tile names is `hXXvYY` where `XX` is the horizontal coordinate and `YY` the vertical. + + +* use the map above to work out the names of the two tiles that we will need to access data over the UK +* set the variable `tiles` to contain these two names in a list + +For example, for the two tiles covering Madegascar, we would set: + + tiles = ['h22v10','h22v11'] + + +```python +# do exercise here +``` + +### 3.1.1 NASA Earthdata access + +#### 3.1.1.1 Register at NASA Earthdata + +Before you attempt to do this section, you will need to register at [NASA Earthdata](https://urs.earthdata.nasa.gov/home). + + +We have set up these notes so that you don't have to put your username and password in plain text. Instead, you need to enter your username and password when prompted by `cylog`. The password is stored in an encrypted file, although it can be accessed as plain text within your Python session. + +**N.B. using `cylog().login()` is only intended to work with access to NASA Earthdata and to prevent you having to expose your username and password in these notes**. + +`cylog().login()` returns the tuple `(username,password)` in plain text. + + +```python +import geog0111.nasa_requests as nasa_requests +from geog0111.cylog import cylog +%matplotlib inline + +url = 'https://e4ftl01.cr.usgs.gov/MOTA/MCD15A3H.006/2018.09.30/' + +# grab the HTML information +try: + html = nasa_requests.get(url).text + # test a few lines of the html + if html[:20] == '= 2020400: + print('gdal ok',version) +else: + print('gdal problem',version,'2.2.4+ expected') +``` + + gdal ok 3000400 + + +If there is a problem and you are on the geography system, we should be able to fix it for you. + +If you are not on the geography system, try running: + + conda env update -f environment.yml + +before going any further. If an update occurs, shutdown and restart your notebooks. + +## 3.2 Automatic downloading of NASA MODIS products + +In [this section](041_MODIS_download.md), you will learn how to: + +* scan the directories (on the Earthdata server) where the MODIS data are stored +* get the dataset filename for a given tile, date and product +* get to URL associated with the dataset +* use the URL to pull the dataset over to store in the local file system + +## 3.3 GDAL masking + +In [this section](042_GDAL_masking.md) you will learn how to: + +* load locally stored files into gdal +* select a particular dataset +* form a virtual 'stitched' dataset from multiple files +* apply a mask to the data from a vector boundary +* crop the dataset + +## 3.4 GDAL stacking and interpolating + +In [this section](043_GDAL_stacking_and_interpolating.md) you will learn how to: + +* generate a numpy time series of spatial data +* interpolate/smooth the dataset + +## 3.X Summary + +In this session, we have learned to use some geospatial tools using GDAL in Python. A good set of [working notes on how to use GDAL](http://jgomezdans.github.io/gdal_notes/) has been developed that you will find useful for further reading. + +We have also very briefly introduced dealing with vector datasets in `ogr`, but this was mainly through the use of a pre-defined function that will take an ESRI shapefile (vector dataset), warp this to the projection of a raster dataset, and produce a mask for a given layer in the vector file. + +If there is time in the class, we will develop some exercises to examine the datasets we have generated and/or to explore some different datasets or different locations. + diff --git a/docs/041_MODIS_download 2.md b/docs/041_MODIS_download 2.md new file mode 100644 index 00000000..001b487b --- /dev/null +++ b/docs/041_MODIS_download 2.md @@ -0,0 +1,814 @@ +# 3.2 Accessing MODIS Data products + +

    Table of Contents

    + + +## 3.2.1 Introduction + +In this section, you will learn how to: + +* scan the directories (on the Earthdata server) where the MODIS data are stored +* get the dataset filename for a given tile, date and product +* get to URL associated with the dataset +* use the URL to pull the dataset over to store in the local file system + +You should already know: + +* basic use of Python (sections 1 and 2) +* the MODIS product grid system +* the two tiles needed to cover the UK + + tiles = ['h17v03', 'h18v03'] + +* what LAI is and the code for the MODIS LAI/FPAR product [MOD15](https://modis.gsfc.nasa.gov/data/dataprod/mod15.php) +* your username and password for [NASA Earthdata](https://urs.earthdata.nasa.gov/home), or have previously entered this with [`cylog`](geog0111/cylog.py). + +Let's first just test your NASA login: + + +```python +import geog0111.nasa_requests as nasa_requests +from geog0111.cylog import cylog + +url = 'https://e4ftl01.cr.usgs.gov/MOTA/MCD15A3H.006/2018.09.30/' + +# grab the HTML information +try: + html = nasa_requests.get(url).text + # test a few lines of the html + if html[:20] == ' + + + Index of /MOTA/MCD15A3H.006/2018.09.30 + + + + +
    + +
    +    ********************************************************************************
    +    
    +                             U.S. GOVERNMENT COMPUTER
    +    
    +    This is the NASA Land Processes Distributed Active Archive Center (LP DAAC) 
    +    Distribution Server hosted at the USGS Earth Resources Observation and Science 
    +    (EROS) Center.  The purpose of this server is to provide NASA data products to 
    +    the public.  The directory listing is exposed intentionally for user navigation 
    +    to the NASA data products.  Large data downloads (not jpeg browse) requires 
    +    user authentication through the NASA Earthdata Login. To obtain a NASA Earthdata 
    +    Logi
    +    
    +     ------------------------------ etc ------------------------------
    +    
    +    [   ] MCD15A3H.A2018273.h35v08.006.2018278143649.hdf.xml      2018-10-05 09:42  7.6K  
    +    [   ] MCD15A3H.A2018273.h35v09.006.2018278143649.hdf          2018-10-05 09:42  207K  
    +    [   ] MCD15A3H.A2018273.h35v09.006.2018278143649.hdf.xml      2018-10-05 09:42  7.6K  
    +    [   ] MCD15A3H.A2018273.h35v10.006.2018278143650.hdf          2018-10-05 09:42  298K  
    +    [   ] MCD15A3H.A2018273.h35v10.006.2018278143650.hdf.xml      2018-10-05 09:42  7.6K  
    +    
    + + + + +In HTML the code text such as: + + MCD15A3H.A2018273.h35v10.006.2018278143650.hdf + + +specifies an HTML link, that will appear as + + MCD15A3H.A2018273.h35v10.006.2018278143650.hdf 2018-10-05 09:42 7.6K + +and link to the URL specified in the `href` field: `MCD15A3H.A2018273.h35v10.006.2018278143650.hdf`. + +We could interpret this information by searching for strings etc., but the package `BeautifulSoup` can help us a lot in this. + + + + + +```python +import geog0111.nasa_requests as nasa_requests +from geog0111.get_url import get_url +from bs4 import BeautifulSoup + +doy,year = 273,2018 +url = get_url(doy,year).url +html = nasa_requests.get(url).text + +# use BeautifulSoup +# to get all urls referenced with +# html code +soup = BeautifulSoup(html,'lxml') +links = [mylink.attrs['href'] for mylink in soup.find_all('a')] +``` + +**Exercise E3.2.2** + +* copy the code in the block above and print out some of the linformation in the list `links` (e.g. the last 20 entries) +* using an implicit loop, make a list called `hdf_filenames` of only those filenames (links) that have `hdf` as their filename extension. + +**Hint 1**: first you might select an example item from the `links` list: + + item = links[-1] + print('item is',item) + +and print: + + item[-3:] + +but maybe better (why would this be?) is: + + item.split('.')[-1] + +**Hint 2**: An implicit loop is a construct of the form: + + [item for item in links] + +In an implicit for loop, we can actually add a conditional statement if we like, e.g. try: + + hdf_filenames = [item for item in links if item[-5] == '4'] + +This will print out `item` if the condition `item[-5] == '4'` is met. That's a bit of a pointless test, but illustrates the pattern required. Try this now with the condition you want to use to select `hdf` files. + + +```python +# do exercise here +``` + +## 3.2.3 MODIS filename format + +The `hdf` filenames are of the form: + + MCD15A3H.A2018273.h35v10.006.2018278143650.hdf + +where: + +* the first field (`MCD15A3H`) gives the product code +* the second (`A2018273`) gives the observation date: day of year `273`, `2018` here +* the third (`h35v10`) gives the 'MODIS tile' code for the data location +* the remaining fields specify the product version number (`006`) and a code representing the processing date. + +If we want a particular dataset, we would assume then that we know the information to construct the first four fields. + +We then have the task remaining of finding an address of the pattern: + + MCD15A3H.A2018273.h17v03.006.*.hdf + +where `*` represents a wildcard (unknown element of the URL/filename). + +Putting together the code from above to get a list of the `hdf` files: + + +```python +#from geog0111.nasa_requests import nasa_requests +from bs4 import BeautifulSoup +from geog0111.get_url import get_url +import geog0111.nasa_requests as nasa_requests + +doy,year = 273,2018 +url = get_url(doy,year).url +html = nasa_requests.get(url).text +soup = BeautifulSoup(html,'lxml') +links = [mylink.attrs['href'] for mylink in soup.find_all('a')] + +# get all files that end 'hdf' as in example above +hdf_filenames = [item for item in links if item.split('.')[-1] == 'hdf'] +``` + +We now want to specify a particular tile or tiles to access. + +In this case, we want to look at the field `item.split('.')[-4]` and check to see if it is the list `tiles`. + +**Exercise 3.2.3** + +* copy the code above and print out the first 10 values in the list `hdf_filenames`. Can you recognise where the tile information is in the string? + +Now, let's check what we get when we look at `item.split('.')[-4]`. + +* set a variable called `tiles` containing the names of the UK tiles (as in Exercise 3.1.1) +* write a loop `for item in links:` to loop over each item in the list `links` +* inside this loop set the condition `if item.split('.')[-1] == 'hdf':` to select only `hdf` files, as above +* inside this conditional statement, print out `item.split('.')[-4]` to see if it looks like the tile names +* having confirmed that you are getting the right information, add another conditional statement to see if `item.split('.')[-4] in tiles`, and then print only those filenames that pass both of your tests +* see if you can combine the two tests (the two `if` statements) into a single one + +**Hint 1**: if you print all of the tilenames, this will go on for quite some time. Instead it may be better to use `print(item.split('.')[-4],end=' ')`, which will put a space, rather than a newline between each item printed. + +**Hint 2**: recall what the logical statement `(A and B)` gives when thinking about the combined `if` statement + + +```python +# do exercise here +``` + +You should end up with something like: + + +```python +import geog0111.nasa_requests as nasa_requests +from bs4 import BeautifulSoup +from geog0111.get_url import get_url + +doy,year = 273,2018 +tiles = ['h17v03', 'h18v03'] + +url = get_url(doy,year).url +html = nasa_requests.get(url).text +soup = BeautifulSoup(html,'lxml') +links = [mylink.attrs['href'] for mylink in soup.find_all('a')] + +tile_filenames = [item for item in links \ + if (item.split('.')[-1] == 'hdf') and \ + (item.split('.')[-4] in tiles)] +``` + +**Exercise E3.2.4** + +* print out the first 10 items in `tile_filenames` and check the result is as you expect. +* write a function called `modis_tiles()` that takes as input `doy`, `year` and `tiles` and returns a list of the modis tile **urls**. + +**Hint** + +1. Don't forget to put in a mechanism to allow you to change the default `base_url`, `product` and `version` (as you did for the function `get_url()`) + +2. In some circumstances, yopu can get repeats of filenames in the list. One way to get around this is to convert the list to a `numpy` array, and use [`np.unique()`](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.unique.html) to remove duplicates. + + import numpy as np + tile_filenames = np.unique(tile_filenames) + + +```python +# do exercise here +``` + +You should end up with something like: + + +```python +from geog0111.modis_tiles import modis_tiles + +doy,year = 273,2018 +tiles = ['h17v03', 'h18v03'] + +tile_urls = modis_tiles(doy,year,tiles) +``` + +**Exercise E3.2.5** + +* print out the first 10 items in `tile_urls` and check the result is as you expect. + + +```python +# do exercise here +``` + +## 3.2.4 Saving binary data to a file + +We suppose that we want to save the dataset to a local file on the system. + +To do that, we need to know how to save a binary dataset to a file. To do this well, we should also consider factors such as whether we want to save a file we already have. + +Before we go any further we should check: + +* that the directory exists (if not, create it) +* that the file doesn't already exist (else, don't bother) + +We can conveniently use methods in [`pathlib.Path`](https://docs.python.org/3/library/pathlib.html) for this. + +So, import `Path`: + + from pathlib import Path + +We suppose we might want to put a file (variable `filename`) into the directory `destination_folder`: + +To test if a directory exists and create if not: + + dest_path = Path(destination_folder) + if not dest_path.exists(): + dest_path.mkdir() + +To make a compound name of `dest_path` and `filename`: + + output_fname = dest_path.joinpath(filename) + +To test if a file exists: + + if not output_fname.exists(): + print(f"{str(output_fname))} doesn't exist yet ..."}) + + + +**Exercise E3.2.6** + +* set a variable `destination_folder` to `data` and write code to create this folder ('directory') if it doesn't already exist. +* set a variable `filename` to `test.bin` and write code to check to see if this file is in the folder `destination_folder`. If not, print a message to say so. + + +```python +# do exercise here +``` + +We now try to read the binary file `data/test_image.bin`. + +This involves opening a binary file for reading: + + fp = open(input_fname, 'rb') + +Then reading the data: + + data = fp.read() + +Then close `fp` + + fp.close() + + +```python +input_fname = 'data/test_image.bin' +fp = open(input_fname, 'rb') +data = fp.read() +fp.close() +print(f'data read is {len(data)} bytes') +``` + + data read is 9136806 bytes + + +And now, write the data as `data/test.bin`. + +This involves opening a binary file for writing: + + fp = open(output_fname, 'wb') + +Then reading the data: + + d = fp.write(data) + +and closing as before: + + fp.close() + + +```python +output_fname = 'data/test.bin' +fp = open(output_fname, 'wb') +d = fp.write(data) +print(f'data written is {d} bytes') +``` + + data written is 9136806 bytes + + +We can avoid the need for the `close` by using the construct: + + with open(output_fname, 'wb') as fp: + d = fp.write(data) + + +```python +d = 0 +with open(output_fname, 'wb') as fp: + d = fp.write(data) +print(f'data written is {d} bytes') +``` + + data written is 9136806 bytes + + +**Exercise E3.2.7** + +With the ideas above, write some code to: + +* check to see if the output directory `data` exists +* if not, create it +* check to see if the input file `data/test_image.bin` exists +* if so, read it in to `data` +* check to see if the output file `data/test.bin` exists +* if not (and if you read data), save `data` to this file +* once you are happy with the code operation, write a function: `save_data(data,filename,destination_folder)` that takes the binary dataset `data` and writes it to the file `filename` in directory `destination_folder`. It should return the n umber of bytes written, and should check to see if files / directories exist and act accordingly. +* add a keyword option to `save_data()` that will overwrite the filename, even if it already exists. + + +```python +# do exercise here +``` + +You should now know how to save a binary data file. + +## 3.2.4 downloading the data file + +The following code uses the `nasa_requests` library to pull some binary data from a URL. + +The response is tested (`r.ok`), and if it is ok, then we split the url to derive the filename, and print this out. + +The binary dataset is available as `r.content`, which we store to the variable `data` here: + + +```python +import geog0111.nasa_requests as nasa_requests +from geog0111.modis_tiles import modis_tiles +from pathlib import Path + +doy,year = 273,2018 +tiles = ['h17v03', 'h18v03'] +destination_folder = 'data' + +tile_urls = modis_tiles(doy,year,tiles) + +# loop over urls +for url in tile_urls: + r = nasa_requests.get(url) + + # check response + if r.ok: + # get the filename from the url + filename = url.split('/')[-1] + # get the binary data + data = r.content + + print(filename) + else: + print (f'response from {url} not good') +``` + + response from https://e4ftl01.cr.usgs.gov/MOTA/MCD15A3H.006/2018.09.30/MCD15A3H.A2018273.h17v03.006.2018278143630.hdf not good + response from https://e4ftl01.cr.usgs.gov/MOTA/MCD15A3H.006/2018.09.30/MCD15A3H.A2018273.h18v03.006.2018278143633.hdf not good + + +**Exercise E3.2.8** + +* use the code above to write a function `get_modis_files()` that takes as input `doy`, `year` and `tiles`, has a default `destination_folder` of `data`, that downloads the appropriate datasets (if they don't already exist). It should have similar defaults to `modis_tiles()`. It should return a list of the output filenames. + + +```python +# do exercise here +``` + +You should end up with something like: + + +```python +import geog0111.nasa_requests as nasa_requests +from geog0111.save_data import save_data + +doy,year = 273,2018 +tiles = ['h17v03', 'h18v03'] +destination_folder = 'data' + +tile_urls = modis_tiles(doy,year,tiles) + +# loop over urls +for url in tile_urls: + r = nasa_requests.get(url) + + # check response + if r.ok: + # get the filename from the url + filename = url.split('/')[-1] + # get the binary data + d = save_data(r.content,filename,destination_folder) + print(filename,d) + else: + print (f'response from {url} not good') +``` + + response from https://e4ftl01.cr.usgs.gov/MOTA/MCD15A3H.006/2018.09.30/MCD15A3H.A2018273.h17v03.006.2018278143630.hdf not good + response from https://e4ftl01.cr.usgs.gov/MOTA/MCD15A3H.006/2018.09.30/MCD15A3H.A2018273.h18v03.006.2018278143633.hdf not good + + +## 3.2.5 Visualisation + +We will learn more fully how to visualise these later, but just to show that the datasets exist. + +You might want to look at the [FIPS](https://en.wikipedia.org/wiki/List_of_FIPS_country_codes) country codes for selecting boundary data. + + +```python +import requests +import shutil +''' +Get the world borders shapefile that we will need +''' +tm_borders_url = "http://thematicmapping.org/downloads/TM_WORLD_BORDERS-0.3.zip" + +r = requests.get(tm_borders_url) +with open("data/TM_WORLD_BORDERS-0.3.zip", 'wb') as fp: + fp.write (r.content) + +shutil.unpack_archive("data/TM_WORLD_BORDERS-0.3.zip", + extract_dir="data/") +``` + + + --------------------------------------------------------------------------- + + ReadError Traceback (most recent call last) + + in + 11 + 12 shutil.unpack_archive("data/TM_WORLD_BORDERS-0.3.zip", + ---> 13 extract_dir="data/") + + + ~/anaconda3/envs/geog0111/lib/python3.7/shutil.py in unpack_archive(filename, extract_dir, format) + 1000 func = _UNPACK_FORMATS[format][1] + 1001 kwargs = dict(_UNPACK_FORMATS[format][2]) + -> 1002 func(filename, extract_dir, **kwargs) + 1003 + 1004 + + + ~/anaconda3/envs/geog0111/lib/python3.7/shutil.py in _unpack_zipfile(filename, extract_dir) + 897 + 898 if not zipfile.is_zipfile(filename): + --> 899 raise ReadError("%s is not a zip file" % filename) + 900 + 901 zip = zipfile.ZipFile(filename) + + + ReadError: data/TM_WORLD_BORDERS-0.3.zip is not a zip file + + + +```python +from geog0111.get_modis_files import get_modis_files +import gdal +import matplotlib.pylab as plt +import numpy as np + +def mosaic_and_mask_data(gdal_fnames, vector_file, vector_where): + stitch_vrt = gdal.BuildVRT("", gdal_fnames) + g = gdal.Warp("", stitch_vrt, + format = 'MEM', dstNodata=200, + cutlineDSName = vector_file, + cutlineWhere = vector_where) + return g + +doy,year = 273,2018 +tiles = ['h17v03', 'h18v03'] +destination_folder = 'data' + +filenames = get_modis_files(doy,year,tiles,base_url='https://e4ftl01.cr.usgs.gov/MOTA',\ + version=6,\ + product='MCD15A3H') + +# this part is to access a particular dataset in the file +gdal_fnames = [f'HDF4_EOS:EOS_GRID:"{file_name:s}":MOD_Grid_MCD15A3H:Lai_500m' + for file_name in filenames] + + +g = mosaic_and_mask_data(gdal_fnames, "data/TM_WORLD_BORDERS-0.3.shp", + "FIPS='UK'") + +lai = np.array(g.ReadAsArray()).astype(float) * 0.1 # for LAI scaling +# valid data mask +mask = np.nonzero(lai < 20) +min_y = mask[0].min() +max_y = mask[0].max() + 1 + +min_x = mask[1].min() +max_x = mask[1].max() + 1 + +lai = lai[min_y:max_y, + min_x:max_x] + +fig = plt.figure(figsize=(12,12)) +im = plt.imshow(lai, interpolation="nearest", vmin=0, vmax=6, + cmap=plt.cm.inferno_r) +plt.title('LAI'+' '+str(tiles)+' '+str((doy,year))) +plt.colorbar() +``` + + + + + + + + + +![png](041_MODIS_download_files/041_MODIS_download_50_1.png) + + + +```python +from geog0111.get_modis_files import get_modis_files +import gdal +import matplotlib.pylab as plt +import numpy as np + +def mosaic_and_mask_data(gdal_fnames, vector_file, vector_where): + stitch_vrt = gdal.BuildVRT("", gdal_fnames) + g = gdal.Warp("", stitch_vrt, + format = 'MEM', dstNodata=200, + cutlineDSName = vector_file, + cutlineWhere = vector_where) + return g + +doy,year = 273,2018 +tiles = ['h17v03', 'h18v03'] +destination_folder = 'data' + +filenames = get_modis_files(doy,year,tiles,base_url='https://e4ftl01.cr.usgs.gov/MOTA',\ + version=6,\ + product='MCD15A3H') + +# this part is to access a particular dataset in the file +gdal_fnames = [f'HDF4_EOS:EOS_GRID:"{file_name:s}":MOD_Grid_MCD15A3H:Lai_500m' + for file_name in filenames] + +g = mosaic_and_mask_data(gdal_fnames, "data/TM_WORLD_BORDERS-0.3.shp", + "FIPS='NL'") + +lai = np.array(g.ReadAsArray()).astype(float) * 0.1 # for LAI scaling +# valid data mask +mask = np.nonzero(lai < 20) +min_y = mask[0].min() +max_y = mask[0].max() + 1 + +min_x = mask[1].min() +max_x = mask[1].max() + 1 + +lai = lai[min_y:max_y, + min_x:max_x] + +fig = plt.figure(figsize=(12,12)) +im = plt.imshow(lai, interpolation="nearest", vmin=0, vmax=6, + cmap=plt.cm.inferno_r) +plt.title('LAI'+' '+str(tiles)+' '+str((doy,year))) +plt.colorbar() +``` + + + + + + + + + +![png](041_MODIS_download_files/041_MODIS_download_51_1.png) + + +**Exercise 3.2.7 Homework** + + + +* Have a look at the information for [`MOD10A1` product](http://www.icess.ucsb.edu/modis/SnowUsrGuide/usrguide_1dtil.html), which is the 500 m MODIS daily snow cover product. +* Use what you have learned here to download the MOD10A product over the UK + +**Hint**: +* The data are on a different server `https://n5eil01u.ecs.nsidc.org/MOST` +* the template for the snow cover dataxset is `f'HDF4_EOS:EOS_GRID:"{file_name:s}":MOD_Grid_Snow_500m:NDSI_Snow_Cover'` +* today-10 may not be the best example doy: choose something in winter +* valid snow cover values are 0 to 100 (use this to set `vmin=0, vmax=100` when plotting) + +**N.B. You will be required to download this dataset for your assessed practical, so it is a good idea to sort code for this now** + + +```python +# do exercise here +``` + +## 3.2.6 Summary + +In this session, we have learned how to download MODIS datasets from NASA Earthdata. + +We have developed and tested functions that group together the commands we want, ultimately arriving at the function `get_modis_files(doy,year,tiles,**kwargs)`. + +We have seen ((if you've done the homework) that such code is re-useable and can directly be used for your assessed practical. diff --git a/docs/043_GDAL_stacking_and_interpolating_files/043_GDAL_stacking_and_interpolating_10_2.png b/docs/043_GDAL_stacking_and_interpolating_files/043_GDAL_stacking_and_interpolating_10_2.png deleted file mode 100644 index 7e737ee3..00000000 Binary files a/docs/043_GDAL_stacking_and_interpolating_files/043_GDAL_stacking_and_interpolating_10_2.png and /dev/null differ diff --git a/docs/043_GDAL_stacking_and_interpolating_files/043_GDAL_stacking_and_interpolating_36_1.png b/docs/043_GDAL_stacking_and_interpolating_files/043_GDAL_stacking_and_interpolating_36_1.png deleted file mode 100644 index 051a6bf7..00000000 Binary files a/docs/043_GDAL_stacking_and_interpolating_files/043_GDAL_stacking_and_interpolating_36_1.png and /dev/null differ diff --git a/docs/043_GDAL_stacking_and_interpolating_files/043_GDAL_stacking_and_interpolating_63_1.png b/docs/043_GDAL_stacking_and_interpolating_files/043_GDAL_stacking_and_interpolating_63_1.png deleted file mode 100644 index 022854a3..00000000 Binary files a/docs/043_GDAL_stacking_and_interpolating_files/043_GDAL_stacking_and_interpolating_63_1.png and /dev/null differ diff --git a/docs/044_GDAL_Reconciling_projections 2.md b/docs/044_GDAL_Reconciling_projections 2.md new file mode 100644 index 00000000..89735918 --- /dev/null +++ b/docs/044_GDAL_Reconciling_projections 2.md @@ -0,0 +1,1100 @@ +# 3.6 Reconciling projections + +

    Table of Contents

    +
    + +## 3.6.1 Introduction + +This section of notes is optional to the course, and the tutor may decide *not* to go through this in class. + +That said, the information and examples contained here can be very useful for accessing and processing certain types of geospatial data. + +In particular, we deal with obtaining climate data records from [ECMWF](http://apps.ecmwf.int/datasets/data/era40-daily/levtype=sfc) that we will later use for model fitting. These data come in a [netcdf](https://confluence.ecmwf.int/display/CKB/What+are+NetCDF+files+and+how+can+I+read+them) format (commonly used for climate data) with a grid in latitude/longitude. To 'overlay' these data with another dataset (e.g. the MODIS LAI product that we have been using) in a different (equal area) projection, we use the `gdal` function + + gdal.ReprojectImage(src, dst, src_proj, dst_proj, interp) + +where: + + src : a source dataset that we want to process + dst : a blank destination dataset that we set up with the + required (output) data type, shape, and geotransform and projection + src_proj : the source dataset projection wkt + dst_proj : the destination projection wkt + interp : the required interpolation method, e.g. gdalconst.GRA_Bilinear + +where wkt stands for [well known text](https://en.wikipedia.org/wiki/Well-known_text) and is a projection format string. + +Other codes we use are ones we have developed earlier. + +In these notes, we will learn: + + * how to access an ECMWF daily climate dataset (from ERA interim) + * how to reproject the dataset to match another spatial dataset (MODIS LAI) + +We will then save some datasets that we will use later in the notes. For this reason, it's possile to skip this section, and return to it later. + +## 3.6.1.1 Projections + +For various reasons, different geospatial datasets will come in different [projections](http://desktop.arcgis.com/en/arcmap/10.3/guide-books/map-projections/what-are-map-projections.htm). + +Considering for example, satellite-derived data from Low Earth Orbit [LEO](https://en.wikipedia.org/wiki/Low_Earth_orbit), the satellite sensor will typically obtain image data in a swath as it passes over the Earth surface. Projected onto the Earth surface, this appears as a strip of data: + +![https://earthobservatory.nasa.gov/Features/LDCMLongSwath](images/long_swath_map_720.png) + +but in the satellite data recording system, the data are stored as a regular array. We call such satellite data 'swath' (or 'swath-like') data (in the satellite imager coordinate system) and we may obtain data products in anything up to [Level 2](https://earthdata.nasa.gov/earth-science-data-systems-program/policies/data-information-policy/data-levels) in such a form. + +These data are often difficult for data scientists to deal with. They generally prefer to have a dataset mapped to a uniform space-time grid, even though this may involve some re-sampling, which can sometimes result in loss of information. The convenience of a uniform space-time grid means that you can. for example, look at dynamic features (information over time). + +The properties of the 'uniform space-time grid' will depend on [user requirements](http://desktop.arcgis.com/en/arcmap/10.3/tools/coverage-toolbox/choosing-a-map-projection.htm). For some, it is important to have an [equal area projection](https://www.giss.nasa.gov/tools/gprojector/help/projections/), one where the 'pixel size' is consistent throughout the dataset. + +![https://www.giss.nasa.gov/tools/gprojector/hehttps://www.giss.nasa.gov/tools/gprojector/help/projections/CylindricalEqualArea.png](images/CylindricalEqualArea.png) + +even if this is not convenient for viewing some areas of the Earth (map projections are very political!). + +Or other factors may be more important, such as user familiarity with a simple latitude/longitude grid typically used by climate scientists. + +![https://www.giss.nasa.gov/tools/gprojector/help/projections/CylindricalStereographic.png](images/CylindricalStereographic.png) + +For others, a conformal projection (preserving angles, as a cost of distance distortion) may be vital. + +![https://www.giss.nasa.gov/tools/gprojector/help/projections/AdamsHemisphereInASquare.png](images/AdamsHemisphereInASquare.png) + +We have see that MODIS data products, for example, come described in an equal area sinusoidal grid: + +![https://www.giss.nasa.gov/tools/gprojector/help/projections/Sinusoidal.png](images/Sinusoidal.png). + +but the data for high latitudes and longitudes appears very distorted. + + + + +We must accept then, that dealing with geospatial data must involve some understanding of projections, as well as practically, how to convert datasets between different projections. + +**Earth shape** + +One factor that can make life even more complicated than using just different projections is the use of different assumptions about the Earth shape (e.g. sphere, spheroid, radius variations). Often, the particular assumptions used by a group of users is just a result of history: it is what has 'traditionally' used for that purpose. It can be seen as too bothersome or expensive to change this. + +Since we can convert between different projections though, we can also deal with different Earth shape assumptions. We just have to be very clear about what was assumed. If at all possible, the geospatial datasets themselves should contain a full description of the projection and Earth shape assumed, but this is not always the case. + +The datasets we will mostly be dealing are in the following projections: + +* MODIS Sinusoidal ([tested](https://github.com/SciTools/cartopy/blob/master/lib/cartopy/tests/crs/test_sinusoidal.py)), which assumes a custom spherical Earth of radius 6371007.181 m. In `cartopy` this is given as [Sinusoidal.MODIS](https://github.com/SciTools/cartopy/blob/master/lib/cartopy/crs.py): + + # MODIS data products use a Sinusoidal projection of a spherical Earth + # http://modis-land.gsfc.nasa.gov/GCTP.html + Sinusoidal.MODIS = Sinusoidal(globe=Globe(ellipse=None, + semimajor_axis=6371007.181, + semiminor_axis=6371007.181)) + + In the MODIS data hdf products, the projection information is stored directly. Extracted as a wkt, this is: + + [[PROJCS["unnamed", + GEOGCS["Unknown datum based upon the custom spheroid", + DATUM["Not_specified_based_on_custom_spheroid", + SPHEROID["Custom spheroid",6371007.181,0]], + PRIMEM["Greenwich",0], + UNIT["degree",0.0174532925199433]], + PROJECTION["Sinusoidal"], + PARAMETER["longitude_of_center",0], + PARAMETER["false_easting",0], + PARAMETER["false_northing",0], + UNIT["metre",1,AUTHORITY["EPSG","9001"]]] + + According to [SR-ORG](http://spatialreference.org/ref/sr-org/6965/), the MODIS projection uses a spherical projection ellipsoid but a WGS84 datum ellipsoid. This is not quite the same as the definition in the wkt above. + + It is also defined by SR-ORG with the EPSG code [6974](http://spatialreference.org/ref/sr-org/6974/) for software that can use `semi_major` and `semi_minor` projection definitions. + + Some software may use the simpler [6965](http://spatialreference.org/ref/sr-org/6965/) definition (or the older [6842](http://spatialreference.org/ref/sr-org/6842/)). + + The MODIS projection 6974 is given as: + + PROJCS["MODIS Sinusoidal", + GEOGCS["WGS 84", + DATUM["WGS_1984", + SPHEROID["WGS 84",6378137,298.257223563, + AUTHORITY["EPSG","7030"]], + AUTHORITY["EPSG","6326"]], + PRIMEM["Greenwich",0, + AUTHORITY["EPSG","8901"]], + UNIT["degree",0.01745329251994328, + AUTHORITY["EPSG","9122"]], + AUTHORITY["EPSG","4326"]], + PROJECTION["Sinusoidal"], + PARAMETER["false_easting",0.0], + PARAMETER["false_northing",0.0], + PARAMETER["central_meridian",0.0], + PARAMETER["semi_major",6371007.181], + PARAMETER["semi_minor",6371007.181], + UNIT["m",1.0], + AUTHORITY["SR-ORG","6974"]] + + None of these codes are defined in `gdal` (see files in $GDAL_DATA/*.wkt for details), so to use them, we have to take the file from [SR-ORG](http://spatialreference.org/ref/sr-org/6974/ogcwkt/). + + For the datasets we are using, it makes no real difference whether the projection information from the file is used instead of MODIS projection 6974, so we will use that from the file. For other areas and especially for any higher spatial resolution datasets, it is worth investigating which is more appropriate. + +* ECMWF netcdf format (derived from GRIB) [ERA Interim](https://www.ecmwf.int/en/forecasts/datasets/archive-datasets/reanalysis-datasets/era-interim) climate datasets (1979-Present). These are geographic coordinates (latitude/longitude) in a custom spheroid with a radius 6371200 m. + +This information can be obtained from any example of a GRIB file, as we shall see below. As a wkt, this is: + + ['GEOGCS["Coordinate System imported from GRIB file", + DATUM["unknown",SPHEROID["Sphere",6371200,0]], + PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]]'] + +* A more common spheroid to use is [WGS84](https://confluence.qps.nl/qinsy/en/world-geodetic-system-1984-wgs84-29855173.html), although even in that case there are multiple 'realisations' available (used mainly by the DoD). Users should generally implement that given in EPSG code [4326](http://spatialreference.org/ref/epsg/4326/) used by the GPS system, for example. + + [GEOGCS["WGS 84", + DATUM["WGS_1984", + SPHEROID["WGS 84",6378137,298.257223563, + AUTHORITY["EPSG","7030"]], + AUTHORITY["EPSG","6326"]], + PRIMEM["Greenwich",0, + AUTHORITY["EPSG","8901"]], + UNIT["degree",0.01745329251994328, + AUTHORITY["EPSG","9122"]], + AUTHORITY["EPSG","4326"]]] + +## 3.6.1.2 Changing Projections + +We can conveniently use the Python [`cartopy`](https://scitools.org.uk/cartopy/docs/v0.16/) package to explore projections. + +We download an image taken from the satellite sensor ([SEVIRI](https://www.esa.int/Our_Activities/Observing_the_Earth/Meteosat/SEVIRI)): + +![http://www.esa.int/spaceinimages/Images/2005/12/Artist_s_view_of_SEVIRI_in_orbit](images/Artist_s_view_of_SEVIRI_in_orbit_node_full_image_2.png) + +The sensor builds up images of the Earth disc from geostationarty orbit, actioned by the platform spin. + +![http://www.esa.int/spaceinimages/Images/2015/08/MSG-4_Europe_s_latest_weather_satellite_delivers_first_image](images/MSG-4_Europe_s_latest_weather_satellite_delivers_first_image_node_full_image_2.png) + +In the code below, we plot the dataset in the 'earth disk' (Orthographic) projection, then re-map it to the equal area Sinusoidal projection. + + +```python +try: + from urllib2 import urlopen +except ImportError: + from urllib.request import urlopen +from io import BytesIO +%matplotlib inline + +import cartopy.crs as ccrs +import matplotlib.pyplot as plt +are_you_sure = False + +''' +===================================================== +Don't run this cell in class as it will take too long! + + Use it for homework + set are_you_sure = True +===================================================== +''' + +''' +from https://scitools.org.uk/cartopy/docs/v0.16/\ + gallery/geostationary.html#sphx-glr-gallery-geostationary-py +''' +def geos_image(): + """ + Return a specific SEVIRI image by retrieving it from a github gist URL. + + Returns + ------- + img : numpy array + The pixels of the image in a numpy array. + img_proj : cartopy CRS + The rectangular coordinate system of the image. + img_extent : tuple of floats + The extent of the image ``(x0, y0, x1, y1)`` referenced in + the ``img_proj`` coordinate system. + origin : str + The origin of the image to be passed through to matplotlib's imshow. + + """ + url = ('https://gist.github.com/pelson/5871263/raw/' + 'EIDA50_201211061300_clip2.png') + img_handle = BytesIO(urlopen(url).read()) + img = plt.imread(img_handle) + img_proj = ccrs.Geostationary(satellite_height=35786000) + img_extent = [-5500000, 5500000, -5500000, 5500000] + return img, img_proj, img_extent, 'upper' + +if are_you_sure: + print('Retrieving image...') + img, crs, extent, origin = geos_image() + + fig = plt.figure(figsize=(8,8)) + ax = fig.add_subplot(1, 1, 1,projection=\ + ccrs.Orthographic(central_longitude=0.0, central_latitude=0.0)) + ax.coastlines() + ax.set_global() + ax.imshow(img, transform=crs, extent=extent, origin=origin, cmap='gray') + + fig = plt.figure(figsize=(8,8)) + ax = fig.add_subplot(1, 1, 1, projection=\ + ccrs.Sinusoidal(central_longitude=0.0, \ + false_easting=0.0, false_northing=0.0)) + ax.coastlines() + ax.set_global() + print('Projecting and plotting image (this may take a while)...') + ax.imshow(img, transform=crs, extent=extent, origin=origin, cmap='gray') +``` + + Retrieving image... + Projecting and plotting image (this may take a while)... + + + +![png](044_GDAL_Reconciling_projections_files/044_GDAL_Reconciling_projections_6_1.png) + + + +![png](044_GDAL_Reconciling_projections_files/044_GDAL_Reconciling_projections_6_2.png) + + +The full list of [`cartopy` projections](https://scitools.org.uk/cartopy/docs/v0.16/crs/projections.html) is quite entensive. + +**Exercise 3.6.1** Extra Homework + +* Explore some different types of projection using `cartopy` and make a note of their features. +* Read up (follow the links in the text above) on projections. + + +```python +#do exercise here +``` + +## 3.6.2 Requirements + +We will need to: + +* make sure we have the MODIS LAI dataset locally +* read them in for a given country. +* register with ecmwf, install ecmwfapi +* get the temperature datasset from ECMWF for 2006 and 2017 for Europe +* get the country borders shapefile + +**Set up the conditions** + + + +```python +# required general imports +import matplotlib.pyplot as plt +%matplotlib inline +import numpy as np +import sys +import os +from pathlib import Path +import gdal +from datetime import datetime, timedelta +import cartopy.crs as ccrs +``` + + +```python +''' +Set the country code and year to be used here +''' +country_code = 'UK' +year = 2017 +shpfile = "data/TM_WORLD_BORDERS-0.3.shp" +``` + +### 3.6.2.1 Run the pre-requisite scripts + +**Make sure you register with ECMWF** +* register with ECMWF and install the API + + Follow the [ECMWF instructions](https://confluence.ecmwf.int/display/WEBAPI/Access+ECMWF+Public+Datasets) + +**Sort data prerequisities** +* Run the codes in the [prerequisites section](044_GDAL_Reconciling_projections.md) + + OR + +* Run the [prerequisites script]: + + +```python +# install ecmwf api -- do this once only +ECMWF = 'https://software.ecmwf.int/wiki/download/attachments/56664858/ecmwf-api-client-python.tgz' +try: + from ecmwfapi import ECMWFDataServer +except: + try: + !pip install $ECMWF + except: + # on Unix/Linux + !pip install --user $ECMWF + +``` + + +```python +# just make sure the pre-requisites are run +%run geog0111/Chapter3_6A_prerequisites.py $country_code $year +``` + + ['geog0111/Chapter3_6A_prerequisites.py', 'UK', '2017'] 2017 UK + Looking for match to sample 2017-01-01 00:00:00 + Looking for match to sample 2017-02-10 00:00:00 + Looking for match to sample 2017-03-22 00:00:00 + Looking for match to sample 2017-05-01 00:00:00 + Looking for match to sample 2017-06-10 00:00:00 + Looking for match to sample 2017-07-20 00:00:00 + Looking for match to sample 2017-08-29 00:00:00 + Looking for match to sample 2017-10-08 00:00:00 + Looking for match to sample 2017-11-17 00:00:00 + Looking for match to sample 2017-12-27 00:00:00 + 18.45137418797783 + 0.19862203366558234 + (2624, 1396, 92) (2624, 1396, 92) + interpolating ... + (2624, 1396, 92) + saving ... + europe_data_2016_2017.nc exists + GEOGCS["Coordinate System imported from GRIB file",DATUM["unknown",SPHEROID["Sphere",6371200,0]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]] + Refreshing nc file europe_data_2016_2017.nc + data/europe_data_2016.nc + data/europe_data_2017.nc + + + +```python +# read in the LAI data for given country code +tiles = [] +for h in [17, 18]: + for v in [3, 4]: + tiles.append(f"h{h:02d}v{v:02d}") + +fname = f'lai_data_{year}_{country_code}.npz' +ofile = Path('data')/fname +try: + # read data from npz file + lai = np.load(ofile) + print(lai['lai'].shape) +except: + print(f"{ofile} doesn't exist: sort the pre-requisites") + +``` + + (2624, 1396, 92) + + + +```python +import numpy as np +# a quick look at some stats to see if there are data there +# and they are sensible +lai = np.load(ofile) +print(np.array(lai['lai'][1000,700]),\ + np.array(lai['weights'][1000,700])) +# does it have the interpolated value? +if 'interpolated_lai' in list(lai.keys()): + print(np.array(lai['interpolated_lai'][1000,700])) +``` + + [1.1 1.1 1.2 0.9 0.9 1.2 1. 0.8 1.4 0.3 0. 0.5 0.3 0.9 0.6 1. 0.7 0.5 + 0.7 0.6 1.3 0.7 0.1 1. 1. 0.6 1.1 0.5 1.1 1. 1.1 1.1 1.1 0. 0.6 1.4 + 1.3 1.6 1.7 1.7 1.1 0.3 1.3 1.7 1.5 1.2 0.5 0.6 1.6 3.1 0.3 2. 1.6 0.5 + 2.6 2.5 0.4 0.4 0.4 0.4 0.4 1.1 1.7 0.5 1.5 1.4 0.1 1.4 1. 1. 0.3 1.1 + 0.2 1.1 0.1 0.7 2.6 1.7 2. 1.4 1.4 0.3 0.4 0.7 1.1 1.1 0.8 0.8 0.9 1.2 + 1.2 1.2] [0.38196601 0.38196601 0.38196601 0.38196601 0.38196601 0.38196601 + 0.38196601 0.38196601 0.38196601 0.38196601 0.38196601 1. + 1. 1. 1. 1. 1. 1. + 1. 1. 1. 1. 1. 1. + 1. 1. 1. 1. 1. 1. + 1. 1. 1. 0.23606798 1. 1. + 1. 1. 1. 1. 0.23606798 0.23606798 + 1. 1. 1. 1. 0.23606798 1. + 1. 1. 0.23606798 1. 1. 0.23606798 + 1. 1. 0.23606798 0.23606798 0.23606798 0.23606798 + 1. 1. 1. 1. 1. 1. + 1. 1. 1. 1. 1. 1. + 0.23606798 1. 1. 0.38196601 0.38196601 0.38196601 + 0.38196601 0.38196601 0.38196601 0.38196601 0.38196601 0.38196601 + 0.38196601 0.38196601 0.38196601 0.38196601 0.38196601 0.38196601 + 0.38196601 0.38196601] + [1.08683704 1.07935938 1.06092468 1.03161555 0.99133678 0.93493706 + 0.86317607 0.78136836 0.70352293 0.64487202 0.61266704 0.60678279 + 0.61883651 0.63991235 0.66383767 0.68530728 0.7020839 0.71506691 + 0.72433462 0.73247513 0.74002342 0.7487489 0.76081558 0.77753903 + 0.79985155 0.82674875 0.8580399 0.89089931 0.92313233 0.95320821 + 0.98065863 1.00873477 1.04290792 1.08866497 1.14943392 1.22080857 + 1.2929335 1.35676154 1.4021202 1.42632177 1.43105239 1.41880135 + 1.40112126 1.38702625 1.38535581 1.40496862 1.45132144 1.52190587 + 1.6097794 1.70184005 1.78498301 1.84800873 1.88579434 1.89137779 + 1.85216521 1.7573269 1.60194669 1.4068558 1.2226864 1.09378123 + 1.03387865 1.01734516 1.02167784 1.02682062 1.02221966 1.00590369 + 0.98109385 0.95002526 0.91607184 0.8832294 0.85542066 0.83730438 + 0.83548611 0.85858254 0.90888448 0.98602178 1.07794995 1.15678679 + 1.19786983 1.18422965 1.12426637 1.04138037 0.9671476 0.91852027 + 0.90214004 0.90955621 0.93302657 0.96662729 1.00304032 1.03742083 + 1.06445991 1.08219511] + + +## 3.6.3 Reconcile the datasets + +In this section, we will use `gdal` to transform two datasets into the same coordinate system. + +To do this, we identify one dataset with the projection and geographic extent that we want for our data (a MODIS sub-dataset here, the 'exemplar'). + +We then download a climate dataset in a latitude/longitude grid ([netcdf](https://www.unidata.ucar.edu/software/netcdf/) format) and transform this to be consistent with the MODIS dataset. + + +### 3.6.3.1 load an exemplar dataset + +Since we want to match up datasets, we need to produce an example of the dataset we want to match up to. + +We save the exemplar as a GeoTiff format file here. + + +```python +from osgeo import gdal, gdalconst,osr +import numpy as np +from geog0111.process_timeseries import mosaic_and_clip + +# set to True if you want to override +# the MODIS projection (see above) +use_6974 = False + +''' +https://stackoverflow.com/questions/10454316/ +how-to-project-and-resample-a-grid-to-match-another-grid-with-gdal-python +''' + +# first get an exemplar LAI file, clipped to +# the required limits. We will use this to match +# the t2 dataset to +match_filename = mosaic_and_clip(tiles,1,year,ofolder='tmp',\ + country_code=country_code,shpfile=shpfile,frmat='GTiff') + +print(match_filename) + +''' +Now get the projection, geotransform and dataset +size that we want to match to +''' +match_ds = gdal.Open(match_filename, gdalconst.GA_ReadOnly) +match_proj = match_ds.GetProjection() +match_geotrans = match_ds.GetGeoTransform() +wide = match_ds.RasterXSize +high = match_ds.RasterYSize + +print('\nProjection from file:') +print(match_proj,'\n') + +''' +set Projection 6974 from SR-OR +by setting use_6974 = True +''' +if use_6974: + print('\nProjection 6974 from SR-ORG:') + modis_wkt = 'data/modis_6974.wkt' + match_proj = open(modis_wkt,'r').readline() + match_ds.SetProjection(match_proj) + print(match_proj,'\n') + +''' +Visualise +''' +plt.figure(figsize=(10,10)) +plt.title(f'Exemplar LAI dataset for {country_code}') +plt.imshow(match_ds.ReadAsArray()) +plt.colorbar(shrink=0.75) +# close the file -- we dont need it any more +del match_ds +``` + + tmp/Lai_500m_2017_001_UK.tif + + Projection from file: + PROJCS["unnamed",GEOGCS["Unknown datum based upon the custom spheroid",DATUM["Not_specified_based_on_custom_spheroid",SPHEROID["Custom spheroid",6371007.181,0]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["longitude_of_center",0],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]]] + + + + +![png](044_GDAL_Reconciling_projections_files/044_GDAL_Reconciling_projections_19_1.png) + + +### 3.6.3.2 get information from source file + + +Now, we pull the information we need from the source file (the netcdf format t2 dataset). + +We need to know: + +* the data type +* the number of bands (time samples in this case) +* the geotransform of the dataset (the fact that it's 0.25 degree resolution over Europe) + +and access these from the source dataset. + + + + +```python +from osgeo import gdal, gdalconst,osr +import numpy as np + +# set up conditions +src_filename = f'data/europe_data_{year}.nc' +''' +access information from source +''' +src_dataname = 'NETCDF:"'+src_filename+'":t2m' +src = gdal.Open(src_filename, gdalconst.GA_ReadOnly) + +''' +Get geotrans, data type and number of bands +from source dataset +''' +band1 = src.GetRasterBand(1) +src_proj = src.GetProjection() +src_geotrans = src.GetGeoTransform() +nbands = src.RasterCount +src_format = band1.DataType +nx = band1.XSize +ny = band1.YSize + +print('Information found') +print('GeoTransform: ',src_geotrans) +print('Projection: ',src_proj) +print('number of bands:',nbands) +print('format: ',src_format) +print('nx,ny: ',nx,ny) + +# read data +t2m = band1.ReadAsArray() +plt.figure(figsize=(10,10)) +ax = plt.subplot ( 1, 1, 1) +ax.set_title(f'T2 ECMWF dataset for {country_code}: band 1') + +im = plt.imshow(t2m) +_ = plt.colorbar(im,shrink=0.6) +``` + + Information found + GeoTransform: (-20.125, 0.25, 0.0, 75.125, 0.0, -0.25) + Projection: GEOGCS["Coordinate System imported from GRIB file",DATUM["unknown",SPHEROID["Sphere",6371200,0]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]] + number of bands: 365 + format: 6 + nx,ny: 321 261 + + + +![png](044_GDAL_Reconciling_projections_files/044_GDAL_Reconciling_projections_21_1.png) + + +### 3.6.3.4 reprojection + +Now, set up a blank gdal dataset (in memory) with the size, data type, projection etc. that we want, the reproject the temperature dataset into this. + +The processing may take some time if the LAI dataset is large (e.g. France). + +The result will be of the same size, projection etc as the cropped LAI dataset. + + + + +```python +dst_filename = src_filename.replace('.nc',f'_{country_code}.tif') +force = False + + +if (not Path(dst_filename).exists()) or force: + + dst = gdal.GetDriverByName('MEM').Create('', wide, high, nbands, src_format) + + dst.SetGeoTransform( match_geotrans ) + dst.SetProjection( match_proj) + + print('Information found') + print('wide: ',wide) + print('high: ',high) + print('geotrans: ',match_geotrans) + print('projection:',match_proj) + + # Do the work: reproject the dataset + # This will take a few minutes, depending on dataset size + _ = gdal.ReprojectImage(src, dst, src_proj, match_proj, gdalconst.GRA_Bilinear) + +``` + + +```python +xOrigin = match_geotrans[0] +yOrigin = match_geotrans[3] +pixelWidth = match_geotrans[1] +pixelHeight = match_geotrans[5] + +extent = (xOrigin,xOrigin+pixelWidth*wide,\ + yOrigin+pixelHeight*(high),yOrigin+pixelHeight) + +print(extent) + +if (not Path(dst_filename).exists()) or force: + + + + ''' + Visualise: takes some time to plot + due to reprojections + ''' + t2m = dst.GetRasterBand(1).ReadAsArray() + match_ds = gdal.Open(match_filename, gdalconst.GA_ReadOnly).ReadAsArray() + + # visualise + plt.figure(figsize=(15,10)) + ax = plt.subplot ( 1, 2, 1 ,projection=ccrs.Sinusoidal.MODIS) + ax.coastlines('10m') + ax.set_title(f'T2m ECMWF dataset for {country_code}: band 1') + im = ax.imshow(t2m[::-1],extent=extent) + plt.colorbar(im,shrink=0.75) + + + ax = plt.subplot ( 1, 2, 2 ,projection=ccrs.Sinusoidal.MODIS) + ax.coastlines('10m') + ax.set_title(f'MODIS LAI {country_code}') + im = plt.imshow(match_ds,extent=extent) + _ = plt.colorbar(im,shrink=0.75) +``` + + (-528121.3116353625, 118541.0173548233, 5549929.5167459585, 6765137.823590214) + + +### 3.6.3.5 crop + +Finally, we crop the temperature dataset using `gdal.Warp()` and save it to a (GeoTiff) file: + + +```python + # Output / destination +dst_filename = src_filename.replace('.nc',f'_{country_code}.tif') +force = False + + +if (not Path(dst_filename).exists()) or force: + ''' + Only run this if file doesnt exist + ''' + frmat = 'GTiff' + g = gdal.Warp(dst_filename, + dst, + format=frmat, + dstNodata=-300, + cutlineDSName=shpfile, + cutlineWhere=f"FIPS='{country_code:s}'", + cropToCutline=True) + del dst # Flush + del g +``` + + +```python +# visualise +print(dst_filename) +t2m = gdal.Open(dst_filename, gdalconst.GA_ReadOnly) +t2m = t2m.GetRasterBand(1).ReadAsArray() +t2m[t2m==-300] = np.nan +match_ds = gdal.Open(match_filename, gdalconst.GA_ReadOnly).ReadAsArray() + +# visualise +plt.figure(figsize=(15,10)) +ax = plt.subplot ( 1, 2, 1 ,projection=ccrs.Sinusoidal.MODIS) +ax.coastlines('10m') +ax.set_title(f'T2m ECMWF dataset for {country_code}: band 1') +im = ax.imshow(t2m[::-1],extent=extent) +plt.colorbar(im,shrink=0.75) + +ax = plt.subplot ( 1, 2, 2 ,projection=ccrs.Sinusoidal.MODIS) +ax.coastlines('10m') +ax.set_title(f'MODIS Exemplar LAI {country_code}') +im = plt.imshow(match_ds,extent=extent) +_ = plt.colorbar(im,shrink=0.75) +``` + + data/europe_data_2017_UK.tif + + + +![png](044_GDAL_Reconciling_projections_files/044_GDAL_Reconciling_projections_27_1.png) + + +Now let's look at the time information in the metadata: + + +```python +meta = gdal.Open(src_filename).GetMetadata() + +print(meta['time#units']) +``` + + hours since 1900-01-01 00:00:00.0 + + +The time information is in hours since `1900-01-01 00:00:00.0`. This is not such a convenient unit for plotting, so we can use `datetime` to fix that: + + + +```python +timer = meta['NETCDF_DIM_time_VALUES'] +print(timer[:100]) +``` + + {1025628,1025652,1025676,1025700,1025724,1025748,1025772,1025796,1025820,1025844,1025868,1025892,102 + + + +```python +# split the string into integers +timer = [int(i) for i in meta['NETCDF_DIM_time_VALUES'][1:-1].split(',')] + +print (timer[:20]) +``` + + [1025628, 1025652, 1025676, 1025700, 1025724, 1025748, 1025772, 1025796, 1025820, 1025844, 1025868, 1025892, 1025916, 1025940, 1025964, 1025988, 1026012, 1026036, 1026060, 1026084] + + + +```python +# split the string into integers +# convert to days +timer = [float(i)/24. for i in meta['NETCDF_DIM_time_VALUES'][1:-1].split(',')] + +print (timer[:20]) +``` + + [42734.5, 42735.5, 42736.5, 42737.5, 42738.5, 42739.5, 42740.5, 42741.5, 42742.5, 42743.5, 42744.5, 42745.5, 42746.5, 42747.5, 42748.5, 42749.5, 42750.5, 42751.5, 42752.5, 42753.5] + + + +```python +from datetime import datetime,timedelta + +# add base date +# split the string into integers +# convert to days +timer = [(datetime(1900,1,1) + timedelta(days=float(i)/24.)) \ + for i in meta['NETCDF_DIM_time_VALUES'][1:-1].split(',')] + +print (timer[:20]) +``` + + [datetime.datetime(2017, 1, 1, 12, 0), datetime.datetime(2017, 1, 2, 12, 0), datetime.datetime(2017, 1, 3, 12, 0), datetime.datetime(2017, 1, 4, 12, 0), datetime.datetime(2017, 1, 5, 12, 0), datetime.datetime(2017, 1, 6, 12, 0), datetime.datetime(2017, 1, 7, 12, 0), datetime.datetime(2017, 1, 8, 12, 0), datetime.datetime(2017, 1, 9, 12, 0), datetime.datetime(2017, 1, 10, 12, 0), datetime.datetime(2017, 1, 11, 12, 0), datetime.datetime(2017, 1, 12, 12, 0), datetime.datetime(2017, 1, 13, 12, 0), datetime.datetime(2017, 1, 14, 12, 0), datetime.datetime(2017, 1, 15, 12, 0), datetime.datetime(2017, 1, 16, 12, 0), datetime.datetime(2017, 1, 17, 12, 0), datetime.datetime(2017, 1, 18, 12, 0), datetime.datetime(2017, 1, 19, 12, 0), datetime.datetime(2017, 1, 20, 12, 0)] + + +## 3.6.3.6 Putting this together + +We can now put these codes together to make a function `match_netcdf_to_data()`: + + +```python +from osgeo import gdal, gdalconst,osr +import numpy as np +from geog0111.process_timeseries import mosaic_and_clip +from datetime import datetime + +def match_netcdf_to_data(src_filename,match_filename,dst_filename,year,\ + country_code=None,shpfile=None,force=False,\ + nodata=-300,frmat='GTiff',verbose=False): + + ''' + see : + https://stackoverflow.com/questions/10454316/ + how-to-project-and-resample-a-grid-to-match-another-grid-with-gdal-python + ''' + + ''' + Get the projection, geotransform and dataset + size that we want to match to + ''' + if verbose: print(f'getting info from match file {match_filename}') + match_ds = gdal.Open(match_filename, gdalconst.GA_ReadOnly) + + match_proj = match_ds.GetProjection() + match_geotrans = match_ds.GetGeoTransform() + wide = match_ds.RasterXSize + high = match_ds.RasterYSize + # close the file -- we dont need it any more + del match_ds + + ''' + access information from source + ''' + if verbose: print(f'getting info from source netcdf file {src_filename}') + try: + src_dataname = 'NETCDF:"'+src_filename+'":t2m' + src = gdal.Open(src_dataname, gdalconst.GA_ReadOnly) + except: + if verbose: print('failed') + return(None) + + # get meta data + meta = gdal.Open(src_filename, gdalconst.GA_ReadOnly).GetMetadata() + + extent = [match_geotrans[0],match_geotrans[0]+match_geotrans[1]*wide,\ + match_geotrans[3]+match_geotrans[5]*high,match_geotrans[3]] + # get time info + timer = np.array([(datetime(1900,1,1) + timedelta(days=float(i)/24.)) \ + for i in meta['NETCDF_DIM_time_VALUES'][1:-1].split(',')]) + + if (not Path(dst_filename).exists()) or force: + + ''' + Get geotrans, proj, data type and number of bands + from source dataset + ''' + band1 = src.GetRasterBand(1) + src_geotrans = src.GetGeoTransform() + src_proj = src.GetProjection() + + nbands = src.RasterCount + src_format = band1.DataType + + dst = gdal.GetDriverByName('MEM').Create(\ + '', wide, high, \ + nbands, src_format) + dst.SetGeoTransform( match_geotrans ) + dst.SetProjection( match_proj) + + if verbose: print(f'reprojecting ...') + # Output / destination + _ = gdal.ReprojectImage(src, dst, \ + src_proj, \ + match_proj,\ + gdalconst.GRA_Bilinear ) + if verbose: print(f'cropping to {country_code:s} ...') + done = gdal.Warp(dst_filename, + dst, + format=frmat, + dstNodata=nodata, + cutlineDSName=shpfile, + cutlineWhere=f"FIPS='{country_code:s}'", + cropToCutline=True) + del dst + + return(timer,dst_filename,extent) +``` + + +```python +from osgeo import gdal, gdalconst,osr +import numpy as np +from geog0111.process_timeseries import mosaic_and_clip +from datetime import datetime,timedelta +from geog0111.match_netcdf_to_data import match_netcdf_to_data +from geog0111.geog_data import procure_dataset +from pathlib import Path + +# set conditions + +country_code = 'UK' +year = 2017 +shpfile = "data/TM_WORLD_BORDERS-0.3.shp" +src_filename = f'data/europe_data_{year}.nc' +dst_filename = f'data/europe_data_{year}_{country_code}.tif' +t2_filename = f'data/europe_data_{year}_{country_code}.npz' +# read in the LAI data for given country code +tiles = [] +for h in [17, 18]: + for v in [3, 4]: + tiles.append(f"h{h:02d}v{v:02d}") + + +#read LAI +fname = f'lai_data_{year}_{country_code}.npz' +ofile = Path('data')/fname +lai = np.load(ofile) + +if not Path(t2_filename).exists(): + print(f'calculating dataset match in {t2_filename}') + # first get an exemplar LAI file, clipped to + # the required limits. We will use this to match + # the t2 dataset to + match_filename = mosaic_and_clip(tiles,1,year,\ + country_code=country_code,\ + shpfile=shpfile,frmat='GTiff') + ''' + Match the datasets using the function + we have developed + ''' + meta = gdal.Open(src_filename, gdalconst.GA_ReadOnly).GetMetadata() + + timer,dst_filename,extent = match_netcdf_to_data(\ + src_filename,match_filename,\ + dst_filename,year,\ + country_code=country_code,\ + shpfile=shpfile,\ + nodata=-300,frmat='GTiff',\ + verbose=True) + + # read and interpret the t2 data and flip + temp2 = gdal.Open(dst_filename).ReadAsArray()[:,::-1] + temp2[temp2==-300] = np.nan + temp2 -= 273.15 + # save these + print(f'saving data to {t2_filename}') + np.savez_compressed(t2_filename,timer=timer,temp2=temp2,extent=extent) + +else: + print(f'dataset in {t2_filename} exists') + +print('done') +t2data = np.load(t2_filename) +timer,temp2,extent = t2data['timer'],t2data['temp2'],t2data['extent'] +``` + + calculating dataset match in data/europe_data_2017_UK.npz + getting info from match file data/Lai_500m_2017_001_UK.tif + getting info from source netcdf file data/europe_data_2017.nc + saving data to data/europe_data_2017_UK.npz + done + + + +```python +# visualise the interpolated dataset +import matplotlib.pylab as plt +import cartopy.crs as ccrs +%matplotlib inline + +plt.figure(figsize=(12,12)) +ax = plt.subplot ( 2, 2, 1 ,projection=ccrs.Sinusoidal.MODIS) +ax.coastlines('10m') +ax.set_title(f'T2m ECMWF dataset for {country_code}: {str(timer[0])}') +im = ax.imshow(temp2[0],extent=extent) +plt.colorbar(im,shrink=0.75) + +ax = plt.subplot ( 2, 2, 2 ,projection=ccrs.Sinusoidal.MODIS) +ax.coastlines('10m') +ax.set_title(f'MODIS LAI {country_code}: {str(timer[0])}') +im = plt.imshow(interpolated_lai[:,:,0],vmax=6,extent=extent) +_ = plt.colorbar(im,shrink=0.75) + +plt.subplot ( 2, 2, 3 ) +plt.title(f'mean T2m for {country_code}') +plt.plot(timer,np.nanmean(temp2,axis=(1,2))) +plt.ylabel('temperature 2m / C') +plt.subplot ( 2, 2, 4 ) +plt.title(f'mean LAI for {country_code}') +mean = np.nanmean(interpolated_lai,axis=(0,1)) +plt.plot(timer[::4],mean) +``` + + + + + [] + + + + +![png](044_GDAL_Reconciling_projections_files/044_GDAL_Reconciling_projections_38_1.png) + + +## 3.6.6 Summary + +In this section, we have learned about projections, and have reconciled two datasets that were originally in different projections. NThey also were defined with geoids with different Earth radius assumptions. + +These issues are typical when dealing with geospatial data. + +This part of the notes is non compulsory, as the codes and ideas are quite complicated for people just begining to learn coding. We have included it here to allow students to revisit this later. It is also included because we want to develop some interesting datasets for modelling, so we need to deal with reconciling datasets from different providers in different projections. + +In this section, we have developed the following datasets: + + +```python +from geog0111.geog_data import procure_dataset +import numpy as np +from pathlib import Path + +year = 2017 +country_code = 'UK' +''' +LAI data +''' +# read in the LAI data for given country code +lai_filename = f'data/lai_data_{year}_{country_code}.npz' +# get the dataset in case its not here +procure_dataset(Path(lai_filename).name,verbose=False) + +lai = np.load(lai_filename) +print(lai_filename,list(lai.keys())) + +''' +T 2m data +''' +t2_filename = f'data/europe_data_{year}_{country_code}.npz' +# get the dataset in case its not here +procure_dataset(Path(t2_filename).name,verbose=False) +t2data = np.load(t2_filename) +print(t2_filename,list(t2data.keys())) +``` + + data/lai_data_2017_UK.npz ['dates', 'lai', 'weights', 'interpolated_lai'] + data/europe_data_2017_UK.npz ['timer', 'temp2', 'extent'] + + + +```python +import numpy as np +# a quick look at some stats to see if there are data there +# and they are sensible +lai = np.load(ofile) +print(np.array(lai['lai'][1000,700]),\ + np.array(lai['weights'][1000,700])) +# does it have the interpolated value? +if 'interpolated_lai' in list(lai.keys()): + print(np.array(lai['interpolated_lai'][1000,700])) +``` + + [1.1 1.1 1.2 0.9 0.9 1.2 1. 0.8 1.4 0.3 0. 0.5 0.3 0.9 0.6 1. 0.7 0.5 + 0.7 0.6 1.3 0.7 0.1 1. 1. 0.6 1.1 0.5 1.1 1. 1.1 1.1 1.1 0. 0.6 1.4 + 1.3 1.6 1.7 1.7 1.1 0.3 1.3 1.7 1.5 1.2 0.5 0.6 1.6 3.1 0.3 2. 1.6 0.5 + 2.6 2.5 0.4 0.4 0.4 0.4 0.4 1.1 1.7 0.5 1.5 1.4 0.1 1.4 1. 1. 0.3 1.1 + 0.2 1.1 0.1 0.7 2.6 1.7 2. 1.4 1.4 0.3 0.4 0.7 1.1 1.1 0.8 0.8 0.9 1.2 + 1.2 1.2] [0.38196601 0.38196601 0.38196601 0.38196601 0.38196601 0.38196601 + 0.38196601 0.38196601 0.38196601 0.38196601 0.38196601 1. + 1. 1. 1. 1. 1. 1. + 1. 1. 1. 1. 1. 1. + 1. 1. 1. 1. 1. 1. + 1. 1. 1. 0.23606798 1. 1. + 1. 1. 1. 1. 0.23606798 0.23606798 + 1. 1. 1. 1. 0.23606798 1. + 1. 1. 0.23606798 1. 1. 0.23606798 + 1. 1. 0.23606798 0.23606798 0.23606798 0.23606798 + 1. 1. 1. 1. 1. 1. + 1. 1. 1. 1. 1. 1. + 0.23606798 1. 1. 0.38196601 0.38196601 0.38196601 + 0.38196601 0.38196601 0.38196601 0.38196601 0.38196601 0.38196601 + 0.38196601 0.38196601 0.38196601 0.38196601 0.38196601 0.38196601 + 0.38196601 0.38196601] + [1.08683704 1.07935938 1.06092468 1.03161555 0.99133678 0.93493706 + 0.86317607 0.78136836 0.70352293 0.64487202 0.61266704 0.60678279 + 0.61883651 0.63991235 0.66383767 0.68530728 0.7020839 0.71506691 + 0.72433462 0.73247513 0.74002342 0.7487489 0.76081558 0.77753903 + 0.79985155 0.82674875 0.8580399 0.89089931 0.92313233 0.95320821 + 0.98065863 1.00873477 1.04290792 1.08866497 1.14943392 1.22080857 + 1.2929335 1.35676154 1.4021202 1.42632177 1.43105239 1.41880135 + 1.40112126 1.38702625 1.38535581 1.40496862 1.45132144 1.52190587 + 1.6097794 1.70184005 1.78498301 1.84800873 1.88579434 1.89137779 + 1.85216521 1.7573269 1.60194669 1.4068558 1.2226864 1.09378123 + 1.03387865 1.01734516 1.02167784 1.02682062 1.02221966 1.00590369 + 0.98109385 0.95002526 0.91607184 0.8832294 0.85542066 0.83730438 + 0.83548611 0.85858254 0.90888448 0.98602178 1.07794995 1.15678679 + 1.19786983 1.18422965 1.12426637 1.04138037 0.9671476 0.91852027 + 0.90214004 0.90955621 0.93302657 0.96662729 1.00304032 1.03742083 + 1.06445991 1.08219511] + + +**Exercise 3.6.2** Extra Homework + +Go carefully through these notes and make notes of the processes we have to go through to reconcile datasets such as these. + +Learn what issues to look out for when coming across a new dataset, and how to use Python code to deal with it. Try to stick to one geospatial package as far as possible (`gdal` here) as you can make problems for yourself by mixing them. + + +```python +# do exercise here +``` diff --git a/docs/050_Linear_models 2.md b/docs/050_Linear_models 2.md new file mode 100644 index 00000000..c2ce1e69 --- /dev/null +++ b/docs/050_Linear_models 2.md @@ -0,0 +1,831 @@ +# Table of Contents +

    + +# Fitting to the Mauna Loa $CO_2$ record + +## Introduction + +$CO_2$ concentration in the atmosphere has been steadily increasing. The flask measurements collected in Mauna Loa provide a fairly long time series that allows us to see the temporal evolution of this trace gas. + + +
    +Why is CO2 important? +
    + +The time series looks like this + +![Mauna Loa CO2 record](https://www.pmel.noaa.gov/co2/files/co2_data_mlo_med.jpg) + +
    +Knowing that Mauna Loa is in Hawaii (Northern Hemisphere), can you broadly explain what's going on? +
    + +In today's session, we shall consider how to "model" the $CO_2$ record. We'll develop some simple models of $CO_2$ as a function of time, and will try to "fit" them to the data. + +
    +Can you think why fitting some models based on time to the Mauna Loa record might not be particularly insightful? +
    + +We first have an import cell that deals with importing all the usual modules that we'll require: numpy, matplotlib as well as pandas to read the data file. If you put all your imports at the top and run them first, they should be available for all other cells... + + +```python +from pathlib import Path # Checks for files and so on +import numpy as np # Numpy for arrays and so on +import pandas as pd +import sys +import matplotlib.pyplot as plt # Matplotlib for plotting +# Ensure the plots are shown in the notebook +%matplotlib inline + +# Instead of using requests, we might as well use Python's buil-in +# HTTP downloader +from urllib.request import urlretrieve +``` + +## Obtaining the data + +### Downloading the data + +The data are available on line from [NOAA](https://www.esrl.noaa.gov/gmd/ccgg/trends/data.html). We want the monthly average dataset, which can be found there. If the data is not yet available in your system, the next Python cell will download it from [`ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt`](ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt). In this case, because the url is an FTP one we will use the `urllib2` package rather than requests, that doesn't deal with FTP. We will save it to a file with the same name locally: + + +```python +# The remote URL for the data file is address: +address = 'ftp://aftp.cmdl.noaa.gov/products/trends/co2/' +# We'll create a folder if it doesn't exist in the data folder for +# the data +dest_path = Path("data/Mauna_Loa/").mkdir(parents=True, exist_ok=True) +fname = Path("data/Mauna_Loa/co2_mm_mlo.txt") +if not fname.exists(): + # Data file not present, let's download it + print("Downloading remote file") + urlretrieve(f"{address:s}/{fname.name:s}", fname.as_posix()) + print(f"Remote file downloaded to {fname.name:s}") +else: + print(f"{fname.name:s} already present, no need to download again") +``` + + co2_mm_mlo.txt already present, no need to download again + + +### Exploring the data + +We can have a peek at the text file. We note that most of the first few lines are "comments" (lines start by `#`), which describe useful *metadata*. We note that we have several columns of data: + +1. The year +2. The month +3. The decimal date +4. The monthly mean CO2 mole fraction determined from daily averages + +We will mostly be bothered about columns three and four. + + +We can peek at the data (first 73 lines) using the UNIX shell [`head`](http://www.linfo.org/head.html) command (this will not work on Windows, but will probably work on OSX): + + +```python +!head -n 73 data/Mauna_Loa/co2_mm_mlo.txt +``` + + # -------------------------------------------------------------------- + # USE OF NOAA ESRL DATA + # + # These data are made freely available to the public and the + # scientific community in the belief that their wide dissemination + # will lead to greater understanding and new scientific insights. + # The availability of these data does not constitute publication + # of the data. NOAA relies on the ethics and integrity of the user to + # ensure that GML receives fair credit for their work. If the data + # are obtained for potential use in a publication or presentation, + # GML should be informed at the outset of the nature of this work. + # If the GML data are essential to the work, or if an important + # result or conclusion depends on the GML data, co-authorship + # may be appropriate. This should be discussed at an early stage in + # the work. Manuscripts using the GML data should be sent to GML + # for review before they are submitted for publication so we can + # ensure that the quality and limitations of the data are accurately + # represented. + # + # Contact: Pieter Tans (303 497 6678; pieter.tans@noaa.gov) + # + # File Creation: Wed Aug 5 09:08:30 2020 + # + # RECIPROCITY + # + # Use of these data implies an agreement to reciprocate. + # Laboratories making similar measurements agree to make their + # own data available to the general public and to the scientific + # community in an equally complete and easily accessible form. + # Modelers are encouraged to make available to the community, + # upon request, their own tools used in the interpretation + # of the GML data, namely well documented model code, transport + # fields, and additional information necessary for other + # scientists to repeat the work and to run modified versions. + # Model availability includes collaborative support for new + # users of the models. + # -------------------------------------------------------------------- + # + # + # See www.esrl.noaa.gov/gmd/ccgg/trends/ for additional details. + # + # Data from March 1958 through April 1974 have been obtained by C. David Keeling + # of the Scripps Institution of Oceanography (SIO) and were obtained from the + # Scripps website (scrippsco2.ucsd.edu). + # + # The "average" column contains the monthly mean CO2 mole fraction determined + # from daily averages. The mole fraction of CO2, expressed as parts per million + # (ppm) is the number of molecules of CO2 in every one million molecules of dried + # air (water vapor removed). If there are missing days concentrated either early + # or late in the month, the monthly mean is corrected to the middle of the month + # using the average seasonal cycle. Missing months are denoted by -99.99. + # The "interpolated" column includes average values from the preceding column + # and interpolated values where data are missing. Interpolated values are + # computed in two steps. First, we compute for each month the average seasonal + # cycle in a 7-year window around each monthly value. In this way the seasonal + # cycle is allowed to change slowly over time. We then determine the "trend" + # value for each month by removing the seasonal cycle; this result is shown in + # the "trend" column. Trend values are linearly interpolated for missing months. + # The interpolated monthly mean is then the sum of the average seasonal cycle + # value and the trend value for the missing month. + # + # NOTE: In general, the data presented for the last year are subject to change, + # depending on recalibration of the reference gas mixtures used, and other quality + # control procedures. Occasionally, earlier years may also be changed for the same + # reasons. Usually these changes are minor. + # + # CO2 expressed as a mole fraction in dry air, micromol/mol, abbreviated as ppm + # + # (-99.99 missing data; -1 no data for #daily means in month) + # + # decimal average interpolated trend #days + # date (season corr) + 1958 3 1958.208 315.71 315.71 314.62 -1 + + +## Loading the data into Python + +This is quite straightforward using [`np.loadtxt`](https://scipython.com/book/chapter-6-numpy/examples/using-numpys-loadtxt-method/)... + +We will also "mask" if the data is missing checking for the value -99.99... + + +```python +hdr = [ + "year", "month", "decimal_date", "average", "interpolated", "trend", "days" +] +co2 = pd.read_csv( + fname, + comment='#', + delim_whitespace=True, + names=hdr, + na_values=[-99.99, -1]) + +plt.figure(figsize=(12, 7)) +plt.plot(co2.decimal_date, co2.interpolated, '-', lw=2, label="Interpolated") +plt.plot(co2.decimal_date, co2.average, '-', lw=1, label="Average") +plt.plot(co2.decimal_date, co2.trend, '-', lw=1, label="Trend") +plt.xlabel("Time") +plt.ylabel("CO2 conc.") +plt.legend(loc="best") +``` + + + + + + + + + +![png](050_Linear_models_files/050_Linear_models_8_1.png) + + +So this is quite similar to what we had above. There's an average line, an interpolated line, as well as some smoothed trend line. We're interested in the interpolated line. + +## A model for $CO_2$ concentration + +### A linear trend model + +We might be curious about a simple model for $CO_2$ concentration. Perhaps the simplest model is a linear trend, which we can write as the concentration at some time step $i$, $W_i$ being just a linear scaling of the time $t_i$: + +$$ +W_i = m \cdot t_i + c. +$$ + +We can define a Python function for this very easily: + + +```python +def linear_model(p, t): + m, c = p + return m * t + c +``` + +We can now try to plot some model trajectories and the data by supplying parameters for the slope ($m$) and intercept ($c$). Let's start by assuming that the slope can be approximated by the difference between minimum and maximum concentrations divided by the number of timesteps: + +$$ +m \approx \frac{411-310}{728} +$$ + +$c$ is the minimum value, so $c\approx 310$. + + +```python +n_times = co2.interpolated.shape[0] +max_co2 = co2.interpolated.max() +min_co2 = co2.interpolated.min() +print(f"There are {n_times:d} steps in the data") +print(f"Maximum CO2 concentration {max_co2:f}") +print(f"Minimum CO2 concentration {min_co2:f}") +x = np.arange(n_times) +fig, axs = plt.subplots(nrows=2, ncols=1, figsize=(12, 7)) +axs[0].plot(x, co2.interpolated, '-', label="Measured") +m = (403. - 305.) / 716 +c = 305. +axs[0].plot(x, linear_model([m, c], x), '--', label="Modelled") +axs[0].legend(loc="best") +axs[1].plot(x, linear_model([m, c], x) - co2.interpolated, 'o', mfc="none") +axs[1].axhline(y=0, lw=2, c="0.8") +S = np.sum((linear_model([m, c], x) - co2.interpolated)**2) +print("Sum of squared residuals: {}".format(S)) +``` + + There are 749 steps in the data + Maximum CO2 concentration 417.070000 + Minimum CO2 concentration 312.660000 + Sum of squared residuals: 15956.69223949002 + + + +![png](050_Linear_models_files/050_Linear_models_12_1.png) + + +So, not really a great fit... The overall shape is a bit off, and the model isn't really fitting the annual seasonality in the curve. The residuals plot tells us that the residuals aren't really noise around zero: they show a very clear trend, suggesting that **the model is too simple to fit the data**. + +### A quadratic model + +Maybe we need a higher order model, like a quadratic model: + +$$ +W_i = a_0 \cdot t_i^2 + a_1 \cdot t_i + a_2. +$$ + +In this case, it is a bit harder to eyeball what good starting parameters for $\left[a_0, a_1, a_2\right]$ would be. A strategy for this would be to consider what a good fit would look like, and then use this to define a metric of good fit. A good fit would basically overlap the measurements, being indistinguishable from them. The *residual* is the difference between the measurement and the model. In this case, it can be positive or negative (whether the model over- or undershoots the observations), but by squaring the residual we get rid of the sign. Then we can add up all the squared residuals, and the best fit will be the one that has the lowest sum of squares. This is in essence the [method of least squares](https://en.wikipedia.org/wiki/Least_squares). Let's see how this works *intuitevely*: we'll loop over the parameters and plot the different predicted concentrations... First we need our model function... + + +```python +def quadratic_model(p, t): + a0, a1, a2 = p + return a0 * t**2 + a1 * t + a2 +``` + +We can get a feeling of what the parameters might be just by eyeballing reading up some points from the graph, and then solving the system manually: + +$$ +\begin{aligned} +403 &= a_0\cdot (728)^2 + a_1\cdot (728) + a_2\\ +340 &= a_0\cdot (300)^2 + a_1\cdot (300) + a_2\\ +315 &= a_0\cdot (0)^2 + a_1\cdot (0) + a_2\\ +\end{aligned} +$$ + +From this, we can get some rough estimates, which in this case are + +$$ +\begin{aligned} +a_0 &= 9.5\cdot 10^{-5}\\ +a_1 &= 5.48\cdot 10^{-2}\\ +a_2 &= 315.\\ +\end{aligned} +$$ + +We can just basically run the model around these numbers and plot the different model predictions with a loop over $a_0$ and another one over $a_1$ (assuming $a_2$ is well defined) + + +```python +plt.figure(figsize=(12, 4)) + +a2 = 315. +for a0 in np.linspace(1e-5, 20e-5, 10): + for a1 in np.linspace(1e-2, 10e-2, 10): + plt.plot(x, quadratic_model([a0, a1, a2], x), '-', lw=0.5, c="0.8") + +plt.plot(x, co2.interpolated, '-', label="Measured") +``` + + + + + [] + + + + +![png](050_Linear_models_files/050_Linear_models_16_1.png) + + +This is quite complicated, we can see that there might be a good line of fit, but we don't see clearly what parameters provide it! We can store the goodness of fit metric (sum of squared residuals) in a 2D array and then plot it as an image. It should be more obvious where the minimum lies... + + +```python +# Define a 2D array for the sum of squares (sos) +sos = np.zeros((10, 20)) +# the time axis redefined again, in case it got confused with something else +x = np.arange(n_times) + +# first loop is over a0, 20 steps between 1e-5 and 20e-5 +for ii, a0 in enumerate(np.linspace(1e-5, 20e-5, 20)): + # 2nd loop is over a1, 10 steps between 1e-2 and 10e-2 + for jj, a1 in enumerate(np.linspace(1e-2, 10e-2, 10)): + # for the current values of a0 and a1, calculate the residual + residual = quadratic_model([a0, a1, a2], x) - co2.interpolated + sq_residual = residual * residual + sum_of_residuals = sq_residual.sum() + # Store the sum_of_residuals into our array + sos[jj, ii] = np.sum( + (quadratic_model([a0, a1, a2], x) - co2.interpolated)**2) + +# Plotting! +plt.figure(figsize=(15, 5)) +# Set up the x and y axis for the plot +yy = np.linspace(1e-5, 20e-5, 20) +xx = np.linspace(1e-5, 10e-2, 10) +# Do a contour plot. The logspace bit basically defines the location +# of 20 contour lines +c = plt.contourf(yy, xx, sos, np.logspace(3, 5, 20), cmap=plt.cm.magma_r) +# Colorbar +plt.colorbar() +# Now, just plot the rough guess of a0 and a1 into this plot +# We want to plot an empty circle with a green edge +plt.plot(9.51242659e-05, 5.47960536e-02, 'o', mfc="None", mec="g") +``` + + + + + [] + + + + +![png](050_Linear_models_files/050_Linear_models_18_1.png) + + +So that's pretty interesting, we get a very clear "valley", with a minimum pretty close to where our first rough guess is... The shape is quite interesting: if we start at the first guess point, and move along the $x-$ or $y-$ axes, we quickly go into areas of large error. However, if we move along the diagonal line, we will be in the "trough" of the cost function, provided that when you move "up" (positive $a_0$), you also move "left" (negative $a_1$), or if you move "down" (negative $a_0$), you also move "right" (positive $a_1$). Basically, the cost function does not change if you can get the two parameters to co-operate and compensate the effect of each other. + + +Let's find out where the actual minimum from our brute-force approach is. We can do this quickly by creating a mask where all the elements are `False` except where the minimum value of `sos` is located. We can then use this mask to multiply our `x` and `y` axes and just select the unique values that are larger than 0. + + +```python +print(f"Best SoS: {sos.min():g}") +sos_mask = sos == sos.min() +u1 = np.unique(yy[None, :] * sos_mask) +yy_opt = u1[u1 > 0] +u2 = np.unique(xx[:, None] * sos_mask) +xx_opt = u2[u2 > 0] +``` + + Best SoS: 4374.68 + + +The Sum of Squares of the first example was around 15000, so we've improved our modelling by adding an extra (quadratic term). This is usually the case: you can improve your goodness of fit by adding extra terms, but usually at the cost of *specialising* your model too much to the training data. This will usually result in poor predictive abilities for the model outside the training region. Which isn't cool. + + +We can plot now the cost function, as well as our first rough guess and the final guess: + + +```python +# Plotting! +plt.figure(figsize=(15, 5)) +# Set up the x and y axis for the plot +yy = np.linspace(1e-5, 20e-5, 20) +xx = np.linspace(1e-5, 10e-2, 10) +# Do a contour plot. The logspace bit basically defines the location +# of 20 contour lines +c = plt.contourf(yy, xx, sos, np.logspace(3, 5, 20), cmap=plt.cm.magma_r) +# Colorbar +plt.colorbar() +# Now, just plot the rough guess of a0 and a1 into this plot +# We want to plot an empty circle with a green edge +plt.plot( + 9.51242659e-05, + 5.47960536e-02, + 'o', + mfc="None", + mec="g", + label="Rough guess") +plt.plot(yy_opt, xx_opt, 'o', mfc="None", mec="r", label="Brute force guess") +plt.legend(loc="best") +``` + + + + + + + + + +![png](050_Linear_models_files/050_Linear_models_22_1.png) + + +That's not *too bad*! But althogh we found a minimum, we haven't shown how well our model really fits the observations! Let's plot the prediction (with the "optimised parameters" as well as the roughly guessed ones): + + +```python +fig, axs = plt.subplots(nrows=2, ncols=1, figsize=(15, 4)) + +a2 = 315. +axs[0].plot( + x, + quadratic_model([9.51242659e-05, 5.47960536e-02, a2], x), + '-', + label="Rough guess") +axs[0].plot( + x, quadratic_model([yy_opt, xx_opt, a2], x), '-', label="Brute force") + +axs[0].plot(x, co2.interpolated, '-', label="Measured") +axs[0].legend(loc="best") + +axs[1].plot( + x, + co2.interpolated - quadratic_model([yy_opt, xx_opt, a2], x), + 's-', + lw=0.8, + mfc="none", + mec="0.9") +axs[1].axhline(0, color="0.7") +``` + + + + + + + + + +![png](050_Linear_models_files/050_Linear_models_24_1.png) + + +So solvng by brute force with a quadratic appears to have worked better than fitting with our linear model. The residuals now mostly lie in the -5 to 5 units range, whereas the linear model had residuals floating around -12 and 12 or thereabouts. It is also clear that we're missing out on the seasonality, and some rates of growth (particularly at the end) seem to be underemphasised. + +## Solving the problem using linear algebra + +So we can see that our brute force search has given us a better fit than eyeballing it, which is what one might expect. It should be possible to solve this analytically. Let's write this as a matrix problem: + +$$ +\begin{aligned} +\mathbf{A}\cdot\vec{x}&=\vec{y}\\ +\mathbf{A}&=\begin{bmatrix} +t_1^2 & t_1 & 1 \\ +t_2^2 & t_2 & 1 \\ +t_3^2 & t_3 & 1 \\ +\vdots & \vdots \vdots \\ +t_N^{2} & t_N & 1 \\\end{bmatrix}\\ +\vec{x} &=\begin{bmatrix}a_0\\a_1\\a_2 \end{bmatrix}\\ +\vec{y} &=\begin{bmatrix}W_1\\W_2\\W_3\\ \vdots \\W_N \end{bmatrix}\\ +\end{aligned} +$$ + + +
    +Spend some time satisfying yourself that you understand how the previous matrices and vectors work together. +
    + +So, we see that this is really an overdetermined linear problem, where we've got more observations ($N$) than parameters (3). We can solve this by calculating the pseudo inverse: + +$$ +\vec{x} = \left[\mathbf{A}^{\top}\mathbf{A} \right]^{-1}\mathbf{A}^{\top}\vec{y}, +$$ +where $^{\top}$ is the **transpose**, and $^{-1}$ is the inverse matrix. We can solve this problem easily in Python, which can deal with linear algebra nicely. The [`np.linalg.lstsq`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linalg.lstsq.html) method has a direct solver, or you can also work it out by calculating the inverse matrix yourself. The latter approach is usually numerically more unstable, so we won't be looking into it. + +### Solution using `np.linalg.lstsq` + +In this case, we need to define the matrix $\mathbf{A}$. The observations vector $\vec{y}$ is already defined. What is needed is to weed out the invalid measurements in both $\mathbf{A}$ and $\vec{y}$. We then use [`np.linalg.lstsq`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linalg.lstsq.html) to solve the linear overdetermined system. This returns a number of things: + +1. The solution vector. +2. The sum of squared residuals. +3. The rank of the matrix $\mathbf{A}$. +4. The eigenvalues. + +We're really only interested in the first two (the other two outputs are important, but this is not your methods course!). + + +```python +# We create the A matrix +x = np.arange(n_times) +A = np.array([x**2, x, np.ones_like(x)]) +# Now put the observations into y +y = co2.interpolated + +# Call lstsq +xopt, sum_of_residuals, r, evals = np.linalg.lstsq(A.T, y) +rough_guess = [9.51242659e-05, 5.47960536e-02, 315] +brute_force = [yy_opt, xx_opt, 315] +print("Parameter Matrix Brute force Rough guess") +for par in range(3): + print("a{}: {:08.5e}\t {:08.5e}\t {:08.5e}".format( + par, xopt[par], float(brute_force[par]), rough_guess[par])) +print("Sum of residuals: {:g}".format(float(sum_of_residuals))) +``` + + Parameter Matrix Brute force Rough guess + a0: 9.04677e-05 8.00000e-05 9.51243e-05 + a1: 6.38326e-02 6.66700e-02 5.47961e-02 + a2: 3.14601e+02 3.15000e+02 3.15000e+02 + Sum of residuals: 3710.77 + + + /Users/plewis/anaconda3/envs/geog0111/lib/python3.7/site-packages/ipykernel_launcher.py:8: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions. + To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`. + + + +The parameters we got from the linear solver are very similar to the brute force method. If we had used a finer grid in the brute force model, we'd get even closer, but at the price of incresing the number of model evaluations. We can also see that using the analytic least squares solution results in the actual minimum of the cost function, not a value close to it. + +In the linear algebra case, the procedure is very simple, and provided the matrix $\mathbf{A}$ is invertible, one is mostly guaranteed a good solution. + +As usual, let's us plot model, data and residuals and see what we can spot... + + +```python +fig, axs = plt.subplots(nrows=2, ncols=1, figsize=(15, 4)) +x = np.arange(n_times) +a2 = 315. + +axs[0].plot(x, quadratic_model(rough_guess, x), '-', label="Rough guess") +axs[0].plot( + x, quadratic_model([yy_opt, xx_opt, a2], x), '-', label="Brute force") +axs[0].plot(x, quadratic_model(xopt, x), '-', label="Linear least squares") +axs[0].plot(x, co2.interpolated, '-', lw=0.6, label="Measured") +axs[0].legend(loc="best") + +axs[1].plot( + x, + co2.interpolated - quadratic_model(xopt, x), + 's-', + lw=0.8, + mfc="none", + mec="0.9") +axs[1].axhline(0, color="0.7") +``` + + + + + + + + + +![png](050_Linear_models_files/050_Linear_models_29_1.png) + + +So we can see that the optimal value is quite similar to the other two solutions, but results in a better fit (3470 versus 4090). We can see that with this method we can solve for all three parameters, even though our first guess of 315 for $a_2$ was pretty close to the true solution. + +
    +Try to use the linear least squares method to fit the first order linear model that we fitted "by eye" at the start of the notebook. +
    + +## A model with seasonality + +While the quadratic model appears to go through the centre of the Mauna Loa curve, it clearly misses an important feature: the seasonality of the $CO_2$ concentration. We can't really emulate that behaviour with a simple quadratic function, but need a different model, one that deals with the seasonality. We can think that the seasonality is an additive cosine term, so that our model for $CO_2$ concentration is now + +$$ +W_i = a_0\cdot t_i^2 + a_1\cdot t_i + a_2 + a_3\cdot \cos\left(2\pi\frac{t_i}{T} \right), +$$ +where $T$ is the period of the seasonality, in this case, annual so $T=12$. + +Although the model looks quite ugly, we see that we can write it like a sum (or a *linear combination*) of some functions ($t^2,\,t,$, the cosine term) weighted by the model parameters $a_0, \cdots, a_3$. So this is a linear model like the ones we've seen before and with which you should be familiar. + +In this case, the $\mathbf{A}$ matrix is now given by + +$$ +\mathbf{A}=\begin{bmatrix} +t_1^2 & t_1 & 1 & \cos \left( 2\pi\frac{t_1}{T}\right)\\ +t_2^2 & t_2 & 1 & \cos \left( 2\pi\frac{t_2}{T}\right)\\ +t_3^2 & t_3 & 1 & \cos \left( 2\pi\frac{t_3}{T}\right)\\ +\vdots & \vdots & \vdots & \vdots \\ +t_N^2 & t_N & 1 & \cos \left( 2\pi\frac{t_N}{T}\right)\\ +\end{bmatrix}. +$$ + +We can still solve the problem by making use of `lstsq`. Let's see how that works! + + +```python +def quadratic_with_season(p, t, period=12.): + a0, a1, a2, a3 = p + return a0 * t * t + a1 * t + a2 + a3 * np.cos(2 * np.pi * (t / period)) + + +period = 12. +# We create the A matrix +x = np.arange(n_times) +A = np.array([x * x, x, np.ones_like(x), + np.cos(2 * np.pi * (x / period))]) +# Now put the observations into y +y = co2.interpolated + +# Call lstsq +xopt, sum_of_residuals, r, evals = np.linalg.lstsq(A.T, y) +for par in range(4): + print("a{}: {:08.5e}".format(par, xopt[par])) +print(f"Sum of squares: {float(sum_of_residuals):g}") +``` + + a0: 9.01601e-05 + a1: 6.40418e-02 + a2: 3.14574e+02 + a3: 2.29951e+00 + Sum of squares: 1731.99 + + + /Users/plewis/anaconda3/envs/geog0111/lib/python3.7/site-packages/ipykernel_launcher.py:15: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions. + To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`. + from ipykernel import kernelapp as app + + +Let's do some plots of the function fitting and residuals, and compare to previous results... + + +
    +Doing these sort of plots should be second nature to you by now. So do them! +
    + + + +```python +fig, axs = plt.subplots(nrows=2, ncols=1, figsize=(15, 4)) +x = np.arange(n_times) +a2 = 315. + +axs[0].plot(x, quadratic_with_season(xopt, x), '-', label="Linear least squares") +axs[0].plot(x, co2.interpolated, '-', lw=0.6, label="Measured") +axs[0].legend(loc="best") + +axs[1].plot( + x, + co2.interpolated - quadratic_with_season(xopt, x), + 's-', + lw=0.8, + mfc="none", + mec="0.9") +axs[1].axhline(0, color="0.7") +``` + + + + + + + + + +![png](050_Linear_models_files/050_Linear_models_34_1.png) + + +So that's pretty good: by adding a simple cosine term, we can now start to model the annual seasonality in the measurements, and the sum of squared residuals is now further shrunk to around 1500. This is good, but in some ways unsurprising: you're now solving for 4 parameters, rather than 3 or 2 (for the simple linear case), so you have more degrees of freedom, and you expect to be able to fit your data better. + +## A phase shift + +Looking at the residuals, we might decide that there's some mileage in shifting the cosine term a bit to get a better fit. We could do this by adding a phase shift so that the cosine terms would look like + +$$ +\cos\left[ \frac{2\pi}{T}(t+\phi)\right] +$$ + +However, it'd be hard to guess $\phi$ (we've effectively assumed it was 0 radians above!). So we'd need to use some non-linear solving approach. However, we might exploit the following trigonometrical identity: + +$$ +A\cos(\theta) + B\sin(\theta)=C\sin(\theta + \phi), +$$ + +
    +Can you prove the above identity? +
    + + +This means that we can just add (drumroll...) yet another term to our model (a sine term), and the ratio of the cosine and sine terms will result in a phase shift. As we're adding another term, we expect a better result, but in this case, we hope that the aim of adding this extra term is to have **uncorrelated residuals around 0**. + +
    +You should be able to do this yourself, including model fitting and plotting. +
    + + + +```python +def quadratic_with_season_shift(p, t, period=12.): + a0, a1, a2, a3, a4 = p + return a0 * t * t + a1 * t + a2 + \ + a3 * np.cos(2 * np.pi * (t / period)) + \ + a4 * np.sin(2 * np.pi * (t / period)) + + + +period = 12. +# We create the A matrix +x = np.arange(n_times) +A = np.array([x * x, x, np.ones_like(x), + np.cos(2 * np.pi * (x / period)), + np.sin(2 * np.pi * (x / period))]) +# Now put the observations into y +y = co2.interpolated + +# Call lstsq +xopt, sum_of_residuals, r, evals = np.linalg.lstsq(A.T, y) +for par in range(5): + print("a{}: {:08.5e}".format(par, xopt[par])) +print(f"Sum of squares: {float(sum_of_residuals):g}") +``` + + a0: 8.97749e-05 + a1: 6.43387e-02 + a2: 3.14528e+02 + a3: 2.29780e+00 + a4: 1.66387e+00 + Sum of squares: 694.742 + + + /Users/plewis/anaconda3/envs/geog0111/lib/python3.7/site-packages/ipykernel_launcher.py:19: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions. + To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`. + + +The squared sum of residuals is now around 645, again an improvement on the solution. We can see the fit and residuals I got (yours should be similar) here + +![quadratic with seasonal shift](images/quadratic_season_shift.png) + +We're now within the +/- 2 units band, which is a reasonable estimate. + +## Prediction + +A model isn't very good if you don't challenge it to predict phenomean outside the training range. We could just extend the $x-$ axis further left or right and see what the model predicts, but we'd be **extrapolating**. + +
    +Given the simplicity of the model, and what you know about $CO_2$ dynamics over the past ~100 years, would you trust these extrapolations? +
    + +We can sort of mimic this behaviour by fitting the model only to a subset of years, and then test it for the rest of the available time series. For example, fit for the first 20 years, and then forecasat the remaining years. Or fit the last 20 years and forecast the previous years until the 1950s... If the model is successful in its predictions, then we can say that the model is probably OK, but if the quality of the predictions is poor, then we need to start thinking about **discarding** the model, and looking for alternatives! + + + +
    +Fit the model to the first 30 years of data, and then use it to predict the complete time series. Your result should look something like below. Can you explain what's going on there? +
    + +![Extrapolation plot](images/extrapolation.png) + +## Uncertainty + +We have not said anything about how the model predictions are *uncertain*: we only used a limited dataset, with measurements errors associated to it. Even within the training period, the residuals are not 0, so we can expect that the model has some bits of reality missing from it (it *is* a model, after all!). Uncertainty would allow us to quantify how good or bad the predictions from the model are, but so far, we have ignored it... + +
    +In the 30 year training experiment, can you sketch how you think uncertainty should look like? +
    + + + +```python +def quadratic_with_season_shift(p, t, period=12.): + a0, a1, a2, a3, a4 = p + return a0 * t * t + a1 * t + a2 + \ + a3 * np.cos(2 * np.pi * (t / period)) + \ + a4 * np.sin(2 * np.pi * (t / period)) + + + +period = 12. +# We create the A matrix +x = np.arange(n_times)[:12*30] +A = np.array([x * x, x, np.ones_like(x), + np.cos(2 * np.pi * (x / period)), + np.sin(2 * np.pi * (x / period))]) +# Now put the observations into y +y = co2.interpolated[:12*30] + +# Call lstsq +xopt, sum_of_residuals, r, evals = np.linalg.lstsq(A.T, y) +for par in range(5): + print("a{}: {:08.5e}".format(par, xopt[par])) +print(f"Sum of squares: {float(sum_of_residuals):g}") +``` + + a0: 1.37498e-04 + a1: 4.93748e-02 + a2: 3.15199e+02 + a3: 2.10181e+00 + a4: 1.70328e+00 + Sum of squares: 166.035 + + + /Users/plewis/anaconda3/envs/geog0111/lib/python3.7/site-packages/ipykernel_launcher.py:19: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions. + To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`. + + + +```python + +``` diff --git a/docs/051_Modelling_and_optimisation 2.md b/docs/051_Modelling_and_optimisation 2.md new file mode 100644 index 00000000..c1039176 --- /dev/null +++ b/docs/051_Modelling_and_optimisation 2.md @@ -0,0 +1,287 @@ +# 5 Modelling and optimisation + +

    Table of Contents

    + + +## 5.1 Introduction + +In this sections, we will build some models to describe environmental processes. We wil then use observational data to calibrate and test these models. + + +## 5.2 Get datasets + +We first get the datasets we will need. + +These are the MODIS LAI and land cover data and associated ECMWF temperature data. Datasets are available in `npz` files that we have previously generated. + + +```python +# required general imports +import matplotlib.pyplot as plt +import cartopy.crs as ccrs +%matplotlib inline +import numpy as np +import sys +import os +from pathlib import Path +import gdal +from datetime import datetime, timedelta +from geog0111.geog_data import procure_dataset +``` + + +```python +# conditions +year = 2016 +country_code = 'UK' +``` + + +```python +''' +Load the prepared LAI data +''' +# read in the LAI data for given country code +lai_filename = f'data/lai_data_{year}_{country_code}.npz' +# get the dataset in case its not here +procure_dataset(Path(lai_filename).name,verbose=False) + +lai_data = np.load(lai_filename) +print(lai_filename,list(lai_data.keys())) + +# unload for use +dates, lai, weights, interpolated_lai = lai_data['dates'],lai_data['lai'],\ + lai_data['weights'],lai_data['interpolated_lai'] +lai[weights==0.] = np.nan + +print(lai.shape) +``` + + data/lai_data_2016_UK.npz ['dates', 'lai', 'weights', 'interpolated_lai'] + (2624, 1396, 92) + + +Recall that land cover is interpreted as: + + +| Name | Value | Description | +|------|-------|-------------| +|Water Bodies|0|At least 60% of area is covered by permanent water bodies.| +|Grasslands|1|Dominated by herbaceous annuals (<2m) includ- ing cereal croplands.| +|Shrublands|2|Shrub (1-2m) cover >10%.| +|Broadleaf Croplands|3|Dominated by herbaceous annuals (<2m) that are cultivated with broadleaf crops.| +|Savannas|4|Between 10-60% tree cover (>2m).| +|Evergreen Broadleaf Forests|5|Dominated by evergreen broadleaf and palmate trees (>2m). Tree cover >60%.| +|Deciduous Broadleaf Forests|6|Dominated by deciduous broadleaf trees (>2m). Tree cover >60%.| +|Evergreen Needleleaf Forests|7|Dominated by evergreen conifer trees (>2m). Tree cover >60%.| +|Deciduous Needleleaf Forests|8|Dominated by deciduous needleleaf (larch) trees (>2m). Tree cover >60%.| +|Non-Vegetated Lands|9|At least 60% of area is non-vegetated barren (sand, rock, soil) or permanent snow and ice with less than 10% vegetation.| +|Urban and Built-up Lands|10|At least 30% impervious surface area including building materials, asphalt, and vehicles.| +|Unclassified|255|Has not received a map label because of missing inputs.| + + +```python +''' +Load the prepared landcover data +''' +# read in the LAI data for given country code +lc_filename = f'data/landcover_{year}_{country_code}.npz' +# get the dataset in case its not here +procure_dataset(Path(lc_filename).name,verbose=False) + +lc_data = np.load(lc_filename) +print(lc_filename,list(lc_data.keys())) + +# unload for use +LC_Type3, lc_data = lc_data['LC_Type3'],lc_data['lc_data'] + +from geog0111.plot_landcover import plot_land_cover +print(plot_land_cover(lc_data,year,country_code)) +print(lc_data.shape) +``` + + data/landcover_2016_UK.npz ['LC_Type3', 'lc_data'] + ['Water Bodies' 'Grasslands' 'Shrublands' 'Broadleaf Croplands' 'Savannas' + 'Evergreen Broadleaf Forests' 'Deciduous Broadleaf Forests' + 'Evergreen Needleleaf Forests' 'Deciduous Needleleaf Forests' + 'Non-Vegetated Lands' 'Urban and Built-up Lands'] + (2624, 1396) + + + +![png](051_Modelling_and_optimisation_files/051_Modelling_and_optimisation_7_1.png) + + + +```python + +''' +Load the prepared T 2m data +''' +t2_filename = f'data/europe_data_{year}_{country_code}.npz' +# get the dataset in case its not here +procure_dataset(Path(t2_filename).name,verbose=False) +t2data = np.load(t2_filename) +print(t2_filename,list(t2data.keys())) + +timer, temp2, extent = t2data['timer'], t2data['temp2'], t2data['extent'] +print(temp2.shape) +``` + + data/europe_data_2016_UK.npz ['timer', 'temp2', 'extent'] + (366, 2624, 1396) + + +Now let's plot the datasets: + + +```python +# visualise the interpolated dataset +import matplotlib.pylab as plt +import cartopy.crs as ccrs +%matplotlib inline + +plt.figure(figsize=(12,12)) +ax = plt.subplot ( 3, 2, 1 ,projection=ccrs.Sinusoidal.MODIS) +ax.coastlines('10m') +ax.set_title(f'T2m ECMWF dataset for {country_code}: {str(t2data["timer"][0])}') +im = ax.imshow(temp2[0],extent=extent) +plt.colorbar(im,shrink=0.75) + +ax = plt.subplot ( 3, 2, 3 ,projection=ccrs.Sinusoidal.MODIS) +ax.coastlines('10m') +ax.set_title(f'MODIS interpolated LAI {country_code}: {str(t2data["timer"][0])}') +im = plt.imshow(interpolated_lai[:,:,0],vmax=6,extent=extent) +plt.colorbar(im,shrink=0.75) + +ax = plt.subplot ( 3, 2, 5 ,projection=ccrs.Sinusoidal.MODIS) +ax.coastlines('10m') +ax.set_title(f'MODIS LAI {country_code}: {str(t2data["timer"][0])}') +im = plt.imshow(lai[:,:,0],vmax=6,extent=extent) +plt.colorbar(im,shrink=0.75) + + + +plt.subplot ( 3, 2, 2 ) +plt.title(f'mean T2m for {country_code}') +plt.plot(timer,np.nanmean(temp2,axis=(1,2))) +plt.ylabel('temperature 2m / C') +plt.subplot ( 3,2, 4 ) +plt.title(f'mean interpolated LAI for {country_code}') +mean = np.nanmean(interpolated_lai,axis=(0,1)) +plt.plot(timer[::4],mean) +plt.subplot ( 3,2, 6 ) +plt.title(f'mean LAI for {country_code}') +mean = np.nanmean(lai,axis=(0,1)) +plt.plot(timer[::4],mean) + + +``` + + /Users/plewis/anaconda/envs/geog0111/lib/python3.6/site-packages/ipykernel_launcher.py:37: RuntimeWarning: Mean of empty slice + + + + + + [] + + + + +![png](051_Modelling_and_optimisation_files/051_Modelling_and_optimisation_10_2.png) + + +## 5.3 Interpretation of the data + +We can see that the raw LAI temporal profile (bottom right plot) can be very noisy, even when averaged spatially. + +The 'true' temporal profile is probably much better represented in the 'interpolated LAI' dataset, although this may be ober-smoothed. + +From the interpolated dataset, we see that the LAI trajectory 'takes off' in the Spring (March/April), and 'falls' in the Autumn (October/November), which is the pattern we would expect of Western European vegetation. There is some evidence of multiple 'peaks' in the higher LAI values, which is suggestive of the signal being a compound of thebehaviour of multiple vegetation types. + +The periods of rapid change in LAI correspond to when the mean (2m) temperature is around 10 C. + +Now let's look at a particular land cover type: grasslands. + + +```python +lc = 1 + +# need 2 versions of this as datasets +# have time stacked differently +flc_data1 = lc_data[...,np.newaxis] +flc_data2 = lc_data[np.newaxis,...] + +for d in [flc_data1,flc_data2]: + mask = d==lc + d[mask] = 1 + d[~mask] = 0 +``` + + +```python +''' +filter datasets by land cover +''' +interpolated_lai_ = interpolated_lai*flc_data1 +interpolated_lai_[interpolated_lai_==0] = np.nan +lai_ = lai*flc_data1 +lai_[lai==0] = np.nan +temp2_ = temp2*flc_data2 +temp2_[temp2==0] = np.nan + +plt.figure(figsize=(12,12)) +ax = plt.subplot ( 3, 2, 1 ,projection=ccrs.Sinusoidal.MODIS) +ax.coastlines('10m') +ax.set_title(f'T2m ECMWF dataset for {country_code}: {str(t2data["timer"][0])}') +im = ax.imshow((temp2_)[0],extent=extent) +plt.colorbar(im,shrink=0.75) + +ax = plt.subplot ( 3, 2, 3 ,projection=ccrs.Sinusoidal.MODIS) +ax.coastlines('10m') +ax.set_title(f'MODIS interpolated LAI {country_code}: {str(t2data["timer"][0])}') +im = plt.imshow(interpolated_lai_[:,:,0],vmax=6,extent=extent) +plt.colorbar(im,shrink=0.75) + +ax = plt.subplot ( 3, 2, 5 ,projection=ccrs.Sinusoidal.MODIS) +ax.coastlines('10m') +ax.set_title(f'MODIS LAI {country_code}: {str(t2data["timer"][0])}') +im = plt.imshow((lai_)[:,:,0],vmax=6,extent=extent) +plt.colorbar(im,shrink=0.75) + +plt.subplot ( 3, 2, 2 ) +plt.title(f'mean T2m for {country_code}') +plt.plot(timer,np.nanmean(temp2_,axis=(1,2))) +plt.ylabel('temperature 2m / C') +plt.subplot ( 3,2, 4 ) +plt.title(f'mean interpolated LAI for {country_code}') +mean = np.nanmean(interpolated_lai_,axis=(0,1)) +plt.plot(timer[::4],mean) +plt.subplot ( 3,2, 6 ) +plt.title(f'mean LAI for {country_code}') +mean = np.nanmean(lai_,axis=(0,1)) +plt.plot(timer[::4],mean) + + + +``` + + /Users/plewis/anaconda/envs/geog0111/lib/python3.6/site-packages/ipykernel_launcher.py:39: RuntimeWarning: Mean of empty slice + + + + + + [] + + + + +![png](051_Modelling_and_optimisation_files/051_Modelling_and_optimisation_13_2.png) + + + +```python + +``` diff --git a/docs/bin 2.md b/docs/bin 2.md new file mode 100644 index 00000000..8f9f66b0 --- /dev/null +++ b/docs/bin 2.md @@ -0,0 +1,134 @@ + +## Docker + +The Docker file is in [`Docker/Dockerfile`](copy/Dockerfile). This is a minimal setup for running notebooks for this course, being based on [`jgomezdans/uclgeog`](https://hub.docker.com/r/jgomezdans/uclgeog) which is derived from [`geog_docker`](https://github.com/jgomezdans/geog_docker). + +This docker is stored on dockerhub as [`ucleo/geog0111`](https://hub.docker.com/r/ucleo/geog0111) and derived from this repo [`UCL-EO/geog0111`](https://github.com/UCL-EO/geog0111). + +It is automatically run from [travis](https://travis-ci.com/github/UCL-EO/geog0111) on a new load, so you shouldn't need to generate the docker manually. + +The docker cleans out the environment for `uclgeog`, as we want opne called `geog0111`, clones [this repository](https://github.com/UCL-EO/geog0111) and runs [`bin/setup.sh`](bin/setup.sh) to install the `geog0111` environment. It activates the environment and runs [`bin/postBuild`](bin/postBuild). + +### Clean up docker files + +The script [`bin/docker-killall`](bin/docker-killall) cleans up any cached dockers, kills running dockers, and cleans it all out. Use this only to make a clean slate for your docker in the repoi. +Otherwise, use more subtle `docker` commands. It is intended onbly for developers, but is run as user. + +### Build docker + +The script [`bin/docker-build`](bin/docker-build) will build and upload the docker. This is again intended just for developers, and requires a login to dockerhub. + +### Run docker + +The script [`bin/docker-run`](bin/docker-run) can be used to run the docker and launch a notebook. It will attempt to map the `work` directory to the current directory or `${HOME}/OneDrive*`. It will try to re-use an existing docker image. + +Normal use then would be something like: + + cd /Users/plewis/work + ~/geog0111/bin/docker-run + + +The first time you use this, it will show something like: + + --> running bin/docker-run from /Users/plewis/work + --> mount /Users/plewis/work + --> bin/docker-run: no existing docker image found + --> bin/docker-run: running docker with /Users/plewis/work as /home/jovyan/notebooks/work + +So anything we save into `/home/jovyan/notebooks/work` in the notebook will go into `/Users/plewis/work`. + +The notebook will start with a message such as: + + Serving notebooks from local directory: /home/jovyan/geog0111/notebooks + 1 active kernel + The Jupyter Notebook is running at: + http://8fbac34af2bc:8888/?token=42897db02a8f1168ae2d7fb37aa2acdfcba51d47560a96dc + or http://127.0.0.1:8888/?token=42897db02a8f1168ae2d7fb37aa2acdfcba51d47560a96dc + +and you can access the notebooks from this address in a browser. + +To end the session, type `^C` (`CONTROL + c`) in the teminal you ran the command from. + +If you now type: + + docker ps -l + +You should see an existing image e.g.: + + docker ps -l + CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES + 8fbac34af2bc ucleo/geog0111 "tini -g -- start-no…" 28 minutes ago Exited (0) 2 minutes ago jolly_maxwell + +If you really want to delete that (start again with yoiur notebooks), you can with: + + docker container rm jolly_maxwell + +but it should normally be fine to re-use. + +The next time you run thiis command, it should recognise an existing docker image and respind with something like: + + --> running bin/docker-run from /Users/plewis/work + --> mount /Users/plewis/work + --> bin/docker-run: using docker image jolly_maxwell + +## Scripts + +### `bin/setup.sh` + +[`bin/setup.sh`](bin/setup.sh) is the core setup script. It is run for example from [`Docker/Dockerfile`](copy/Dockerfile) +but may also be run on the repository. It should usually be run by the user and should work from any operating system. + + setup.sh [-r | --remove] [-f|--force] | [-n|--no_force] + +The main purpose of the script is to run the `conda` setup to make the conda environment `geog0111` from [Docker/environment.yml](copy/environment.yml). +It will detect if windows is being used (so run `conda.bat`) and test to see if the environment `geog0111` already exists. If it does, it can be removed (`--remove`) or a foce install done (`--force`). Otherwise, it will try to update the environment from [Docker/environment.yml](copy/environment.yml). + +It generates a file [`~/.dockenvrc`](copy/dockenvrc) to be run on shell startup to activate the environment. + +After running this script, you should manually activate the environment: + + conda activcate geog0111 + +### `bin/postBuild` + +[`bin/postBuild`](bin/postBuild) is run after [`bin/setup.sh`](bin/setup.sh) and does jobs such as setting up the jupyter notebook extensions, installing the `geog0111` package locally (using [`setup.py`](copy/setup.py)) and ensuring shell initialisation is properly done for subsequent sessions. It should be run the the user, and woul;d normally be run after any new run of [`bin/setup.sh`](bin/setup.sh). + +### `bin/link-set.sh` + +[`bin/link-set.sh`](bin/link-set.sh) is the directory linking script. +It should usually be run by the user. + +Users may work in any of several directories, so we need to put in symbolic links +from common directories (`data` `$repo` `images`) in each of these to ensure +correct operation. This script does that: It goes into each of `notebooks` +`notebooks/work` `docs` `notebooks_lab` `notebooks_lab/work` `docs/work` and puts a +symbolic link to `data` `$repo` `images` in `..`. Since relative paths are used, +this is portable. + +In addition, it puts a link in from `~/$repo` to `$repo` for convenience (unless this +already exists). It also puts a link in from `$UCLDATA` to `data/ucl` (default `${HOME}/geog0111/work`) +so that a system-wide data directory can be put in, and referred to asd `data/ucl` from +scripts. + +This is called [`bin/setup.sh`](bin/setup.sh), but may be run independently to fix any broken links. + +### `bin/notebook-run.sh` + +This script runs all notebooks in [`notebooks`](notebooks) using `jupyter nbconvert`. The files are saved as `*.nbconvert.md`. The running is tolerant to errors. If the file is no different to the original, the original is kept. Otherwise the user is prompted to see if you want to replace the original notebook with the one that has been executed. You are provided with information on file sizes, which should help with this decision: you might not want to save a notebook that is (much) smaller than the original. Backups are stored in `backup.$$` which you need to manually clear. + +You might run this as a pre-cursor to [`bin/notebook-mkdocs.sh`](bin/notebook-mkdocs.sh). + +### `bin/notebook-mkdocs.sh` + +This script [`bin/notebook-mkdocs.sh`](bin/notebook-mkdocs.sh) filters notebooks in [`notebooks`](notebooks) into [`notebooks_lab`](notebooks_lab) using [`geog0111/edit_notebook.py`](geog0111/edit_notebook.py). This filters out noteboom extensions and other features, and makes the notebooks suitable for running in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/), rather than just `jupyter noteboook`. It strips out exercises defiuned in `exercise2` cells, into new notebooks with the pattern `_answers.md`. + +The script takes the [`notebooks_lab`](notebooks_lab) notebooks, and converts to markdown in [`docs`](docs) and prepares the environment for the document generator `mkdocs` using [`geog0111/mkdocs_prep.py`](geog0111/mkdocs_prep.py). +It runs `mkdocs build` locally. The documents can be viewed with: + + mkdocs serve + +and/or uploaded to the document server (by the developer) using: + + mkdocs gh-deploy --force + + diff --git a/docs/index 2.md b/docs/index 2.md new file mode 100644 index 00000000..bb2eb53e --- /dev/null +++ b/docs/index 2.md @@ -0,0 +1,81 @@ + +![UCL](images/ucl_logo.png) + +# GEOG0111 + +UCL Geography MSc notes. + +| | | | | | +|---|---|---|---|---| +|Author: [Prof. P. Lewis](mailto:p.lewis@ucl.ac.uk)|version 1.0.1|||| + +# GEOG0111 Scientific Computing + +[![Documentation Status](https://readthedocs.org/projects/geog0111-scientific-computing/badge/?version=latest)](https://geog0111-scientific-computing.readthedocs.io/en/latest/?badge=latest) + [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/UCL-EO/geog0111/master) + [![Travis-CI](https://travis-ci.com/UCL-EO/geog0111.svg?branch=master)](https://travis-ci.com/github/UCL-EO/geog0111) + + +## Course information + +### Course Convenor and Contributing Staff + +[Prof P. Lewis](http://www.geog.ucl.ac.uk/~plewis) + +| | | | +|---|---|---| +|[Dr Qingling Wu](http://www.geog.ucl.ac.uk/about-the-department/people/research-staff/qingling-wu/)| [Dr. Jose Gomez-Dans](http://www.geog.ucl.ac.uk/about-the-department/people/research-staff/jose-gomez-dans/)|[Feng Yin](https://www.geog.ucl.ac.uk/people/research-students/feng-yin)| +### Purpose of this course + +This course, GEOG0111 Scientific Computing, is a term 1 MSc module worth 15 credits (25% of the term 1 credits) that aims to: + +* impart an understanding of scientific computing +* give students a grounding in the basic principles of algorithm development and program construction +* to introduce principles of computer-based image analysis and model development + +It is open to students from a number of MSc courses run by the Department of Geography UCL, but the material should be of wider value to others wishing to make use of scientific computing. + +The module will cover: + +* Computing in Python +* Computing for image analysis +* Computing for environmental modelling +* Data visualisation for scientific applications + +### Learning Outcomes + +At the end of the module, students should: + +* have an understanding of the Python programmibng language and experience of its use +* have an understanding of algorithm development and be able to use widely used scientific computing software to manipulate datasets and accomplish analytical tasks +* have an understanding of the technical issues specific to image-based analysis, model implementation and scientific visualisation + + +## Timetable + +The course takes place over 10 weeks in term 1, usually in the Geography Department Unix Computing Lab (PB110) in the [Northwest wing](http://www.ucl.ac.uk/estates/roombooking/building-location/?id=003), UCL. + +Due to covid restrictions, it is being run online in the 2020-21 session. + +Classes take place from the second week of term to the final week of term, other than Reading week. See UCL [term dates](http://www.ucl.ac.uk/staff/term-dates) for further information. + +The timetable is available on the UCL Academic Calendar. Live class sessions will take place in groups on Monday and Wednesday mornings. + +### Assessment + +Assessment is through two pieces of coursework, submitted in both paper form and electronically via Moodle. + +See the [Moodle page](https://moodle.ucl.ac.uk/course/view.php?id=2796) for more details. + +### Useful links + +[Course Moodle page](https://moodle.ucl.ac.uk/course/view.php?id=2796) + +### Using the course notes + +We will generally use `jupyter` notebooks for running interactive Python programs. + + + +# Notes + diff --git a/docs/index 2.rst b/docs/index 2.rst new file mode 100644 index 00000000..286f7739 --- /dev/null +++ b/docs/index 2.rst @@ -0,0 +1,76 @@ +.. GEOG0111 UCL MSc Scientific Computing documentation master file, created by + sphinx-quickstart on Tue Sep 8 14:35:26 2020. + You can adapt this file completely to your liking, but it should at least + contain the root directive. + +GEOG0111 Scientific Computing +============================= + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + + + 001_Notebook_use.md + 002_Unix.md + 003_Local_Install.md + 005_Help.md + + 010_Python_Introduction.md + 011_Python_data_types.md + 012_Python_strings.md + 013_Python_string_methods.md + 014_Python_groups.md + 015_Python_control.md + 016_Python_for.md + 017_Functions.md + 018_Python_files.md + 019_Running_Python.md + + 020_NASA_MODIS_Earthdata.md + 021_GoogleEarthEngine.md + + 031_Numpy_matplotlib.md + 031_Plotting.md + + 040_GDAL.md + 041_MODIS_download.md + 042_GDAL_masking.md + 043_GDAL_stacking_and_interpolating.md + 044_GDAL_Reconciling_projections.md + + 050_Linear_models.md + 051_Modelling_and_optimisation.md + &id001 + 001_Notebook_use_answers.md + 002_Unix_answers.md + 005_Help_answers.md + 010_Python_Introduction_answers.md + 011_Python_data_types_answers.md + 012_Python_strings_answers.md + 013_Python_string_methods_answers.md + 014_Python_groups_answers.md + 015_Python_control_answers.md + 016_Python_for_answers.md + 017_Functions_answers.md + 019_Running_Python_answers.md + 020_NASA_MODIS_Earthdata_answers.md + *id001 + *id001 + *id001 + *id001 + *id001 + *id001 + *id001 + *id001 + *id001 + *id001 + *id001 + *id001 + +Indices and tables +================== + +* :ref:`genindex` +* :ref:`modindex` +* :ref:`search` diff --git a/docs/index.md b/docs/index.md index bb2eb53e..be769c98 100644 --- a/docs/index.md +++ b/docs/index.md @@ -25,6 +25,8 @@ UCL Geography MSc notes. | | | | |---|---|---| |[Dr Qingling Wu](http://www.geog.ucl.ac.uk/about-the-department/people/research-staff/qingling-wu/)| [Dr. Jose Gomez-Dans](http://www.geog.ucl.ac.uk/about-the-department/people/research-staff/jose-gomez-dans/)|[Feng Yin](https://www.geog.ucl.ac.uk/people/research-students/feng-yin)| + + ### Purpose of this course This course, GEOG0111 Scientific Computing, is a term 1 MSc module worth 15 credits (25% of the term 1 credits) that aims to: diff --git a/docs/index.rst b/docs/index.rst index 286f7739..2003a77a 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -68,6 +68,8 @@ GEOG0111 Scientific Computing *id001 *id001 + bin.md + Indices and tables ================== diff --git a/geog0111/mkdocs_prep.py b/geog0111/mkdocs_prep.py index bea588ed..f4766dce 100755 --- a/geog0111/mkdocs_prep.py +++ b/geog0111/mkdocs_prep.py @@ -70,8 +70,6 @@ {info['long_description']} -# Notes - '''