Skip to content

Commit

Permalink
speed up python etc
Browse files Browse the repository at this point in the history
  • Loading branch information
madsmpedersen committed May 19, 2014
1 parent 479724f commit 4fe2777
Show file tree
Hide file tree
Showing 16 changed files with 2,266 additions and 0 deletions.
Binary file added lesson 7/Speed up Python.pptx
Binary file not shown.
343 changes: 343 additions & 0 deletions lesson 7/cython/Cython.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,343 @@
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#Cython\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##Introduction\n",
"Cython is an optimizing static compiler for Python that gives you the combined power of Python and C.\n",
"\n",
"Normally the use of C-code sacrifices two main reasons for using Python, namely the high-level language and cross-platform code. \n",
"In this context, however, cython translates normal Python code into C-code which is compiled on the target machine and imported as a normal library.\n",
"\n",
"The process requires the cython package, a setup script and a compiler.\n",
"\n",
"- The cython package is accessible from the homepage http://cython.org/, from the Python Package Index, winpython, anaconda etc.\n",
"- The setup script can be generated automatically by the 'cython_import.py' script (found in the folder of this notebook)\n",
"- The compiler (from http://docs.cython.org/src/quickstart/install.html):\n",
" - Linux: The GNU C Compiler (gcc) is usually present, or easily available through the package system. On Ubuntu or Debian, for instance, the command sudo apt-get install build-essential will fetch everything you need.\n",
" - Mac OS X: To retrieve gcc, one option is to install Apple\u2019s XCode, which can be retrieved from the Mac OS X\u2019s install DVDs or from http://developer.apple.com.\n",
" - Windows: A popular option is to use the open source MinGW (a Windows distribution of gcc). See the appendix for instructions for setting up MinGW manually. EPD and Python(x,y) bundle MinGW, but some of the configuration steps in the appendix might still be necessary. Another option is to use Microsoft\u2019s Visual C. One must then use the same version which the installed Python was compiled with.\n",
"\n",
"\n",
"###A prime function\n",
"Run the following function by pressing `Shift+Enter` in the cell.\n",
"The `prime`-function returns the number of primes in the range [2..n] and the `print_time`-decorator prints the execution time (see Notebook on decorators)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"\n",
"def primes(n):\n",
" count = 0\n",
" for i in xrange(2,n):\n",
" for j in xrange(2,i):\n",
" if i%j==0: \n",
" break\n",
" else:\n",
" count+=1\n",
" return count\n",
"%timeit primes(10000)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"###Compile function\n",
"Here is a copy of the script in `cy_primes.py`:\n",
"\n",
"------------------------------------------\n",
" def primes(n):\n",
" count = 0\n",
" for i in xrange(2,n):\n",
" for j in xrange(2,i):\n",
" if i%j==0: \n",
" break\n",
" else:\n",
" count+=1\n",
" return count\n",
"----------------------------------------- \n",
"Yes, it is exactly similar to the function above, but by running the next script it will be compiled to a C-library `cy_primes.pyd`(Windows)/`cy_primes.so`(Unix), which is imported.\n",
"\n",
"Finally the compiled version of `primes` is in executed from `primes_cython` in order to see the execution time\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from cython_import import cython_import\n",
"\n",
"# Translate, compile and import cy_primes.py\n",
"cython_import('cy_primes')\n",
"import cy_primes #import after compilation\n",
"\n",
"\n",
"# Test compiled function\n",
"% timeit cy_primes.primes(10000)\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Yeah!, writing C-code is rather easy!!!\n",
"\n",
"The line\n",
"\n",
" `cython_import('cy_primes')\u00b4 \n",
"\n",
"does the magic. It:\n",
"\n",
"- Generates a `cy_primes.pyx` file from the code from `cy_primes.py`\n",
"- Generates a setup.py file containing a build script\n",
"- Executes the setup script which makes cython compile `cy_primes.pyx` into `cy_primes.pyd`/`cy_primes.so`\n",
"- Deletes all temporary files\n",
"\n",
"But the speedup is not impressing... yet!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##Static typing\n",
"One of the major differences between C and Python is that:\n",
"\n",
"- Python is dynamic type\n",
" - There is no need for variable declaration (Python determines the type based on the contents)\n",
" - The type of a variable can be changed during execution\n",
"- C is static type\n",
" - All variables must be declared to specify the type\n",
" - Variables cannot change type during execution\n",
"\n",
"Dynamic typing makes it easy to code, but slow to execute, so in order to speed up the compiled function, variables must be declared.\n",
"\n",
"Variable declation can be done in two ways: Pure and `cdef`\n",
"\n",
"###Pure\n",
"Pure provides an easy and pythonic way to apply static typing. The documentation can be found at: http://docs.cython.org/src/tutorial/pure.html\n",
"\n",
"One advantage of using pure is that types are declared via python decorators. This means that if the module for some reason cannot be compiled, then it can be used as a normal python module.\n",
"\n",
"In `cy_primes_pure.py` four lines are appended above the function\n",
"<pre>\n",
"\n",
" import cython\n",
" @cython.ccall\n",
" @cython.locals(n=cython.int, count=cython.int, i=cython.int, j=cython.int)\n",
" @cython.returns(cython.int)\n",
" def primes(n):\n",
" ...\n",
"</pre>\n",
"\n",
"- `@cython.ccal` declares the function as a fast C-function with a Python wrapper(i.e. it is fast and callable from Python)<br>\n",
"Tip: If the Python wrapper is not required, `@cython.cfunc`, can be used (The function will be a little faster, but cannot be invoked from Python modules)\n",
"- `@cython.locals(...` declares the type of local variables and input arguments<br>\n",
" - Signed integers: char, short, int, long, longlong\n",
" - Unsigned integers: uchar, ushort, uint, ulong, ulonglong\n",
" - floats: float, double, longdouble\n",
" - Boolean: bint\n",
" - Python types: list, tuple, dict\n",
" - Pointers: p_int, pp_int,... (see http://docs.cython.org/src/tutorial/pure.html)\n",
"- `@cython.returns(...` declares the type of the returned value\n",
"\n",
"\n",
"Now, the speedup should be significant\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from cython_import import cython_import\n",
"\n",
"# Translate, compile and import cy_primes_pure.py\n",
"cython_import(\"cy_primes_pure\")\n",
"import cy_primes_pure #import after compilation\n",
"\n",
"\n",
"# Test compiled function\n",
"%timeit cy_primes_pure.primes(10000)\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"###Exercise 1\n",
"Optimize the integrate functions, `integrate` and `f`, below using Pure (Note that scipy contains a whole library for integration). It is possible to speed up the execution speed about 100 times (You may need to increase `N` as the execution time is printed in ms)\n",
"\n",
"In the bottom of the script, the `test_func([func1, func2,...],(arg1,arg2,...))`-call, performs a test, in which a cython compiled version of the two functions are compare to the python version.\n",
"\n",
"_Note: The Kernel must be restarted before every launch. Otherwise the compiled libray cannot be replaced._"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import cython \n",
"\n",
"# write your pure declarations here\n",
"def f(x):\n",
" return x**2-x\n",
"\n",
"# write your pure declarations here\n",
"def integrate(a, b, N): \n",
" s = 0\n",
" dx = (b-a)/N\n",
" for i in xrange(N):\n",
" s += f(a+i*dx)\n",
" return s * dx\n",
"\n",
"# Compile, import and compare compiled version to python version\n",
"from test_func import test_func\n",
"a,b,N = 0,10000,1000000 #testing arguments\n",
"test_func([integrate, f],(a,b,N))"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"###cdef\n",
"\n",
"In fact, cython translates Pyrex code and not Python code into C-code (as described in the beginning of this notebook). Pyrex, with the file extension `.pyx`, is an extension of Python that adds function and variable type declarations.\n",
"\n",
"In Pyrex types are declared using the `cdef` syntax: C-functions with a Python wrapper are declared using `cpdef` while C-functions without a Python wrapper and local variables are declared via `cdef`, e.g:\n",
"\n",
" cpdef int primes(int n):\n",
" cdef int count, i, j\n",
" ...\n",
"\n",
"\n",
"Unfortunately, Python cannot not import Pyrex code, as it is not legal Python syntax, but a work around has been implemented in `cython_import`:\n",
"\n",
"When `cython_import` generates the .pyx file from the python code, it searches all lines for \"#c\" and replaces everything before \"#\" with the text after the \"#\". <br>\n",
"This means that `cython_import` generates the Pyrex code above from the following code legal python code.\n",
"\n",
" def primes(n): #cpdef int primes(int n):\n",
" #cdef int count, i, j\n",
" ...\n",
"\n",
"\n",
"Take a look at `cy_primes_cdef.py` where this declaration is implemented and run the following script. The execution time should be similar to the pure-approach"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from cython_import import cython_import\n",
"\n",
"# Translate, compile and import cy_primes_cdef.py\n",
"cython_import(\"cy_primes_cdef\")\n",
"import cy_primes_cdef #import after compilation\n",
"\n",
"# Test compiled function\n",
"%timeit cy_primes_cdef.primes(10000)\n",
"\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `cdef` approach has an advantage over the pure-approach, as it supports declaration of numpy arrays, e.g:\n",
"\n",
" #cimport numpy as np\n",
" def foo(A): #cpdef foo(np.ndarray[np.int_t,ndim=2] A):\n",
" ...\n",
"\n",
"- The first parameter in the square brackets should be a numpy type followed by `\"_t\"`. The suffix indicates that it is a compile time type.\n",
"- The second parameter specifies the number of dimensions of the array. (default is 1).\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"###Exercise 2\n",
"Speed up the convolution function below using `cdef` declarations. In this case it is possible to speed up the function more than 500 times (You may need to increase `N` as `print_time` prints in ms)<br>\n",
"Again you must restart the Kernel before every launch."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import cython\n",
"import numpy as np\n",
"\n",
"def convolve(A,f): \n",
" f_size = f.shape[0]\n",
" f_half_size = f_size/2\n",
" B = np.zeros_like(A, dtype=np.double)\n",
" for i in xrange(f_half_size,A.shape[0]-f_half_size):\n",
" s = 0.\n",
" for j in xrange(i-f_half_size,i+f_half_size+1):\n",
" s += A[j]\n",
" B[i] = s/f_size\n",
"\n",
" return B\n",
"\n",
"# Compile, import and compare compiled version to python version\n",
"from test_func import test_func\n",
"N = 1000000\n",
"A = np.random.randint(0,100,N) # 1D Numpy array of N random integers in range [0,100]\n",
"f = np.array([1,1,1]) # 1D Numpy array of integers\n",
"test_func([convolve],(A,f))"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Loading

0 comments on commit 4fe2777

Please sign in to comment.