Skip to content

Commit

Permalink
added a few asides
Browse files Browse the repository at this point in the history
  • Loading branch information
Mark Pilgrim committed Sep 28, 2009
1 parent 3e0cb2a commit 727c149
Show file tree
Hide file tree
Showing 5 changed files with 46 additions and 3 deletions.
8 changes: 7 additions & 1 deletion advanced-iterators.html
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ <h2 id=divingin>Diving In</h2>

<p>Puzzles like this are called <i>cryptarithms</i> or <i>alphametics</i>. The letters spell out actual words, but if you replace each letter with a digit from <code>0&ndash;9</code>, it also &#8220;spells&#8221; an arithmetic equation. The trick is to figure out which letter maps to each digit. All the occurrences of each letter must map to the same digit, no digit can be repeated, and no &#8220;word&#8221; can start with the digit 0.

<p>The most well-known alphametic puzzle is <code>SEND + MORE = MONEY</code>.
<aside>The most well-known alphametic puzzle is <code>SEND + MORE = MONEY</code>.</aside>

<p>In this chapter, we&#8217;ll dive into an incredible Python program originally written by Raymond Hettinger. This program solves alphametic puzzles <em>in just 14 lines of code</em>.

Expand Down Expand Up @@ -107,6 +107,8 @@ <h2 id=re-findall>Finding all occurrences of a pattern</h2>
<samp class=p>>>> </samp><kbd class=pp>re.findall(' s.*? s', "The sixth sick sheikh's sixth sheep's sick.")</kbd>
<samp class=pp>[' sixth s', " sheikh's s", " sheep's s"]</samp></pre>

<aside>This is the <a href=http://en.wikipedia.org/wiki/Tongue-twister>hardest tongue twister</a> in the English language.</aside>

<p>Surprised? The regular expression looks for a space, an <code>s</code>, and then the shortest possible series of any character (<code>.*?</code>), then a space, then another <code>s</code>. Well, looking at that input string, I see five matches:

<ol>
Expand Down Expand Up @@ -258,6 +260,8 @@ <h2 id=permutations>Calculating Permutations&hellip; The Lazy Way!</h2>
<li>That&#8217;s it! Those are all the permutations of <code>[1, 2, 3]</code> taken 2 at a time. Pairs like <code>(1, 1)</code> and <code>(2, 2)</code> never show up, because they contain repeats so they aren&#8217;t valid permutations. When there are no more permutations, the iterator raises a <code>StopIteration</code> exception.
</ol>

<aside>The <code>itertools</code> module has all kinds of fun stuff.</aside>

<p>The <code>permutations()</code> function doesn&#8217;t have to take a list. It can take any sequence&nbsp;&mdash;&nbsp;even a string.

<pre class=screen>
Expand Down Expand Up @@ -438,6 +442,8 @@ <h2 id=string-translate>A New Kind Of String Manipulation</h2>
<li>The <code>translate()</code> method on a string takes a translation table and runs the string through it. That is, it replaces all occurrences of the keys of the translation table with the corresponding values. In this case, &#8220;translating&#8221; <code>MARK</code> to <code>MORK</code>.
</ol>

<aside>Now you&#8217;re getting to the <em>really</em> fun part.</aside>

<p>What does this have to do with solving alphametic puzzles? As it turns out, everything.

<pre class=screen>
Expand Down
6 changes: 6 additions & 0 deletions comprehensions.html
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ <h3 id=getcwd>The Current Working Directory</h3>
<li>Explain the result
</ol>

<aside>There is always a current working directory.</aside>

<p>If you don&#8217;t know about the current working directory, step 1 will probably fail with an <code>ImportError</code>. Why? Because Python will look for the example module in <a href=your-first-python-program.html#importsearchpath>the import search path</a>, but it won&#8217;t find it because the <code>examples</code> folder isn&#8217;t one of the directories in the search path. To get past this, you can do one of two things:

<ol>
Expand Down Expand Up @@ -108,6 +110,8 @@ <h3 id=glob>Listing Directories</h3>

<p>The <code>glob</code> module is another tool in the Python standard library. It&#8217;s an easy way to get the contents of a directory programmatically, and it uses the sort of wildcards that you may already be familiar with from working on the command line.

<aside>The <code>glob</code> module uses shell-like wildcards.</aside>

<pre class=screen>
<samp class=p>>>> </samp><kbd class=pp>os.chdir('/Users/pilgrim/diveintopython3/')</kbd>
<samp class=p>>>> </samp><kbd class=pp>import glob</kbd>
Expand Down Expand Up @@ -189,6 +193,8 @@ <h3 id=abspath>Constructing Absolute Pathnames</h3>

<h2 id=listcomprehension>List Comprehensions</h2>

<aside>You can use any Python expression in a list comprehension.</aside>

<p>A <dfn>list comprehension</dfn> provides a compact way of mapping a list into another list by applying a function to each of the elements of the list.

<pre class=screen>
Expand Down
3 changes: 3 additions & 0 deletions dip3.css
Original file line number Diff line number Diff line change
Expand Up @@ -345,6 +345,9 @@ aside.ots {
aside a {
color: #fff !important;
}
aside code {
font-size: inherit;
}

/* previous/next navigation links */

Expand Down
14 changes: 13 additions & 1 deletion files.html
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ <h3 id=encoding>Character Encoding Rears Its Ugly Head</h3>
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 28: character maps to &lt;undefined></samp>
<samp class=p>>>> </samp></pre>

<aside>The default encoding is platform-dependent.</aside>

<p>What just happened? You didn&#8217;t specify a character encoding, so Python is forced to use the default encoding. What&#8217;s the default encoding? If you look closely at the traceback, you can see that it&#8217;s dying in <code>cp1252.py</code>, meaning that Python is using CP-1252 as the default encoding here. (CP-1252 is a common encoding on computers running Microsoft Windows.) The CP-1252 character set doesn&#8217;t support the characters that are in this file, so the read fails with an ugly <code>UnicodeDecodeError</code>.

<p>But wait, it&#8217;s worse than that! The default encoding is <em>platform-dependent</em>, so this code <em>might</em> work on your computer (if your default encoding is <abbr>UTF-8</abbr>), but then it will fail when you distribute it to someone else (whose default encoding is different, like CP-1252).
Expand Down Expand Up @@ -101,6 +103,8 @@ <h3 id=read>Reading Data From A Text File</h3>
<li>Perhaps somewhat surprisingly, reading the file again does not raise an exception. Python does not consider reading past end-of-file to be an error; it simply returns an empty string.
</ol>

<aside>Always specify an <code>encoding</code> parameter when you open a file.</aside>

<p>What if you want to re-read a file?

<pre class=screen>
Expand Down Expand Up @@ -202,6 +206,8 @@ <h3 id=close>Closing Files</h3>

<h3 id=with>Closing Files Automatically</h3>

<aside><code>try..finally</code> is good. <code>with</code> is better.</aside>

<p>Stream objects have an explicit <code>close()</code> method, but what happens if your code has a bug and crashes before you call <code>close()</code>? That file could theoretically stay open for much longer than necessary. While you&#8217;re debugging on your local computer, that&#8217;s not a big deal. On a production server, maybe it is.

<p>Python 2 had a solution for this: the <code>try..finally</code> block. That still works in Python 3, and you may see it in other people&#8217;s code or in older code that was <a href=case-study-porting-chardet-to-python-3.html>ported to Python 3</a>. But Python 2.5 introduced a cleaner solution, which is now the preferred solution in Python 3: the <code>with</code> statement.
Expand Down Expand Up @@ -275,6 +281,8 @@ <h3 id=for>Reading Data One Line At A Time</h3>

<h2 id=writing>Writing to Text Files</h2>

<aside>Just open a file and start writing.</aside>

<p>You can write to files in much the same way that you read from them. First you open a file and get a stream object, then you use methods on the stream object to write data to the file, then you close the file.

<p>To open a file for writing, use the <code>open()</code> function and specify the write mode. There are two file modes for writing:
Expand Down Expand Up @@ -361,7 +369,9 @@ <h2 id=binary>Binary Files</h2>

<p class=a>&#x2042;

<h2 id=file-like-objects>Streams Objects From Non-File Sources</h2>
<h2 id=file-like-objects>Stream Objects From Non-File Sources</h2>

<aside>To read from a fake file, just call <code>read()</code>.</aside>

<p>Imagine you&#8217;re writing a library, and one of your library functions is going to read some data from a file. The function could simply take a filename as a string, go open the file for reading, read it, and close it before exiting. But you shouldn&#8217;t do that. Instead, your <abbr>API</abbr> should take <em>an arbitrary stream object</em>.

Expand Down Expand Up @@ -433,6 +443,8 @@ <h3 id=gzip>Handling Compressed Files</h3>

<h2 id=stdio>Standard Input, Output, and Error</h2>

<aside><code>sys.stdin</code>, <code>sys.stdout</code>, <code>sys.stderr</code>.</aside>

<p>Command-line gurus are already familiar with the concept of standard input, standard output, and standard error. This section is for the rest of you.

<p>Standard output and standard error (commonly abbreviated <code>stdout</code> and <code>stderr</code>) are pipes that are built into every <abbr>UNIX</abbr>-like system, including Mac OS X and Linux. When you call the <code>print()</code> function, the thing you&#8217;re printing is sent to the <code>stdout</code> pipe. When your program crashes and prints out a traceback, it goes to the <code>stderr</code> pipe. By default, both of these pipes are just connected to the terminal window where you are working; when your program prints something, you see the output in your terminal window, and when a program crashes, you see the traceback in your terminal window too. In the graphical Python Shell, the <code>stdout</code> and <code>stderr</code> pipes default to your &#8220;Interactive Window&#8221;.
Expand Down
18 changes: 17 additions & 1 deletion http-web-services.html
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ <h3 id=caching>Caching</h3>

<p>The most important thing to understand about any type of web service is that network access is incredibly expensive. I don&#8217;t mean &#8220;dollars and cents&#8221; expensive (although bandwidth ain&#8217;t free). I mean that it takes an extraordinary long time to open a connection, send a request, and retrieve a response from a remote server. Even on the fastest broadband connection, <i>latency</i> (the time it takes to send a request and start retrieving data in a response) can still be higher than you anticipated. A router misbehaves, a packet is dropped, an intermediate proxy is under attack&nbsp;&mdash;&nbsp;there&#8217;s <a href=http://isc.sans.org/>never a dull moment</a> on the public internet, and there may be nothing you can do about it.

<aside><code>Cache-Control</code> means &#8220;don't bug me until next week.&#8221;</aside>

<p><abbr>HTTP</abbr> is designed with caching in mind. There is an entire class of devices (called &#8220;caching proxies&#8221;) whose only job is to sit between you and the rest of the world and minimize network access. Your company or <abbr>ISP</abbr> almost certainly maintains caching proxies, even if you&#8217;re unaware of them. They work because caching built into the <abbr>HTTP</abbr> protocol.

<p>Here&#8217;s a concrete example of how caching works. You visit <a href=http://diveintomark.org/><code>diveintomark.org</code></a> in your browser. That page includes a background image, <a href=http://wearehugh.com/m.jpg><code>wearehugh.com/m.jpg</code></a>. When your browser downloads that image, the server includes the following <abbr>HTTP</abbr> headers:
Expand Down Expand Up @@ -82,6 +84,8 @@ <h3 id=last-modified>Last-Modified Checking</h3>

<p>Some data never changes, while other data changes all the time. In between, there is a vast field of data that <em>might</em> have changed, but hasn&#8217;t. CNN.com&#8217;s feed is updated every few minutes, but my weblog&#8217;s feed may not change for days or weeks at a time. In the latter case, I don&#8217;t want to tell clients to cache my feed for weeks at a time, because then when I do actually post something, people may not read it for weeks (because they&#8217;re respecting my cache headers which said &#8220;don&#8217;t bother checking this feed for weeks&#8221;). On the other hand, I don&#8217;t want clients downloading my entire feed once an hour if it hasn&#8217;t changed!

<aside><code>Last-Modified</code> means &#8220;same shit, different day.&#8221;</aside>

<p><abbr>HTTP</abbr> has a solution to this, too. When you request data for the first time, the server can send back a <code>Last-Modified</code> header. This is exactly what it sounds like: the date that the data was changed. That background image referenced from <code>diveintomark.org</code> included a <code>Last-Modified</code> header.

<pre class=nd><code>HTTP/1.1 200 OK
Expand Down Expand Up @@ -130,7 +134,9 @@ <h3 id=etags>ETags</h3>
Content-Type: image/jpeg
</code></pre>

The second time you request the same data, you include the ETag hash in an <code>If-None-Match</code> header of your request. If the data hasn&#8217;t changed, the server will send you back a <code>304</code> status code. As with the last-modified date checking, the server sends back <em>only</em> the <code>304</code> status code; it doesn&#8217;t send you the same data a second time. By including the ETag hash in your second request, you&#8217;re telling the server that there&#8217;s no need to re-send the same data if it still matches this hash, since <a href=#caching>you still have the data from the last time</a>.
<aside><code>ETag</code> means &#8220;there&#8217;s nothing new under the sun.&#8221;</aside>

<p>The second time you request the same data, you include the ETag hash in an <code>If-None-Match</code> header of your request. If the data hasn&#8217;t changed, the server will send you back a <code>304</code> status code. As with the last-modified date checking, the server sends back <em>only</em> the <code>304</code> status code; it doesn&#8217;t send you the same data a second time. By including the ETag hash in your second request, you&#8217;re telling the server that there&#8217;s no need to re-send the same data if it still matches this hash, since <a href=#caching>you still have the data from the last time</a>.

<p>Again with the <kbd>curl</kbd>:

Expand Down Expand Up @@ -161,6 +167,8 @@ <h3 id=redirects>Redirects</h3>

<p><a href=http://www.w3.org/Provider/Style/URI>Cool <abbr>URI</abbr>s don&#8217;t change</a>, but many <abbr>URI</abbr>s are seriously uncool. Web sites get reorganized, pages move to new addresses. Even web services can reorganize. A syndicated feed at <code>http://example.com/index.xml</code> might be moved to <code>http://example.com/xml/atom.xml</code>. Or an entire domain might move, as an organization expands and reorganizes; <code>http://www.example.com/index.xml</code> becomes <code>http://server-farm-1.example.com/index.xml</code>.

<aside><code>Location</code> means &#8220;look over there!&#8221;</aside>

<p>Every time you request any kind of resource from an <abbr>HTTP</abbr> server, the server includes a status code in its response. Status code <code>200</code> means &#8220;everything&#8217;s normal, here&#8217;s the page you asked for&#8221;. Status code <code>404</code> means &#8220;page not found&#8221;. (You&#8217;ve probably seen 404 errors while browsing the web.) Status codes in the 300&#8217;s indicate some form of redirection.

<p><abbr>HTTP</abbr> has several different ways of signifying that a resource has moved. The two most common techiques are status codes <code>302</code> and <code>301</code>. Status code <code>302</code> is a <i>temporary redirect</i>; it means &#8220;oops, that got moved over here temporarily&#8221; (and then gives the temporary address in a <code>Location</code> header). Status code <code>301</code> is a <i>permanent redirect</i>; it means &#8220;oops, that got moved permanently&#8221; (and then gives the new address in a <code>Location</code> header). If you get a <code>302</code> status code and a new address, the <abbr>HTTP</abbr> specification says you should use the new address to get what you asked for, but the next time you want to access the same resource, you should retry the old address. But if you get a <code>301</code> status code and a new address, you&#8217;re supposed to use the new address from then on.
Expand Down Expand Up @@ -224,6 +232,8 @@ <h2 id=whats-on-the-wire>What&#8217;s On The Wire?</h2>
<li>The fourth line specifies the name of the library that is making the request. By default, this is <code>Python-urllib</code> plus a version number. Both <code>urllib.request</code> and <code>httplib2</code> support changing the user agent, simply by adding a <code>User-Agent</code> header to the request (which will override the default value).
</ol>

<aside>We&#8217;re downloading 3070 bytes when we could have just downloaded 941.</aside>

<p>Now let&#8217;s look at what the server sent back in its response.

<pre class=screen>
Expand Down Expand Up @@ -393,6 +403,8 @@ <h3 id=why-bytes>A Short Digression To Explain Why <code>httplib2</code> Returns

<p>If you know what sort of resource you&#8217;re expecting (an <abbr>XML</abbr> document in this case), perhaps you could &#8220;just&#8221; pass the returned <code>bytes</code> object to the <a href=xml.html#xml-parse><code>xml.etree.ElementTree.parse()</code> function</a>. That&#8217;ll work as long as the <abbr>XML</abbr> document includes information on its own character encoding (as this one does), but that&#8217;s an optional feature and not all <abbr>XML</abbr> documents do that. If an <abbr>XML</abbr> document doesn&#8217;t include encoding information, the client is supposed to look at the enclosing transport&nbsp;&mdash;&nbsp;<i>i.e.</i> the <code>Content-Type</code> <abbr>HTTP</abbr> header, which can include a <code>charset</code> parameter.

<p class=ss><a style=border:0 href=http://www.cafepress.com/feedparser><img src=http://feedparser.org/img/feedparser.jpg alt="[I support RFC 3023 t-shirt]" width=150 height=150></a>

<p>But it&#8217;s worse than that. Now character encoding information can be in two places: within the <abbr>XML</abbr> document itself, and within the <code>Content-Type</code> <abbr>HTTP</abbr> header. If the information is in <em>both</em> places, which one wins? According to <a href=http://www.ietf.org/rfc/rfc3023.txt>RFC 3023</a> (I swear I am not making this up), if the media type given in the <code>Content-Type</code> <abbr>HTTP</abbr> header is <code>application/xml</code>, <code>application/xml-dtd</code>, <code>application/xml-external-parsed-entity</code>, or any one of the subtypes of <code>application/xml</code> such as <code>application/atom+xml</code> or <code>application/rss+xml</code> or even <code>application/rdf+xml</code>, then the encoding is

<ol>
Expand Down Expand Up @@ -456,6 +468,8 @@ <h3 id=httplib2-caching>How <code>httplib2</code> Handles Caching</h3>
<li>Here&#8217;s the rub: this &#8220;response&#8221; was generated from <code>httplib2</code>&#8217;s local cache. That directory name you passed in when you created the <code>httplib2.Http</code> object&nbsp;&mdash;&nbsp;that directory holds <code>httplib2</code>&#8217;s cache of all the operations it&#8217;s ever performed.
</ol>

<aside>What&#8217;s on the wire? Absolutely nothing.</aside>

<blockquote class=note>
<p><span class=u>&#x261E;</span>If you want to turn on <code>httplib2</code> debugging, you need to set a module-level constant (<code>httplib2.debuglevel</code>), then create a new <code>httplib2.Http</code> object. If you want to turn off debugging, you need to change the same module-level constant, then create a new <code>httplib2.Http</code> object.
</blockquote>
Expand Down Expand Up @@ -576,6 +590,8 @@ <h3 id=httplib2-etags>How <code>httplib2</code> Handles <code>Last-Modified</cod

<h3 id=httplib2-compression>How <code>http2lib</code> Handles Compression</h3>

<aside>&#8220;We have both kinds of music, country AND western.&#8221;</aside>

<p><abbr>HTTP</abbr> supports <a href=#compression>two types of compression</a>. <code>httplib2</code> supports both of them.

<pre class=screen>
Expand Down

0 comments on commit 727c149

Please sign in to comment.