-
Notifications
You must be signed in to change notification settings - Fork 0
/
summary.txt
35 lines (32 loc) · 1.54 KB
/
summary.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
`Beautiful Soup, so rich and green,
Waiting in a hot tureen!
Who for such dainties would not stoop?
Soup of the evening, beautiful Soup!
Soup of the evening, beautiful Soup!
Beau--ootiful Soo--oop!
Beau--ootiful Soo--oop!
Soo--oop of the e--e--evening,
Beautiful, beautiful Soup!'
--Lewis Carroll
Many websites have geodata that's embedded in HTML, with no published
API to retrieve the original underlying data. Beautiful Soup is a
Python library for quick, simple extraction of data from HTML
pages. At the end of this tutorial, you will know how to write a
Python script to use Beautiful Soup to extract geographic data from
web pages that you didn't write and don't control.
We'll dip our toes into several areas:
- Just enough Python to be able to install the Python packages you
need and create the script.
- The structure of an HTML document.
- How to go spelunking in an HTML document to find the geographic
information you're looking for, and build a simple script to extract it.
- Some options for geocoding the information, and getting output in a
usable format.
If you know of a website containing geodata that you'd like to extract and
use, bring the URL and we can look at how to attack it.
--------
Hal Mueller's software projects have used geodata for space-based
radar analysis, animal habitat usage simulation, tree migration
simulation, celestial navigation, historic site mapping, and ship
tracking. He is currently an independent developer, creating 3D
applications for the Apple Vision Pro headset.