Example of naive way of Parsing

s=urllib.urlopen("https://github.com/siddhant3s/sendsms").read()
l=s.find("""<div id="repository_description" rel="repository_description_edit">""")
s=s[l:]
l=s.find("<p>")
s=s[l:]
s=s[3:] #removing <p>
r=s.find("<span")
s[:r]
print s
'A python script to send sms non-interactively via fullonsms.com'

Results of University Website

http://uptu.ac.in/results/EVEN_SEMESTER_10_11/bte4_10_11.asp?rollno=0909110103 soup.find(text=”First Year”).next.next.string

BeautifulSoup

Basic Travarsal And Finding

soup.html soup.body soup.p.parent soup(‘p’) soup.find(‘p’) soup.findAll(‘p’) soup.findAll(‘div’) len(soup.findAll(‘div’)) soup.findAll(‘div’, id=”wrapper”)

soup.findAll(‘div’, onclick=”window.location.reload()”) #festember soup.findAll(‘div’, onclick=”window.location.reload()”)[0].string soup.findAll(‘div’, onclick=”window.location.reload()”)[0][‘class’]

soup.findAll(‘div’, id=”wrapper”)[0][‘id’] soup.findAll(‘div’, id=”mainWrap”)[0].header

Bad HTML

from BeautifulSoup import BeautifulSoup html = “<html><p>Para 1<p>Para 2<blockquote>Quote 1<blockquote>Quote 2” soup = BeautifulSoup(html) print soup.prettify()

Unicode

Parsing

.parent .content

for x in soup.body: print x

Searching

findAll

regex
attrs
list of tags to find [‘table’,’p’]
a function
Keyword as argument to findAll
CSS class shortcut
calling findAll equals calling tag

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notes.org

notes.org

Example of naive way of Parsing

Results of University Website

BeautifulSoup

Basic Travarsal And Finding

Bad HTML

Unicode

Parsing

Searching

findAll

find

Youtube Example

Simple Cookie Example

Screen Setup

VNC Server

Files

notes.org

Latest commit

History

notes.org

File metadata and controls

Example of naive way of Parsing

Results of University Website

BeautifulSoup

Basic Travarsal And Finding

Bad HTML

Unicode

Parsing

Searching

findAll

find

Youtube Example

Simple Cookie Example

Screen Setup

VNC Server