I am not getting right contents from the url #23

kevinbsc · 2015-08-06T04:44:33Z

Great tool!

url = "http://news.bbc.co.uk/2/hi/health/2284783.stm"

I am not getting right contents from the url. The above url is used in NLTK as an example.
http://www.nltk.org/book_1ed/ch03.html

rodricios · 2015-08-06T18:31:46Z

Hi @kevinbschae, are you using v2 of eatiht? If so, please go back to using v1. v2 is arguably less accurate than v1 (I plan to properly address that empirically supported sentiment when time permits).

Here's what I get when I use v1:

from eatiht import eatiht

url = "http://news.bbc.co.uk/2/hi/health/2284783.stm"

print eatiht.extract(url)

Output:

The last natural blondes will die out within 200 years, scientists believe. A study by experts in Germany suggests people with blonde hair are an endangered species and will become extinct by 2202. Researchers predict the last truly natural blonde will be born in Finland - the country with the highest proportion of blondes. The frequency of blondes may drop but they won't disappear ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I am not getting right contents from the url #23

I am not getting right contents from the url #23

kevinbsc commented Aug 6, 2015

rodricios commented Aug 6, 2015

I am not getting right contents from the url #23

I am not getting right contents from the url #23

Comments

kevinbsc commented Aug 6, 2015

rodricios commented Aug 6, 2015