Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I am not getting right contents from the url #23

Open
kevinbsc opened this issue Aug 6, 2015 · 1 comment
Open

I am not getting right contents from the url #23

kevinbsc opened this issue Aug 6, 2015 · 1 comment

Comments

@kevinbsc
Copy link

kevinbsc commented Aug 6, 2015

Great tool!

url = "http://news.bbc.co.uk/2/hi/health/2284783.stm"

I am not getting right contents from the url. The above url is used in NLTK as an example.
http://www.nltk.org/book_1ed/ch03.html

@rodricios
Copy link
Owner

Hi @kevinbschae, are you using v2 of eatiht? If so, please go back to using v1. v2 is arguably less accurate than v1 (I plan to properly address that empirically supported sentiment when time permits).

Here's what I get when I use v1:

from eatiht import eatiht

url = "http://news.bbc.co.uk/2/hi/health/2284783.stm"

print eatiht.extract(url)

Output:

The last natural blondes will die out within 200 years, scientists believe. A study by experts in Germany suggests people with blonde hair are an endangered species and will become extinct by 2202. Researchers predict the last truly natural blonde will be born in Finland - the country with the highest proportion of blondes. The frequency of blondes may drop but they won't disappear ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants