Skip to content

gpip/aiounfurl

 
 

Repository files navigation

Build Status Coverage Status

aiounfurl

Using this library you can extract meta information from web pages and create site preview. The library uses four sources of information:

  1. oEmbed
  2. Open Graph
  3. Twitter Cards
  4. HTML meta tags

Requirements

  • python 3.5
  • aiohttp
  • beautifulsoup4
  • html5lib

Installation

pip install aiounfurl

Example of using

To extract all site data:

import asyncio
import aiohttp
from pprint import pprint
from aiounfurl.views import get_preview_data, fetch_all


async def get_links_data(links, loop):
    results = []
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_all(session, l, loop) for l in links]
        results = await asyncio.gather(*tasks, loop=loop, return_exceptions=True)
    return [{'link':l, 'data': d} for l, d in zip(links, results)]


links = [
    'https://habrahabr.ru/post/314606/',
    'https://www.youtube.com/watch?v=9EftQMnuhvU',
    'https://medium.freecodecamp.com/million-requests-per-second-with-python-95c137af319'
]
loop = asyncio.get_event_loop()
result = loop.run_until_complete(get_links_data(links, loop))
loop.close()
pprint(result)

Server example.

Full example you can find here.

Install required packages for running example:

pip install -r example/requirements.txt

Run python srv.py runserver, then open http://127.0.0.1:8080/

Running the example in Docker

I added a docker image with the example in http://hub.docker.com/ to run the sample as a separate independent service.

Running in the background:

docker run --name aiounfurl -p 8080:8080 -d tigorc/aiounfurl

then you can open our example http://127.0.0.1:8080/.

Using the list of oEmbed providers (a json file with a list of providers /path_to_file/providers.json has to be preliminarily created):

docker run --name aiounfurl -p 8080:8080 -e "OEMBED_PROVIDERS_FILE=/srv/app/providers.json" -v /path_to_file/providers.json:/srv/app/providers.json -d tigorc/aiounfurl

Tests

Install the tox package and run command:

tox

About

Making site preview

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 79.9%
  • HTML 20.1%