Included in this repository are tools to help process data from Google Scholar. The tools are written in Python, and utilise the Beautiful Soup to handle most of the HTML parsing. You will need to install this if you have not already got it installed.
This script scrapes a given author's publication page, then returns a link to each publication on their initial page.
To scrape an author's page, you enter the following in the command line:
$ python "some scholar url"
You must replace the some scholar url, with the a google scholar link for an individual. However, it is important that you leave the quotations marks.
$ python ""
Publications for Jonathan Hurlock: Searching Twitter: Separating the Tweet from the Chaff. ==> Keyword clouds: having very little effect on sensemaking in web search engines ==>
This script scrapes a given publication page. It will also try and retreive the MIME type of any linked documents.
To scrape a publication page, you enter the following in the command line:
$ python "some publication url"
You must replace the some publication url, with the publicaiton's google scholar link. However, it is important that you leave the quotations marks.
$ python ""
URL Scraped: Title: Searching Twitter: Separating the Tweet from the Chaff. Paper URL: Paper File Type: application/pdf Authors: ['Jonathan Hurlock', 'Max L Wilson'] Desription: Abstract Within the millions of digital communications posted in online social networks, thereis undoubtedly some valuable and useful information. Although a large portion of socialmedia content is considered to be babble, research shows that people share useful links,provide recommendations to friends, answer questions, and solve problems. In this paper,we report on a qualitative investigation into the different factors that make tweets 'useful'and'not useful'for a set of common search tasks. The investigation found 16 features that help...