Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile crawling, specific reaction data crawling, adding error handling #56

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

laols574
Copy link

@laols574 laols574 commented Feb 4, 2020

ADDITIONALLY:
On comments.py, there is a section of code that begins with "if back". This part is checking whether or not it needs to iterate upwards to get the rest of the comments in the replies. However, for some comments, like

https://mbasic.facebook.com/comment/replies/?ctoken=10162169751605725_10162170377070725&p=129&count=168&pc=1&ft_ent_identifier=10162169751605725&gfid=AQBjT1xFFeGcZxyW&refid=52&__tn__=R

which IS the first link visited from the main comment page because FB displays a middle comment on the main page due to its popularity. In order to prevent missing out on scraping these entries,
you should change:

back = response.xpath('//div[contains(@id,"comment_replies_more_1")]/a/@href').extract()

to

back = response.xpath('//div[contains(@id,"comment_replies_more_2")]/a/@href').extract()

in order to get the algorithm to iterate forwards as well. After, you have to merge these two separately generated csv files. This ended up being the easiest solution for me, but it's definitely possible to be done within a single program

added additional items to gather more information about the profile of the user and the specific reactions
see comments for details on changes, but I add random time pauses, crawled specific reaction and profile data
updated error handling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant