Profile crawling, specific reaction data crawling, adding error handling #56

laols574 · 2020-02-04T22:33:20Z

ADDITIONALLY:
On comments.py, there is a section of code that begins with "if back". This part is checking whether or not it needs to iterate upwards to get the rest of the comments in the replies. However, for some comments, like

https://mbasic.facebook.com/comment/replies/?ctoken=10162169751605725_10162170377070725&p=129&count=168&pc=1&ft_ent_identifier=10162169751605725&gfid=AQBjT1xFFeGcZxyW&refid=52&__tn__=R

which IS the first link visited from the main comment page because FB displays a middle comment on the main page due to its popularity. In order to prevent missing out on scraping these entries,
you should change:

back = response.xpath('//div[contains(@id,"comment_replies_more_1")]/a/@href').extract()

to

back = response.xpath('//div[contains(@id,"comment_replies_more_2")]/a/@href').extract()

in order to get the algorithm to iterate forwards as well. After, you have to merge these two separately generated csv files. This ended up being the easiest solution for me, but it's definitely possible to be done within a single program

added additional items to gather more information about the profile of the user and the specific reactions

see comments for details on changes, but I add random time pauses, crawled specific reaction and profile data

updated error handling

laols574 added 3 commits February 4, 2020 14:45

Update items.py

9f0149f

added additional items to gather more information about the profile of the user and the specific reactions

Update comments.py

25cf067

see comments for details on changes, but I add random time pauses, crawled specific reaction and profile data

Update fbcrawl.py

d7ba4bc

updated error handling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profile crawling, specific reaction data crawling, adding error handling #56

Profile crawling, specific reaction data crawling, adding error handling #56

laols574 commented Feb 4, 2020

Profile crawling, specific reaction data crawling, adding error handling #56

Are you sure you want to change the base?

Profile crawling, specific reaction data crawling, adding error handling #56

Conversation

laols574 commented Feb 4, 2020