Skip to content

Latest commit

 

History

History
 
 

wiki_cralwler

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Introduction

This folder is related to wiki famous scientist infobox biography vcard crawler. The main target is to collect character infobox biography which include 'Doctoral advisor', 'Doctoral students', 'Born', 'Died','Nationality', 'Awards', 'Fields' and so on.

More importantly, the crawler is not only about Marie_Curie, but also includes her doctoral advisor and doctoral students. Besides, her doctoral advisor's doctoral advisor and her doctoral students' doctoral students, etc.. In ideal status, the whole academic circle's scholar will be crawlered. Really amazing!!!

The crawler file is flexible which can be revised by yourself if you want to collect more information aboout the scientist or other person's infobox biograpy.

Work environment

Windows + Python3 + Pycharm

Actually, Python2 is also feasible , so is Linux or Mac, you just need to change the code a little bit.

Notice

If you want to execute the script, you may need to put the two files(juli_nm.py and uid.txt) in the same directory, otherwise some error will be generated.

The script is rough which includes many 'if' constructure. I really appreciate that if you can revise the scripts and make it nore modularizd, which can make it more clear and reduce the number of code.

Ackonwledgement

I sincerely thanks to my junior fellow apprentice Hepeichao. With his help, I successfully completed the script.