Since TikTok API is in constant change and I'm having issues using some of the TikTok APIs and Python modules available in the market I've built this scraper to help me do some of the data acquistion from their platform.\
You can either download the Zip file from github or simply clone the repository to your environment.
This script requires a TikTok account login, once you create that login you should create a .env
file in the main folder of the project to add your credentials.
Folder structure will look like this:
.
├── README.md
├── .env
├── modules
│ ├── general_classes.py
│ └── tiktok_elements.json
├── requirements.txt
├── results
│ ├── json
│ ├── posts_list_#hashtag.csv
│ └── user_list_#hashtag.csv
└── tiktok_scraper.py
username = your_tiktok_username
password = your_tiktok_password
- Install requirements by runing the python command
pip3 install -r requirements.txt
- After requirements installation run
python3 tiktok_scraper.py
- If everything goes well a Chrome browser screen will open to TikTok's login page
- You will have to manually login into TikTok, this script do not capture or store any login information from the user
- After you login you will have to type your search keyword that will be searched on TikTok
- Once you type the keywork and hit enter the script will scrape the front page search results of the search query, and will continue working on its own you can follow what the script is doing on the browser screen
For Mac users, the Chromedriver extension is executed within the script no additional installation is needed. For Linux and Windows users, you will need to install the Chromedriver - Reference here: https://www.makeuseof.com/how-to-install-selenium-webdriver-on-any-computer-with-python/
Additional reference for Windows users: http://www.learningaboutelectronics.com/Articles/How-to-install-chromedriver-Python-windows.php
Make sure to update the Chromedriver path on code line:
driver = webdriver.Chrome(executable_path="/usr/bin/chromedriver", options=options)
Make sure to update the Chromedriver path on code line
driver = webdriver.Chrome(executable_path="PATH_TO_CHROMEDRIVER.exe", options=options)
As a result the script will save a Json file with the following Swagger:
{
"7079374480363048198":{
"postURL":"https://www.tiktok.com/@planetmatters/video/7079374480363048198",
"postcontent":"Let's clean up the ocean! 💙😊 #plasticpollution #climatechange #oceancleanup #sealover #foryou #fypシ ",
"commentsCount":"46.8K comments",
"comments":{
"@planetmatters":{
"Userlink":"https://www.tiktok.com/@planetmatters",
"UserName":"Planet Matters 🌍 ",
"UserFollowing":"1",
"UserFollowers":"2.9M",
"UserLikes":"29.3M",
"ReplyContent":"Thanks for the video @treasurehawaii 🎥💙",
"Replylikes":"3",
"replies":0
}
}
}
}
- Optimize script
- Introduce concurrent futures and parallel processing
Done
- Add Pandas and overall CSV export support
- Added dotenv support