Twitter Data Analysis Tool

中文 | English

📚 Overview

This project provides a comprehensive pipeline for analyzing Twitter (X) data, including crawling tweets, retrieving comments and user information based on keywords and date ranges, and further performing keyword extraction, sentiment analysis, and statistical descriptions.

🛠️ Features

1. Twitter Data Crawling

Crawl tweets, user information, and comments based on specified keywords and date ranges.
Extract structured data, including:
- Tweet metadata (content, likes, retweets, replies, and timestamps).
- User information (followers, location, etc.).
- Comments related to the tweets.

2. Data Cleaning and Preprocessing

Convert JSON to Excel:
- Merge and process raw JSON files containing tweets, user information, and comments.
- Convert data into structured Excel files (.xlsx) with three sheets:
  - Main Data: Detailed tweet information.
  - Comments: Associated comment data.
  - Users: User profile data.
Preprocessing Methods:
- Remove duplicate and irrelevant records.
- Clean special characters (e.g., removing @ mentions).
- Standardize date formats and handle missing values.

3. Sentiment Analysis

Classify tweets and comments into predefined sentiment categories using advanced models.

Classification System Prompt:

You are an advanced language model trained to analyze the sentiment of a given text. Your task is to classify the text into one of the following categories:

1. Positive
2. Negative
3. Objective
4. Sarcastic
5. Neutral
6. Disappointment
7. Disgust
8. Contempt
9. Hate

For the input text, determine the most appropriate category based on its emotional tone, intent, and context. Return only the category name as the output, with no additional explanation or commentary.

Supported sentiment categories include:
- Positive, Negative, Neutral, Sarcastic, Hate, Disappointment, Contempt, Disgust.
Visualize sentiment trends over time.

4. Language and Cultural Background Analysis

Analyze language distribution in tweets and comments.
Generate:
- Language distribution pie charts.
- Language profiles to showcase cultural trends.

5. Keyword and Topic Analysis

Extract keywords and topics using:
- TF-IDF: Identify frequently discussed topics.
- TextRank: Highlight key phrases using graph-based ranking methods.
Create word clouds and bar charts for intuitive visualization of keyword information.

6. Statistical and Engagement Analysis

Measure tweet engagement:
- Analyze distributions of likes, retweets, and comments.
- Use bar charts to display total and average engagement.
Explore sentiment distributions and engagement trends over time.

7. Network and Co-occurrence Analysis

Build networks:
- Keyword Co-occurrence: Highlight relationships between frequently mentioned terms.
- Tweet Propagation: Analyze retweet or comment relationships in tweets.
Export networks in .gexf format for visualization in tools like Gephi.

8. Automated Report Generation

Summarize analysis results into a .docx report, including:
- Data insights.
- Sentiment and language trends.
- Keyword highlights and engagement statistics.

📋 Quick Start

1. Clone the Repository

git clone https://github.com/AquariniqueMu/TwitterCrawl4Analysis.git
cd TwitterCrawl4Analysis

2. Install Dependencies

Python version: >=3.8 (Recommended: Python 3.12.5)
Install required libraries:
```
pip install -r requirements.txt
```

3. Running the Pipeline

This project uses Jupyter Notebook files (.ipynb) for all steps of the analysis. Follow the steps below:

(1) Data Crawling

Open twitter_crawl.ipynb.
Configure the keywords, date range, and Twitter account information. Execute each cell to crawl tweets, comments, and user information.

(2) Data Preprocessing

Open json_process.ipynb.
Execute all cells to convert raw JSON files into structured Excel files (.xlsx).

(3) Data Analysis

For specific analysis tasks:
- Sentiment Analysis: Open and run analysis_emotion.ipynb.
- Keyword and Trend Analysis: Open and run analysis_keyword_and_trend.ipynb.
- Language Distribution Analysis: Open and run analysis_language.ipynb.
- User Analysis: Open and run analysis_user.ipynb.

📊 Outputs

Processed Data

Excel Files:
- Three sheets: Main Data, Comments, and Users.

Visualizations

Charts:
- Sentiment distribution bar charts.
- Keyword trends and word clouds.
- Language distribution pie charts.
- Engagement and sentiment trends over time.

🛠️ Key Tools and Libraries

Data Crawling: Selenium, requests, webdriver-manager
Data Processing: Pandas, JSON, OpenPyXL
Sentiment Analysis: Transformers, OpenAI API
Keyword Extraction: SpaCy, TF-IDF, TextRank
Statistical Analysis: Matplotlib, Seaborn, WordCloud
Network Analysis: NetworkX, Gephi

💡 Customization

Keywords: Modify keywords directly in the notebook files to crawl specific topics.
Sentiment Analysis: Adjust the sentiment analysis prompt in analysis_emotion.ipynb for tailored analysis.
Visualizations: Modify chart styles and layouts in the analysis notebooks.

📜 License

This project is open-sourced under the MIT License. See the LICENSE file for details.

🙏 Acknowledgments

Thanks to hanxinkong and CX330Blake for their Twitter crawling tools, as well as Stopwords ISO, which formed the foundation of this project.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
codes		codes
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Data Analysis Tool

📚 Overview

🛠️ Features

1. Twitter Data Crawling

2. Data Cleaning and Preprocessing

3. Sentiment Analysis

4. Language and Cultural Background Analysis

5. Keyword and Topic Analysis

6. Statistical and Engagement Analysis

7. Network and Co-occurrence Analysis

8. Automated Report Generation

📋 Quick Start

1. Clone the Repository

2. Install Dependencies

3. Running the Pipeline

(1) Data Crawling

(2) Data Preprocessing

(3) Data Analysis

📊 Outputs

Processed Data

Visualizations

🛠️ Key Tools and Libraries

💡 Customization

📜 License

🙏 Acknowledgments

About

Releases

Packages

Languages

License

AquariniqueMu/TwitterCrawl4Analysis

Folders and files

Latest commit

History

Repository files navigation

Twitter Data Analysis Tool

📚 Overview

🛠️ Features

1. Twitter Data Crawling

2. Data Cleaning and Preprocessing

3. Sentiment Analysis

4. Language and Cultural Background Analysis

5. Keyword and Topic Analysis

6. Statistical and Engagement Analysis

7. Network and Co-occurrence Analysis

8. Automated Report Generation

📋 Quick Start

1. Clone the Repository

2. Install Dependencies

3. Running the Pipeline

(1) Data Crawling

(2) Data Preprocessing

(3) Data Analysis

📊 Outputs

Processed Data

Visualizations

🛠️ Key Tools and Libraries

💡 Customization

📜 License

🙏 Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages