Skip to content

Commit

Permalink
ADD:增加抖音快手爬虫
Browse files Browse the repository at this point in the history
  • Loading branch information
lishilong committed May 24, 2024
0 parents commit 20519e9
Show file tree
Hide file tree
Showing 51 changed files with 2,133 additions and 0 deletions.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.venv/
__pycache__/
.log/
*.db
*.DS_Store
16 changes: 16 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
FROM python:3

WORKDIR /app

COPY . .

RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash -
RUN apt-get install -y nodejs

RUN pip3 install -r requirements.txt

ENV THREADS=4

EXPOSE 8080

CMD gunicorn -c config/gunicorn.conf.py -w $THREADS -b :8080 main:app
33 changes: 33 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
非商业使用许可证 1.0

版权所有 (c) [2024] [[email protected]]

鉴于:

1. 版权所有者拥有和控制本软件和相关文档文件(以下简称“软件”)的版权;
2. 使用者希望使用该软件;
3. 版权所有者愿意在本许可证所述的条件下授权使用者使用该软件;

现因此,双方遵循相关法律法规,同意如下条款:

授权范围:

1. 版权所有者特此免费授予接受本许可证的任何自然人或法人(以下简称“使用者”)非独占的、不可转让的权利,在非商业目的下使用、复制、修改、合并本软件,前提是遵守以下条件。

条件:

1. 使用者必须在软件及其副本的所有合理显著位置包含上述版权声明和本许可证声明。
2. 本软件不得用于任何商业目的,包括但不限于销售、营利或商业竞争。
3. 未经版权所有者书面同意,不得将本软件用于任何商业用途。

免责声明:

1. 本软件按“现状”提供,不提供任何形式的明示或暗示保证,包括但不限于对适销性、特定用途的适用性和非侵权的保证。
2. 在任何情况下,版权所有者均不对因使用本软件而产生的,或在任何方式上与本软件有关的任何直接、间接、偶然、特殊、示例性或后果性损害负责(包括但不限于采购替代品或服务;使用、数据或利润的损失;或业务中断),无论这些损害是如何引起的,以及无论是通过合同、严格责任还是侵权行为(包括疏忽或其他方式)产生的,即使已被告知此类损害的可能性。

适用法律:

1. 本许可证的解释和执行应遵循当地法律法规。
2. 因本许可证引起的或与之相关的任何争议,双方应友好协商解决;协商不成时,任何一方可将争议提交至版权所有者所在地的人民法院诉讼解决。

本许可证构成双方之间关于本软件的完整协议,取代并合并以前的讨论、交流和协议,无论是口头还是书面的。
16 changes: 16 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
.PHONY: venv
venv:
python3 -m venv .venv

.PHONY: install
install:
. .venv/bin/activate; pip3 install -r requirements.txt

.PHONY: clean
clean:
rm -rf .venv

port ?= 8080
thread ?= 4
run: venv install
. .venv/bin/activate; .venv/bin/gunicorn -c config/gunicorn.conf.py -w $(thread) -b :$(port) main:app
5 changes: 5 additions & 0 deletions config/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
logger:
level: INFO
backupcount: 144
format: "[%(asctime)s][%(name)s][%(levelname)s]: %(message)s"
path: .log/crawler.log
8 changes: 8 additions & 0 deletions config/gunicorn.conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
import os

if not os.path.exists('.log'):
os.makedirs('.log')

accesslog = '.log/access.log'
errorlog = '.log/error.log'
loglevel = 'info'
Empty file added data/douyin/.gitkeep
Empty file.
15 changes: 15 additions & 0 deletions data/driver.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import sqlite3
from sqlalchemy.pool import QueuePool

class SqliteStore:
def __init__(self, db_path, pool_size=5):
self.db_path = db_path
self._connection_pool = QueuePool(self._connect, max_overflow=0, pool_size=pool_size)

def _connect(self):
conn = sqlite3.connect(self.db_path)
conn.row_factory = sqlite3.Row # 结果以字典形式返回
return conn

def _get_connection(self):
return self._connection_pool.connect()
Empty file added data/kuaishou/.gitkeep
Empty file.
174 changes: 174 additions & 0 deletions docs/api/douyin/douyin.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# API 文档

## 抖音

### 添加账号

- **URL**

`/douyin/add_account`

- **Method**

`POST`

- **URL Params**

None

- **Data Params**

**Required:**

`"id"=[string]` - 账号id, 可填写账号昵称、id等,主要用于区分管理cookie。

`"cookie"=[string]` - 抖音cookie

- **Success Response**

- **Code:** 200
- **Content:** `{ "code" : 0, "data" : null, "msg" : "OK" }`

### 获取账号列表

- **URL**

`/douyin/account_list`

- **Method**

`GET`

- **URL Params**

None

- **Data Params**

None

- **Success Response**

- **Code:** 200
- **Content:** `{ "code" : 0, "data" : {}, "msg" : "OK" }`

### 获取视频详情

- **URL**

`/douyin/detail`

- **Method**

`GET`

- **URL Params**

**Required:**

`"id"=[string]` - 抖音视频id

- **Data Params**

None

- **Success Response**

- **Code:** 200
- **Content:** `{ "code" : 0, "data" : {}, "msg" : "OK" }`

### 获取视频评论

- **URL**

`/douyin/comments`

- **Method**

`GET`

- **URL Params**

**Required:**

`"id"=[string]` - 抖音视频id

**Optional:**

`"offset"=[string]` - 评论翻页偏移量, 默认0

`"limit"=[string]` - 评论数量, 默认20

- **Data Params**

None

- **Success Response**

- **Code:** 200
- **Content:** `{ "code" : 0, "data" : {}, "msg" : "OK" }`

### 获取评论回复

- **URL**

`/douyin/replys`

- **Method**

`GET`

- **URL Params**

**Required:**

`"video_id"=[string]` - 抖音视频id

`"comment_id"=[string]` - 抖音评论id

**Optional:**

`"offset"=[string]` - 评论翻页偏移量, 默认0

`"limit"=[string]` - 评论数量, 默认20

- **Data Params**

None

- **Success Response**

- **Code:** 200
- **Content:** `{ "code" : 0, "data" : {}, "msg" : "OK" }`

### 关键词搜索视频

- **URL**

`/douyin/search`

- **Method**

`GET`

- **URL Params**

**Required:**

`"keyword"=[string]` - 搜索关键词

**Optional:**

`"offset"=[string]` - 搜索结果翻页偏移量, 默认0

`"limit"=[string]` - 搜索结果数量, 默认10

- **Data Params**

None

- **Success Response**

- **Code:** 200
- **Content:** `{ "code" : 0, "data" : {}, "msg" : "OK" }`

Loading

0 comments on commit 20519e9

Please sign in to comment.