Skip to content

Commit

Permalink
init: create project
Browse files Browse the repository at this point in the history
  • Loading branch information
kallydev committed Nov 28, 2020
0 parents commit a970ad7
Show file tree
Hide file tree
Showing 83 changed files with 24,196 additions and 0 deletions.
12 changes: 12 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
.idea
.vscode

/website/node_modules
/website/.eslintcache
/website/build
/website/.DS_Store
/website/.env.local
/website/.env.development.local
/website/.env.test.local
/website/.env.production.local

22 changes: 22 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
MIT License

Copyright (c) 2020 KallyDev

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

166 changes: 166 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
# Privacy

![GitHub last commit](https://img.shields.io/github/last-commit/kallydev/privacy?style=flat-square)
![GitHub commit activity](https://img.shields.io/github/commit-activity/m/kallydev/privacy?style=flat-square)
![GitHub license](https://img.shields.io/github/license/kallydev/privacy?style=flat-square)

个人数据泄漏检测网站,适用于近期流传的 40GB+ 数据。

## 示例截图

![screenshot](screenshot/screenshot.png)

可以前往预览 [示例网站](https://privacy.kallydev.com/)(暂未部署最新版本)。

## 使用方法

### 导入数据

数据来源于近期流传的 40GB+ 的压缩包,目前已支持 QQ / JD / SF 的多表查询。

1. 创建 SQLite 数据库

```bash
sqlite3 database.db
```

分别执行以下 SQL 语句,用于创建 QQ / 京东 / 顺丰数据表。

```sql
CREATE TABLE IF NOT EXISTS qq
(
id BIGINT,
qq_number BIGINT,
phone_number INT
);
```

```sql
CREATE TABLE IF NOT EXISTS jd
(
id BIGINT,
name TEXT,
nickname TEXT,
password TEXT,
email TEXT,
id_number TEXT,
phone_number INT
);
```

```sql
CREATE TABLE IF NOT EXISTS sf
(
id BIGINT,
name TEXT,
phone_number INT,
address TEXT
);
```

2. 导入 QQ 库

`6.9更新总库.txt` 文件放到 `database` 目录下,然后执行 `qq.py`

3.导入京东库

`www_jd_com_12g.txt` 文件放到 `database` 目录下,然后执行 `jd.py`

- 创建索引

```bash
sqlite3 database.db
```

```sql
CREATE INDEX index_qq ON qq (qq, phone);
CREATE INDEX index_jd ON jd (email, id_number, phone_number, phone_number);
```

4. 导入顺丰库

还没来得及写,欢迎 PR 或者等我明天再写。

### 编译代码

1. 安装 Yarn

```bash
npm install -g yarn
```

2. 安装 Golang

```bash
sudo apt install -y snap
sudo snap install golang --classic
```

3. 下载源代码

```bash
git clone http://github.com/kallydev/privacy
```

4. 编译前端

```bash
cd privacy
yarn install
yarn build
```

5. 编译后端

```bash
cd ../server
go build -o app main/main.go
```

### 运行

修改 `config.yaml` 配置文件,然后直接运行后端。

```bash
./app --config config.yaml
```

## TODO

- [ ] 编译 Docker 镜像
- [ ] 取模分表
- [ ] 微博账号和手机号关联查询
- [ ] 重构所有导入脚本以及编写微博和顺丰的导入脚本
- [ ] 自动加载支持的数据表
- [ ] 示例网站支持以上新的功能

## Q&A

### 1. 为什么代码和文档都写的这么生草?

我当时只是随口说了一个时间,结果才发现时间安排得有亿点紧,于是就放飞自我了。之后会逐步进行重构,**同时也欢迎发起 PR**

### 2. 部署或使用遇到问题如何解决?

1. 在这个 Repo 发起 Issues,空余时间我会协助你解决。
2. 把错误信息粘贴到 `https://stackoverflow.com/search?q=` 这个链接后面,然后浏览器打开。
3. 因为个人并不喜欢回复 PM,所以 Telegram 之类问我问题的不太可能会回复。
4. 通往罗马的道路千万条慢,自己努力吧少年。

### 3. 为什么示例网站只支持 QQ 和手机号关系查询?

示例服务器的硬盘不够,而且这些大文件传输特别麻烦,先搁置一段时间。

### 4. 为什么导入脚本会提示出现无效数据?

因为源数据的格式实在是太乱了,存在大量错排。脚本会自动忽略这些解析失败的数据。

### 5. 为什么不提供数据库文件?

众所周知传播这些数据属于违法行为,所以这个项目不提供相关数据。

## License

Copyright (c) KallyDev. All rights reserved.

Licensed under the [MIT](LICENSE).
16 changes: 16 additions & 0 deletions config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
database:
path: ../database/database.db
tables:
qq: true
jd: true
sf: false # Not support
wb: false # Not support

http:
host: 0.0.0.0
port: 80
# tls:
# cert_path: server.crt
# key_path: server.key

mask: true
Binary file added screenshot/screenshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 25 additions & 0 deletions scripts/database/create_database.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
CREATE TABLE IF NOT EXISTS qq
(
id BIGINT,
qq_number BIGINT,
phone_number INT
);

CREATE TABLE IF NOT EXISTS jd
(
id BIGINT,
name TEXT,
nickname TEXT,
password TEXT,
email TEXT,
id_number TEXT,
phone_number INT
);

CREATE TABLE IF NOT EXISTS sf
(
id BIGINT,
name TEXT,
phone_number INT,
address TEXT
);
3 changes: 3 additions & 0 deletions scripts/database/create_index.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
CREATE INDEX index_qq ON qq (qq, phone);
CREATE INDEX index_jd ON jd (email, id_number, phone_number, phone_number);
CREATE INDEX index_sf ON sf (phone_number);
111 changes: 111 additions & 0 deletions scripts/jd.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
import logging
import sqlite3
import threading
import time


class Converter(object):
def __init__(self, database_path, file_path):
logging.basicConfig(
level=logging.DEBUG,
format="%(asctime)s %(levelname)s %(message)s",
datefmt='%Y-%m-%d %H:%M:%S',
)
self.logger = logging.getLogger()
self.database_connection = None
self.database_path = database_path
self.file_path = file_path
self.file_rows = 0
self.handle_total = 0
self.handle_invalid = 0
self.handle_queue = 0
self.cancel_print_insertion_speed = None

def connect_database(self):
self.database_connection = sqlite3.connect(self.database_path)

def close_database(self):
self.database_connection.close()

def insert(self, id, name, nickname, password, email, id_number, phone_number):
cursor = self.database_connection.cursor()
try:
cursor.execute(
"INSERT INTO jd VALUES (?, ?, ?, ?, ?, ?, ?);",
(id, name, nickname, password, email, id_number, phone_number)
)
except sqlite3.IntegrityError:
self.handle_invalid += 1
finally:
self.handle_total += 1
self.handle_queue += 1
pass

def start_insertion_speed(self):
event = threading.Event()

def print_insertion_speed():
handle_total = self.handle_total
while not event.wait(1):
if self.handle_total - handle_total == 0:
continue
self.logger.info("{}/s, {}/{} progress, {} rows are invalid, {} seconds left".format(
self.handle_total - handle_total,
self.handle_total,
self.file_rows,
self.handle_invalid,
(self.file_rows - self.handle_total) / (self.handle_total - handle_total),
))
handle_total = self.handle_total

threading.Thread(target=print_insertion_speed).start()
return event.set

def start(self):
# Get the number of file rows
self.logger.info("start scanning file lines")
start_time = time.time()
with open(self.file_path) as file:
self.file_rows = 0
for _ in file:
self.file_rows += 1
end_time = time.time()
self.logger.info("scan completed, there are a total of {} lines, and it taken {} seconds".format(
self.file_rows,
end_time - start_time,
))
# Insert QQ and phone numbers
self.connect_database()
self.cancel_print_insertion_speed = self.start_insertion_speed()
with open(self.file_path) as file:
for line in file:
try:
dataset = line.strip().split("---")
name = dataset[0]
nickname = dataset[1]
password = dataset[2]
email = dataset[3]
id_number = dataset[4]
phone_number = dataset[5]
except IndexError:
self.handle_invalid += 1
pass
finally:
self.handle_total += 1
self.insert(self.handle_total, name, nickname, password, email, id_number, phone_number)
if self.handle_queue >= 400000:
self.database_connection.commit()
self.handle_queue = 0
self.database_connection.commit()
self.cancel_print_insertion_speed()
self.close_database()
self.logger.info("completed, insert {} rows, {} rows of invalid data".format(
self.handle_total,
self.handle_invalid,
))
exit()


if __name__ == '__main__':
converter = Converter("database/database.db", "www_jd_com_12g.txt")
converter.start()
Loading

0 comments on commit a970ad7

Please sign in to comment.