Skip to content

Commit

Permalink
Rank: Supervise and Unsurpervise
Browse files Browse the repository at this point in the history
  • Loading branch information
wzzzd committed Jan 17, 2023
1 parent cc17e10 commit d07822a
Show file tree
Hide file tree
Showing 7 changed files with 54,185 additions and 10 deletions.
2 changes: 1 addition & 1 deletion FAQ.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from Module.PreRank import PreRank
from Module.RankUnsupervise import RankUnsupervise
from Module.RankSupervise import RankSupervise
from utils.Logger import init_logger
from Utils.Logger import init_logger


class FAQ(object):
Expand Down
2 changes: 1 addition & 1 deletion Module/PreRank.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from Module.Word2Vec.Word2Vec import W2V
# from Module.Word2Vec.train import W2V
from Module.LM.LMEmbedding import LMEmbedding
from utils.Logger import init_logger
from Utils.Logger import init_logger


class PreRank(object):
Expand Down
2 changes: 1 addition & 1 deletion Module/RankSupervise.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from Module.Model.Bert import Bert
from Module.Model.Distilbert import Distilbert
from Module.LM.LMEmbedding import LMEmbedding
from utils.Logger import init_logger
from Utils.Logger import init_logger


class RankSupervise(object):
Expand Down
2 changes: 1 addition & 1 deletion Module/RankUnsupervise.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import torch
from transformers import AutoTokenizer, AutoModel
from Module.LM.LMEmbedding import LMEmbedding
from utils.Logger import init_logger
from Utils.Logger import init_logger


class RankUnsupervise(object):
Expand Down
16 changes: 13 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ label: 1

1.保险行业语料
- 来自项目[baoxianzhidao_filter](https://github.com/SophonPlus/ChineseNlpCorpus/blob/master/datasets/baoxianzhidao/intro.ipynb)
- 根据`is_best=1`筛选出回答正确的数据,获取其中的`title``reply`字段,处理成两个数据集
- 根据`is_best=1`筛选出回答正确的数据,获取其中的`title``reply`字段,处理成两个数据集,位于目录`data/insurance_zhidao_test/`
- `corpus.txt`: 语料库,包含`question``answer`两个字段。
- `question`:与原始文件的`title`字段对应
- `answer`:与原始文件的`reply`对应。
Expand Down Expand Up @@ -184,14 +184,24 @@ $ python insert_data_to_es.py
若想直接使用本数据集训练的模型参数,可直接下载模型文件[rank-bert](https://pan.baidu.com/s/1B51WcVrjxRRRPVcqg4-dwg),密码:tal1。并将下载的所有文件(非文件夹)放在目录`file/supervise/bert/`下。


### 4.FAQ问答
### 4.训练无监督的语义表征模型SimCSE(可选)
若配置文件`config.py`中,字段`use_supervise=False`,则表示在rank阶段,使用有无监督的方法来实现。字段`unsup_rank_name=simcse-bert`时,表示使用SimCSE训练的预训练模型来进行句子语义提取。

具体的无监督SimCSE模型及训练,可参考论文源码[SimCSE](https://github.com/princeton-nlp/SimCSE)

本项目提供了已经处理好的保险行业的无监督训练数据集,可参考`data/insurance_zhidao_unsup/corpus.txt`

若想直接使用本数据集训练的模型参数,可直接下载模型文件[simcse-unsup-bert](),密码:。并将下载的所有文件(非文件夹)放在目录`file/unsupervise/simcse_bert/`下。


### 5.FAQ问答
直接测试FAQ效果
```
$ python FAQ.py
```


### 5.部署FAQ问答服务
### 6.部署FAQ问答服务
可以将FAQ部署成一个Web服务。

**Step1:启动FAQ问答Web服务。**
Expand Down
6 changes: 3 additions & 3 deletions Service.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@
from tornado.options import define, options

from FAQ import FAQ
from utils.Logger import init_logger
from utils.DateOption import get_date
from utils.FileOp import File
from Utils.Logger import init_logger
from Utils.DateOption import get_date
from Utils.FileOp import File
from Config import Config


Expand Down
54,165 changes: 54,165 additions & 0 deletions data/insurance_zhidao_unsup/corpus.txt

Large diffs are not rendered by default.

0 comments on commit d07822a

Please sign in to comment.