Skip to content

Commit

Permalink
Merge pull request charent#11 from charent/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
charent authored Jan 11, 2024
2 parents f67783b + a779b74 commit b7f4e0d
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 7 deletions.
8 changes: 4 additions & 4 deletions README.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ ChatLM-mini-Chinese is a small Chinese chat model with only 0.2B (added shared w
🟢**Latest Update**

<summary> <b>2024-01-07</b> </summary>
- Add document deduplication based on mini hash during the data cleaning process (in this project, the samples of the data set are actually deduplicated). Prevent the model from spitting out training data during inference after encountering multiple repeated data. <br/>
- Add document deduplication based on mini hash during the data cleaning process (in this project, it's to deduplicated the rows of datasets actually). Prevent the model from spitting out training data during inference after encountering multiple repeated data. <br/>
- Add the `DropDatasetDuplicate` class to implement deduplication of documents from large data sets. <br/>
</details>

Expand Down Expand Up @@ -334,15 +334,15 @@ pythondpo_train.py
```

## 3.7 Infering
Make sure there are the following files in the `model_save` directory:
Make sure there are the following files in the `model_save` directory, These files can be found in the `Hugging Face Hub` repository [ChatLM-Chinese-0.2B](https://huggingface.co/charent/ChatLM-mini-Chinese)::
```bash
ChatLM-mini-Chinese
├─model_save
| ├─chat_model.py
| ├─chat_model_config.py
| ├─config.json
| ├─configuration_chat_model.py
| ├─generation_config.json
| ├─model.safetensors
| ├─modeling_chat_model.py
| ├─special_tokens_map.json
| ├─tokenizer.json
| └─tokenizer_config.json
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -344,15 +344,15 @@ python dpo_train.py
```

## 3.7 推理
确保`model_save`目录下有以下文件:
确保`model_save`目录下有以下文件,这些文件都可以在`Hugging Face Hub`仓库[ChatLM-Chinese-0.2B](https://huggingface.co/charent/ChatLM-mini-Chinese)中找到
```bash
ChatLM-mini-Chinese
├─model_save
| ├─chat_model.py
| ├─chat_model_config.py
| ├─config.json
| ├─configuration_chat_model.py
| ├─generation_config.json
| ├─model.safetensors
| ├─modeling_chat_model.py
| ├─special_tokens_map.json
| ├─tokenizer.json
| └─tokenizer_config.json
Expand Down

0 comments on commit b7f4e0d

Please sign in to comment.