Skip to content

Commit

Permalink
Dev0.4.0 (fastnlp#149)
Browse files Browse the repository at this point in the history
* 1. CRF增加支持bmeso类型的tag 2. vocabulary中增加注释

* BucketSampler增加一条错误检测

* 1.修改ClipGradientCallback的bug;删除LRSchedulerCallback中的print,之后应该传入pbar进行打印;2.增加MLP注释

* update MLP module

* 增加metric注释;修改trainer save过程中的bug

* Update README.md

fix tutorial link

* Add ENAS (Efficient Neural Architecture Search)

* add ignore_type in DataSet.add_field

* * AutoPadder will not pad when dtype is None
* add ignore_type in DataSet.apply

* 修复fieldarray中padder潜在bug

* 修复crf中typo; 以及可能导致数值不稳定的地方

* 修复CRF中可能存在的bug

* change two default init arguments of Trainer into None

* Changes to Callbacks:
* 给callback添加给定几个只读属性
* 通过manager设置这些属性
* 代码优化,减轻@Transfer的负担

* * 将enas相关代码放到automl目录下
* 修复fast_param_mapping的一个bug
* Trainer添加自动创建save目录
* Vocabulary的打印,显示内容

* * 给vocabulary添加遍历方法

* 修复CRF为负数的bug

* add SQuAD metric

* add sigmoid activate function in MLP

* - add star transformer model
- add ConllLoader, for all kinds of conll-format files
- add JsonLoader, for json-format files
- add SSTLoader, for SST-2 & SST-5
- change Callback interface
- fix batch multi-process when killed
- add README to list models and their performance

* - fix test

* - fix callback & tests

* - update README

* 修改部分bug;调整callback

* 准备发布0.4.0版本“

* update readme

* support parallel loss

* 防止多卡的情况导致无法正确计算loss“

* update advance_tutorial jupyter notebook

* 1. 在embedding_loader中增加新的读取函数load_with_vocab(), load_without_vocab, 比之前的函数改变主要在(1)不再需要传入embed_dim(2)自动判断当前是word2vec还是glove.
2. vocabulary增加from_dataset(), index_dataset()函数。避免需要多行写index dataset的问题。
3. 在utils中新增一个cache_result()修饰器,用于cache函数的返回值。
4. callback中新增update_every属性

* 1.DataSet.apply()报错时提供错误的index
2.Vocabulary.from_dataset(), index_dataset()提供报错时的vocab顺序
3.embedloader在embed读取时遇到不规则的数据跳过这一行.

* update attention

* doc tools

* fix some doc errors

* 修改为中文注释,增加viterbi解码方法

* 样例版本

* - add pad sequence for lstm
- add csv, conll, json filereader
- update dataloader
- remove useless dataloader
- fix trainer loss print
- fix tests

* - fix test_tutorial

* 注释增加

* 测试文档

* 本地暂存

* 本地暂存

* 修改文档的顺序

* - add document

* 本地暂存

* update pooling

* update bert

* update documents in MLP

* update documents in snli

* combine self attention module to attention.py

* update documents on losses.py

* 对DataSet的文档进行更新

* update documents on metrics

* 1. 删除了LSTM中print的内容; 2. 将Trainer和Tester的use_cuda修改为了device; 3.补充Trainer的文档

* 增加对Trainer的注释

* 完善了trainer,callback等的文档; 修改了部分代码的命名以使得代码从文档中隐藏

* update char level encoder

* update documents on embedding.py

* - update doc

* 补充注释,并修改部分代码

* - update doc
- add get_embeddings

* 修改了文档配置项

* 修改embedding为init_embed初始化

* 1.增加对Trainer和Tester的多卡支持;

* - add test
- fix jsonloader

* 删除了注释教程

* 给 dataset 增加了get_field_names

* 修复bug

* - add Const
- fix bugs

* 修改部分注释

* - add model runner for easier test models
- add model tests

* 修改了 docs 的配置和架构

* 修改了核心部分的一大部分文档,TODO:
1. 完善 trainer 和 tester 部分的文档
2. 研究注释样例与测试

* core部分的注释基本检查完成

* 修改了 io 部分的注释

* 全部改为相对路径引用

* 全部改为相对路径引用

* small change

* 1. 从安装文件中删除api/automl的安装
2. metric中存在seq_len的bug
3. sampler中存在命名错误,已修改

* 修复 bug :兼容 cpu 版本的 PyTorch
TODO:其它地方可能也存在类似的 bug

* 修改文档中的引用部分

* 把 tqdm.autonotebook 换成tqdm.auto

* - fix batch & vocab

* 上传了文档文件 *.rst

* 上传了文档文件和若干 TODO

* 讨论并整合了若干模块

* core部分的测试和一些小修改

* 删除了一些冗余文档

* update init files

* update const files

* update const files

* 增加cnn的测试

* fix a little bug

* - update attention
- fix tests

* 完善测试

* 完成快速入门教程

* 修改了sequence_modeling 命名为 sequence_labeling 的文档

* 重新 apidoc 解决改名的遗留问题

* 修改文档格式

* 统一不同位置的seq_len_to_mask, 现统一到core.utils.seq_len_to_mask

* 增加了一行提示

* 在文档中展示 dataset_loader

* 提示 Dataset.read_csv 会被 CSVLoader 替换

* 完成 Callback 和 Trainer 之间的文档

* index更新了部分

* 删除冗余的print

* 删除用于分词的metric,因为有可能引起错误

* 修改文档中的中文名称

* 完成了详细介绍文档

* tutorial 的 ipynb 文件

* 修改了一些介绍文档

* 修改了 models 和 modules 的主页介绍

* 加上了 titlesonly 这个设置

* 修改了模块文档展示的标题

* 修改了 core 和 io 的开篇介绍

* 修改了 modules 和 models 开篇介绍

* 使用 .. todo:: 隐藏了可能被抽到文档中的 TODO 注释

* 修改了一些注释

* delete an old metric in test

* 修改 tutorials 的测试文件

* 把暂不发布的功能移到 legacy 文件夹

* 删除了不能运行的测试

* 修改 callback 的测试文件

* 删除了过时的教程和测试文件

* cache_results 参数的修改

* 修改 io 的测试文件; 删除了一些过时的测试

* 修复bug

* 修复无法通过test_utils.py的测试

* 修复与pytorch1.1中的padsequence的兼容问题; 修改Trainer的pbar

* 1. 修复metric中的bug; 2.增加metric测试

* add model summary

* 增加别名

* 删除encoder中的嵌套层

* 修改了 core 部分 import 的顺序,__all__ 暴露的内容

* 修改了 models 部分 import 的顺序,__all__ 暴露的内容

* 修改了文件名

* 修改了 modules 模块的__all__ 和 import

* fix var runn

* 增加vocab的clear方法

* 一些符合 PEP8 的微调

* 更新了cache_results的例子

* 1. 对callback中indices潜在None作出提示;2.DataSet支持通过List进行index

* 修改了一个typo

* 修改了 README.md

* update documents on bert

* update documents on encoder/bert

* 增加一个fitlog callback,实现与fitlog实验记录

* typo

* - update dataset_loader

* 增加了到 fitlog 文档的链接。

* 增加了 DataSet Loader 的文档

* - add star-transformer reproduction
  • Loading branch information
WillQvQ authored May 22, 2019
1 parent 863a99f commit 881ce01
Show file tree
Hide file tree
Showing 206 changed files with 13,065 additions and 41,275 deletions.
7 changes: 7 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include requirements.txt
include LICENSE
include README.md
prune test/
prune reproduction/
prune fastNLP/api
prune fastNLP/automl
101 changes: 61 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,87 +6,108 @@
![Hex.pm](https://img.shields.io/hexpm/l/plug.svg)
[![Documentation Status](https://readthedocs.org/projects/fastnlp/badge/?version=latest)](http://fastnlp.readthedocs.io/?badge=latest)

FastNLP is a modular Natural Language Processing system based on PyTorch, built for fast development of NLP models.
fastNLP 是一款轻量级的 NLP 处理套件。你既可以使用它快速地完成一个命名实体识别(NER)、中文分词或文本分类任务; 也可以使用他构建许多复杂的网络模型,进行科研。它具有如下的特性:

- 统一的Tabular式数据容器,让数据预处理过程简洁明了。内置多种数据集的DataSet Loader,省去预处理代码。
- 各种方便的NLP工具,例如预处理embedding加载; 中间数据cache等;
- 详尽的中文文档以供查阅;
- 提供诸多高级模块,例如Variational LSTM, Transformer, CRF等;
- 封装CNNText,Biaffine等模型可供直接使用;
- 便捷且具有扩展性的训练器; 提供多种内置callback函数,方便实验记录、异常捕获等。


## 安装指南

fastNLP 依赖如下包:

+ numpy
+ torch>=0.4.0
+ tqdm
+ nltk

其中torch的安装可能与操作系统及 CUDA 的版本相关,请参见 PyTorch 官网 。
在依赖包安装完成的情况,您可以在命令行执行如下指令完成安装

```shell
pip install fastNLP
```


## 内置组件

大部分用于的 NLP 任务神经网络都可以看做由编码(encoder)、聚合(aggregator)、解码(decoder)三种模块组成。


![](./docs/source/figures/text_classification.png)

fastNLP 在 modules 模块中内置了三种模块的诸多组件,可以帮助用户快速搭建自己所需的网络。 三种模块的功能和常见组件如下:

A deep learning NLP model is the composition of three types of modules:
<table>
<tr>
<td><b> module type </b></td>
<td><b> functionality </b></td>
<td><b> example </b></td>
<td><b> 类型 </b></td>
<td><b> 功能 </b></td>
<td><b> 例子 </b></td>
</tr>
<tr>
<td> encoder </td>
<td> encode the input into some abstract representation </td>
<td> 将输入编码为具有具 有表示能力的向量 </td>
<td> embedding, RNN, CNN, transformer
</tr>
<tr>
<td> aggregator </td>
<td> aggregate and reduce information </td>
<td> 从多个向量中聚合信息 </td>
<td> self-attention, max-pooling </td>
</tr>
<tr>
<td> decoder </td>
<td> decode the representation into the output </td>
<td> 将具有某种表示意义的 向量解码为需要的输出 形式 </td>
<td> MLP, CRF </td>
</tr>
</table>

For example:

![](docs/source/figures/text_classification.png)

## Requirements

- Python>=3.6
- numpy>=1.14.2
- torch>=0.4.0
- tensorboardX
- tqdm>=4.28.1

## 完整模型
fastNLP 为不同的 NLP 任务实现了许多完整的模型,它们都经过了训练和测试。

## Resources
你可以在以下两个地方查看相关信息
- [介绍](reproduction/)
- [源码](fastNLP/models/)

- [Tutorials](https://github.com/fastnlp/fastNLP/tree/master/tutorials)
- [Documentation](https://fastnlp.readthedocs.io/en/latest/)
- [Source Code](https://github.com/fastnlp/fastNLP)


## Installation
Run the following commands to install fastNLP package.
```shell
pip install fastNLP
```
## 项目结构

![](./docs/source/figures/workflow.png)

## Project Structure
fastNLP的大致工作流程如上图所示,而项目结构如下:

<table>
<tr>
<td><b> fastNLP </b></td>
<td> an open-source NLP library </td>
</tr>
<tr>
<td><b> fastNLP.api </b></td>
<td> APIs for end-to-end prediction </td>
<td> 开源的自然语言处理库 </td>
</tr>
<tr>
<td><b> fastNLP.core </b></td>
<td> data representation & train/test procedure </td>
<td> 实现了核心功能,包括数据处理组件、训练器、测速器等 </td>
</tr>
<tr>
<td><b> fastNLP.models </b></td>
<td> a collection of NLP models </td>
<td> 实现了一些完整的神经网络模型 </td>
</tr>
<tr>
<td><b> fastNLP.modules </b></td>
<td> a collection of PyTorch sub-models/components/wheels </td>
<td> 实现了用于搭建神经网络模型的诸多组件 </td>
</tr>
<tr>
<td><b> fastNLP.io </b></td>
<td> readers & savers </td>
<td> 实现了读写功能,包括数据读入,模型读写等 </td>
</tr>
</table>

## 参考资源

- [教程](https://github.com/fastnlp/fastNLP/tree/master/tutorials)
- [文档](https://fastnlp.readthedocs.io/en/latest/)
- [源码](https://github.com/fastnlp/fastNLP)



*In memory of @FengZiYjun. May his soul rest in peace. We will miss you very very much!*
7 changes: 7 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

# You can set these variables from the command line.
SPHINXOPTS =
SPHINXAPIDOC = sphinx-apidoc
SPHINXBUILD = sphinx-build
SPHINXPROJ = fastNLP
SOURCEDIR = source
Expand All @@ -12,6 +13,12 @@ BUILDDIR = build
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

apidoc:
$(SPHINXAPIDOC) -efM -o source ../$(SPHINXPROJ)

server:
cd build/html && python -m http.server

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
Expand Down
45 changes: 30 additions & 15 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
#
import os
import sys

sys.path.insert(0, os.path.abspath('../../'))

# -- Project information -----------------------------------------------------
Expand All @@ -23,10 +24,9 @@
author = 'xpqiu'

# The short X.Y version
version = '0.2'
version = '0.4'
# The full version, including alpha/beta/rc tags
release = '0.2'

release = '0.4'

# -- General configuration ---------------------------------------------------

Expand All @@ -42,9 +42,15 @@
'sphinx.ext.viewcode',
'sphinx.ext.autosummary',
'sphinx.ext.mathjax',

'sphinx.ext.todo'
]

autodoc_default_options = {
'member-order': 'bysource',
'special-members': '__init__',
'undoc-members': True,
}

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

Expand All @@ -62,17 +68,16 @@
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
language = "zh_CN"

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path .
exclude_patterns = []
exclude_patterns = ['modules.rst']

# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
Expand All @@ -84,7 +89,10 @@
# further. For a list of options available for each theme, see the
# documentation.
#
# html_theme_options = {}
html_theme_options = {
'collapse_navigation': False,
'titles_only': True
}

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
Expand All @@ -107,22 +115,21 @@
# Output file base name for HTML help builder.
htmlhelp_basename = 'fastNLPdoc'


# -- Options for LaTeX output ------------------------------------------------

latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#
# 'papersize': 'letterpaper',

# The font size ('10pt', '11pt' or '12pt').
#
# 'pointsize': '10pt',

# Additional stuff for the LaTeX preamble.
#
# 'preamble': '',

# Latex figure (float) alignment
#
# 'figure_align': 'htbp',
Expand All @@ -136,7 +143,6 @@
'xpqiu', 'manual'),
]


# -- Options for manual page output ------------------------------------------

# One entry per manual page. List of tuples
Expand All @@ -146,7 +152,6 @@
[author], 1)
]


# -- Options for Texinfo output ----------------------------------------------

# Grouping the document tree into Texinfo files. List of tuples
Expand All @@ -159,4 +164,14 @@
]


# -- Extension configuration -------------------------------------------------
# -- Extension configuration -------------------------------------------------
def maybe_skip_member(app, what, name, obj, skip, options):
if name.startswith("_"):
return True
if obj.__doc__ is None:
return True
return False


def setup(app):
app.connect('autodoc-skip-member', maybe_skip_member)
36 changes: 0 additions & 36 deletions docs/source/fastNLP.api.rst

This file was deleted.

7 changes: 7 additions & 0 deletions docs/source/fastNLP.core.batch.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
fastNLP.core.batch
==================

.. automodule:: fastNLP.core.batch
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/source/fastNLP.core.callback.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
fastNLP.core.callback
=====================

.. automodule:: fastNLP.core.callback
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/source/fastNLP.core.const.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
fastNLP.core.const
==================

.. automodule:: fastNLP.core.const
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/source/fastNLP.core.dataset.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
fastNLP.core.dataset
====================

.. automodule:: fastNLP.core.dataset
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/source/fastNLP.core.field.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
fastNLP.core.field
==================

.. automodule:: fastNLP.core.field
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/source/fastNLP.core.instance.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
fastNLP.core.instance
=====================

.. automodule:: fastNLP.core.instance
:members:
:undoc-members:
:show-inheritance:
Loading

0 comments on commit 881ce01

Please sign in to comment.