Skip to content

Commit

Permalink
refine demo dataprovider and some tiny fix
Browse files Browse the repository at this point in the history
ISSUE=4597359 

git-svn-id: https://svn.baidu.com/idl/trunk/paddle@1432 1ad973e4-5ce8-4261-8a94-b56d1f490c56
  • Loading branch information
luotao02 committed Aug 30, 2016
1 parent 13f4602 commit 3e87021
Show file tree
Hide file tree
Showing 10 changed files with 65 additions and 49 deletions.
14 changes: 7 additions & 7 deletions demo/semantic_role_labeling/dataprovider.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@ def hook(settings, word_dict, label_dict, **kwargs):
settings.label_dict = label_dict
#all inputs are integral and sequential type
settings.slots = [
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE),
integer_value(2, seq_type=SequenceType.SEQUENCE),
integer_value(len(label_dict), seq_type=SequenceType.SEQUENCE)]
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(2),
integer_value_sequence(len(label_dict))]


@provider(init_hook=hook)
Expand Down
2 changes: 1 addition & 1 deletion demo/sentiment/dataprovider.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
def hook(settings, dictionary, **kwargs):
settings.word_dict = dictionary
settings.input_types = [
integer_value(len(settings.word_dict), seq_type=SequenceType.SEQUENCE),
integer_value_sequence(len(settings.word_dict)),
integer_value(2)]
settings.logger.info('dict len : %d' % (len(settings.word_dict)))

Expand Down
17 changes: 5 additions & 12 deletions demo/seqToseq/dataprovider.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,22 +30,15 @@ def hook(settings, src_dict, trg_dict, file_list, **kwargs):
if settings.job_mode:
settings.trg_dict = trg_dict
settings.slots = [
integer_value(
len(settings.src_dict),
seq_type=SequenceType.SEQUENCE), integer_value(
len(settings.trg_dict),
seq_type=SequenceType.SEQUENCE), integer_value(
len(settings.trg_dict),
seq_type=SequenceType.SEQUENCE)
integer_value_sequence(len(settings.src_dict)),
integer_value_sequence(len(settings.trg_dict)),
integer_value_sequence(len(settings.trg_dict))
]
settings.logger.info("trg dict len : %d" % (len(settings.trg_dict)))
else:
settings.slots = [
integer_value(
len(settings.src_dict),
seq_type=SequenceType.SEQUENCE), integer_value(
len(open(file_list[0], "r").readlines()),
seq_type=SequenceType.SEQUENCE)
integer_value_sequence(len(settings.src_dict)),
integer_value_sequence(len(open(file_list[0], "r").readlines()))
]


Expand Down
Binary file modified doc/demo/quick_start/NetRNN_en.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
17 changes: 9 additions & 8 deletions doc/demo/quick_start/index_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ Performance summary: You can refer to the training and testing scripts later. In
<br>

### Word Embedding Model
In order to use the word embedding model, you need to change the data provider a little bit to make the input words as a sequence of word IDs. The revised data provider is listed below. You only need to change initializer() for the type of the first input. It is changed from sparse_binary_vector to sequence of intergers. process() remains the same. This data provider can also be used for later sequence models.
In order to use the word embedding model, you need to change the data provider a little bit to make the input words as a sequence of word IDs. The revised data provider `dataprovider_emb.py` is listed below. You only need to change initializer() for the type of the first input. It is changed from sparse_binary_vector to sequence of intergers. process() remains the same. This data provider can also be used for later sequence models.

```python
def initializer(settings, dictionary, **kwargs):
Expand Down Expand Up @@ -260,7 +260,7 @@ avg = pooling_layer(input=emb, pooling_type=AvgPooling())

The other parts of the model are the same as logistic regression network.

The performance is summarized in the following table:
The performance is summarized in the following table:

<html>
<center>
Expand Down Expand Up @@ -400,7 +400,7 @@ If you want to install the remote training platform, which enables distributed t
You can use the trained model to perform prediction on the dataset with no labels. You can also evaluate the model on dataset with labels to obtain its test accuracy.
<center> ![](./PipelineTest_en.png) </center>

The test script (test.sh) is listed below. PaddlePaddle can evaluate a model on the data with labels specified in `test.list`.
The test script is listed below. PaddlePaddle can evaluate a model on the data with labels specified in `test.list`.

```bash
paddle train \
Expand Down Expand Up @@ -497,11 +497,12 @@ The scripts of data downloading, network configurations, and training scrips are
## Appendix
### Command Line Argument

* --config:network architecture path.
* --save_dir:model save directory.
* --log_period:the logging period per batch.
* --num_passes:number of training passes. One pass means the training would go over the whole training dataset once.* --config_args:Other configuration arguments.
* --init_model_path:The path of the initial model parameter.
* \--config:network architecture path.
* \--save_dir:model save directory.
* \--log_period:the logging period per batch.
* \--num_passes:number of training passes. One pass means the training would go over the whole training dataset once.
* \--config_args:Other configuration arguments.
* \--init_model_path:The path of the initial model parameter.

By default, the trainer will save model every pass. You can also specify `saving_period_by_batches` to set the frequency of batch saving. You can use `show_parameter_stats_period` to print the statistics of the parameters, which are very useful for tuning parameters. Other command line arguments can be found in <a href = "../../ui/index.html#command-line-argument">command line argument documentation</a>。

Expand Down
17 changes: 8 additions & 9 deletions doc/demo/semantic_role_labeling/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,15 +71,14 @@ def hook(settings, word_dict, label_dict, **kwargs):
settings.word_dict = word_dict
settings.label_dict = label_dict
#all inputs are integral and sequential type
settings.slots = [
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE),
integer_value(2, seq_type=SequenceType.SEQUENCE),
integer_value(len(label_dict), seq_type=SequenceType.SEQUENCE)]```
settings.slots = [
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(2),
integer_value_sequence(len(label_dict))]
```
The corresponding data iterator is as following:
```
Expand Down
4 changes: 4 additions & 0 deletions doc/layer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Layer Documents

* [Layer Source Code Document](source/gserver/layers/index.rst)
* [Layer Python API Document](ui/api/trainer_config_helpers/layers_index.rst)
15 changes: 14 additions & 1 deletion doc/source/gserver/layers/layer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -510,11 +510,24 @@ NCELayer
.. doxygenclass:: paddle::NCELayer
:members:

Validation Layers
-----------------

ValidationLayer
---------------
```````````````
.. doxygenclass:: paddle::ValidationLayer
:members:

AucValidation
`````````````
.. doxygenclass:: paddle::AucValidation
:members:

PnpairValidation
````````````````
.. doxygenclass:: paddle::PnpairValidation
:members:

Check Layers
============

Expand Down
7 changes: 7 additions & 0 deletions doc/ui/api/trainer_config_helpers/activations_index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Activations
===========

.. toctree::
:maxdepth: 3

activations.rst
21 changes: 10 additions & 11 deletions doc_cn/demo/quick_start/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,17 +207,16 @@ classification_cost(input=output, label=label)

### 词向量模型(Word Vector)

embeding模型需要稍微改变数据提供的脚本,即`dataprovider_emb.py`,词向量模型、
卷积模型、时序模型均使用该脚
- 文本输入类型定义为整数类型integer_value
- 设置文本输入类型seq_type为SequenceType.SEQUENCE
embedding模型需要稍微改变数据提供的脚本,即`dataprovider_emb.py`,词向量模型、
卷积模型、时序模型均使用该脚本。其中文本输入类型定义为整数时序类型integer_value_sequence。

```
def initializer(settings, dictionary, **kwargs):
settings.word_dict = dictionary
settings.input_types = [
# Define the type of the first input as sequence of integer.
integer_value(len(dictionary), seq_type=SequenceType.SEQUENCE),
# The value of the integers range from 0 to len(dictrionary)-1
integer_value_sequence(len(dictionary)),
# Define the second input for label id
integer_value(2)]
Expand Down Expand Up @@ -479,12 +478,12 @@ else:
## 附录(Appendix)
### 命令行参数(Command Line Argument)

* --config:网络配置
* --save_dir:模型存储路径
* --log_period:每隔多少batch打印一次日志
* --num_passes:训练轮次,一个pass表示过一遍所有训练样本
* --config_args:命令指定的参数会传入网络配置中。
* --init_model_path:指定初始化模型路径,可用在测试或训练时指定初始化模型。
* \--config:网络配置
* \--save_dir:模型存储路径
* \--log_period:每隔多少batch打印一次日志
* \--num_passes:训练轮次,一个pass表示过一遍所有训练样本
* \--config_args:命令指定的参数会传入网络配置中。
* \--init_model_path:指定初始化模型路径,可用在测试或训练时指定初始化模型。

默认一个pass保存一次模型,也可以通过saving_period_by_batches设置每隔多少batch保存一次模型。
可以通过show_parameter_stats_period设置打印参数信息等。
Expand Down

0 comments on commit 3e87021

Please sign in to comment.