Skip to content

Latest commit

 

History

History
156 lines (118 loc) · 5.58 KB

Configuration.md

File metadata and controls

156 lines (118 loc) · 5.58 KB

Configuration of NeuralClassifier uses JSON.

Common

  • task_info
    • label_type: Candidates: "single_label", "multi_label".
    • hierarchical: Boolean. Indicates whether it is a hierarchical classification.
    • hierar_taxonomy: A text file describes taxonomy.
    • hierar_penalty: Float.
  • device: Candidates: "cuda", "cpu".
  • model_name: Candidates: "FastText", "TextCNN", "TextRNN", "TextRCNN", "DRNN", "VDCNN", "DPCNN", "AttentiveConvNet", "Transformer".
  • checkpoint_dir: checkpoint directory
  • model_dir: model directory
  • data
    • train_json_files: train input data.
    • validate_json_files: validation input data.
    • test_json_files: test input data.
    • generate_dict_using_json_files: generate dict using train data.
    • generate_dict_using_all_json_files: generate dict using train, validate, test data.
    • generate_dict_using_pretrained_embedding: generate dict from pre-trained embedding.
    • dict_dir: dict directory.
    • num_worker: number of porcess to load data.

Feature

  • feature_names: Candidates: "token", "char".
  • min_token_count
  • min_char_count
  • token_ngram: N-Gram, for example, 2 means bigram.
  • min_token_ngram_count
  • min_keyword_count
  • min_topic_count
  • max_token_dict_size
  • max_char_dict_size
  • max_token_ngram_dict_size
  • max_keyword_dict_size
  • max_topic_dict_size
  • max_token_len
  • max_char_len
  • max_char_len_per_token
  • token_pretrained_file: token pre-trained embedding.
  • keyword_pretrained_file: keyword pre-trained embedding.

Train

  • batch_size
  • eval_train_data: whether evaluate training data when training.
  • start_epoch: start number of epochs.
  • num_epochs: number of epochs.
  • num_epochs_static_embedding: number of epochs that input embedding does not update.
  • decay_steps: decay learning rate every decay_steps.
  • decay_rate: Rate of decay for learning rate.
  • clip_gradients: Clip absolute value gradient bigger than threshold.
  • l2_lambda: l2 regularization lambda value.
  • loss_type: Candidates: "SoftmaxCrossEntropy", "SoftmaxFocalCrossEntropy", "SigmoidFocalCrossEntropy", "BCEWithLogitsLoss".
  • sampler: If loss type is NCE, sampler is needed. Candidate: "fixed", "log", "learned", "uniform".
  • num_sampled: If loss type is NCE, need to sample negative labels.
  • hidden_layer_dropout: dropout of hidden layer.
  • visible_device_list: GPU list to use.

Embedding

  • type: Candidates: "embedding", "region_embedding".
  • dimension: dimension of embedding.
  • region_embedding_type: config for Region embedding. Candidates: "word_context", "context_word".
  • region_size region size, must be odd number. Config for Region embedding.
  • initializer: Candidates: "uniform", "normal", "xavier_uniform", "xavier_normal", "kaiming_uniform", "kaiming_normal", "orthogonal".
  • fan_mode: Candidates: "FAN_IN", "FAN_OUT".
  • uniform_bound: If embedding_initializer is uniform, this param will be used as bound. e.g. [-embedding_uniform_bound,embedding_uniform_bound].
  • random_stddev: If embedding_initializer is random, this param will be used as stddev.
  • dropout: dropout of embedding layer.

Optimizer

  • optimizer_type: Candidates: "Adam", "Adadelta"
  • learning_rate: learning rate.
  • adadelta_decay_rate: useful when optimizer_type is Adadelta.
  • adadelta_epsilon: useful when optimizer_type is Adadelta.

Eval

  • text_file
  • threshold: float trunc threshold for predict probabilities.
  • dir: output dir of evaluation.
  • batch_size: batch size of evaluation.
  • is_flat: Boolean, flat evaluation or hierarchical evaluation.

Log

  • logger_file: log file path.
  • log_level: Candidates: "debug", "info", "warn", "error".

Encoder

TextCNN

  • kernel_sizes: kernel size.
  • num_kernels: number of kernels.
  • top_k_max_pooling: max top-k pooling.

TextRNN

  • hidden_dimension: dimension of hidden layer.
  • rnn_type: Candidates: "RNN", "LSTM", "GRU".
  • num_layers: number of layers.
  • doc_embedding_type: Candidates: "AVG", "Attention", "LastHidden".
  • attention_dimension: dimension of self-attention.
  • bidirectional: Boolean, use Bi-RNNs.

RCNN

see TextCNN and TextRNN

DRNN

  • hidden_dimension: dimension of hidden layer.
  • window_size: window size.
  • rnn_type: Candidates: "RNN", "LSTM", "GRU".
  • bidirectional: Boolean.
  • cell_hidden_dropout

VDCNN

  • vdcnn_depth: depth of VDCNN.
  • top_k_max_pooling: max top-k pooling.

DPCNN

  • kernel_size: kernel size.
  • pooling_stride: stride of pooling.
  • num_kernels: number of kernels.
  • blocks: number of blocks for DPCNN.

AttentiveConvNet

  • attention_type: Candidates: "dot", "bilinear", "additive_projection".
  • margin_size: attentive width, must be odd.
  • type: Candidates: "light", "advanced".
  • hidden_size: size of hidder layer.

Transformer

  • d_inner: dimension of inner nodes.
  • d_k: dimension of key.
  • d_v: dimension fo value.
  • n_head: number of heads.
  • n_layers: number of layers.
  • dropout
  • use_star: whether use Star-Transformer, see Star-Transformer