Skip to content

Latest commit

 

History

History
236 lines (191 loc) · 17.8 KB

config_en.md

File metadata and controls

236 lines (191 loc) · 17.8 KB

Configuration

1. Optional Parameter List

The following list can be viewed through --help

FLAG Supported script Use Defaults Note
-c ALL Specify configuration file to use None Please refer to the parameter introduction for configuration file usage
-o ALL set configuration options None Configuration using -o has higher priority than the configuration file selected with -c. E.g: -o Global.use_gpu=false

2. Introduction to Global Parameters of Configuration File

Take rec_chinese_lite_train_v2.0.yml as an example

Global

Parameter Use Defaults Note
use_gpu Set using GPU or not true \
epoch_num Maximum training epoch number 500 \
log_smooth_window Log queue length, the median value in the queue each time will be printed 20 \
print_batch_step Set print log interval 10 \
save_model_dir Set model save path output/{算法名称} \
save_epoch_step Set model save interval 3 \
eval_batch_step Set the model evaluation interval 2000 or [1000, 2000] running evaluation every 2000 iters or evaluation is run every 2000 iterations after the 1000th iteration
cal_metric_during_train Set whether to evaluate the metric during the training process. At this time, the metric of the model under the current batch is evaluated true \
load_static_weights Set whether the pre-training model is saved in static graph mode (currently only required by the detection algorithm) true \
pretrained_model Set the path of the pre-trained model ./pretrain_models/CRNN/best_accuracy \
checkpoints set model parameter path None Used to load parameters after interruption to continue training
use_visualdl Set whether to enable visualdl for visual log display False Tutorial
infer_img Set inference image path or folder path ./infer_img |
character_dict_path Set dictionary path ./ppocr/utils/ppocr_keys_v1.txt If the character_dict_path is None, model can only recognize number and lower letters
max_text_length Set the maximum length of text 25 \
use_space_char Set whether to recognize spaces True |
label_list Set the angle supported by the direction classifier ['0','180'] Only valid in angle classifier model
save_res_path Set the save address of the test model results ./output/det_db/predicts_db.txt Only valid in the text detection model

Optimizer (ppocr/optimizer)

Parameter Use Defaults Note
name Optimizer class name Adam Currently supportsMomentum,Adam,RMSProp, see ppocr/optimizer/optimizer.py
beta1 Set the exponential decay rate for the 1st moment estimates 0.9 \
beta2 Set the exponential decay rate for the 2nd moment estimates 0.999 \
clip_norm The maximum norm value - \
lr Set the learning rate decay method - \
name Learning rate decay class name Cosine Currently supportsLinear,Cosine,Step,Piecewise, seeppocr/optimizer/learning_rate.py
learning_rate Set the base learning rate 0.001 \
regularizer Set network regularization method - \
name Regularizer class name L2 Currently supportL1,L2, seeppocr/optimizer/regularizer.py
factor Learning rate decay coefficient 0.00004 \

Architecture (ppocr/modeling)

In PaddleOCR, the network is divided into four stages: Transform, Backbone, Neck and Head

Parameter Use Defaults Note
model_type Network Type rec Currently supportrec,det,cls
algorithm Model name CRNN See algorithm_overview for the support list
Transform Set the transformation method - Currently only recognition algorithms are supported, see ppocr/modeling/transforms for details
name Transformation class name TPS Currently supports TPS
num_fiducial Number of TPS control points 20 Ten on the top and bottom
loc_lr Localization network learning rate 0.1 \
model_name Localization network size small Currently supportsmall,large
Backbone Set the network backbone class name - see ppocr/modeling/backbones
name backbone class name ResNet Currently supportMobileNetV3,ResNet
layers resnet layers 34 Currently support18,34,50,101,152,200
model_name MobileNetV3 network size small Currently supportsmall,large
Neck Set network neck - seeppocr/modeling/necks
name neck class name SequenceEncoder Currently supportSequenceEncoder,DBFPN
encoder_type SequenceEncoder encoder type rnn Currently supportreshape,fc,rnn
hidden_size rnn number of internal units 48 \
out_channels Number of DBFPN output channels 256 \
Head Set the network head - seeppocr/modeling/heads
name head class name CTCHead Currently supportCTCHead,DBHead,ClsHead
fc_decay CTCHead regularization coefficient 0.0004 \
k DBHead binarization coefficient 50 \
class_dim ClsHead output category number 2 \
Parameter Use Defaults Note
name loss class name CTCLoss Currently supportCTCLoss,DBLoss,ClsLoss
balance_loss Whether to balance the number of positive and negative samples in DBLossloss (using OHEM) True \
ohem_ratio The negative and positive sample ratio of OHEM in DBLossloss 3 \
main_loss_type The loss used by shrink_map in DBLossloss DiceLoss Currently supportDiceLoss,BCELoss
alpha The coefficient of shrink_map_loss in DBLossloss 5 \
beta The coefficient of threshold_map_loss in DBLossloss 10 \

PostProcess (ppocr/postprocess)

Parameter Use Defaults Note
name Post-processing class name CTCLabelDecode Currently supportCTCLoss,AttnLabelDecode,DBPostProcess,ClsPostProcess
thresh The threshold for binarization of the segmentation map in DBPostProcess 0.3 \
box_thresh The threshold for filtering output boxes in DBPostProcess. Boxes below this threshold will not be output 0.7 \
max_candidates The maximum number of text boxes output in DBPostProcess 1000
unclip_ratio The unclip ratio of the text box in DBPostProcess 2.0 \

Metric (ppocr/metrics)

Parameter Use Defaults Note
name Metric method name CTCLabelDecode Currently supportDetMetric,RecMetric,ClsMetric
main_indicator Main indicators, used to select the best model acc For the detection method is hmean, the recognition and classification method is acc

Dataset (ppocr/data)

Parameter Use Defaults Note
dataset Return one sample per iteration - -
name dataset class name SimpleDataSet Currently supportSimpleDataSet,LMDBDataSet
data_dir Image folder path ./train_data \
label_file_list Groundtruth file path ["./train_data/train_list.txt"] This parameter is not required when dataset is LMDBDataSet
ratio_list Ratio of data set [1.0] If there are two train_lists in label_file_list and ratio_list is [0.4,0.6], 40% will be sampled from train_list1, and 60% will be sampled from train_list2 to combine the entire dataset
transforms List of methods to transform images and labels [DecodeImage,CTCLabelEncode,RecResizeImg,KeepKeys] seeppocr/data/imaug
loader dataloader related -
shuffle Does each epoch disrupt the order of the data set True \
batch_size_per_card Single card batch size during training 256 \
drop_last Whether to discard the last incomplete mini-batch because the number of samples in the data set cannot be divisible by batch_size True \
num_workers The number of sub-processes used to load data, if it is 0, the sub-process is not started, and the data is loaded in the main process 8 \

3. Multilingual Config File Generation

PaddleOCR currently supports recognition for 80 languages (besides Chinese). A multi-language configuration file template is provided under the path configs/rec/multi_languages: rec_multi_language_lite_train.yml

There are two ways to create the required configuration file:

  1. Automatically generated by script

Script generate_multi_language_configs.py can help you generate configuration files for multi-language models.

  • Take Italian as an example, if your data is prepared in the following format:

    |-train_data
        |- it_train.txt # train_set label
        |- it_val.txt # val_set label
        |- data
            |- word_001.jpg
            |- word_002.jpg
            |- word_003.jpg
            | ...
    

    You can use the default parameters to generate a configuration file:

    # The code needs to be run in the specified directory
    cd PaddleOCR/configs/rec/multi_language/
    # Set the configuration file of the language to be generated through the -l or --language parameter.
    # This command will write the default parameters into the configuration file
    python3 generate_multi_language_configs.py -l it
  • If your data is placed in another location, or you want to use your own dictionary, you can generate the configuration file by specifying the relevant parameters:

    # -l or --language field is required
    # --train to modify the training set
    # --val to modify the validation set
    # --data_dir to modify the data set directory
    # --dict to modify the dict path
    # -o to modify the corresponding default parameters
    cd PaddleOCR/configs/rec/multi_language/
    python3 generate_multi_language_configs.py -l it \  # language
    --train {path/of/train_label.txt} \ # path of train_label
    --val {path/of/val_label.txt} \     # path of val_label
    --data_dir {train_data/path} \      # root directory of training data
    --dict {path/of/dict} \             # path of dict
    -o Global.use_gpu=False             # whether to use gpu
    ...
    

Italian is made up of Latin letters, so after executing the command, you will get the rec_latin_lite_train.yml.

  1. Manually modify the configuration file

    You can also manually modify the following fields in the template:

     Global:
       use_gpu: True
       epoch_num: 500
       ...
       character_dict_path:  {path/of/dict} # path of dict
    
    Train:
       dataset:
         name: SimpleDataSet
         data_dir: train_data/ # root directory of training data
         label_file_list: ["./train_data/train_list.txt"] # train label path
       ...
    
    Eval:
       dataset:
         name: SimpleDataSet
         data_dir: train_data/ # root directory of val data
         label_file_list: ["./train_data/val_list.txt"] # val label path
       ...
    
    

Currently, the multi-language algorithms supported by PaddleOCR are:

Configuration file Algorithm name backbone trans seq pred language
rec_chinese_cht_lite_train.yml CRNN Mobilenet_v3 small 0.5 None BiLSTM ctc chinese traditional
rec_en_lite_train.yml CRNN Mobilenet_v3 small 0.5 None BiLSTM ctc English(Case sensitive)
rec_french_lite_train.yml CRNN Mobilenet_v3 small 0.5 None BiLSTM ctc French
rec_ger_lite_train.yml CRNN Mobilenet_v3 small 0.5 None BiLSTM ctc German
rec_japan_lite_train.yml CRNN Mobilenet_v3 small 0.5 None BiLSTM ctc Japanese
rec_korean_lite_train.yml CRNN Mobilenet_v3 small 0.5 None BiLSTM ctc Korean
rec_latin_lite_train.yml CRNN Mobilenet_v3 small 0.5 None BiLSTM ctc Latin
rec_arabic_lite_train.yml CRNN Mobilenet_v3 small 0.5 None BiLSTM ctc arabic
rec_cyrillic_lite_train.yml CRNN Mobilenet_v3 small 0.5 None BiLSTM ctc cyrillic
rec_devanagari_lite_train.yml CRNN Mobilenet_v3 small 0.5 None BiLSTM ctc devanagari

For more supported languages, please refer to : Multi-language model

The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded using the following two methods.