AutoNLP

清博-WAIC世界人工智能大会AutoNLP-第三名解决方案

WAIC AutoNLP 3rd place Solution(txta)

Data cleaning and feature selection

1.Do some data cleaning on Chinese and English texts respectively
2.Do something about the data imbalance
3.Use automated feature filtering
4.Automated processing of long and short text
We tried hashingvctorizer to reduce the dimension of long text and to deal with sparse short text densely
5.Character level tf-idf is used for feature selection in Chinese, while word level feature selection is used in English

Sub-training, multi-layer sampling training

1.Stratified sampling based on incremental model
2.Oversampling of the sampled samples
3.Control the proportion of training sample class quantity
4.Oversampling is carried out for the categories with too small data volume

Linear-SVM+ probability calibration

Unbalanced category of automatic adjustment
Number of iterations of automatic search model
Automatic search for superparameters
Use the cross-validation generator and estimate the calibration of training samples and test samples for each split model parameter
Then average the probability of folding prediction
Since these probabilities are not always consistent, post-processing is performed to normalize them.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
AutoDL_ingestion_program		AutoDL_ingestion_program
AutoDL_sample_code_submission		AutoDL_sample_code_submission
AutoDL_sample_result_submission		AutoDL_sample_result_submission
AutoDL_scoring_output		AutoDL_scoring_output
AutoDL_scoring_program		AutoDL_scoring_program
AutoNLP		AutoNLP
README.md		README.md
max_length.config		max_length.config
model.config		model.config
num_features.config		num_features.config
run_local_test.py		run_local_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoNLP

WAIC AutoNLP 3rd place Solution(txta)

Data cleaning and feature selection

Sub-training, multi-layer sampling training

Linear-SVM+ probability calibration

About

Releases

Packages

Languages

qingboAI/AutoNLP

Folders and files

Latest commit

History

Repository files navigation

AutoNLP

WAIC AutoNLP 3rd place Solution(txta)

Data cleaning and feature selection

Sub-training, multi-layer sampling training

Linear-SVM+ probability calibration

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages