Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During parallel training, metrics of training set sometimes will be bad #886

Closed
qrqpjxq opened this issue Sep 3, 2017 · 45 comments
Closed

Comments

@qrqpjxq
Copy link
Contributor

qrqpjxq commented Sep 3, 2017

When doing parallel training, metrics of training set sometimes will be bad in each worker at the same iteration.

i'm trying to train on 5 workers, the metrics on training set:
worker0:
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=19
[LightGBM] [Info] Iteration:1, training auc : 0.624454
[LightGBM] [Info] Iteration:1, training binary_logloss : 0.680496
[LightGBM] [Info] 1.889427 seconds elapsed, finished iteration 1
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=16
[LightGBM] [Info] Iteration:2, training auc : 0.618313
[LightGBM] [Info] Iteration:2, training binary_logloss : 0.67641
[LightGBM] [Info] 3.768745 seconds elapsed, finished iteration 2
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=14
[LightGBM] [Info] Iteration:3, training auc : 0.603125
[LightGBM] [Info] Iteration:3, training binary_logloss : 0.676286
[LightGBM] [Info] 5.559710 seconds elapsed, finished iteration 3
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=17
[LightGBM] [Info] Iteration:4, training auc : 0.56756
[LightGBM] [Info] Iteration:4, training binary_logloss : 0.70803
[LightGBM] [Info] 7.490879 seconds elapsed, finished iteration 4
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=15
[LightGBM] [Info] Iteration:5, training auc : 0.588966
[LightGBM] [Info] Iteration:5, training binary_logloss : 0.697803
[LightGBM] [Info] 9.537317 seconds elapsed, finished iteration 5
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=17
[LightGBM] [Info] Iteration:6, training auc : 0.606358
[LightGBM] [Info] Iteration:6, training binary_logloss : 0.689927
[LightGBM] [Info] 11.492579 seconds elapsed, finished iteration 6

worker1:
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=19
[LightGBM] [Info] Iteration:1, training auc : 0.610784
[LightGBM] [Info] Iteration:1, training binary_logloss : 0.684069
[LightGBM] [Info] 1.917517 seconds elapsed, finished iteration 1
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=16
[LightGBM] [Info] Iteration:2, training auc : 0.625668
[LightGBM] [Info] Iteration:2, training binary_logloss : 0.677478
[LightGBM] [Info] 3.796817 seconds elapsed, finished iteration 2
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=14
[LightGBM] [Info] Iteration:3, training auc : 0.60758
[LightGBM] [Info] Iteration:3, training binary_logloss : 0.670345
[LightGBM] [Info] 5.561043 seconds elapsed, finished iteration 3
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=17
[LightGBM] [Info] Iteration:4, training auc : 0.594692
[LightGBM] [Info] Iteration:4, training binary_logloss : 0.682121
[LightGBM] [Info] 7.489096 seconds elapsed, finished iteration 4
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=15
[LightGBM] [Info] Iteration:5, training auc : 0.611087
[LightGBM] [Info] Iteration:5, training binary_logloss : 0.675639
[LightGBM] [Info] 9.505840 seconds elapsed, finished iteration 5
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=17
[LightGBM] [Info] Iteration:6, training auc : 0.622268
[LightGBM] [Info] Iteration:6, training binary_logloss : 0.670829
[LightGBM] [Info] 11.492364 seconds elapsed, finished iteration 6

worker2:
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=19
[LightGBM] [Info] Iteration:1, training auc : 0.605063
[LightGBM] [Info] Iteration:1, training binary_logloss : 0.692491
[LightGBM] [Info] 1.946944 seconds elapsed, finished iteration 1
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=16
[LightGBM] [Info] Iteration:2, training auc : 0.607614
[LightGBM] [Info] Iteration:2, training binary_logloss : 0.689697
[LightGBM] [Info] 3.822381 seconds elapsed, finished iteration 2
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=14
[LightGBM] [Info] Iteration:3, training auc : 0.592087
[LightGBM] [Info] Iteration:3, training binary_logloss : 0.688123
[LightGBM] [Info] 5.589290 seconds elapsed, finished iteration 3
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=17
[LightGBM] [Info] Iteration:4, training auc : 0.55509
[LightGBM] [Info] Iteration:4, training binary_logloss : 0.69431
[LightGBM] [Info] 7.516317 seconds elapsed, finished iteration 4
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=15
[LightGBM] [Info] Iteration:5, training auc : 0.58136
[LightGBM] [Info] Iteration:5, training binary_logloss : 0.685505
[LightGBM] [Info] 9.563150 seconds elapsed, finished iteration 5
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=17
[LightGBM] [Info] Iteration:6, training auc : 0.591031
[LightGBM] [Info] Iteration:6, training binary_logloss : 0.680882
[LightGBM] [Info] 11.554436 seconds elapsed, finished iteration 6

worker3:
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=19
[LightGBM] [Info] Iteration:1, training auc : 0.594895
[LightGBM] [Info] Iteration:1, training binary_logloss : 0.697414
[LightGBM] [Info] 1.913700 seconds elapsed, finished iteration 1
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=16
[LightGBM] [Info] Iteration:2, training auc : 0.607586
[LightGBM] [Info] Iteration:2, training binary_logloss : 0.690211
[LightGBM] [Info] 3.792895 seconds elapsed, finished iteration 2
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=14
[LightGBM] [Info] Iteration:3, training auc : 0.611669
[LightGBM] [Info] Iteration:3, training binary_logloss : 0.678102
[LightGBM] [Info] 5.528864 seconds elapsed, finished iteration 3
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=17
[LightGBM] [Info] Iteration:4, training auc : 0.585117
[LightGBM] [Info] Iteration:4, training binary_logloss : 0.683948
[LightGBM] [Info] 7.485415 seconds elapsed, finished iteration 4
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=15
[LightGBM] [Info] Iteration:5, training auc : 0.601168
[LightGBM] [Info] Iteration:5, training binary_logloss : 0.677983
[LightGBM] [Info] 9.530298 seconds elapsed, finished iteration 5
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=17
[LightGBM] [Info] Iteration:6, training auc : 0.615217
[LightGBM] [Info] Iteration:6, training binary_logloss : 0.672196
[LightGBM] [Info] 11.488655 seconds elapsed, finished iteration 6

worker4:
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=19
[LightGBM] [Info] Iteration:1, training auc : 0.639623
[LightGBM] [Info] Iteration:1, training binary_logloss : 0.680161
[LightGBM] [Info] 1.913545 seconds elapsed, finished iteration 1
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=16
[LightGBM] [Info] Iteration:2, training auc : 0.651382
[LightGBM] [Info] Iteration:2, training binary_logloss : 0.674585
[LightGBM] [Info] 3.759978 seconds elapsed, finished iteration 2
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=14
[LightGBM] [Info] Iteration:3, training auc : 0.630517
[LightGBM] [Info] Iteration:3, training binary_logloss : 0.66882
[LightGBM] [Info] 5.556613 seconds elapsed, finished iteration 3
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=17
[LightGBM] [Info] Iteration:4, training auc : 0.571659
[LightGBM] [Info] Iteration:4, training binary_logloss : 0.703941
[LightGBM] [Info] 7.452347 seconds elapsed, finished iteration 4
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=15
[LightGBM] [Info] Iteration:5, training auc : 0.58287
[LightGBM] [Info] Iteration:5, training binary_logloss : 0.69834
[LightGBM] [Info] 9.527238 seconds elapsed, finished iteration 5
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=17
[LightGBM] [Info] Iteration:6, training auc : 0.600549
[LightGBM] [Info] Iteration:6, training binary_logloss : 0.693129
[LightGBM] [Info] 11.483836 seconds elapsed, finished iteration 6

train.conf:
task = train
boosting_type = gbdt
objective = binary
metric = binary_logloss,auc
metric_freq = 1
is_training_metric = true
max_bin = 255
data = trainset
num_trees = 50
learning_rate = 0.1
num_leaves = 63
tree_learner = data
is_pre_partition = false
categorical_column = 0,4
bagging_freq = 5
min_data_in_leaf = 20
min_sum_hessian_in_leaf = 5.0
is_enable_sparse = true
use_two_round_loading = false
is_save_binary_file = false
num_machines = 5

trainset:
1 8.0 -0.635 0.226 0.327 2.0 0.754 -0.249 -1.092
1 8.0 0.329 0.359 1.498 2.0 1.096 -0.558 -1.588
1 8.0 1.471 -1.636 0.454 5.0 1.105 1.282 1.382
0 9.0 -0.877 0.936 1.992 6.0 1.786 -1.647 -0.942
1 8.0 0.321 1.522 0.883 2.0 0.681 -1.07 -0.922
head(5) of trainset.
in the test, there are 7000 train data.

@guolinke
Copy link
Collaborator

guolinke commented Sep 3, 2017

@qrqpjxq
The accuracy of training data in parallel learning is based on local partition.
So it may is not good since #data is very small in your case.

You can add a parameter: valid=trainset to get the accuracy on full training data.

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 3, 2017

thanks, but after set valid=trainset; num_machines=2
worker0:
[LightGBM] [Info] Number of positive: 1849, number of negative: 1672
[LightGBM] [Info] Total Bins 1542
[LightGBM] [Info] Number of data: 3521, number of used features: 8
[LightGBM] [Info] Finished initializing training
[LightGBM] [Info] Started training...
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=27
[LightGBM] [Info] Iteration:1, valid_1 auc : 0.605109
[LightGBM] [Info] Iteration:1, valid_1 binary_logloss : 0.687757
[LightGBM] [Info] 2.217076 seconds elapsed, finished iteration 1
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=19
[LightGBM] [Info] Iteration:2, valid_1 auc : 0.614288
[LightGBM] [Info] Iteration:2, valid_1 binary_logloss : 0.67951
[LightGBM] [Info] 4.499987 seconds elapsed, finished iteration 2
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=15
[LightGBM] [Info] Iteration:3, valid_1 auc : 0.568639
[LightGBM] [Info] Iteration:3, valid_1 binary_logloss : 0.748094
[LightGBM] [Info] 7.203631 seconds elapsed, finished iteration 3
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=18
[LightGBM] [Info] Iteration:4, valid_1 auc : 0.573558
[LightGBM] [Info] Iteration:4, valid_1 binary_logloss : 0.713961
[LightGBM] [Info] 9.606860 seconds elapsed, finished iteration 4
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=18
[LightGBM] [Info] Iteration:5, valid_1 auc : 0.574269
[LightGBM] [Info] Iteration:5, valid_1 binary_logloss : 0.70842
[LightGBM] [Info] 11.894101 seconds elapsed, finished iteration 5
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=12
[LightGBM] [Info] Iteration:6, valid_1 auc : 0.598179
[LightGBM] [Info] Iteration:6, valid_1 binary_logloss : 0.69411
[LightGBM] [Info] 14.400654 seconds elapsed, finished iteration 6

worker1:
[LightGBM] [Info] Number of positive: 1867, number of negative: 1612
[LightGBM] [Info] Total Bins 1542
[LightGBM] [Info] Number of data: 3479, number of used features: 8
[LightGBM] [Info] Finished initializing training
[LightGBM] [Info] Started training...
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=27
[LightGBM] [Info] Iteration:1, valid_1 auc : 0.623738
[LightGBM] [Info] Iteration:1, valid_1 binary_logloss : 0.680396
[LightGBM] [Info] 2.211793 seconds elapsed, finished iteration 1
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=19
[LightGBM] [Info] Iteration:2, valid_1 auc : 0.627245
[LightGBM] [Info] Iteration:2, valid_1 binary_logloss : 0.675169
[LightGBM] [Info] 4.503630 seconds elapsed, finished iteration 2
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=15
[LightGBM] [Info] Iteration:3, valid_1 auc : 0.589485
[LightGBM] [Info] Iteration:3, valid_1 binary_logloss : 0.7208
[LightGBM] [Info] 7.191382 seconds elapsed, finished iteration 3
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=18
[LightGBM] [Info] Iteration:4, valid_1 auc : 0.57794
[LightGBM] [Info] Iteration:4, valid_1 binary_logloss : 0.713243
[LightGBM] [Info] 9.578662 seconds elapsed, finished iteration 4
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=18
[LightGBM] [Info] Iteration:5, valid_1 auc : 0.587163
[LightGBM] [Info] Iteration:5, valid_1 binary_logloss : 0.703439
[LightGBM] [Info] 11.869153 seconds elapsed, finished iteration 5
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=12
[LightGBM] [Info] Iteration:6, valid_1 auc : 0.611375
[LightGBM] [Info] Iteration:6, valid_1 binary_logloss : 0.688842
[LightGBM] [Info] 14.408432 seconds elapsed, finished iteration 6

in each iteration, auc and logloss in worker0 and worker1 is not same.

@guolinke
Copy link
Collaborator

guolinke commented Sep 3, 2017

@qrqpjxq
can you try it without set categorical_column = 0,4 ?

@guolinke
Copy link
Collaborator

guolinke commented Sep 3, 2017

@qrqpjxq
I just test in by using example data: https://github.com/Microsoft/LightGBM/tree/master/examples/parallel_learning

The result:
worker 1:

[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Trying to bind port 12400...
[LightGBM] [Info] Binding port 12400 succeeded
[LightGBM] [Info] Listening...
[LightGBM] [Warning] Connecting to rank 1 failed, waiting for 10000 milliseconds
[LightGBM] [Info] Connected to rank 1
[LightGBM] [Info] Local rank: 0, total number of machines: 2
[LightGBM] [Info] Finished initializing network
[LightGBM] [Info] Finished loading data in 0.070659 seconds
[LightGBM] [Info] Number of positive: 1867, number of negative: 1612
[LightGBM] [Info] Total Bins 6143
[LightGBM] [Info] Number of data: 3479, number of used features: 28
[LightGBM] [Info] Finished initializing training
[LightGBM] [Info] Started training...
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=10
[LightGBM] [Info] Iteration:1, training binary_logloss : 0.669083
[LightGBM] [Info] Iteration:1, training auc : 0.779758
[LightGBM] [Info] Iteration:1, valid_1 binary_logloss : 0.672876
[LightGBM] [Info] Iteration:1, valid_1 auc : 0.740543
[LightGBM] [Info] 0.019646 seconds elapsed, finished iteration 1
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=10
[LightGBM] [Info] Iteration:2, training binary_logloss : 0.650871
[LightGBM] [Info] Iteration:2, training auc : 0.800644
[LightGBM] [Info] Iteration:2, valid_1 binary_logloss : 0.659035
[LightGBM] [Info] Iteration:2, valid_1 auc : 0.745928
[LightGBM] [Info] 0.039247 seconds elapsed, finished iteration 2
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=11
[LightGBM] [Info] Iteration:3, training binary_logloss : 0.634422
[LightGBM] [Info] Iteration:3, training auc : 0.821839
[LightGBM] [Info] Iteration:3, valid_1 binary_logloss : 0.646742
[LightGBM] [Info] Iteration:3, valid_1 auc : 0.760965
[LightGBM] [Info] 0.063029 seconds elapsed, finished iteration 3
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=10
[LightGBM] [Info] Iteration:4, training binary_logloss : 0.619946
[LightGBM] [Info] Iteration:4, training auc : 0.828923
[LightGBM] [Info] Iteration:4, valid_1 binary_logloss : 0.631734
[LightGBM] [Info] Iteration:4, valid_1 auc : 0.787297
[LightGBM] [Info] 0.082977 seconds elapsed, finished iteration 4
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=11
[LightGBM] [Info] Iteration:5, training binary_logloss : 0.604387
[LightGBM] [Info] Iteration:5, training auc : 0.83749
[LightGBM] [Info] Iteration:5, valid_1 binary_logloss : 0.61896
[LightGBM] [Info] Iteration:5, valid_1 auc : 0.797464
[LightGBM] [Info] 0.101462 seconds elapsed, finished iteration 5
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=10
[LightGBM] [Info] Iteration:6, training binary_logloss : 0.590637
[LightGBM] [Info] Iteration:6, training auc : 0.847719
[LightGBM] [Info] Iteration:6, valid_1 binary_logloss : 0.609161
[LightGBM] [Info] Iteration:6, valid_1 auc : 0.798915

worker 2:

[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Trying to bind port 12401...
[LightGBM] [Info] Binding port 12401 succeeded
[LightGBM] [Info] Listening...
[LightGBM] [Info] Connected to rank 0
[LightGBM] [Info] Local rank: 1, total number of machines: 2
[LightGBM] [Info] Finished initializing network
[LightGBM] [Info] Finished loading data in 0.071100 seconds
[LightGBM] [Info] Number of positive: 1849, number of negative: 1672
[LightGBM] [Info] Total Bins 6143
[LightGBM] [Info] Number of data: 3521, number of used features: 28
[LightGBM] [Info] Finished initializing training
[LightGBM] [Info] Started training...
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=10
[LightGBM] [Info] Iteration:1, training binary_logloss : 0.668681
[LightGBM] [Info] Iteration:1, training auc : 0.787495
[LightGBM] [Info] Iteration:1, valid_1 binary_logloss : 0.672876
[LightGBM] [Info] Iteration:1, valid_1 auc : 0.740543
[LightGBM] [Info] 0.018932 seconds elapsed, finished iteration 1
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=10
[LightGBM] [Info] Iteration:2, training binary_logloss : 0.650747
[LightGBM] [Info] Iteration:2, training auc : 0.804816
[LightGBM] [Info] Iteration:2, valid_1 binary_logloss : 0.659035
[LightGBM] [Info] Iteration:2, valid_1 auc : 0.745928
[LightGBM] [Info] 0.037590 seconds elapsed, finished iteration 2
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=11
[LightGBM] [Info] Iteration:3, training binary_logloss : 0.635438
[LightGBM] [Info] Iteration:3, training auc : 0.821809
[LightGBM] [Info] Iteration:3, valid_1 binary_logloss : 0.646742
[LightGBM] [Info] Iteration:3, valid_1 auc : 0.760965
[LightGBM] [Info] 0.061934 seconds elapsed, finished iteration 3
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=10
[LightGBM] [Info] Iteration:4, training binary_logloss : 0.621799
[LightGBM] [Info] Iteration:4, training auc : 0.826775
[LightGBM] [Info] Iteration:4, valid_1 binary_logloss : 0.631734
[LightGBM] [Info] Iteration:4, valid_1 auc : 0.787297
[LightGBM] [Info] 0.081349 seconds elapsed, finished iteration 4
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=11
[LightGBM] [Info] Iteration:5, training binary_logloss : 0.606385
[LightGBM] [Info] Iteration:5, training auc : 0.836405
[LightGBM] [Info] Iteration:5, valid_1 binary_logloss : 0.61896
[LightGBM] [Info] Iteration:5, valid_1 auc : 0.797464
[LightGBM] [Info] 0.100113 seconds elapsed, finished iteration 5
[LightGBM] [Info] Trained a tree with leaves=63 and max_depth=10
[LightGBM] [Info] Iteration:6, training binary_logloss : 0.593097
[LightGBM] [Info] Iteration:6, training auc : 0.845252
[LightGBM] [Info] Iteration:6, valid_1 binary_logloss : 0.609161
[LightGBM] [Info] Iteration:6, valid_1 auc : 0.798915

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 3, 2017

i tried again, if delete categorical_column = 0,4 there's noting wrong. metrics: logloss and auc in each worker is the same.
but set categorical_column = 0,4; there will be wrong, and metrics in each worker is not same.

@guolinke
Copy link
Collaborator

guolinke commented Sep 3, 2017

@qrqpjxq

  1. can you provide your test data so that i can debug on it ?
  2. did these workers generate the same model files ?

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 4, 2017

ok, i've send the trainset to your email.

@guolinke
Copy link
Collaborator

@qrqpjxq updates ?

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 10, 2017

what's your meaning?
i don't know how to give you my trainset.

@guolinke
Copy link
Collaborator

@qrqpjxq You can email it to me.

@guolinke
Copy link
Collaborator

@qrqpjxq
can you try the latest code?
I think it should be fixed.

guolinke added a commit that referenced this issue Sep 10, 2017
@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 10, 2017

is there any wrong in split_info.hpp/77 &cat_threshold.data()?
i compile in mac os, a error : LightGBM/src/treelearner/split_info.hpp:77:45: error: lvalue required as unary '&' operand std::memcpy(buffer, &cat_threshold.data(), sizeof(uint32_t) * num_cat_threshold);

@guolinke
Copy link
Collaborator

@qrqpjxq can you delete LightGBM folder, then re-clone ?

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 10, 2017

is ok, work very vell.thank you~

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 11, 2017

hi, there's another problem.
when set workers = 8, it works well:
2017-09-11 16:52:33,826 INFO Iteration:1, valid_1 auc : 0.635991
2017-09-11 16:52:33,869 INFO Iteration:1, valid_1 binary_logloss : 0.548136
2017-09-11 16:52:34,318 INFO Iteration:2, valid_1 auc : 0.644662
2017-09-11 16:52:34,361 INFO Iteration:2, valid_1 binary_logloss : 0.477304
2017-09-11 16:52:35,348 INFO Iteration:3, valid_1 auc : 0.646065
2017-09-11 16:52:35,391 INFO Iteration:3, valid_1 binary_logloss : 0.388317
2017-09-11 16:52:35,873 INFO Iteration:4, valid_1 auc : 0.641365
2017-09-11 16:52:35,916 INFO Iteration:4, valid_1 binary_logloss : 0.365449
2017-09-11 16:52:36,254 INFO Iteration:5, valid_1 auc : 0.643555
2017-09-11 16:52:36,297 INFO Iteration:5, valid_1 binary_logloss : 0.313496
2017-09-11 16:52:36,564 INFO Iteration:6, valid_1 auc : 0.646382
2017-09-11 16:52:36,607 INFO Iteration:6, valid_1 binary_logloss : 0.281064
2017-09-11 16:52:36,890 INFO Iteration:7, valid_1 auc : 0.650021
2017-09-11 16:52:36,933 INFO Iteration:7, valid_1 binary_logloss : 0.258292
2017-09-11 16:52:37,273 INFO Iteration:8, valid_1 auc : 0.657701
2017-09-11 16:52:37,316 INFO Iteration:8, valid_1 binary_logloss : 0.24119
2017-09-11 16:52:37,635 INFO Iteration:9, valid_1 auc : 0.664256
2017-09-11 16:52:37,678 INFO Iteration:9, valid_1 binary_logloss : 0.228436
2017-09-11 16:52:37,906 INFO Iteration:10, valid_1 auc : 0.67106
2017-09-11 16:52:37,949 INFO Iteration:10, valid_1 binary_logloss : 0.2188
2017-09-11 16:52:38,431 INFO Iteration:11, valid_1 auc : 0.677438
2017-09-11 16:52:38,474 INFO Iteration:11, valid_1 binary_logloss : 0.211596
2017-09-11 16:52:38,918 INFO Iteration:12, valid_1 auc : 0.68454
2017-09-11 16:52:38,961 INFO Iteration:12, valid_1 binary_logloss : 0.206148
2017-09-11 16:52:39,467 INFO Iteration:13, valid_1 auc : 0.690673
2017-09-11 16:52:39,510 INFO Iteration:13, valid_1 binary_logloss : 0.201961
2017-09-11 16:52:39,950 INFO Iteration:14, valid_1 auc : 0.698263
2017-09-11 16:52:39,993 INFO Iteration:14, valid_1 binary_logloss : 0.198667
2017-09-11 16:52:40,286 INFO Iteration:15, valid_1 auc : 0.704705
2017-09-11 16:52:40,329 INFO Iteration:15, valid_1 binary_logloss : 0.196059
2017-09-11 16:52:40,691 INFO Iteration:16, valid_1 auc : 0.70947
2017-09-11 16:52:40,734 INFO Iteration:16, valid_1 binary_logloss : 0.1939
2017-09-11 16:52:41,017 INFO Iteration:17, valid_1 auc : 0.716933
2017-09-11 16:52:41,060 INFO Iteration:17, valid_1 binary_logloss : 0.192113
2017-09-11 16:52:41,286 INFO Iteration:18, valid_1 auc : 0.723636
2017-09-11 16:52:41,329 INFO Iteration:18, valid_1 binary_logloss : 0.190566
2017-09-11 16:52:41,669 INFO Iteration:19, valid_1 auc : 0.729448
2017-09-11 16:52:41,712 INFO Iteration:19, valid_1 binary_logloss : 0.18928
2017-09-11 16:52:42,163 INFO Iteration:20, valid_1 auc : 0.735661
2017-09-11 16:52:42,206 INFO Iteration:20, valid_1 binary_logloss : 0.188081
2017-09-11 16:52:42,450 INFO Iteration:21, valid_1 auc : 0.739919
2017-09-11 16:52:42,493 INFO Iteration:21, valid_1 binary_logloss : 0.186924
2017-09-11 16:52:42,984 INFO Iteration:22, valid_1 auc : 0.74577
2017-09-11 16:52:43,027 INFO Iteration:22, valid_1 binary_logloss : 0.185777
2017-09-11 16:52:43,473 INFO Iteration:23, valid_1 auc : 0.751523
2017-09-11 16:52:43,517 INFO Iteration:23, valid_1 binary_logloss : 0.184678
2017-09-11 16:52:44,020 INFO Iteration:24, valid_1 auc : 0.758177
2017-09-11 16:52:44,063 INFO Iteration:24, valid_1 binary_logloss : 0.183675
2017-09-11 16:52:44,550 INFO Iteration:25, valid_1 auc : 0.764066
2017-09-11 16:52:44,593 INFO Iteration:25, valid_1 binary_logloss : 0.182664
2017-09-11 16:52:44,983 INFO Iteration:26, valid_1 auc : 0.769386
2017-09-11 16:52:45,026 INFO Iteration:26, valid_1 binary_logloss : 0.181722
2017-09-11 16:52:45,269 INFO Iteration:27, valid_1 auc : 0.775255
2017-09-11 16:52:45,312 INFO Iteration:27, valid_1 binary_logloss : 0.180775
2017-09-11 16:52:45,985 INFO Iteration:28, valid_1 auc : 0.77953
2017-09-11 16:52:46,028 INFO Iteration:28, valid_1 binary_logloss : 0.179947
2017-09-11 16:52:46,582 INFO Iteration:29, valid_1 auc : 0.785543
2017-09-11 16:52:46,625 INFO Iteration:29, valid_1 binary_logloss : 0.179
2017-09-11 16:52:47,098 INFO Iteration:30, valid_1 auc : 0.79044
2017-09-11 16:52:47,141 INFO Iteration:30, valid_1 binary_logloss : 0.178112

but when i set workers = 16, the metrics for validation data is bad:

2017-09-11 16:53:34,441 INFO Iteration:1, valid_1 auc : 0.635971
2017-09-11 16:53:34,484 INFO Iteration:1, valid_1 binary_logloss : 0.548134
2017-09-11 16:53:37,413 INFO Iteration:2, valid_1 auc : 0.630677
2017-09-11 16:53:37,456 INFO Iteration:2, valid_1 binary_logloss : 0.467982
2017-09-11 16:53:39,878 INFO Iteration:3, valid_1 auc : 0.62309
2017-09-11 16:53:39,921 INFO Iteration:3, valid_1 binary_logloss : 0.386897
2017-09-11 16:53:42,971 INFO Iteration:4, valid_1 auc : 0.591336
2017-09-11 16:53:43,014 INFO Iteration:4, valid_1 binary_logloss : 5.01054
2017-09-11 16:53:45,967 INFO Iteration:5, valid_1 auc : 0.606469
2017-09-11 16:53:46,010 INFO Iteration:5, valid_1 binary_logloss : 0.545685
2017-09-11 16:53:48,796 INFO Iteration:6, valid_1 auc : 0.576905
2017-09-11 16:53:48,839 INFO Iteration:6, valid_1 binary_logloss : 0.638196
2017-09-11 16:53:52,814 INFO Iteration:7, valid_1 auc : 0.589182
2017-09-11 16:53:52,858 INFO Iteration:7, valid_1 binary_logloss : 0.328422
2017-09-11 16:53:55,895 INFO Iteration:8, valid_1 auc : 0.594834
2017-09-11 16:53:55,938 INFO Iteration:8, valid_1 binary_logloss : 0.308189
2017-09-11 16:54:00,400 INFO Iteration:9, valid_1 auc : 0.599538
2017-09-11 16:54:00,443 INFO Iteration:9, valid_1 binary_logloss : 0.29409
2017-09-11 16:54:03,884 INFO Iteration:10, valid_1 auc : 0.602566
2017-09-11 16:54:03,927 INFO Iteration:10, valid_1 binary_logloss : 0.277499
2017-09-11 16:54:08,668 INFO Iteration:11, valid_1 auc : 0.606994
2017-09-11 16:54:08,711 INFO Iteration:11, valid_1 binary_logloss : 0.262994
2017-09-11 16:54:15,099 INFO Iteration:12, valid_1 auc : 0.614123
2017-09-11 16:54:15,143 INFO Iteration:12, valid_1 binary_logloss : 0.253562
2017-09-11 16:54:21,170 INFO Iteration:13, valid_1 auc : 0.615581
2017-09-11 16:54:21,213 INFO Iteration:13, valid_1 binary_logloss : 0.333199
2017-09-11 16:54:27,872 INFO Iteration:14, valid_1 auc : 0.606632
2017-09-11 16:54:27,915 INFO Iteration:14, valid_1 binary_logloss : 0.301879
2017-09-11 16:54:34,353 INFO Iteration:15, valid_1 auc : 0.611201
2017-09-11 16:54:34,396 INFO Iteration:15, valid_1 binary_logloss : 0.297701
2017-09-11 16:54:41,036 INFO Iteration:16, valid_1 auc : 0.616083
2017-09-11 16:54:41,079 INFO Iteration:16, valid_1 binary_logloss : 0.289976
2017-09-11 16:54:47,659 INFO Iteration:17, valid_1 auc : 0.622897
2017-09-11 16:54:47,702 INFO Iteration:17, valid_1 binary_logloss : 0.284319
2017-09-11 16:54:54,210 INFO Iteration:18, valid_1 auc : 0.628703
2017-09-11 16:54:54,253 INFO Iteration:18, valid_1 binary_logloss : 0.28036
2017-09-11 16:55:01,261 INFO Iteration:19, valid_1 auc : 0.633971
2017-09-11 16:55:01,265 INFO Iteration:19, valid_1 binary_logloss : 0.277179
2017-09-11 16:55:08,085 INFO Iteration:20, valid_1 auc : 0.638923
2017-09-11 16:55:08,128 INFO Iteration:20, valid_1 binary_logloss : 0.27337
2017-09-11 16:55:15,109 INFO Iteration:21, valid_1 auc : 0.642588
2017-09-11 16:55:15,152 INFO Iteration:21, valid_1 binary_logloss : 0.269251
2017-09-11 16:55:21,764 INFO Iteration:22, valid_1 auc : 0.646308
2017-09-11 16:55:21,807 INFO Iteration:22, valid_1 binary_logloss : 0.265861
2017-09-11 16:55:28,886 INFO Iteration:23, valid_1 auc : 0.651363
2017-09-11 16:55:28,929 INFO Iteration:23, valid_1 binary_logloss : 0.260687
2017-09-11 16:55:35,428 INFO Iteration:24, valid_1 auc : 0.65551
2017-09-11 16:55:35,471 INFO Iteration:24, valid_1 binary_logloss : 0.254044
2017-09-11 16:55:42,286 INFO Iteration:25, valid_1 auc : 0.659807
2017-09-11 16:55:42,329 INFO Iteration:25, valid_1 binary_logloss : 0.249976
2017-09-11 16:55:49,024 INFO Iteration:26, valid_1 auc : 0.664042
2017-09-11 16:55:49,067 INFO Iteration:26, valid_1 binary_logloss : 0.245899
2017-09-11 16:55:55,275 INFO Iteration:27, valid_1 auc : 0.668859
2017-09-11 16:55:55,318 INFO Iteration:27, valid_1 binary_logloss : 0.244396
2017-09-11 16:56:02,045 INFO Iteration:28, valid_1 auc : 0.674391
2017-09-11 16:56:02,088 INFO Iteration:28, valid_1 binary_logloss : 0.242955
2017-09-11 16:56:08,085 INFO Iteration:29, valid_1 auc : 0.678099
2017-09-11 16:56:08,128 INFO Iteration:29, valid_1 binary_logloss : 0.241665
2017-09-11 16:56:13,253 INFO Iteration:30, valid_1 auc : 0.682126
2017-09-11 16:56:13,296 INFO Iteration:30, valid_1 binary_logloss : 0.240443

more workers may make it work worse?

@guolinke
Copy link
Collaborator

@qrqpjxq what is number of your training data ?
if it is very small, it is possible.

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 11, 2017

yes, dataset is a litter small, ony 20w.
but i try training in a big data, like this:
each worker contents about 1000w data, val-set is 20w, but metric is wrong in inter 33-50.
(file type is libsvm)

[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Connected to rank 1
[LightGBM] [Info] Connected to rank 3
[LightGBM] [Info] Connected to rank 4
[LightGBM] [Info] Connected to rank 6
[LightGBM] [Info] Connected to rank 7
[LightGBM] [Info] Connected to rank 9
[LightGBM] [Info] Connected to rank 13
[LightGBM] [Info] Connected to rank 21
[LightGBM] [Info] Connected to rank 29
[LightGBM] [Info] Local rank: 5, total number of machines: 32
[LightGBM] [Info] Finished initializing network
[LightGBM] [Info] Finished loading data in 53.855597 seconds
[LightGBM] [Info] Number of positive: 550253, number of negative: 9891976
[LightGBM] [Info] Total Bins 6460
[LightGBM] [Info] Number of data: 10442229, number of used features: 455
[LightGBM] [Info] Finished initializing training
[LightGBM] [Info] Started training...
[LightGBM] [Info] Iteration:1, training auc : 0.641065
[LightGBM] [Info] Iteration:1, training binary_logloss : 0.581707
[LightGBM] [Info] Iteration:1, valid_1 auc : 0.581827
[LightGBM] [Info] Iteration:1, valid_1 binary_logloss : 0.582968
[LightGBM] [Info] 15.374525 seconds elapsed, finished iteration 1
[LightGBM] [Info] Iteration:2, training auc : 0.642554
[LightGBM] [Info] Iteration:2, training binary_logloss : 0.499464
[LightGBM] [Info] Iteration:2, valid_1 auc : 0.583901
[LightGBM] [Info] Iteration:2, valid_1 binary_logloss : 0.501547
[LightGBM] [Info] 30.345764 seconds elapsed, finished iteration 2
[LightGBM] [Info] Iteration:3, training auc : 0.643544
[LightGBM] [Info] Iteration:3, training binary_logloss : 0.436843
[LightGBM] [Info] Iteration:3, valid_1 auc : 0.587749
[LightGBM] [Info] Iteration:3, valid_1 binary_logloss : 0.439421
[LightGBM] [Info] 45.430786 seconds elapsed, finished iteration 3
[LightGBM] [Info] Iteration:4, training auc : 0.643894
[LightGBM] [Info] Iteration:4, training binary_logloss : 0.3882
[LightGBM] [Info] Iteration:4, valid_1 auc : 0.587691
[LightGBM] [Info] Iteration:4, valid_1 binary_logloss : 0.391284
[LightGBM] [Info] 60.590688 seconds elapsed, finished iteration 4
[LightGBM] [Info] Iteration:5, training auc : 0.64427
[LightGBM] [Info] Iteration:5, training binary_logloss : 0.349914
[LightGBM] [Info] Iteration:5, valid_1 auc : 0.587839
[LightGBM] [Info] Iteration:5, valid_1 binary_logloss : 0.353483
[LightGBM] [Info] 76.583244 seconds elapsed, finished iteration 5
[LightGBM] [Info] Iteration:6, training auc : 0.644595
[LightGBM] [Info] Iteration:6, training binary_logloss : 0.319512
[LightGBM] [Info] Iteration:6, valid_1 auc : 0.58916
[LightGBM] [Info] Iteration:6, valid_1 binary_logloss : 0.323278
[LightGBM] [Info] 91.915405 seconds elapsed, finished iteration 6
[LightGBM] [Info] Iteration:7, training auc : 0.644796
[LightGBM] [Info] Iteration:7, training binary_logloss : 0.295233
[LightGBM] [Info] Iteration:7, valid_1 auc : 0.588669
[LightGBM] [Info] Iteration:7, valid_1 binary_logloss : 0.299315
[LightGBM] [Info] 107.149692 seconds elapsed, finished iteration 7
[LightGBM] [Info] Iteration:8, training auc : 0.645032
[LightGBM] [Info] Iteration:8, training binary_logloss : 0.275776
[LightGBM] [Info] Iteration:8, valid_1 auc : 0.589283
[LightGBM] [Info] Iteration:8, valid_1 binary_logloss : 0.280058
[LightGBM] [Info] 122.182387 seconds elapsed, finished iteration 8
[LightGBM] [Info] Iteration:9, training auc : 0.645298
[LightGBM] [Info] Iteration:9, training binary_logloss : 0.260158
[LightGBM] [Info] Iteration:9, valid_1 auc : 0.589989
[LightGBM] [Info] Iteration:9, valid_1 binary_logloss : 0.264565
[LightGBM] [Info] 137.793945 seconds elapsed, finished iteration 9
[LightGBM] [Info] Iteration:10, training auc : 0.645473
[LightGBM] [Info] Iteration:10, training binary_logloss : 0.247618
[LightGBM] [Info] Iteration:10, valid_1 auc : 0.590066
[LightGBM] [Info] Iteration:10, valid_1 binary_logloss : 0.252222
[LightGBM] [Info] 154.700460 seconds elapsed, finished iteration 10
[LightGBM] [Info] Iteration:11, training auc : 0.645698
[LightGBM] [Info] Iteration:11, training binary_logloss : 0.237554
[LightGBM] [Info] Iteration:11, valid_1 auc : 0.590248
[LightGBM] [Info] Iteration:11, valid_1 binary_logloss : 0.242299
[LightGBM] [Info] 170.088301 seconds elapsed, finished iteration 11
[LightGBM] [Info] Iteration:12, training auc : 0.645927
[LightGBM] [Info] Iteration:12, training binary_logloss : 0.22949
[LightGBM] [Info] Iteration:12, valid_1 auc : 0.5901
[LightGBM] [Info] Iteration:12, valid_1 binary_logloss : 0.234534
[LightGBM] [Info] 185.718420 seconds elapsed, finished iteration 12
[LightGBM] [Info] Iteration:13, training auc : 0.646206
[LightGBM] [Info] Iteration:13, training binary_logloss : 0.223042
[LightGBM] [Info] Iteration:13, valid_1 auc : 0.5909
[LightGBM] [Info] Iteration:13, valid_1 binary_logloss : 0.228126
[LightGBM] [Info] 201.576137 seconds elapsed, finished iteration 13
[LightGBM] [Info] Iteration:14, training auc : 0.646489
[LightGBM] [Info] Iteration:14, training binary_logloss : 0.217903
[LightGBM] [Info] Iteration:14, valid_1 auc : 0.591523
[LightGBM] [Info] Iteration:14, valid_1 binary_logloss : 0.223029
[LightGBM] [Info] 218.651286 seconds elapsed, finished iteration 14
[LightGBM] [Info] Iteration:15, training auc : 0.64679
[LightGBM] [Info] Iteration:15, training binary_logloss : 0.213818
[LightGBM] [Info] Iteration:15, valid_1 auc : 0.592214
[LightGBM] [Info] Iteration:15, valid_1 binary_logloss : 0.219022
[LightGBM] [Info] 234.278238 seconds elapsed, finished iteration 15
[LightGBM] [Info] Iteration:16, training auc : 0.647073
[LightGBM] [Info] Iteration:16, training binary_logloss : 0.210586
[LightGBM] [Info] Iteration:16, valid_1 auc : 0.594091
[LightGBM] [Info] Iteration:16, valid_1 binary_logloss : 0.215693
[LightGBM] [Info] 250.275721 seconds elapsed, finished iteration 16
[LightGBM] [Info] Iteration:17, training auc : 0.647373
[LightGBM] [Info] Iteration:17, training binary_logloss : 0.208029
[LightGBM] [Info] Iteration:17, valid_1 auc : 0.594479
[LightGBM] [Info] Iteration:17, valid_1 binary_logloss : 0.213248
[LightGBM] [Info] 266.050109 seconds elapsed, finished iteration 17
[LightGBM] [Info] Iteration:18, training auc : 0.647672
[LightGBM] [Info] Iteration:18, training binary_logloss : 0.206054
[LightGBM] [Info] Iteration:18, valid_1 auc : 0.594637
[LightGBM] [Info] Iteration:18, valid_1 binary_logloss : 0.211334
[LightGBM] [Info] 282.279991 seconds elapsed, finished iteration 18
[LightGBM] [Info] Iteration:19, training auc : 0.648004
[LightGBM] [Info] Iteration:19, training binary_logloss : 0.205272
[LightGBM] [Info] Iteration:19, valid_1 auc : 0.595967
[LightGBM] [Info] Iteration:19, valid_1 binary_logloss : 0.209705
[LightGBM] [Info] 295.766308 seconds elapsed, finished iteration 19
[LightGBM] [Info] Iteration:20, training auc : 0.648287
[LightGBM] [Info] Iteration:20, training binary_logloss : 0.203377
[LightGBM] [Info] Iteration:20, valid_1 auc : 0.595833
[LightGBM] [Info] Iteration:20, valid_1 binary_logloss : 0.208599
[LightGBM] [Info] 311.474028 seconds elapsed, finished iteration 20
[LightGBM] [Info] Iteration:21, training auc : 0.648595
[LightGBM] [Info] Iteration:21, training binary_logloss : 0.202409
[LightGBM] [Info] Iteration:21, valid_1 auc : 0.596718
[LightGBM] [Info] Iteration:21, valid_1 binary_logloss : 0.207645
[LightGBM] [Info] 326.061589 seconds elapsed, finished iteration 21
[LightGBM] [Info] Iteration:22, training auc : 0.648966
[LightGBM] [Info] Iteration:22, training binary_logloss : 0.201654
[LightGBM] [Info] Iteration:22, valid_1 auc : 0.59799
[LightGBM] [Info] Iteration:22, valid_1 binary_logloss : 0.206848
[LightGBM] [Info] 341.015276 seconds elapsed, finished iteration 22
[LightGBM] [Info] Iteration:23, training auc : 0.649363
[LightGBM] [Info] Iteration:23, training binary_logloss : 0.201064
[LightGBM] [Info] Iteration:23, valid_1 auc : 0.598262
[LightGBM] [Info] Iteration:23, valid_1 binary_logloss : 0.206305
[LightGBM] [Info] 356.307679 seconds elapsed, finished iteration 23
[LightGBM] [Info] Iteration:24, training auc : 0.649711
[LightGBM] [Info] Iteration:24, training binary_logloss : 0.200605
[LightGBM] [Info] Iteration:24, valid_1 auc : 0.598424
[LightGBM] [Info] Iteration:24, valid_1 binary_logloss : 0.205907
[LightGBM] [Info] 371.797513 seconds elapsed, finished iteration 24
[LightGBM] [Info] Iteration:25, training auc : 0.650143
[LightGBM] [Info] Iteration:25, training binary_logloss : 0.20024
[LightGBM] [Info] Iteration:25, valid_1 auc : 0.598826
[LightGBM] [Info] Iteration:25, valid_1 binary_logloss : 0.205603
[LightGBM] [Info] 387.275883 seconds elapsed, finished iteration 25
[LightGBM] [Info] Iteration:26, training auc : 0.65054
[LightGBM] [Info] Iteration:26, training binary_logloss : 0.199956
[LightGBM] [Info] Iteration:26, valid_1 auc : 0.598897
[LightGBM] [Info] Iteration:26, valid_1 binary_logloss : 0.205413
[LightGBM] [Info] 402.682136 seconds elapsed, finished iteration 26
[LightGBM] [Info] Iteration:27, training auc : 0.65092
[LightGBM] [Info] Iteration:27, training binary_logloss : 0.199728
[LightGBM] [Info] Iteration:27, valid_1 auc : 0.599305
[LightGBM] [Info] Iteration:27, valid_1 binary_logloss : 0.205208
[LightGBM] [Info] 417.912536 seconds elapsed, finished iteration 27
[LightGBM] [Info] Iteration:28, training auc : 0.651323
[LightGBM] [Info] Iteration:28, training binary_logloss : 0.199547
[LightGBM] [Info] Iteration:28, valid_1 auc : 0.598938
[LightGBM] [Info] Iteration:28, valid_1 binary_logloss : 0.205169
[LightGBM] [Info] 432.780268 seconds elapsed, finished iteration 28
[LightGBM] [Info] Iteration:29, training auc : 0.6517
[LightGBM] [Info] Iteration:29, training binary_logloss : 0.199398
[LightGBM] [Info] Iteration:29, valid_1 auc : 0.598994
[LightGBM] [Info] Iteration:29, valid_1 binary_logloss : 0.205084
[LightGBM] [Info] 447.839873 seconds elapsed, finished iteration 29
[LightGBM] [Info] Iteration:30, training auc : 0.652009
[LightGBM] [Info] Iteration:30, training binary_logloss : 0.199282
[LightGBM] [Info] Iteration:30, valid_1 auc : 0.600093
[LightGBM] [Info] Iteration:30, valid_1 binary_logloss : 0.20494
[LightGBM] [Info] 462.424145 seconds elapsed, finished iteration 30
[LightGBM] [Info] Iteration:31, training auc : 0.652349
[LightGBM] [Info] Iteration:31, training binary_logloss : 0.199181
[LightGBM] [Info] Iteration:31, valid_1 auc : 0.60052
[LightGBM] [Info] Iteration:31, valid_1 binary_logloss : 0.204862
[LightGBM] [Info] 477.515094 seconds elapsed, finished iteration 31
[LightGBM] [Info] Iteration:32, training auc : 0.652752
[LightGBM] [Info] Iteration:32, training binary_logloss : 0.199093
[LightGBM] [Info] Iteration:32, valid_1 auc : 0.600189
[LightGBM] [Info] Iteration:32, valid_1 binary_logloss : 0.204839
[LightGBM] [Info] 491.764340 seconds elapsed, finished iteration 32
[LightGBM] [Info] Iteration:33, training auc : 0.652302
[LightGBM] [Info] Iteration:33, training binary_logloss : 0.20261
[LightGBM] [Info] Iteration:33, valid_1 auc : 0.598556
[LightGBM] [Info] Iteration:33, valid_1 binary_logloss : 0.311379
[LightGBM] [Info] 505.675917 seconds elapsed, finished iteration 33
[LightGBM] [Info] Iteration:34, training auc : 0.652913
[LightGBM] [Info] Iteration:34, training binary_logloss : 0.248508
[LightGBM] [Info] Iteration:34, valid_1 auc : 0.560975
[LightGBM] [Info] Iteration:34, valid_1 binary_logloss : 4.72078
[LightGBM] [Info] 520.202887 seconds elapsed, finished iteration 34
[LightGBM] [Info] Iteration:35, training auc : 0.652856
[LightGBM] [Info] Iteration:35, training binary_logloss : 0.205662
[LightGBM] [Info] Iteration:35, valid_1 auc : 0.587028
[LightGBM] [Info] Iteration:35, valid_1 binary_logloss : 0.919253
[LightGBM] [Info] 533.905366 seconds elapsed, finished iteration 35
[LightGBM] [Info] Iteration:36, training auc : 0.653241
[LightGBM] [Info] Iteration:36, training binary_logloss : 0.215984
[LightGBM] [Info] Iteration:36, valid_1 auc : 0.569643
[LightGBM] [Info] Iteration:36, valid_1 binary_logloss : 2.07491
[LightGBM] [Info] 547.665035 seconds elapsed, finished iteration 36
[LightGBM] [Info] Iteration:37, training auc : 0.653253
[LightGBM] [Info] Iteration:37, training binary_logloss : 0.204316
[LightGBM] [Info] Iteration:37, valid_1 auc : 0.58249
[LightGBM] [Info] Iteration:37, valid_1 binary_logloss : 0.973264
[LightGBM] [Info] 560.940371 seconds elapsed, finished iteration 37
[LightGBM] [Info] Iteration:38, training auc : 0.653542
[LightGBM] [Info] Iteration:38, training binary_logloss : 0.206349
[LightGBM] [Info] Iteration:38, valid_1 auc : 0.582613
[LightGBM] [Info] Iteration:38, valid_1 binary_logloss : 0.964895
[LightGBM] [Info] 573.998684 seconds elapsed, finished iteration 38
[LightGBM] [Info] Iteration:39, training auc : 0.65369
[LightGBM] [Info] Iteration:39, training binary_logloss : 0.203484
[LightGBM] [Info] Iteration:39, valid_1 auc : 0.578489
[LightGBM] [Info] Iteration:39, valid_1 binary_logloss : 0.891467
[LightGBM] [Info] 583.675609 seconds elapsed, finished iteration 39
[LightGBM] [Info] Iteration:40, training auc : 0.653783
[LightGBM] [Info] Iteration:40, training binary_logloss : 0.203007
[LightGBM] [Info] Iteration:40, valid_1 auc : 0.577776
[LightGBM] [Info] Iteration:40, valid_1 binary_logloss : 0.876479
[LightGBM] [Info] 595.351301 seconds elapsed, finished iteration 40
[LightGBM] [Info] Iteration:41, training auc : 0.653958
[LightGBM] [Info] Iteration:41, training binary_logloss : 0.202624
[LightGBM] [Info] Iteration:41, valid_1 auc : 0.578026
[LightGBM] [Info] Iteration:41, valid_1 binary_logloss : 0.86866
[LightGBM] [Info] 608.005046 seconds elapsed, finished iteration 41
[LightGBM] [Info] Iteration:42, training auc : 0.654072
[LightGBM] [Info] Iteration:42, training binary_logloss : 0.202587
[LightGBM] [Info] Iteration:42, valid_1 auc : 0.578649
[LightGBM] [Info] Iteration:42, valid_1 binary_logloss : 0.879572
[LightGBM] [Info] 620.043002 seconds elapsed, finished iteration 42
[LightGBM] [Info] Iteration:43, training auc : 0.654169
[LightGBM] [Info] Iteration:43, training binary_logloss : 0.20235
[LightGBM] [Info] Iteration:43, valid_1 auc : 0.578855
[LightGBM] [Info] Iteration:43, valid_1 binary_logloss : 0.865287
[LightGBM] [Info] 632.896208 seconds elapsed, finished iteration 43
[LightGBM] [Info] Iteration:44, training auc : 0.654303
[LightGBM] [Info] Iteration:44, training binary_logloss : 0.202263
[LightGBM] [Info] Iteration:44, valid_1 auc : 0.579001
[LightGBM] [Info] Iteration:44, valid_1 binary_logloss : 0.862844
[LightGBM] [Info] 644.372573 seconds elapsed, finished iteration 44
[LightGBM] [Info] Iteration:45, training auc : 0.654428
[LightGBM] [Info] Iteration:45, training binary_logloss : 0.202172
[LightGBM] [Info] Iteration:45, valid_1 auc : 0.579047
[LightGBM] [Info] Iteration:45, valid_1 binary_logloss : 0.865739
[LightGBM] [Info] 656.802910 seconds elapsed, finished iteration 45
[LightGBM] [Info] Iteration:46, training auc : 0.654555
[LightGBM] [Info] Iteration:46, training binary_logloss : 0.202127
[LightGBM] [Info] Iteration:46, valid_1 auc : 0.578316
[LightGBM] [Info] Iteration:46, valid_1 binary_logloss : 0.865127
[LightGBM] [Info] 668.618409 seconds elapsed, finished iteration 46
[LightGBM] [Info] Iteration:47, training auc : 0.654674
[LightGBM] [Info] Iteration:47, training binary_logloss : 0.202018
[LightGBM] [Info] Iteration:47, valid_1 auc : 0.57816
[LightGBM] [Info] Iteration:47, valid_1 binary_logloss : 0.860281
[LightGBM] [Info] 681.581704 seconds elapsed, finished iteration 47
[LightGBM] [Info] Iteration:48, training auc : 0.654772
[LightGBM] [Info] Iteration:48, training binary_logloss : 0.201938
[LightGBM] [Info] Iteration:48, valid_1 auc : 0.578685
[LightGBM] [Info] Iteration:48, valid_1 binary_logloss : 0.858109
[LightGBM] [Info] 693.678399 seconds elapsed, finished iteration 48
[LightGBM] [Info] Iteration:49, training auc : 0.654926
[LightGBM] [Info] Iteration:49, training binary_logloss : 0.201894
[LightGBM] [Info] Iteration:49, valid_1 auc : 0.578726
[LightGBM] [Info] Iteration:49, valid_1 binary_logloss : 0.855664
[LightGBM] [Info] 707.993747 seconds elapsed, finished iteration 49
[LightGBM] [Info] Iteration:50, training auc : 0.655056
[LightGBM] [Info] Iteration:50, training binary_logloss : 0.201929
[LightGBM] [Info] Iteration:50, valid_1 auc : 0.578263
[LightGBM] [Info] Iteration:50, valid_1 binary_logloss : 0.848492
[LightGBM] [Info] 720.386716 seconds elapsed, finished iteration 50
[LightGBM] [Info] Finished training
[LightGBM] [Info] Finished linking network in 451.248990 seconds

my main config file is:
max_bin = 255
num_trees = 50
learning_rate = 0.15
num_leaves = 512
min_data_in_leaf = 20
min_sum_hessian_in_leaf = 5.0

but in another try, have't convert file to libsvm, so is csv type, the result is bellow:
in iteration 8\13\15... is wrong.

[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Connected to rank 0
[LightGBM] [Info] Connected to rank 4
[LightGBM] [Info] Connected to rank 12
[LightGBM] [Info] Connected to rank 20
[LightGBM] [Info] Connected to rank 24
[LightGBM] [Info] Connected to rank 26
[LightGBM] [Info] Connected to rank 27
[LightGBM] [Info] Connected to rank 29
[LightGBM] [Info] Connected to rank 30
[LightGBM] [Info] Local rank: 28, total number of machines: 32
[LightGBM] [Info] Finished initializing network
[LightGBM] [Info] Finished loading data in 56.416021 seconds
[LightGBM] [Info] Number of positive: 548609, number of negative: 9904634
[LightGBM] [Info] Total Bins 6394
[LightGBM] [Info] Number of data: 10453243, number of used features: 30
[LightGBM] [Info] Finished initializing training
[LightGBM] [Info] Started training...
[LightGBM] [Info] Iteration:1, training auc : 0.636755
[LightGBM] [Info] Iteration:1, training binary_logloss : 0.581706
[LightGBM] [Info] Iteration:1, valid_1 auc : 0.637058
[LightGBM] [Info] Iteration:1, valid_1 binary_logloss : 0.581708
[LightGBM] [Info] 15.228212 seconds elapsed, finished iteration 1
[LightGBM] [Info] Iteration:2, training auc : 0.638412
[LightGBM] [Info] Iteration:2, training binary_logloss : 0.499452
[LightGBM] [Info] Iteration:2, valid_1 auc : 0.639413
[LightGBM] [Info] Iteration:2, valid_1 binary_logloss : 0.499468
[LightGBM] [Info] 26.888573 seconds elapsed, finished iteration 2
[LightGBM] [Info] Iteration:3, training auc : 0.639437
[LightGBM] [Info] Iteration:3, training binary_logloss : 0.436821
[LightGBM] [Info] Iteration:3, valid_1 auc : 0.639966
[LightGBM] [Info] Iteration:3, valid_1 binary_logloss : 0.436856
[LightGBM] [Info] 36.703565 seconds elapsed, finished iteration 3
[LightGBM] [Info] Iteration:4, training auc : 0.639716
[LightGBM] [Info] Iteration:4, training binary_logloss : 0.388159
[LightGBM] [Info] Iteration:4, valid_1 auc : 0.640177
[LightGBM] [Info] Iteration:4, valid_1 binary_logloss : 0.388217
[LightGBM] [Info] 53.074178 seconds elapsed, finished iteration 4
[LightGBM] [Info] Iteration:5, training auc : 0.640117
[LightGBM] [Info] Iteration:5, training binary_logloss : 0.349864
[LightGBM] [Info] Iteration:5, valid_1 auc : 0.640376
[LightGBM] [Info] Iteration:5, valid_1 binary_logloss : 0.349948
[LightGBM] [Info] 69.816746 seconds elapsed, finished iteration 5
[LightGBM] [Info] Iteration:6, training auc : 0.640414
[LightGBM] [Info] Iteration:6, training binary_logloss : 0.319823
[LightGBM] [Info] Iteration:6, valid_1 auc : 0.640679
[LightGBM] [Info] Iteration:6, valid_1 binary_logloss : 0.3198
[LightGBM] [Info] 85.731153 seconds elapsed, finished iteration 6
[LightGBM] [Info] Iteration:7, training auc : 0.640424
[LightGBM] [Info] Iteration:7, training binary_logloss : 0.296099
[LightGBM] [Info] Iteration:7, valid_1 auc : 0.640832
[LightGBM] [Info] Iteration:7, valid_1 binary_logloss : 0.296021
[LightGBM] [Info] 102.651455 seconds elapsed, finished iteration 7
[LightGBM] [Info] Iteration:8, training auc : 0.639805
[LightGBM] [Info] Iteration:8, training binary_logloss : 0.433302
[LightGBM] [Info] Iteration:8, valid_1 auc : 0.640649
[LightGBM] [Info] Iteration:8, valid_1 binary_logloss : 0.422675
[LightGBM] [Info] 118.712100 seconds elapsed, finished iteration 8
[LightGBM] [Info] Iteration:9, training auc : 0.640012
[LightGBM] [Info] Iteration:9, training binary_logloss : 0.275742
[LightGBM] [Info] Iteration:9, valid_1 auc : 0.640599
[LightGBM] [Info] Iteration:9, valid_1 binary_logloss : 0.276252
[LightGBM] [Info] 134.902375 seconds elapsed, finished iteration 9
[LightGBM] [Info] Iteration:10, training auc : 0.638982
[LightGBM] [Info] Iteration:10, training binary_logloss : 0.265793
[LightGBM] [Info] Iteration:10, valid_1 auc : 0.63964
[LightGBM] [Info] Iteration:10, valid_1 binary_logloss : 0.270109
[LightGBM] [Info] 149.630657 seconds elapsed, finished iteration 10
[LightGBM] [Info] Iteration:11, training auc : 0.637903
[LightGBM] [Info] Iteration:11, training binary_logloss : 0.280508
[LightGBM] [Info] Iteration:11, valid_1 auc : 0.638326
[LightGBM] [Info] Iteration:11, valid_1 binary_logloss : 0.284099
[LightGBM] [Info] 163.304901 seconds elapsed, finished iteration 11
[LightGBM] [Info] Iteration:12, training auc : 0.636909
[LightGBM] [Info] Iteration:12, training binary_logloss : 0.256094
[LightGBM] [Info] Iteration:12, valid_1 auc : 0.63685
[LightGBM] [Info] Iteration:12, valid_1 binary_logloss : 0.267214
[LightGBM] [Info] 177.223730 seconds elapsed, finished iteration 12
[LightGBM] [Info] Iteration:13, training auc : 0.633092
[LightGBM] [Info] Iteration:13, training binary_logloss : 0.545628
[LightGBM] [Info] Iteration:13, valid_1 auc : 0.632591
[LightGBM] [Info] Iteration:13, valid_1 binary_logloss : 0.555647
[LightGBM] [Info] 189.270357 seconds elapsed, finished iteration 13
[LightGBM] [Info] Iteration:14, training auc : 0.635494
[LightGBM] [Info] Iteration:14, training binary_logloss : 0.257544
[LightGBM] [Info] Iteration:14, valid_1 auc : 0.635549
[LightGBM] [Info] Iteration:14, valid_1 binary_logloss : 0.268337
[LightGBM] [Info] 197.877817 seconds elapsed, finished iteration 14
[LightGBM] [Info] Iteration:15, training auc : 0.57439
[LightGBM] [Info] Iteration:15, training binary_logloss : 4.87429
[LightGBM] [Info] Iteration:15, valid_1 auc : 0.573456
[LightGBM] [Info] Iteration:15, valid_1 binary_logloss : 4.87641
[LightGBM] [Info] 209.251408 seconds elapsed, finished iteration 15
[LightGBM] [Info] Iteration:16, training auc : 0.607715
[LightGBM] [Info] Iteration:16, training binary_logloss : 0.61587
[LightGBM] [Info] Iteration:16, valid_1 auc : 0.606947
[LightGBM] [Info] Iteration:16, valid_1 binary_logloss : 0.629086
[LightGBM] [Info] 223.567104 seconds elapsed, finished iteration 16
[LightGBM] [Info] Iteration:17, training auc : 0.614102
[LightGBM] [Info] Iteration:17, training binary_logloss : 0.423602
[LightGBM] [Info] Iteration:17, valid_1 auc : 0.612788
[LightGBM] [Info] Iteration:17, valid_1 binary_logloss : 0.448656
[LightGBM] [Info] 243.713571 seconds elapsed, finished iteration 17
[LightGBM] [Info] Iteration:18, training auc : 0.617443
[LightGBM] [Info] Iteration:18, training binary_logloss : 0.404427
[LightGBM] [Info] Iteration:18, valid_1 auc : 0.615891
[LightGBM] [Info] Iteration:18, valid_1 binary_logloss : 0.414532
[LightGBM] [Info] 258.406210 seconds elapsed, finished iteration 18
[LightGBM] [Info] Iteration:19, training auc : 0.610042
[LightGBM] [Info] Iteration:19, training binary_logloss : 0.764775
[LightGBM] [Info] Iteration:19, valid_1 auc : 0.608001
[LightGBM] [Info] Iteration:19, valid_1 binary_logloss : 0.796847
[LightGBM] [Info] 272.679645 seconds elapsed, finished iteration 19
[LightGBM] [Info] Iteration:20, training auc : 0.618313
[LightGBM] [Info] Iteration:20, training binary_logloss : 0.322801
[LightGBM] [Info] Iteration:20, valid_1 auc : 0.61735
[LightGBM] [Info] Iteration:20, valid_1 binary_logloss : 0.34846
[LightGBM] [Info] 287.713249 seconds elapsed, finished iteration 20
[LightGBM] [Info] Iteration:21, training auc : 0.616046
[LightGBM] [Info] Iteration:21, training binary_logloss : 0.330975
[LightGBM] [Info] Iteration:21, valid_1 auc : 0.615206
[LightGBM] [Info] Iteration:21, valid_1 binary_logloss : 0.345699
[LightGBM] [Info] 302.044922 seconds elapsed, finished iteration 21
[LightGBM] [Info] Iteration:22, training auc : 0.604945
[LightGBM] [Info] Iteration:22, training binary_logloss : 2.66244
[LightGBM] [Info] Iteration:22, valid_1 auc : 0.606893
[LightGBM] [Info] Iteration:22, valid_1 binary_logloss : 2.58755
[LightGBM] [Info] 315.380659 seconds elapsed, finished iteration 22
[LightGBM] [Info] Iteration:23, training auc : 0.616981
[LightGBM] [Info] Iteration:23, training binary_logloss : 0.336004
[LightGBM] [Info] Iteration:23, valid_1 auc : 0.61637
[LightGBM] [Info] Iteration:23, valid_1 binary_logloss : 0.360879
[LightGBM] [Info] 328.926082 seconds elapsed, finished iteration 23
[LightGBM] [Info] Iteration:24, training auc : 0.613611
[LightGBM] [Info] Iteration:24, training binary_logloss : 0.409137
[LightGBM] [Info] Iteration:24, valid_1 auc : 0.612209
[LightGBM] [Info] Iteration:24, valid_1 binary_logloss : 0.438421
[LightGBM] [Info] 342.807660 seconds elapsed, finished iteration 24
[LightGBM] [Info] Iteration:25, training auc : 0.612375
[LightGBM] [Info] Iteration:25, training binary_logloss : 0.342386
[LightGBM] [Info] Iteration:25, valid_1 auc : 0.611598
[LightGBM] [Info] Iteration:25, valid_1 binary_logloss : 0.370553
[LightGBM] [Info] 356.196761 seconds elapsed, finished iteration 25
[LightGBM] [Info] Iteration:26, training auc : 0.609432
[LightGBM] [Info] Iteration:26, training binary_logloss : 0.536469
[LightGBM] [Info] Iteration:26, valid_1 auc : 0.609151
[LightGBM] [Info] Iteration:26, valid_1 binary_logloss : 0.544729
[LightGBM] [Info] 366.099180 seconds elapsed, finished iteration 26
[LightGBM] [Info] Iteration:27, training auc : 0.612453
[LightGBM] [Info] Iteration:27, training binary_logloss : 0.455057
[LightGBM] [Info] Iteration:27, valid_1 auc : 0.61223
[LightGBM] [Info] Iteration:27, valid_1 binary_logloss : 0.476184
[LightGBM] [Info] 373.665936 seconds elapsed, finished iteration 27
[LightGBM] [Info] Iteration:28, training auc : 0.614532
[LightGBM] [Info] Iteration:28, training binary_logloss : 0.352838
[LightGBM] [Info] Iteration:28, valid_1 auc : 0.614253
[LightGBM] [Info] Iteration:28, valid_1 binary_logloss : 0.386615
[LightGBM] [Info] 387.530127 seconds elapsed, finished iteration 28
[LightGBM] [Info] Iteration:29, training auc : 0.615234
[LightGBM] [Info] Iteration:29, training binary_logloss : 0.368714
[LightGBM] [Info] Iteration:29, valid_1 auc : 0.614201
[LightGBM] [Info] Iteration:29, valid_1 binary_logloss : 0.388137
[LightGBM] [Info] 401.653027 seconds elapsed, finished iteration 29
[LightGBM] [Info] Iteration:30, training auc : 0.613193
[LightGBM] [Info] Iteration:30, training binary_logloss : 0.464328
[LightGBM] [Info] Iteration:30, valid_1 auc : 0.611622
[LightGBM] [Info] Iteration:30, valid_1 binary_logloss : 0.48618
[LightGBM] [Info] 416.400655 seconds elapsed, finished iteration 30
[LightGBM] [Info] Iteration:31, training auc : 0.616344
[LightGBM] [Info] Iteration:31, training binary_logloss : 0.436104
[LightGBM] [Info] Iteration:31, valid_1 auc : 0.615547
[LightGBM] [Info] Iteration:31, valid_1 binary_logloss : 0.456417
[LightGBM] [Info] 430.805055 seconds elapsed, finished iteration 31
[LightGBM] [Info] Iteration:32, training auc : 0.616284
[LightGBM] [Info] Iteration:32, training binary_logloss : 0.376057
[LightGBM] [Info] Iteration:32, valid_1 auc : 0.61496
[LightGBM] [Info] Iteration:32, valid_1 binary_logloss : 0.405443
[LightGBM] [Info] 445.789655 seconds elapsed, finished iteration 32
[LightGBM] [Info] Iteration:33, training auc : 0.616671
[LightGBM] [Info] Iteration:33, training binary_logloss : 0.379022
[LightGBM] [Info] Iteration:33, valid_1 auc : 0.615095
[LightGBM] [Info] Iteration:33, valid_1 binary_logloss : 0.415707
[LightGBM] [Info] 460.147654 seconds elapsed, finished iteration 33
[LightGBM] [Info] Iteration:34, training auc : 0.616807
[LightGBM] [Info] Iteration:34, training binary_logloss : 0.431626
[LightGBM] [Info] Iteration:34, valid_1 auc : 0.615233
[LightGBM] [Info] Iteration:34, valid_1 binary_logloss : 0.459229
[LightGBM] [Info] 474.243885 seconds elapsed, finished iteration 34
[LightGBM] [Info] Iteration:35, training auc : 0.618088
[LightGBM] [Info] Iteration:35, training binary_logloss : 0.369744
[LightGBM] [Info] Iteration:35, valid_1 auc : 0.617057
[LightGBM] [Info] Iteration:35, valid_1 binary_logloss : 0.405938
[LightGBM] [Info] 488.095396 seconds elapsed, finished iteration 35
[LightGBM] [Info] Iteration:36, training auc : 0.614569
[LightGBM] [Info] Iteration:36, training binary_logloss : 0.610493
[LightGBM] [Info] Iteration:36, valid_1 auc : 0.61274
[LightGBM] [Info] Iteration:36, valid_1 binary_logloss : 0.645807
[LightGBM] [Info] 501.402627 seconds elapsed, finished iteration 36
[LightGBM] [Info] Iteration:37, training auc : 0.61735
[LightGBM] [Info] Iteration:37, training binary_logloss : 0.391286
[LightGBM] [Info] Iteration:37, valid_1 auc : 0.615989
[LightGBM] [Info] Iteration:37, valid_1 binary_logloss : 0.427038
[LightGBM] [Info] 514.286187 seconds elapsed, finished iteration 37
[LightGBM] [Info] Iteration:38, training auc : 0.617333
[LightGBM] [Info] Iteration:38, training binary_logloss : 0.409932
[LightGBM] [Info] Iteration:38, valid_1 auc : 0.616493
[LightGBM] [Info] Iteration:38, valid_1 binary_logloss : 0.444093
[LightGBM] [Info] 527.169151 seconds elapsed, finished iteration 38
[LightGBM] [Info] Iteration:39, training auc : 0.617365
[LightGBM] [Info] Iteration:39, training binary_logloss : 0.446886
[LightGBM] [Info] Iteration:39, valid_1 auc : 0.615744
[LightGBM] [Info] Iteration:39, valid_1 binary_logloss : 0.494887
[LightGBM] [Info] 538.976373 seconds elapsed, finished iteration 39
[LightGBM] [Info] Iteration:40, training auc : 0.618668
[LightGBM] [Info] Iteration:40, training binary_logloss : 0.391227
[LightGBM] [Info] Iteration:40, valid_1 auc : 0.617549
[LightGBM] [Info] Iteration:40, valid_1 binary_logloss : 0.436368
[LightGBM] [Info] 547.428825 seconds elapsed, finished iteration 40
[LightGBM] [Info] Iteration:41, training auc : 0.618278
[LightGBM] [Info] Iteration:41, training binary_logloss : 0.438021
[LightGBM] [Info] Iteration:41, valid_1 auc : 0.61703
[LightGBM] [Info] Iteration:41, valid_1 binary_logloss : 0.480443
[LightGBM] [Info] 555.123679 seconds elapsed, finished iteration 41
[LightGBM] [Info] Iteration:42, training auc : 0.618589
[LightGBM] [Info] Iteration:42, training binary_logloss : 0.406052
[LightGBM] [Info] Iteration:42, valid_1 auc : 0.617575
[LightGBM] [Info] Iteration:42, valid_1 binary_logloss : 0.450016
[LightGBM] [Info] 568.877330 seconds elapsed, finished iteration 42
[LightGBM] [Info] Iteration:43, training auc : 0.61807
[LightGBM] [Info] Iteration:43, training binary_logloss : 0.448002
[LightGBM] [Info] Iteration:43, valid_1 auc : 0.617246
[LightGBM] [Info] Iteration:43, valid_1 binary_logloss : 0.483107
[LightGBM] [Info] 582.287893 seconds elapsed, finished iteration 43
[LightGBM] [Info] Iteration:44, training auc : 0.61871
[LightGBM] [Info] Iteration:44, training binary_logloss : 0.410265
[LightGBM] [Info] Iteration:44, valid_1 auc : 0.617669
[LightGBM] [Info] Iteration:44, valid_1 binary_logloss : 0.454359
[LightGBM] [Info] 596.273716 seconds elapsed, finished iteration 44
[LightGBM] [Info] Iteration:45, training auc : 0.617858
[LightGBM] [Info] Iteration:45, training binary_logloss : 0.453907
[LightGBM] [Info] Iteration:45, valid_1 auc : 0.61685
[LightGBM] [Info] Iteration:45, valid_1 binary_logloss : 0.489429
[LightGBM] [Info] 609.287105 seconds elapsed, finished iteration 45
[LightGBM] [Info] Iteration:46, training auc : 0.618452
[LightGBM] [Info] Iteration:46, training binary_logloss : 0.442424
[LightGBM] [Info] Iteration:46, valid_1 auc : 0.617579
[LightGBM] [Info] Iteration:46, valid_1 binary_logloss : 0.487869
[LightGBM] [Info] 623.293668 seconds elapsed, finished iteration 46
[LightGBM] [Info] Iteration:47, training auc : 0.618644
[LightGBM] [Info] Iteration:47, training binary_logloss : 0.44811
[LightGBM] [Info] Iteration:47, valid_1 auc : 0.617072
[LightGBM] [Info] Iteration:47, valid_1 binary_logloss : 0.498942
[LightGBM] [Info] 636.976167 seconds elapsed, finished iteration 47
[LightGBM] [Info] Iteration:48, training auc : 0.619219
[LightGBM] [Info] Iteration:48, training binary_logloss : 0.406706
[LightGBM] [Info] Iteration:48, valid_1 auc : 0.617647
[LightGBM] [Info] Iteration:48, valid_1 binary_logloss : 0.459565
[LightGBM] [Info] 650.636330 seconds elapsed, finished iteration 48
[LightGBM] [Info] Iteration:49, training auc : 0.619208
[LightGBM] [Info] Iteration:49, training binary_logloss : 0.407468
[LightGBM] [Info] Iteration:49, valid_1 auc : 0.618007
[LightGBM] [Info] Iteration:49, valid_1 binary_logloss : 0.455215
[LightGBM] [Info] 663.581277 seconds elapsed, finished iteration 49
[LightGBM] [Info] Iteration:50, training auc : 0.618639
[LightGBM] [Info] Iteration:50, training binary_logloss : 0.436215
[LightGBM] [Info] Iteration:50, valid_1 auc : 0.617638
[LightGBM] [Info] Iteration:50, valid_1 binary_logloss : 0.476904
[LightGBM] [Info] 674.808627 seconds elapsed, finished iteration 50
[LightGBM] [Info] Finished training
[LightGBM] [Info] Finished linking network in 366.842133 seconds

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 11, 2017

in the last try, i set categorical_column = 0,16,17,18,20,24

@guolinke
Copy link
Collaborator

@qrqpjxq it seems there is a bug when #data is large.
Can you try:

  1. without set categorical_column ?
  2. use 2 workers to reproduce this ?

@guolinke
Copy link
Collaborator

@qrqpjxq
it also possible caused by the network communication error.
To avoid this, can you try to run multi-process in single machine to simulate multi-worker ?

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 12, 2017

i tried train in 1worker(1000w data)/4workers(1200w data)/8workers(600w data)
it seems that something wrong when workers is 8, in 1 and 4 it run's ok, but in 8 there is wrong in iter-26

one worker is about similar as 4workers
4workers:
[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Connected to rank 1
[LightGBM] [Info] Connected to rank 2
[LightGBM] [Info] Connected to rank 3
[LightGBM] [Info] Local rank: 0, total number of machines: 4
[LightGBM] [Info] Finished initializing network
[LightGBM] [Info] Finished loading data in 63.777020 seconds
[LightGBM] [Info] Number of positive: 653166, number of negative: 11735113
[LightGBM] [Info] Total Bins 6391
[LightGBM] [Info] Number of data: 12388279, number of used features: 30
[LightGBM] [Info] Finished initializing training
[LightGBM] [Info] Started training...
[LightGBM] [Info] Iteration:1, training auc : 0.637709
[LightGBM] [Info] Iteration:1, training binary_logloss : 0.487958
[LightGBM] [Info] Iteration:1, valid_1 auc : 0.637639
[LightGBM] [Info] Iteration:1, valid_1 binary_logloss : 0.487958
[LightGBM] [Info] 7.542203 seconds elapsed, finished iteration 1
[LightGBM] [Info] Iteration:2, training auc : 0.640096
[LightGBM] [Info] Iteration:2, training binary_logloss : 0.37709
[LightGBM] [Info] Iteration:2, valid_1 auc : 0.640415
[LightGBM] [Info] Iteration:2, valid_1 binary_logloss : 0.377072
[LightGBM] [Info] 14.484031 seconds elapsed, finished iteration 2
[LightGBM] [Info] Iteration:3, training auc : 0.641061
[LightGBM] [Info] Iteration:3, training binary_logloss : 0.31054
[LightGBM] [Info] Iteration:3, valid_1 auc : 0.641501
[LightGBM] [Info] Iteration:3, valid_1 binary_logloss : 0.310502
[LightGBM] [Info] 22.188953 seconds elapsed, finished iteration 3
[LightGBM] [Info] Iteration:4, training auc : 0.641873
[LightGBM] [Info] Iteration:4, training binary_logloss : 0.268946
[LightGBM] [Info] Iteration:4, valid_1 auc : 0.642173
[LightGBM] [Info] Iteration:4, valid_1 binary_logloss : 0.268878
[LightGBM] [Info] 29.352242 seconds elapsed, finished iteration 4
[LightGBM] [Info] Iteration:5, training auc : 0.642319
[LightGBM] [Info] Iteration:5, training binary_logloss : 0.242592
[LightGBM] [Info] Iteration:5, valid_1 auc : 0.643142
[LightGBM] [Info] Iteration:5, valid_1 binary_logloss : 0.242491
[LightGBM] [Info] 36.749060 seconds elapsed, finished iteration 5
[LightGBM] [Info] Iteration:6, training auc : 0.642886
[LightGBM] [Info] Iteration:6, training binary_logloss : 0.225896
[LightGBM] [Info] Iteration:6, valid_1 auc : 0.643812
[LightGBM] [Info] Iteration:6, valid_1 binary_logloss : 0.225762
[LightGBM] [Info] 44.237083 seconds elapsed, finished iteration 6
[LightGBM] [Info] Iteration:7, training auc : 0.643469
[LightGBM] [Info] Iteration:7, training binary_logloss : 0.21542
[LightGBM] [Info] Iteration:7, valid_1 auc : 0.644446
[LightGBM] [Info] Iteration:7, valid_1 binary_logloss : 0.215253
[LightGBM] [Info] 51.534078 seconds elapsed, finished iteration 7
[LightGBM] [Info] Iteration:8, training auc : 0.644161
[LightGBM] [Info] Iteration:8, training binary_logloss : 0.208931
[LightGBM] [Info] Iteration:8, valid_1 auc : 0.645214
[LightGBM] [Info] Iteration:8, valid_1 binary_logloss : 0.208742
[LightGBM] [Info] 59.132044 seconds elapsed, finished iteration 8
[LightGBM] [Info] Iteration:9, training auc : 0.644996
[LightGBM] [Info] Iteration:9, training binary_logloss : 0.204975
[LightGBM] [Info] Iteration:9, valid_1 auc : 0.645788
[LightGBM] [Info] Iteration:9, valid_1 binary_logloss : 0.20477
[LightGBM] [Info] 66.565206 seconds elapsed, finished iteration 9
[LightGBM] [Info] Iteration:10, training auc : 0.645763
[LightGBM] [Info] Iteration:10, training binary_logloss : 0.202602
[LightGBM] [Info] Iteration:10, valid_1 auc : 0.646974
[LightGBM] [Info] Iteration:10, valid_1 binary_logloss : 0.202364
[LightGBM] [Info] 74.383079 seconds elapsed, finished iteration 10
[LightGBM] [Info] Iteration:11, training auc : 0.646626
[LightGBM] [Info] Iteration:11, training binary_logloss : 0.201184
[LightGBM] [Info] Iteration:11, valid_1 auc : 0.648053
[LightGBM] [Info] Iteration:11, valid_1 binary_logloss : 0.200906
[LightGBM] [Info] 81.658939 seconds elapsed, finished iteration 11
[LightGBM] [Info] Iteration:12, training auc : 0.647585
[LightGBM] [Info] Iteration:12, training binary_logloss : 0.200322
[LightGBM] [Info] Iteration:12, valid_1 auc : 0.649096
[LightGBM] [Info] Iteration:12, valid_1 binary_logloss : 0.200027
[LightGBM] [Info] 89.207093 seconds elapsed, finished iteration 12
[LightGBM] [Info] Iteration:13, training auc : 0.64844
[LightGBM] [Info] Iteration:13, training binary_logloss : 0.199804
[LightGBM] [Info] Iteration:13, valid_1 auc : 0.650173
[LightGBM] [Info] Iteration:13, valid_1 binary_logloss : 0.199482
[LightGBM] [Info] 96.425061 seconds elapsed, finished iteration 13
[LightGBM] [Info] Iteration:14, training auc : 0.649337
[LightGBM] [Info] Iteration:14, training binary_logloss : 0.199474
[LightGBM] [Info] Iteration:14, valid_1 auc : 0.651276
[LightGBM] [Info] Iteration:14, valid_1 binary_logloss : 0.199122
[LightGBM] [Info] 103.462266 seconds elapsed, finished iteration 14
[LightGBM] [Info] Iteration:15, training auc : 0.650275
[LightGBM] [Info] Iteration:15, training binary_logloss : 0.199247
[LightGBM] [Info] Iteration:15, valid_1 auc : 0.652378
[LightGBM] [Info] Iteration:15, valid_1 binary_logloss : 0.198859
[LightGBM] [Info] 110.711991 seconds elapsed, finished iteration 15
[LightGBM] [Info] Iteration:16, training auc : 0.651156
[LightGBM] [Info] Iteration:16, training binary_logloss : 0.199076
[LightGBM] [Info] Iteration:16, valid_1 auc : 0.653231
[LightGBM] [Info] Iteration:16, valid_1 binary_logloss : 0.198687
[LightGBM] [Info] 117.742145 seconds elapsed, finished iteration 16
[LightGBM] [Info] Iteration:17, training auc : 0.652072
[LightGBM] [Info] Iteration:17, training binary_logloss : 0.198938
[LightGBM] [Info] Iteration:17, valid_1 auc : 0.654364
[LightGBM] [Info] Iteration:17, valid_1 binary_logloss : 0.198526
[LightGBM] [Info] 124.818825 seconds elapsed, finished iteration 17
[LightGBM] [Info] Iteration:18, training auc : 0.652928
[LightGBM] [Info] Iteration:18, training binary_logloss : 0.198825
[LightGBM] [Info] Iteration:18, valid_1 auc : 0.655289
[LightGBM] [Info] Iteration:18, valid_1 binary_logloss : 0.198412
[LightGBM] [Info] 132.056159 seconds elapsed, finished iteration 18
[LightGBM] [Info] Iteration:19, training auc : 0.653652
[LightGBM] [Info] Iteration:19, training binary_logloss : 0.198726
[LightGBM] [Info] Iteration:19, valid_1 auc : 0.656
[LightGBM] [Info] Iteration:19, valid_1 binary_logloss : 0.19831
[LightGBM] [Info] 138.921986 seconds elapsed, finished iteration 19
[LightGBM] [Info] Iteration:20, training auc : 0.654477
[LightGBM] [Info] Iteration:20, training binary_logloss : 0.198631
[LightGBM] [Info] Iteration:20, valid_1 auc : 0.656894
[LightGBM] [Info] Iteration:20, valid_1 binary_logloss : 0.198204
[LightGBM] [Info] 145.823133 seconds elapsed, finished iteration 20
[LightGBM] [Info] Iteration:21, training auc : 0.655195
[LightGBM] [Info] Iteration:21, training binary_logloss : 0.198548
[LightGBM] [Info] Iteration:21, valid_1 auc : 0.657484
[LightGBM] [Info] Iteration:21, valid_1 binary_logloss : 0.198133
[LightGBM] [Info] 152.871142 seconds elapsed, finished iteration 21
[LightGBM] [Info] Iteration:22, training auc : 0.655765
[LightGBM] [Info] Iteration:22, training binary_logloss : 0.19848
[LightGBM] [Info] Iteration:22, valid_1 auc : 0.658224
[LightGBM] [Info] Iteration:22, valid_1 binary_logloss : 0.198047
[LightGBM] [Info] 159.426117 seconds elapsed, finished iteration 22
[LightGBM] [Info] Iteration:23, training auc : 0.656371
[LightGBM] [Info] Iteration:23, training binary_logloss : 0.198411
[LightGBM] [Info] Iteration:23, valid_1 auc : 0.658967
[LightGBM] [Info] Iteration:23, valid_1 binary_logloss : 0.197969
[LightGBM] [Info] 165.781950 seconds elapsed, finished iteration 23
[LightGBM] [Info] Iteration:24, training auc : 0.656907
[LightGBM] [Info] Iteration:24, training binary_logloss : 0.198348
[LightGBM] [Info] Iteration:24, valid_1 auc : 0.65954
[LightGBM] [Info] Iteration:24, valid_1 binary_logloss : 0.197901
[LightGBM] [Info] 172.252315 seconds elapsed, finished iteration 24
[LightGBM] [Info] Iteration:25, training auc : 0.657435
[LightGBM] [Info] Iteration:25, training binary_logloss : 0.198287
[LightGBM] [Info] Iteration:25, valid_1 auc : 0.660006
[LightGBM] [Info] Iteration:25, valid_1 binary_logloss : 0.19784
[LightGBM] [Info] 178.896138 seconds elapsed, finished iteration 25
[LightGBM] [Info] Iteration:26, training auc : 0.657913
[LightGBM] [Info] Iteration:26, training binary_logloss : 0.198232
[LightGBM] [Info] Iteration:26, valid_1 auc : 0.660642
[LightGBM] [Info] Iteration:26, valid_1 binary_logloss : 0.197766
[LightGBM] [Info] 185.422038 seconds elapsed, finished iteration 26
[LightGBM] [Info] Iteration:27, training auc : 0.658421
[LightGBM] [Info] Iteration:27, training binary_logloss : 0.198169
[LightGBM] [Info] Iteration:27, valid_1 auc : 0.661136
[LightGBM] [Info] Iteration:27, valid_1 binary_logloss : 0.197708
[LightGBM] [Info] 191.830128 seconds elapsed, finished iteration 27
[LightGBM] [Info] Iteration:28, training auc : 0.658885
[LightGBM] [Info] Iteration:28, training binary_logloss : 0.198117
[LightGBM] [Info] Iteration:28, valid_1 auc : 0.661632
[LightGBM] [Info] Iteration:28, valid_1 binary_logloss : 0.197656
[LightGBM] [Info] 198.036167 seconds elapsed, finished iteration 28
[LightGBM] [Info] Iteration:29, training auc : 0.659342
[LightGBM] [Info] Iteration:29, training binary_logloss : 0.198061
[LightGBM] [Info] Iteration:29, valid_1 auc : 0.662101
[LightGBM] [Info] Iteration:29, valid_1 binary_logloss : 0.197599
[LightGBM] [Info] 204.196012 seconds elapsed, finished iteration 29
[LightGBM] [Info] Iteration:30, training auc : 0.65971
[LightGBM] [Info] Iteration:30, training binary_logloss : 0.198017
[LightGBM] [Info] Iteration:30, valid_1 auc : 0.66239
[LightGBM] [Info] Iteration:30, valid_1 binary_logloss : 0.197553
[LightGBM] [Info] 210.099234 seconds elapsed, finished iteration 30
[LightGBM] [Info] Iteration:31, training auc : 0.660125
[LightGBM] [Info] Iteration:31, training binary_logloss : 0.197967
[LightGBM] [Info] Iteration:31, valid_1 auc : 0.662804
[LightGBM] [Info] Iteration:31, valid_1 binary_logloss : 0.197504
[LightGBM] [Info] 216.525299 seconds elapsed, finished iteration 31
[LightGBM] [Info] Iteration:32, training auc : 0.66053
[LightGBM] [Info] Iteration:32, training binary_logloss : 0.197921
[LightGBM] [Info] Iteration:32, valid_1 auc : 0.663326
[LightGBM] [Info] Iteration:32, valid_1 binary_logloss : 0.197447
[LightGBM] [Info] 223.041215 seconds elapsed, finished iteration 32
[LightGBM] [Info] Iteration:33, training auc : 0.660911
[LightGBM] [Info] Iteration:33, training binary_logloss : 0.197877
[LightGBM] [Info] Iteration:33, valid_1 auc : 0.663656
[LightGBM] [Info] Iteration:33, valid_1 binary_logloss : 0.197403
[LightGBM] [Info] 228.927140 seconds elapsed, finished iteration 33
[LightGBM] [Info] Iteration:34, training auc : 0.661282
[LightGBM] [Info] Iteration:34, training binary_logloss : 0.197833
[LightGBM] [Info] Iteration:34, valid_1 auc : 0.664037
[LightGBM] [Info] Iteration:34, valid_1 binary_logloss : 0.197356
[LightGBM] [Info] 235.177382 seconds elapsed, finished iteration 34
[LightGBM] [Info] Iteration:35, training auc : 0.661687
[LightGBM] [Info] Iteration:35, training binary_logloss : 0.197788
[LightGBM] [Info] Iteration:35, valid_1 auc : 0.664296
[LightGBM] [Info] Iteration:35, valid_1 binary_logloss : 0.197317
[LightGBM] [Info] 241.507157 seconds elapsed, finished iteration 35
[LightGBM] [Info] Iteration:36, training auc : 0.661996
[LightGBM] [Info] Iteration:36, training binary_logloss : 0.19775
[LightGBM] [Info] Iteration:36, valid_1 auc : 0.664596
[LightGBM] [Info] Iteration:36, valid_1 binary_logloss : 0.197283
[LightGBM] [Info] 246.718472 seconds elapsed, finished iteration 36
[LightGBM] [Info] Iteration:37, training auc : 0.662362
[LightGBM] [Info] Iteration:37, training binary_logloss : 0.197711
[LightGBM] [Info] Iteration:37, valid_1 auc : 0.664916
[LightGBM] [Info] Iteration:37, valid_1 binary_logloss : 0.197246
[LightGBM] [Info] 252.056184 seconds elapsed, finished iteration 37
[LightGBM] [Info] Iteration:38, training auc : 0.662654
[LightGBM] [Info] Iteration:38, training binary_logloss : 0.197675
[LightGBM] [Info] Iteration:38, valid_1 auc : 0.665167
[LightGBM] [Info] Iteration:38, valid_1 binary_logloss : 0.197212
[LightGBM] [Info] 256.976274 seconds elapsed, finished iteration 38
[LightGBM] [Info] Iteration:39, training auc : 0.662967
[LightGBM] [Info] Iteration:39, training binary_logloss : 0.197636
[LightGBM] [Info] Iteration:39, valid_1 auc : 0.665452
[LightGBM] [Info] Iteration:39, valid_1 binary_logloss : 0.197181
[LightGBM] [Info] 262.168304 seconds elapsed, finished iteration 39
[LightGBM] [Info] Iteration:40, training auc : 0.663276
[LightGBM] [Info] Iteration:40, training binary_logloss : 0.1976
[LightGBM] [Info] Iteration:40, valid_1 auc : 0.665911
[LightGBM] [Info] Iteration:40, valid_1 binary_logloss : 0.197126
[LightGBM] [Info] 266.822548 seconds elapsed, finished iteration 40
[LightGBM] [Info] Iteration:41, training auc : 0.663496
[LightGBM] [Info] Iteration:41, training binary_logloss : 0.197567
[LightGBM] [Info] Iteration:41, valid_1 auc : 0.666007
[LightGBM] [Info] Iteration:41, valid_1 binary_logloss : 0.197112
[LightGBM] [Info] 270.786438 seconds elapsed, finished iteration 41
[LightGBM] [Info] Iteration:42, training auc : 0.663714
[LightGBM] [Info] Iteration:42, training binary_logloss : 0.197539
[LightGBM] [Info] Iteration:42, valid_1 auc : 0.666241
[LightGBM] [Info] Iteration:42, valid_1 binary_logloss : 0.197075
[LightGBM] [Info] 274.413344 seconds elapsed, finished iteration 42
[LightGBM] [Info] Iteration:43, training auc : 0.66397
[LightGBM] [Info] Iteration:43, training binary_logloss : 0.197506
[LightGBM] [Info] Iteration:43, valid_1 auc : 0.666532
[LightGBM] [Info] Iteration:43, valid_1 binary_logloss : 0.197037
[LightGBM] [Info] 279.339110 seconds elapsed, finished iteration 43
[LightGBM] [Info] Iteration:44, training auc : 0.664231
[LightGBM] [Info] Iteration:44, training binary_logloss : 0.19747
[LightGBM] [Info] Iteration:44, valid_1 auc : 0.666733
[LightGBM] [Info] Iteration:44, valid_1 binary_logloss : 0.197009
[LightGBM] [Info] 284.062093 seconds elapsed, finished iteration 44
[LightGBM] [Info] Iteration:45, training auc : 0.664457
[LightGBM] [Info] Iteration:45, training binary_logloss : 0.197445
[LightGBM] [Info] Iteration:45, valid_1 auc : 0.666942
[LightGBM] [Info] Iteration:45, valid_1 binary_logloss : 0.196985
[LightGBM] [Info] 287.968214 seconds elapsed, finished iteration 45
[LightGBM] [Info] Iteration:46, training auc : 0.664657
[LightGBM] [Info] Iteration:46, training binary_logloss : 0.197418
[LightGBM] [Info] Iteration:46, valid_1 auc : 0.667022
[LightGBM] [Info] Iteration:46, valid_1 binary_logloss : 0.19697
[LightGBM] [Info] 291.291058 seconds elapsed, finished iteration 46
[LightGBM] [Info] Iteration:47, training auc : 0.664963
[LightGBM] [Info] Iteration:47, training binary_logloss : 0.197382
[LightGBM] [Info] Iteration:47, valid_1 auc : 0.667284
[LightGBM] [Info] Iteration:47, valid_1 binary_logloss : 0.196937
[LightGBM] [Info] 296.481977 seconds elapsed, finished iteration 47
[LightGBM] [Info] Iteration:48, training auc : 0.665155
[LightGBM] [Info] Iteration:48, training binary_logloss : 0.197351
[LightGBM] [Info] Iteration:48, valid_1 auc : 0.667473
[LightGBM] [Info] Iteration:48, valid_1 binary_logloss : 0.196903
[LightGBM] [Info] 300.243954 seconds elapsed, finished iteration 48
[LightGBM] [Info] Iteration:49, training auc : 0.665357
[LightGBM] [Info] Iteration:49, training binary_logloss : 0.197322
[LightGBM] [Info] Iteration:49, valid_1 auc : 0.66755
[LightGBM] [Info] Iteration:49, valid_1 binary_logloss : 0.196881
[LightGBM] [Info] 304.574909 seconds elapsed, finished iteration 49
[LightGBM] [Info] Iteration:50, training auc : 0.665628
[LightGBM] [Info] Iteration:50, training binary_logloss : 0.197292
[LightGBM] [Info] Iteration:50, valid_1 auc : 0.667881
[LightGBM] [Info] Iteration:50, valid_1 binary_logloss : 0.196845
[LightGBM] [Info] 308.108141 seconds elapsed, finished iteration 50
[LightGBM] [Info] Finished training
[LightGBM] [Info] Finished linking network in 44.177320 seconds

8workers:
[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Connected to rank 1
[LightGBM] [Info] Connected to rank 3
[LightGBM] [Info] Connected to rank 4
[LightGBM] [Info] Connected to rank 6
[LightGBM] [Info] Connected to rank 7
[LightGBM] [Info] Local rank: 5, total number of machines: 8
[LightGBM] [Info] Finished initializing network
[LightGBM] [Info] Finished loading data in 139.742603 seconds
[LightGBM] [Info] Number of positive: 326296, number of negative: 5867774
[LightGBM] [Info] Total Bins 6391
[LightGBM] [Info] Number of data: 6194070, number of used features: 30
[LightGBM] [Info] Finished initializing training
[LightGBM] [Info] Started training...
[LightGBM] [Info] Iteration:1, training auc : 0.637814
[LightGBM] [Info] Iteration:1, training binary_logloss : 0.487936
[LightGBM] [Info] Iteration:1, valid_1 auc : 0.638043
[LightGBM] [Info] Iteration:1, valid_1 binary_logloss : 0.487951
[LightGBM] [Info] 5.222547 seconds elapsed, finished iteration 1
[LightGBM] [Info] Iteration:2, training auc : 0.640404
[LightGBM] [Info] Iteration:2, training binary_logloss : 0.37734
[LightGBM] [Info] Iteration:2, valid_1 auc : 0.640931
[LightGBM] [Info] Iteration:2, valid_1 binary_logloss : 0.377335
[LightGBM] [Info] 9.816722 seconds elapsed, finished iteration 2
[LightGBM] [Info] Iteration:3, training auc : 0.640877
[LightGBM] [Info] Iteration:3, training binary_logloss : 0.319353
[LightGBM] [Info] Iteration:3, valid_1 auc : 0.641066
[LightGBM] [Info] Iteration:3, valid_1 binary_logloss : 0.319781
[LightGBM] [Info] 14.253281 seconds elapsed, finished iteration 3
[LightGBM] [Info] Iteration:4, training auc : 0.640929
[LightGBM] [Info] Iteration:4, training binary_logloss : 0.277839
[LightGBM] [Info] Iteration:4, valid_1 auc : 0.641751
[LightGBM] [Info] Iteration:4, valid_1 binary_logloss : 0.27813
[LightGBM] [Info] 18.385274 seconds elapsed, finished iteration 4
[LightGBM] [Info] Iteration:5, training auc : 0.641213
[LightGBM] [Info] Iteration:5, training binary_logloss : 0.244325
[LightGBM] [Info] Iteration:5, valid_1 auc : 0.642043
[LightGBM] [Info] Iteration:5, valid_1 binary_logloss : 0.245107
[LightGBM] [Info] 22.906152 seconds elapsed, finished iteration 5
[LightGBM] [Info] Iteration:6, training auc : 0.641893
[LightGBM] [Info] Iteration:6, training binary_logloss : 0.227612
[LightGBM] [Info] Iteration:6, valid_1 auc : 0.64272
[LightGBM] [Info] Iteration:6, valid_1 binary_logloss : 0.22849
[LightGBM] [Info] 27.500254 seconds elapsed, finished iteration 6
[LightGBM] [Info] Iteration:7, training auc : 0.642884
[LightGBM] [Info] Iteration:7, training binary_logloss : 0.218942
[LightGBM] [Info] Iteration:7, valid_1 auc : 0.642934
[LightGBM] [Info] Iteration:7, valid_1 binary_logloss : 0.220305
[LightGBM] [Info] 32.102610 seconds elapsed, finished iteration 7
[LightGBM] [Info] Iteration:8, training auc : 0.643339
[LightGBM] [Info] Iteration:8, training binary_logloss : 0.210304
[LightGBM] [Info] Iteration:8, valid_1 auc : 0.643773
[LightGBM] [Info] Iteration:8, valid_1 binary_logloss : 0.21077
[LightGBM] [Info] 36.596755 seconds elapsed, finished iteration 8
[LightGBM] [Info] Iteration:9, training auc : 0.644028
[LightGBM] [Info] Iteration:9, training binary_logloss : 0.206342
[LightGBM] [Info] Iteration:9, valid_1 auc : 0.644737
[LightGBM] [Info] Iteration:9, valid_1 binary_logloss : 0.206765
[LightGBM] [Info] 41.378988 seconds elapsed, finished iteration 9
[LightGBM] [Info] Iteration:10, training auc : 0.645155
[LightGBM] [Info] Iteration:10, training binary_logloss : 0.203832
[LightGBM] [Info] Iteration:10, valid_1 auc : 0.645624
[LightGBM] [Info] Iteration:10, valid_1 binary_logloss : 0.204575
[LightGBM] [Info] 46.289345 seconds elapsed, finished iteration 10
[LightGBM] [Info] Iteration:11, training auc : 0.646097
[LightGBM] [Info] Iteration:11, training binary_logloss : 0.202373
[LightGBM] [Info] Iteration:11, valid_1 auc : 0.646695
[LightGBM] [Info] Iteration:11, valid_1 binary_logloss : 0.203064
[LightGBM] [Info] 50.889507 seconds elapsed, finished iteration 11
[LightGBM] [Info] Iteration:12, training auc : 0.647009
[LightGBM] [Info] Iteration:12, training binary_logloss : 0.201508
[LightGBM] [Info] Iteration:12, valid_1 auc : 0.647716
[LightGBM] [Info] Iteration:12, valid_1 binary_logloss : 0.202175
[LightGBM] [Info] 55.186270 seconds elapsed, finished iteration 12
[LightGBM] [Info] Iteration:13, training auc : 0.648046
[LightGBM] [Info] Iteration:13, training binary_logloss : 0.200966
[LightGBM] [Info] Iteration:13, valid_1 auc : 0.648762
[LightGBM] [Info] Iteration:13, valid_1 binary_logloss : 0.201626
[LightGBM] [Info] 59.347359 seconds elapsed, finished iteration 13
[LightGBM] [Info] Iteration:14, training auc : 0.648796
[LightGBM] [Info] Iteration:14, training binary_logloss : 0.200611
[LightGBM] [Info] Iteration:14, valid_1 auc : 0.649502
[LightGBM] [Info] Iteration:14, valid_1 binary_logloss : 0.201241
[LightGBM] [Info] 64.091962 seconds elapsed, finished iteration 14
[LightGBM] [Info] Iteration:15, training auc : 0.649779
[LightGBM] [Info] Iteration:15, training binary_logloss : 0.200373
[LightGBM] [Info] Iteration:15, valid_1 auc : 0.650401
[LightGBM] [Info] Iteration:15, valid_1 binary_logloss : 0.20099
[LightGBM] [Info] 68.417642 seconds elapsed, finished iteration 15
[LightGBM] [Info] Iteration:16, training auc : 0.650619
[LightGBM] [Info] Iteration:16, training binary_logloss : 0.200196
[LightGBM] [Info] Iteration:16, valid_1 auc : 0.651215
[LightGBM] [Info] Iteration:16, valid_1 binary_logloss : 0.200854
[LightGBM] [Info] 72.854004 seconds elapsed, finished iteration 16
[LightGBM] [Info] Iteration:17, training auc : 0.6515
[LightGBM] [Info] Iteration:17, training binary_logloss : 0.200056
[LightGBM] [Info] Iteration:17, valid_1 auc : 0.651973
[LightGBM] [Info] Iteration:17, valid_1 binary_logloss : 0.200724
[LightGBM] [Info] 77.308802 seconds elapsed, finished iteration 17
[LightGBM] [Info] Iteration:18, training auc : 0.652066
[LightGBM] [Info] Iteration:18, training binary_logloss : 0.19997
[LightGBM] [Info] Iteration:18, valid_1 auc : 0.652581
[LightGBM] [Info] Iteration:18, valid_1 binary_logloss : 0.200636
[LightGBM] [Info] 81.515303 seconds elapsed, finished iteration 18
[LightGBM] [Info] Iteration:19, training auc : 0.652796
[LightGBM] [Info] Iteration:19, training binary_logloss : 0.199873
[LightGBM] [Info] Iteration:19, valid_1 auc : 0.653224
[LightGBM] [Info] Iteration:19, valid_1 binary_logloss : 0.200547
[LightGBM] [Info] 85.568824 seconds elapsed, finished iteration 19
[LightGBM] [Info] Iteration:20, training auc : 0.653484
[LightGBM] [Info] Iteration:20, training binary_logloss : 0.199782
[LightGBM] [Info] Iteration:20, valid_1 auc : 0.653999
[LightGBM] [Info] Iteration:20, valid_1 binary_logloss : 0.200452
[LightGBM] [Info] 89.883252 seconds elapsed, finished iteration 20
[LightGBM] [Info] Iteration:21, training auc : 0.654056
[LightGBM] [Info] Iteration:21, training binary_logloss : 0.199708
[LightGBM] [Info] Iteration:21, valid_1 auc : 0.654673
[LightGBM] [Info] Iteration:21, valid_1 binary_logloss : 0.200359
[LightGBM] [Info] 94.405062 seconds elapsed, finished iteration 21
[LightGBM] [Info] Iteration:22, training auc : 0.654834
[LightGBM] [Info] Iteration:22, training binary_logloss : 0.199623
[LightGBM] [Info] Iteration:22, valid_1 auc : 0.655466
[LightGBM] [Info] Iteration:22, valid_1 binary_logloss : 0.200254
[LightGBM] [Info] 98.907075 seconds elapsed, finished iteration 22
[LightGBM] [Info] Iteration:23, training auc : 0.655323
[LightGBM] [Info] Iteration:23, training binary_logloss : 0.199503
[LightGBM] [Info] Iteration:23, valid_1 auc : 0.655873
[LightGBM] [Info] Iteration:23, valid_1 binary_logloss : 0.200041
[LightGBM] [Info] 103.276109 seconds elapsed, finished iteration 23
[LightGBM] [Info] Iteration:24, training auc : 0.655883
[LightGBM] [Info] Iteration:24, training binary_logloss : 0.199436
[LightGBM] [Info] Iteration:24, valid_1 auc : 0.656388
[LightGBM] [Info] Iteration:24, valid_1 binary_logloss : 0.19997
[LightGBM] [Info] 107.623083 seconds elapsed, finished iteration 24
[LightGBM] [Info] Iteration:25, training auc : 0.65646
[LightGBM] [Info] Iteration:25, training binary_logloss : 0.199371
[LightGBM] [Info] Iteration:25, valid_1 auc : 0.656947
[LightGBM] [Info] Iteration:25, valid_1 binary_logloss : 0.199903
[LightGBM] [Info] 111.891504 seconds elapsed, finished iteration 25
[LightGBM] [Info] Iteration:26, training auc : 0.655644
[LightGBM] [Info] Iteration:26, training binary_logloss : 0.211059
[LightGBM] [Info] Iteration:26, valid_1 auc : 0.655652
[LightGBM] [Info] Iteration:26, valid_1 binary_logloss : 0.210819
[LightGBM] [Info] 115.917622 seconds elapsed, finished iteration 26
[LightGBM] [Info] Iteration:27, training auc : 0.65564
[LightGBM] [Info] Iteration:27, training binary_logloss : 0.203386
[LightGBM] [Info] Iteration:27, valid_1 auc : 0.65627
[LightGBM] [Info] Iteration:27, valid_1 binary_logloss : 0.202972
[LightGBM] [Info] 119.258887 seconds elapsed, finished iteration 27
[LightGBM] [Info] Iteration:28, training auc : 0.656138
[LightGBM] [Info] Iteration:28, training binary_logloss : 0.208581
[LightGBM] [Info] Iteration:28, valid_1 auc : 0.656718
[LightGBM] [Info] Iteration:28, valid_1 binary_logloss : 0.20745
[LightGBM] [Info] 122.489078 seconds elapsed, finished iteration 28
[LightGBM] [Info] Iteration:29, training auc : 0.656389
[LightGBM] [Info] Iteration:29, training binary_logloss : 0.205585
[LightGBM] [Info] Iteration:29, valid_1 auc : 0.656846
[LightGBM] [Info] Iteration:29, valid_1 binary_logloss : 0.204557
[LightGBM] [Info] 126.502695 seconds elapsed, finished iteration 29
[LightGBM] [Info] Iteration:30, training auc : 0.656729
[LightGBM] [Info] Iteration:30, training binary_logloss : 0.203855
[LightGBM] [Info] Iteration:30, valid_1 auc : 0.657434
[LightGBM] [Info] Iteration:30, valid_1 binary_logloss : 0.204028
[LightGBM] [Info] 130.168212 seconds elapsed, finished iteration 30
[LightGBM] [Info] Iteration:31, training auc : 0.657126
[LightGBM] [Info] Iteration:31, training binary_logloss : 0.206254
[LightGBM] [Info] Iteration:31, valid_1 auc : 0.657852
[LightGBM] [Info] Iteration:31, valid_1 binary_logloss : 0.205357
[LightGBM] [Info] 133.843480 seconds elapsed, finished iteration 31
[LightGBM] [Info] Iteration:32, training auc : 0.657359
[LightGBM] [Info] Iteration:32, training binary_logloss : 0.203057
[LightGBM] [Info] Iteration:32, valid_1 auc : 0.657976
[LightGBM] [Info] Iteration:32, valid_1 binary_logloss : 0.202652
[LightGBM] [Info] 137.142360 seconds elapsed, finished iteration 32
[LightGBM] [Info] Iteration:33, training auc : 0.657682
[LightGBM] [Info] Iteration:33, training binary_logloss : 0.202913
[LightGBM] [Info] Iteration:33, valid_1 auc : 0.658214
[LightGBM] [Info] Iteration:33, valid_1 binary_logloss : 0.202595
[LightGBM] [Info] 140.450623 seconds elapsed, finished iteration 33
[LightGBM] [Info] Iteration:34, training auc : 0.657869
[LightGBM] [Info] Iteration:34, training binary_logloss : 0.202783
[LightGBM] [Info] Iteration:34, valid_1 auc : 0.658319
[LightGBM] [Info] Iteration:34, valid_1 binary_logloss : 0.202541
[LightGBM] [Info] 143.567410 seconds elapsed, finished iteration 34
[LightGBM] [Info] Iteration:35, training auc : 0.658003
[LightGBM] [Info] Iteration:35, training binary_logloss : 0.202509
[LightGBM] [Info] Iteration:35, valid_1 auc : 0.658491
[LightGBM] [Info] Iteration:35, valid_1 binary_logloss : 0.202353
[LightGBM] [Info] 146.688199 seconds elapsed, finished iteration 35
[LightGBM] [Info] Iteration:36, training auc : 0.658236
[LightGBM] [Info] Iteration:36, training binary_logloss : 0.202288
[LightGBM] [Info] Iteration:36, valid_1 auc : 0.658622
[LightGBM] [Info] Iteration:36, valid_1 binary_logloss : 0.202013
[LightGBM] [Info] 150.268418 seconds elapsed, finished iteration 36
[LightGBM] [Info] Iteration:37, training auc : 0.658328
[LightGBM] [Info] Iteration:37, training binary_logloss : 0.202096
[LightGBM] [Info] Iteration:37, valid_1 auc : 0.658759
[LightGBM] [Info] Iteration:37, valid_1 binary_logloss : 0.201923
[LightGBM] [Info] 153.343812 seconds elapsed, finished iteration 37
[LightGBM] [Info] Iteration:38, training auc : 0.658397
[LightGBM] [Info] Iteration:38, training binary_logloss : 0.201963
[LightGBM] [Info] Iteration:38, valid_1 auc : 0.658836
[LightGBM] [Info] Iteration:38, valid_1 binary_logloss : 0.201855
[LightGBM] [Info] 155.848591 seconds elapsed, finished iteration 38
[LightGBM] [Info] Iteration:39, training auc : 0.658526
[LightGBM] [Info] Iteration:39, training binary_logloss : 0.201843
[LightGBM] [Info] Iteration:39, valid_1 auc : 0.659126
[LightGBM] [Info] Iteration:39, valid_1 binary_logloss : 0.201817
[LightGBM] [Info] 159.100035 seconds elapsed, finished iteration 39
[LightGBM] [Info] Iteration:40, training auc : 0.658715
[LightGBM] [Info] Iteration:40, training binary_logloss : 0.201747
[LightGBM] [Info] Iteration:40, valid_1 auc : 0.659235
[LightGBM] [Info] Iteration:40, valid_1 binary_logloss : 0.201749
[LightGBM] [Info] 162.325770 seconds elapsed, finished iteration 40
[LightGBM] [Info] Iteration:41, training auc : 0.658898
[LightGBM] [Info] Iteration:41, training binary_logloss : 0.201633
[LightGBM] [Info] Iteration:41, valid_1 auc : 0.659316
[LightGBM] [Info] Iteration:41, valid_1 binary_logloss : 0.20167
[LightGBM] [Info] 165.604355 seconds elapsed, finished iteration 41
[LightGBM] [Info] Iteration:42, training auc : 0.659082
[LightGBM] [Info] Iteration:42, training binary_logloss : 0.201487
[LightGBM] [Info] Iteration:42, valid_1 auc : 0.659425
[LightGBM] [Info] Iteration:42, valid_1 binary_logloss : 0.201525
[LightGBM] [Info] 168.909186 seconds elapsed, finished iteration 42
[LightGBM] [Info] Iteration:43, training auc : 0.659285
[LightGBM] [Info] Iteration:43, training binary_logloss : 0.201393
[LightGBM] [Info] Iteration:43, valid_1 auc : 0.659629
[LightGBM] [Info] Iteration:43, valid_1 binary_logloss : 0.201541
[LightGBM] [Info] 171.610938 seconds elapsed, finished iteration 43
[LightGBM] [Info] Iteration:44, training auc : 0.659416
[LightGBM] [Info] Iteration:44, training binary_logloss : 0.201296
[LightGBM] [Info] Iteration:44, valid_1 auc : 0.659663
[LightGBM] [Info] Iteration:44, valid_1 binary_logloss : 0.201461
[LightGBM] [Info] 174.170988 seconds elapsed, finished iteration 44
[LightGBM] [Info] Iteration:45, training auc : 0.659511
[LightGBM] [Info] Iteration:45, training binary_logloss : 0.201197
[LightGBM] [Info] Iteration:45, valid_1 auc : 0.659711
[LightGBM] [Info] Iteration:45, valid_1 binary_logloss : 0.201391
[LightGBM] [Info] 176.940148 seconds elapsed, finished iteration 45
[LightGBM] [Info] Iteration:46, training auc : 0.659688
[LightGBM] [Info] Iteration:46, training binary_logloss : 0.201081
[LightGBM] [Info] Iteration:46, valid_1 auc : 0.659771
[LightGBM] [Info] Iteration:46, valid_1 binary_logloss : 0.201367
[LightGBM] [Info] 179.391021 seconds elapsed, finished iteration 46
[LightGBM] [Info] Iteration:47, training auc : 0.659875
[LightGBM] [Info] Iteration:47, training binary_logloss : 0.200985
[LightGBM] [Info] Iteration:47, valid_1 auc : 0.659821
[LightGBM] [Info] Iteration:47, valid_1 binary_logloss : 0.201327
[LightGBM] [Info] 182.486511 seconds elapsed, finished iteration 47
[LightGBM] [Info] Iteration:48, training auc : 0.659966
[LightGBM] [Info] Iteration:48, training binary_logloss : 0.200907
[LightGBM] [Info] Iteration:48, valid_1 auc : 0.659923
[LightGBM] [Info] Iteration:48, valid_1 binary_logloss : 0.201257
[LightGBM] [Info] 185.248831 seconds elapsed, finished iteration 48
[LightGBM] [Info] Iteration:49, training auc : 0.660093
[LightGBM] [Info] Iteration:49, training binary_logloss : 0.200808
[LightGBM] [Info] Iteration:49, valid_1 auc : 0.660023
[LightGBM] [Info] Iteration:49, valid_1 binary_logloss : 0.201081
[LightGBM] [Info] 187.584213 seconds elapsed, finished iteration 49
[LightGBM] [Info] Iteration:50, training auc : 0.660168
[LightGBM] [Info] Iteration:50, training binary_logloss : 0.20071
[LightGBM] [Info] Iteration:50, valid_1 auc : 0.66015
[LightGBM] [Info] Iteration:50, valid_1 binary_logloss : 0.200979
[LightGBM] [Info] 189.871226 seconds elapsed, finished iteration 50
[LightGBM] [Info] Finished training
[LightGBM] [Info] Finished linking network in 77.873939 seconds

@guolinke
Copy link
Collaborator

@qrqpjxq
do you use the "tree_learner=data" ?

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 12, 2017

yes, my config is like:

task = train
tree_learner = data
is_pre_partition = false

is_training_metric = true
boosting_type = gbdt
objective = binary
metric = binary_logloss,auc
metric_freq = 1

max_bin = 255
num_trees = 50
learning_rate = 0.3
num_leaves = 512
num_threads = 8

bagging_freq = 5
min_data_in_leaf = 20
min_sum_hessian_in_leaf = 5.0
is_enable_sparse = true
use_two_round_loading = false
is_save_binary_file = false
output_model = lgbm_test.model
categorical_column = 0,16,17,18,20,24

@guolinke
Copy link
Collaborator

@qrqpjxq your data is pre-partitioned ? and you set is_pre_partition = false ?

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 12, 2017

yes, i changed data_loader.cpp a litter, just read file from hdfs. and file is not prepatitioned. but i think this will not effect training?

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 12, 2017

i think the problem is here:

//data_parallel_tree_learner.cpp
before SyncUpGlobalBestSplit:
smaller_best_split:
feature: 20; gain: 3.65923; left_count: 982; right_count: 25676; threshold: 0
larger_best_split:
feature: 20; gain: 17.5227; left_count: 2801; right_count: 79161; threshold: 0

after SyncUpGlobalBestSplit:
smaller_best_split:
feature: 15; gain: 5.6889; left_count: 7841; right_count: 18817; threshold: 72
larger_best_split:
feature: 20; gain: 17.5227; left_count: 2801; right_count: 79161; threshold: 0

then chose best_leaf: 26
feature: 14; gain: 31.6964; left_count: 115; right_count: 100; threshold: 210

before SyncUpGlobalBestSplit:
smaller_best_split:
feature: 1; gain: 1.8929; left_count: 47; right_count: 53; threshold: 101 //compute right_count: 100 info
larger_best_split:
feature: 1; gain: -inf; left_count: 0; right_count: 0; threshold: 0 //left_count: 115 splitinfo is error?

after SyncUpGlobalBestSplit:
smaller_best_split:
feature: 14; gain: 13.7931; left_count: 49; right_count: 51; threshold: 229
larger_best_split:
feature: 0; gain: -inf; left_count: 0; right_count: 0; threshold: 0

then chose best_leaf: 196
feature: 7; gain: 31.6911; left_count: 177238; right_count: 200740; threshold: 167

@guolinke
Copy link
Collaborator

@qrqpjxq
when the gain=-inf, the content in splitinfo has no meaning.

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 12, 2017

I find in iteration one's first split, the gain is 6659, but in iteration 18, the first split gain is 136497, and 1.04708e+06 in third split.
i think there's something wrong, but can't find where is it.

Iteration:1
[LightGBM] [Info] Started training...
before SyncUpGlobalBestSplit:
smaller_best_split:
feature: 22; gain: 5305.8; left_count: 8395392; right_count: 1624913; threshold: 62
larger_best_split:
feature: -1; gain: -inf; left_count: 0; right_count: 0; threshold: 0
after SyncUpGlobalBestSplit:
smaller_best_split:
feature: 6; gain: 6659.39; left_count: 7031550; right_count: 2988755; threshold: 172
larger_best_split:
feature: -1; gain: -inf; left_count: 0; right_count: 0; threshold: 0
then chose best_leaf: 0
feature: 6; gain: 6659.39; left_count: 7031550; right_count: 2988755; threshold: 172

Iteration:18:
[LightGBM] [Info] Iteration:18, training auc : 0.655093
[LightGBM] [Info] Iteration:18, training binary_logloss : 0.262019
[LightGBM] [Info] 7.862754 seconds elapsed, finished iteration 18
before SyncUpGlobalBestSplit:
smaller_best_split:
feature: 9; gain: 139.447; left_count: 8087616; right_count: 1932689; threshold: 205
larger_best_split:
feature: -1; gain: -inf; left_count: 0; right_count: 0; threshold: 0
after SyncUpGlobalBestSplit:
smaller_best_split:
feature: 18; gain: 136497; left_count: 121535; right_count: 9898770; threshold: 0
larger_best_split:
feature: -1; gain: -inf; left_count: 0; right_count: 0; threshold: 0
then chose best_leaf: 0
feature: 18; gain: 136497; left_count: 121535; right_count: 9898770; threshold: 0

before SyncUpGlobalBestSplit:
smaller_best_split:
feature: 9; gain: 68734.3; left_count: 95878; right_count: 25657; threshold: 169
larger_best_split:
feature: 20; gain: 13.6802; left_count: 3671299; right_count: 6227471; threshold: 0
after SyncUpGlobalBestSplit:
smaller_best_split:
feature: 5; gain: 350982; left_count: 19791; right_count: 101744; threshold: 46
larger_best_split:
feature: 24; gain: 17.5659; left_count: 12744; right_count: 9886026; threshold: 0
then chose best_leaf: 0
feature: 5; gain: 350982; left_count: 19791; right_count: 101744; threshold: 46

before SyncUpGlobalBestSplit:
smaller_best_split:
feature: 1; gain: 109058; left_count: 11042; right_count: 8749; threshold: 197
larger_best_split:
feature: 9; gain: 63389.3; left_count: 79160; right_count: 22584; threshold: 169
after SyncUpGlobalBestSplit:
smaller_best_split:
feature: 4; gain: 1.04708e+06; left_count: 13609; right_count: 6182; threshold: 227
larger_best_split:
feature: 9; gain: 63389.3; left_count: 79160; right_count: 22584; threshold: 169
then chose best_leaf: 0
feature: 4; gain: 1.04708e+06; left_count: 13609; right_count: 6182; threshold: 227 ---->gain is go huge

@guolinke
Copy link
Collaborator

@qrqpjxq
Thanks, can you print the sum_gradients and sum_hessians as well ?

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 12, 2017

ok, in iteration 18, third split:

sum_gradient: 12404.6; sum_hessian: 333.905
smaller_split feature: 25; gain: 4619.12; left_count: 19330; right_count: 461; threshold: 169
sum_gradient: 9191.71; sum_hessian: 3058.82
larger_split feature: 25; gain: 995.896; left_count: 85741; right_count: 16003; threshold: 2
sum_gradient: 12404.6; sum_hessian: 333.905
smaller_split feature: 14; gain: 84714.2; left_count: 9558; right_count: 10233; threshold: 198
sum_gradient: 9191.71; sum_hessian: 3058.82
larger_split feature: 14; gain: 4002.16; left_count: 49736; right_count: 52008; threshold: 137
sum_gradient: 12404.6; sum_hessian: 333.905
smaller_split feature: 3; gain: 724493; left_count: 9635; right_count: 10156; threshold: 197
sum_gradient: 9191.71; sum_hessian: 3058.82
larger_split feature: 3; gain: 2424.21; left_count: 83220; right_count: 18524; threshold: 209
before SyncUpGlobalBestSplit:
smaller_best_split:
feature: 3; gain: 724493; left_count: 9635; right_count: 10156; threshold: 197
larger_best_split:
feature: 14; gain: 4002.16; left_count: 49736; right_count: 52008; threshold: 137

after SyncUpGlobalBestSplit:
smaller_best_split:
feature: 4; gain: 1.04708e+06; left_count: 13609; right_count: 6182; threshold: 227
larger_best_split:
feature: 9; gain: 63389.3; left_count: 79160; right_count: 22584; threshold: 169

then chose best_leaf: 0
feature: 4; gain: 1.04708e+06; left_count: 13609; right_count: 6182; threshold: 227

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 12, 2017

this worker maybe:

sum_gradient: 12404.6; sum_hessian: 333.905
smaller_split feature: 26; gain: 1865.2; left_count: 19349; right_count: 442; threshold: 154
sum_gradient: 9191.71; sum_hessian: 3058.82
larger_split feature: 26; gain: 670.372; left_count: 92400; right_count: 9344; threshold: 0
sum_gradient: 12404.6; sum_hessian: 333.905
smaller_split feature: 15; gain: 141334; left_count: 13510; right_count: 6281; threshold: 198
sum_gradient: 9191.71; sum_hessian: 3058.82
larger_split feature: 15; gain: 3910.44; left_count: 87878; right_count: 13866; threshold: 224
sum_gradient: 12404.6; sum_hessian: 333.905
smaller_split feature: 4; gain: 1.04708e+06; left_count: 13609; right_count: 6182; threshold: 227
sum_gradient: 9191.71; sum_hessian: 3058.82
larger_split feature: 4; gain: 2772.11; left_count: 83340; right_count: 18404; threshold: 213
before SyncUpGlobalBestSplit:
smaller_best_split:
feature: 4; gain: 1.04708e+06; left_count: 13609; right_count: 6182; threshold: 227
larger_best_split:
feature: 15; gain: 3910.44; left_count: 87878; right_count: 13866; threshold: 224

after SyncUpGlobalBestSplit:
smaller_best_split:
feature: 4; gain: 1.04708e+06; left_count: 13609; right_count: 6182; threshold: 227
larger_best_split:
feature: 9; gain: 63389.3; left_count: 79160; right_count: 22584; threshold: 169

then chose best_leaf: 0
feature: 4; gain: 1.04708e+06; left_count: 13609; right_count: 6182; threshold: 227

@guolinke
Copy link
Collaborator

@qrqpjxq
I remember the sum_gradient and sum_hessian have the left and right, they are in the splitinfo.

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 12, 2017

the message is like:
before FindBestThreshold small leaf sum_gradients: 12404.6; sum_hessians: 333.905
after FindBestThreshold small leaf left_sum_gradients: 12070.4; right_sum_gradients: 334.238; left_sum_hessian: 327.763; right_sum_hessian: 334.238; gain: 1865.2; left_count: 19349; right_count: 442; threshold: 154
before FindBestThreshold large leaf sum_gradients: 9191.71; sum_hessians: 3058.82
after FindBestThreshold large leaf left_sum_gradients: 8581.29; right_sum_gradients: 610.419; left_sum_hessian: 2702.88; right_sum_hessian: 610.419; gain: 670.372; left_count: 92400; right_count: 9344; threshold: 0
before FindBestThreshold small leaf sum_gradients: 12404.6; sum_hessians: 333.905
after FindBestThreshold small leaf left_sum_gradients: 7397.98; right_sum_gradients: 5006.63; left_sum_hessian: 271.308; right_sum_hessian: 5006.63; gain: 141334; left_count: 13510; right_count: 6281; threshold: 198
before FindBestThreshold large leaf sum_gradients: 9191.71; sum_hessians: 3058.82
after FindBestThreshold large leaf left_sum_gradients: 6726.02; right_sum_gradients: 2465.69; left_sum_hessian: 2635.66; right_sum_hessian: 2465.69; gain: 3910.44; left_count: 87878; right_count: 13866; threshold: 224
before FindBestThreshold small leaf sum_gradients: 12404.6; sum_hessians: 333.905
after FindBestThreshold small leaf left_sum_gradients: 6843.58; right_sum_gradients: 5561.03; left_sum_hessian: 311.122; right_sum_hessian: 5561.03; gain: 1.04708e+06; left_count: 13609; right_count: 6182; threshold: 227
before FindBestThreshold large leaf sum_gradients: 9191.71; sum_hessians: 3058.82
after FindBestThreshold large leaf left_sum_gradients: 6292.17; right_sum_gradients: 2899.54; left_sum_hessian: 2474.78; right_sum_hessian: 2899.54; gain: 2772.11; left_count: 83340; right_count: 18404; threshold: 213
before SyncUpGlobalBestSplit:
smaller_best_split:
feature: 4; gain: 1.04708e+06; left_count: 13609; right_count: 6182; threshold: 227
larger_best_split:
feature: 15; gain: 3910.44; left_count: 87878; right_count: 13866; threshold: 224

after SyncUpGlobalBestSplit:
smaller_best_split:
feature: 4; gain: 1.04708e+06; left_count: 13609; right_count: 6182; threshold: 227
larger_best_split:
feature: 9; gain: 63389.3; left_count: 79160; right_count: 22584; threshold: 169

then chose best_leaf: 0
feature: 4; gain: 1.04708e+06; left_count: 13609; right_count: 6182; threshold: 227

before FindBestThreshold small leaf sum_gradients: 5561.03; sum_hessians: 22.7829
after FindBestThreshold small leaf left_sum_gradients: 0; right_sum_gradients: 0; left_sum_hessian: 0; right_sum_hessian: 0; gain: -inf; left_count: 0; right_count: 0; threshold: 0
before FindBestThreshold large leaf sum_gradients: 6843.58; sum_hessians: 311.122
after FindBestThreshold large leaf left_sum_gradients: 6370.01; right_sum_gradients: 473.566; left_sum_hessian: 294.947; right_sum_hessian: 473.566; gain: 904.36; left_count: 12936; right_count: 673; threshold: 84
before FindBestThreshold small leaf sum_gradients: 5561.03; sum_hessians: 22.7829
after FindBestThreshold small leaf left_sum_gradients: 3194.15; right_sum_gradients: 2366.87; left_sum_hessian: 17.3475; right_sum_hessian: 2366.87; gain: 261427; left_count: 3608; right_count: 2574; threshold: 210
before FindBestThreshold large leaf sum_gradients: 6843.58; sum_hessians: 311.122
after FindBestThreshold large leaf left_sum_gradients: 2238.36; right_sum_gradients: 4605.22; left_sum_hessian: 180.376; right_sum_hessian: 4605.22; gain: 39450.2; left_count: 5986; right_count: 7623; threshold: 116
before FindBestThreshold small leaf sum_gradients: 5561.03; sum_hessians: 22.7829
after FindBestThreshold small leaf left_sum_gradients: 3393.58; right_sum_gradients: 2167.44; left_sum_hessian: 10.5342; right_sum_hessian: 2167.44; gain: 119392; left_count: 3782; right_count: 2400; threshold: 240
before FindBestThreshold large leaf sum_gradients: 6843.58; sum_hessians: 311.122
after FindBestThreshold large leaf left_sum_gradients: 3721.33; right_sum_gradients: 3122.25; left_sum_hessian: 273.144; right_sum_hessian: 3122.25; gain: 156853; left_count: 9723; right_count: 3886; threshold: 197
before SyncUpGlobalBestSplit:
smaller_best_split:
feature: 15; gain: 261427; left_count: 3608; right_count: 2574; threshold: 210
larger_best_split:
feature: 4; gain: 156853; left_count: 9723; right_count: 3886; threshold: 197

after SyncUpGlobalBestSplit:
smaller_best_split:
feature: 5; gain: 758213; left_count: 5190; right_count: 992; threshold: 40
larger_best_split:
feature: 3; gain: 175626; left_count: 6352; right_count: 7257; threshold: 162

then chose best_leaf: 3
feature: 5; gain: 758213; left_count: 5190; right_count: 992; threshold: 40

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 12, 2017

log

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 12, 2017

it that left_sum_hessian + right_sum_hessian != sun_hessian?

@guolinke
Copy link
Collaborator

@qrqpjxq can you provide your print code ? I want to check the sum_hessian is the local one or the global one

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 13, 2017

serial_tree_learner.cpp:
// Get a leaf with max split gain
185 int best_leaf = static_cast(ArrayArgs::ArgMax(best_split_per_leaf_));
186 std::cout << "then chose best_leaf: " << best_leaf << std::endl;
187 // Get split information for best leaf
188 const SplitInfo& best_leaf_SplitInfo = best_split_per_leaf_[best_leaf];
189 std::cout << "feature: " << best_leaf_SplitInfo.feature << "; gain: " << best_leaf_SplitInfo.gain << "; left_count: "
190 << best_leaf_SplitInfo.left_count << "; right_count: " << best_leaf_SplitInfo.right_count << "; threshold: "
191 << best_leaf_SplitInfo.threshold << std::endl << std::endl << std::endl;

//data_parallel_tree_learner.cpp
SplitInfo smaller_split;
198 // find best threshold for smaller child
199 std::cout << "before FindBestThreshold small leaf sum_gradients: " << this->smaller_leaf_splits_->sum_gradients()
200 << "; sum_hessians: " << this->smaller_leaf_splits_->sum_hessians() << std::endl;
201
202 this->smaller_leaf_histogram_array_[feature_index].FindBestThreshold(
203 this->smaller_leaf_splits_->sum_gradients(),
204 this->smaller_leaf_splits_->sum_hessians(),
205 GetGlobalDataCountInLeaf(this->smaller_leaf_splits_->LeafIndex()),
206 &smaller_split);
207
208 std::cout << "after FindBestThreshold small leaf left_sum_gradients: " << smaller_split.left_sum_gradient
209 << "; right_sum_gradients: " << smaller_split.right_sum_gradient << "; left_sum_hessian: " << smaller_split.left_sum_hessian
210 << "; right_sum_hessian: " << smaller_split.right_sum_gradient << "; gain: " << smaller_split.gain << "; left_count: " << smaller_split.left_count
211 << "; right_count: " << smaller_split.right_count << "; threshold: " << smaller_split.threshold << std::endl;
212
213 smaller_split.feature = real_feature_index;
214 if (smaller_split > smaller_bests_per_thread[tid]) {
215 smaller_bests_per_thread[tid] = smaller_split;
216 }
217 // only root leaf
218 if (this->larger_leaf_splits_ == nullptr || this->larger_leaf_splits_->LeafIndex() < 0) continue;
219
220 // construct histgroms for large leaf, we init larger leaf as the parent, so we can just subtract the smaller leaf's histograms
221 this->larger_leaf_histogram_array_[feature_index].Subtract(
222 this->smaller_leaf_histogram_array_[feature_index]);
223 SplitInfo larger_split;
224 // find best threshold for larger child
225 std::cout << "before FindBestThreshold large leaf sum_gradients: " << this->larger_leaf_splits_->sum_gradients()
226 << "; sum_hessians: " << this->larger_leaf_splits_->sum_hessians() << std::endl;
227
228 this->larger_leaf_histogram_array_[feature_index].FindBestThreshold(
229 this->larger_leaf_splits_->sum_gradients(),
230 this->larger_leaf_splits_->sum_hessians(),
231 GetGlobalDataCountInLeaf(this->larger_leaf_splits_->LeafIndex()),
232 &larger_split);
233
234 std::cout << "after FindBestThreshold large leaf left_sum_gradients: " << larger_split.left_sum_gradient
235 << "; right_sum_gradients: " << larger_split.right_sum_gradient << "; left_sum_hessian: " << larger_split.left_sum_hessian
236 << "; right_sum_hessian: " << larger_split.right_sum_gradient << "; gain: " << larger_split.gain << "; left_count: " << larger_split.left_count
237 << "; right_count: " << larger_split.right_count << "; threshold: " << larger_split.threshold << std::endl;

std::cout << "before SyncUpGlobalBestSplit: " << std::endl;
260 std::cout << "smaller_best_split: " << std::endl;
261 std::cout << "feature: " << smaller_best_split.feature << "; gain: " << smaller_best_split.gain << "; left_count: " << smaller_best_split.left_count << "; right_count: " << smaller_best_split.righ
t_count << "; threshold: " << smaller_best_split.threshold << std::endl;
262 std::cout << "larger_best_split: " << std::endl;
263 std::cout << "feature: " << larger_best_split.feature << "; gain: " << larger_best_split.gain << "; left_count: " << larger_best_split.left_count << "; right_count: " << larger_best_split.right_co
unt << "; threshold: " << larger_best_split.threshold << std::endl << std::endl;
264 // sync global best info
265 SyncUpGlobalBestSplit(input_buffer_.data(), input_buffer_.data(), &smaller_best_split, &larger_best_split, this->tree_config_->max_cat_threshold);
266
267 // set best split
268 this->best_split_per_leaf_[this->smaller_leaf_splits_->LeafIndex()] = smaller_best_split;
269 if (this->larger_leaf_splits_->LeafIndex() >= 0) {
270 this->best_split_per_leaf_[this->larger_leaf_splits_->LeafIndex()] = larger_best_split;
271 }
272 std::cout << "after SyncUpGlobalBestSplit: " << std::endl;
273 std::cout << "smaller_best_split: " << std::endl;
274 std::cout << "feature: " << smaller_best_split.feature << "; gain: " << smaller_best_split.gain << "; left_count: " << smaller_best_split.left_count << "; right_count: " << smaller_best_split.righ
t_count << "; threshold: " << smaller_best_split.threshold << std::endl;
275 std::cout << "larger_best_split: " << std::endl;
276 std::cout << "feature: " << larger_best_split.feature << "; gain: " << larger_best_split.gain << "; left_count: " << larger_best_split.left_count << "; right_count: " << larger_best_split.right_co
unt << "; threshold: " << larger_best_split.threshold << std::endl << std::endl;

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 13, 2017

in below split, in father node sum_gradients = 490.476
but left_sum_gradients = -450.066, right_sum_gradients = 940.542
this my cause gradients bigger and bigger

before FindBestThreshold small leaf sum_gradients: 490.476; sum_hessians: 494193
after FindBestThreshold small leaf left_sum_gradients: 1772.61; right_sum_gradients: -1282.13; left_sum_hessian: 320931; right_sum_hessian: -1282.13; gain: 18.7916; left_count: 6741502; right_count: 3278803; threshold: 0
before FindBestThreshold small leaf sum_gradients: 490.476; sum_hessians: 494193
after FindBestThreshold small leaf left_sum_gradients: 1034.33; right_sum_gradients: -543.858; left_sum_hessian: 476592; right_sum_hessian: -543.858; gain: 18.5635; left_count: 9739346; right_count: 280959; threshold: 196
before FindBestThreshold small leaf sum_gradients: 490.476; sum_hessians: 494193
after FindBestThreshold small leaf left_sum_gradients: 269.499; right_sum_gradients: 220.977; left_sum_hessian: 491960; right_sum_hessian: 220.977; gain: 21.5377; left_count: 9975829; right_count: 44476; threshold: 252
before FindBestThreshold small leaf sum_gradients: 490.476; sum_hessians: 494193
after FindBestThreshold small leaf left_sum_gradients: 1374.29; right_sum_gradients: -883.817; left_sum_hessian: 432311; right_sum_hessian: -883.817; gain: 16.5051; left_count: 9086547; right_count: 933758; threshold: 203
before FindBestThreshold small leaf sum_gradients: 490.476; sum_hessians: 494193
after FindBestThreshold small leaf left_sum_gradients: -1018.83; right_sum_gradients: 1509.31; left_sum_hessian: 138991; right_sum_hessian: 1509.31; gain: 13.3948; left_count: 2605864; right_count: 7414441; threshold: 0
before FindBestThreshold small leaf sum_gradients: 490.476; sum_hessians: 494193
after FindBestThreshold small leaf left_sum_gradients: 914.766; right_sum_gradients: -424.289; left_sum_hessian: 28823.4; right_sum_hessian: -424.289; gain: 28.9319; left_count: 587457; right_count: 9432848; threshold: 0
before FindBestThreshold small leaf sum_gradients: 490.476; sum_hessians: 494193
after FindBestThreshold small leaf left_sum_gradients: -450.066; right_sum_gradients: 940.542; left_sum_hessian: 6587.44; right_sum_hessian: 940.542; gain: 32.0767; left_count: 121535; right_count: 9898770; threshold: 0
before SyncUpGlobalBestSplit:
smaller_best_split:
feature: 18; gain: 32.0767; left_count: 121535; right_count: 9898770; threshold: 0
larger_best_split:
feature: -1; gain: -inf; left_count: 0; right_count: 0; threshold: 0

after SyncUpGlobalBestSplit:
smaller_best_split:
feature: 18; gain: 32.0767; left_count: 121535; right_count: 9898770; threshold: 0
larger_best_split:
feature: -1; gain: -inf; left_count: 0; right_count: 0; threshold: 0

then chose best_leaf: 0
feature: 18; gain: 32.0767; left_count: 121535; right_count: 9898770; threshold: 0

@guolinke
Copy link
Collaborator

@qrqpjxq it is normal for the sum_gradient is bigger.
But the sum_hessian seems is very abnormal, it should equal to left+right.
Did it always like this or only sometime are wrong ?

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 15, 2017

i tried parallel learning and single training, in both the sum_hessian != left + right;

in serial_tree_learner.cpp my print code is:
std::cout << "before FindBestThreshold small leaf sum_gradients: " << this->smaller_leaf_splits_->sum_gradients()
<< "; sum_hessians: " << this->smaller_leaf_splits_->sum_hessians() << "; sum_count: " << this->smaller_leaf_splits_->num_data_in_leaf() << std::endl;
smaller_leaf_histogram_array_[feature_index].FindBestThreshold(
smaller_leaf_splits_->sum_gradients(),
smaller_leaf_splits_->sum_hessians(),
smaller_leaf_splits_->num_data_in_leaf(),
&smaller_split);
std::cout << "after FindBestThreshold small leaf left_sum_gradients: " << smaller_split.left_sum_gradient
<< "; right_sum_gradients: " << smaller_split.right_sum_gradient << "; left_sum_hessian: " << smaller_split.left_sum_hessian
<< "; right_sum_hessian: " << smaller_split.right_sum_gradient << "; gain: " << smaller_split.gain << "; left_count: " << smaller_split.left_count
<< "; right_count: " << smaller_split.right_count << "; threshold: " << smaller_split.threshold << std::endl;
smaller_split.feature = real_fidx;
if (smaller_split > smaller_best[tid]) {
smaller_best[tid] = smaller_split;
}

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 15, 2017

but i tried the example in examples/binary_classification
the information is:

[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Finished loading data in 0.012198 seconds
[LightGBM] [Info] Number of positive: 3716, number of negative: 3284
[LightGBM] [Info] Total Bins 1542
[LightGBM] [Info] Number of data: 7000, number of used features: 8
[LightGBM] [Info] Finished initializing training
[LightGBM] [Info] Started training...
before FindBestThreshold small leaf sum_gradients: -216; sum_hessians: 1750; sum_count: 7000
after FindBestThreshold small leaf left_sum_gradients: -222; right_sum_gradients: 6; left_sum_hessian: 1744; right_sum_hessian: 6;

before FindBestThreshold small leaf sum_gradients: -216; sum_hessians: 1750; sum_count: 7000
after FindBestThreshold small leaf left_sum_gradients: -204.5; right_sum_gradients: -11.5; left_sum_hessian: 1721.75; right_sum_hessian: -11.5;

before FindBestThreshold small leaf sum_gradients: -216; sum_hessians: 1750; sum_count: 7000
after FindBestThreshold small leaf left_sum_gradients: 46; right_sum_gradients: -262; left_sum_hessian: 696.5; right_sum_hessian: -262;

before FindBestThreshold small leaf sum_gradients: -216; sum_hessians: 1750; sum_count: 7000
after FindBestThreshold small leaf left_sum_gradients: -37; right_sum_gradients: -179; left_sum_hessian: 511; right_sum_hessian: -179;

before FindBestThreshold small leaf sum_gradients: -216; sum_hessians: 1750; sum_count: 7000
after FindBestThreshold small leaf left_sum_gradients: 6; right_sum_gradients: -222; left_sum_hessian: 7; right_sum_hessian: -222;

before FindBestThreshold small leaf sum_gradients: -216; sum_hessians: 1750; sum_count: 7000
after FindBestThreshold small leaf left_sum_gradients: -297.5; right_sum_gradients: 81.5; left_sum_hessian: 1589.25; right_sum_hessian: 81.5;

before FindBestThreshold small leaf sum_gradients: -216; sum_hessians: 1750; sum_count: 7000
after FindBestThreshold small leaf left_sum_gradients: -126.5; right_sum_gradients: -89.5; left_sum_hessian: 685.25; right_sum_hessian: -89.5;

in the first feature sum = left + right, but then begin in second feature, is not the same

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 15, 2017

then i tried in examples/parallel_learning/ data = binary.train; workers=2
if also sum_h != left_h + right_h

[LightGBM] [Info] Finished loading parameters
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Info] Trying to bind port 12400...
[LightGBM] [Info] Binding port 12400 succeeded
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Info] Listening...
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Info] Connected to rank 1
[LightGBM] [Info] Local rank: 0, total number of machines: 2
[LightGBM] [Info] Finished initializing network
[LightGBM] [Info] Finished loading data in 0.044321 seconds
[LightGBM] [Info] Number of positive: 1867, number of negative: 1612
[LightGBM] [Info] Total Bins 6143
[LightGBM] [Info] Number of data: 3479, number of used features: 28
[LightGBM] [Info] Finished initializing training
[LightGBM] [Info] Started training...
before FindBestThreshold small leaf sum_gradients: -216; sum_hessians: 1750; sum_count: 3479
after FindBestThreshold small leaf left_sum_gradients: -129; right_sum_gradients: -87; left_sum_hessian: 347.5; right_sum_hessian: -87;

before FindBestThreshold small leaf sum_gradients: -216; sum_hessians: 1750; sum_count: 3479
after FindBestThreshold small leaf left_sum_gradients: -221.5; right_sum_gradients: 5.5; left_sum_hessian: 1739.25; right_sum_hessian: 5.5;

before FindBestThreshold small leaf sum_gradients: -216; sum_hessians: 1750; sum_count: 3479
after FindBestThreshold small leaf left_sum_gradients: -168.5; right_sum_gradients: -47.5; left_sum_hessian: 994.25; right_sum_hessian: -47.5;

before FindBestThreshold small leaf sum_gradients: -216; sum_hessians: 1750; sum_count: 3479
after FindBestThreshold small leaf left_sum_gradients: 2.5; right_sum_gradients: -218.5; left_sum_hessian: 1220.75; right_sum_hessian: -218.5;

@guolinke guolinke reopened this Sep 15, 2017
@guolinke
Copy link
Collaborator

@qrqpjxq
I am being very busy recently. It will be good if you can fix this and open a PR.

@qrqpjxq
Copy link
Contributor Author

qrqpjxq commented Sep 16, 2017

ok, i will try it. but if you can add my wechat: qjxtqrm ? then if there any questions i can ask you timely.

guolinke added a commit that referenced this issue Oct 9, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Mar 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants