Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It seems that larger_leaf_histogram_array_ points to the smaller? #6747

Open
pengxiao-song opened this issue Dec 11, 2024 · 3 comments
Open

Comments

@pengxiao-song
Copy link

pengxiao-song commented Dec 11, 2024

Description

Thanks for nice work!! Recently I read the source code. I don’t quite understand how the code of "Reproducible example" works. It seems that larger_leaf_histogram_array_ points to the node with a smaller number of samples. Is that correct? Why not write it like this instead?

} else if (num_data_in_left_child < num_data_in_right_child) {
    if (histogram_pool_.Get(right_leaf, &larger_leaf_histogram_array_)) { parent_leaf_histogram_array_ = larger_leaf_histogram_array_; }
    histogram_pool_.Get(left_leaf, &smaller_leaf_histogram_array_);

Reproducible example

} else if (num_data_in_left_child < num_data_in_right_child) {
  // put parent(left) leaf's histograms into larger leaf's histograms
  if (histogram_pool_.Get(left_leaf, &larger_leaf_histogram_array_)) { parent_leaf_histogram_array_ = larger_leaf_histogram_array_; }
  histogram_pool_.Move(left_leaf, right_leaf);
  histogram_pool_.Get(left_leaf, &smaller_leaf_histogram_array_);

I am a little confused. Thanks for your response.

@pengxiao-song pengxiao-song changed the title larger_leaf_histogram_array_ points to the larger? It seems that larger_leaf_histogram_array_ points to the smaller? Dec 11, 2024
@jameslamb
Copy link
Collaborator

Thanks for using LightGBM.

how the code of "Reproducible example" works

What is this a reference to? Where did this code come from? Can you please provide some links?

@pengxiao-song
Copy link
Author

I apologize for the lack of clarity in my previous issue. Let me provide more context.

The piece of code in question comes from:

} else if (num_data_in_left_child < num_data_in_right_child) {

I believe the related code is performing operations to update larger_leaf_histogram_array_ and smaller_leaf_histogram_array_.

Specifically, can the code be written this way?

} else if (num_data_in_left_child < num_data_in_right_child) {
    if (histogram_pool_.Get(right_leaf, &larger_leaf_histogram_array_)) { parent_leaf_histogram_array_ = larger_leaf_histogram_array_; }
    histogram_pool_.Get(left_leaf, &smaller_leaf_histogram_array_);

like

if (histogram_pool_.Get(left_leaf, &larger_leaf_histogram_array_)) { parent_leaf_histogram_array_ = larger_leaf_histogram_array_; }

Could you please help clarify the reasoning behind the current structure? Thank you very much!

@shiyu1994
Copy link
Collaborator

Hi @pengxiao-song, thanks for using LightGBM.

We should first clarify why we need larger_leaf_histogram_array_. It is used in the subtraction operation here to get the histogram of the larger leaf.

} else {
larger_leaf_histogram_array_[feature_index].Subtract<false>(
smaller_leaf_histogram_array_[feature_index]);
}

And we should notify that the index of left_leaf inherits the index of parent_leaf, after the parent_leaf is split. Thus,

if (histogram_pool_.Get(left_leaf, &larger_leaf_histogram_array_)) { parent_leaf_histogram_array_ = larger_leaf_histogram_array_; }

is doing this:

Get the larger leaf histogram, which is always from histogram_pool_.Get(left_leaf, &larger_leaf_histogram_array_). The larger histogram will always be the histogram of the parent leaf (if with enough memory space to store these parent histograms) since it is obtained by subtracted the small histogram from the parent histogram. So, we just make the larger histogram points to the parent histogram, and after the subtraction the content in the parent histogram becomes the larger histogram.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants