[python-package] Difference between versions 3.2.1 and 4.5.0 in paralelism #6672

guilhermeparreira · 2024-10-10T17:14:02Z

My Experiment

I am working with 100 time series. My PC has 24 processors. I aim to run each time series in each processor using from joblib import Parallel fitting LightGBM model.

When I tried to do this task with LightGBM==4.5.0 I had concurrency problems: All 24 processors were using 100% of CPU and the processing was so slow that it seemed nothing was being calculated. It happened with either device_type="gpu" or device_type="cpu" or device_type="cuda". The problem was solved when I passed njobs=2 in Parallel (so, I was sending only 2 time series at a time for the 24 processors). LightGBM was always with n_jobs=1.

However, when I changed to pip install LightGBM==3.2.1 I could send the 24 time series simultaneously without causing concurrency problems: my code ran fine using 100% of all processors.

My hyperparameters

"LGBMRegressor": {
        "learning_rate": np.linspace(0.001, 0.2, num=8),
        "n_estimators": np.linspace(100, 300, num=2, dtype=int),
        "max_depth": [4, 6],
        "colsample_bytree": [0.7, 0.8, 0.9],
        "subsample": [0.7, 0.8, 0.9],
        "random_state": [42],
        "force_col_wise": [True],
        "verbose": [-1],  # < 0: Fatal, = 0: Error (Warning), = 1: Info, > 1: Debug
        "objective": ["regression_l1"],  # https://lightgbm.readthedocs.io/en/latest/Parameters.html
    },

Commands to install LightGBM

pip install lightgbm==4.5.0
pip install lightgbm==3.2.1

Specific Questions

Is there a big change in how LightGBM was programmed between 3.2.1 and 4.5.0?
Does LightGBM 4.5.0 run some internal code in parallel that 3.2.1 does not?

The text was updated successfully, but these errors were encountered:

jameslamb · 2024-10-10T17:33:14Z

Thanks for using LightGBM.

Yes, there have been significant changes since LightGBM 3.2.1 (April 2021) and 4.5.0 (July 2024).

If you can provide a minimal, reproducible example we'd be happy to help. That's a small amount of fully self-contained code that replicates the issue. Here's a good example to start from: #6620 (comment)

Having that would answer questions I have like:

what is Parallel?
what do you mean "run each time series"?

github-actions · 2024-11-10T04:03:53Z

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

jameslamb changed the title ~~Difference between versions 3.2.1 and 4.5.0 in paralelism~~ [python-package] Difference between versions 3.2.1 and 4.5.0 in paralelism Oct 10, 2024

jameslamb added question awaiting response labels Oct 10, 2024

github-actions bot closed this as completed Nov 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-package] Difference between versions 3.2.1 and 4.5.0 in paralelism #6672

[python-package] Difference between versions 3.2.1 and 4.5.0 in paralelism #6672

guilhermeparreira commented Oct 10, 2024

jameslamb commented Oct 10, 2024

github-actions bot commented Nov 10, 2024

[python-package] Difference between versions 3.2.1 and 4.5.0 in paralelism #6672

[python-package] Difference between versions 3.2.1 and 4.5.0 in paralelism #6672

Comments

guilhermeparreira commented Oct 10, 2024

My Experiment

My hyperparameters

Specific Questions

jameslamb commented Oct 10, 2024

github-actions bot commented Nov 10, 2024