Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nf test tune #166

Merged
merged 25 commits into from
Jul 25, 2024
Merged

Nf test tune #166

merged 25 commits into from
Jul 25, 2024

Conversation

alessiovignoli
Copy link
Contributor

Changes aported:

  1. Setting the seed. Seed can be None but if given is a integer and is given in the tune config. That seed will initialize python, numpy and torch seeds in the TuneWrapper class init. Then numbers will be drown randomly (always the same if the seed is set) and those values will be the seeds for the trials inside tune. So each tune trial/experiment will have is own seed set in a reproducible manner because is dependant on the overall user given seed.
  2. Seed determine also weights initialization for the specific model (1 model per trial). Model are initialized always to same parameters if seed is given.
  3. Added debug mode to explroe weights initialization, seeds and raw output predicgtion on validation set for best model. All this files are outputed only if debug_mode is activated. They will be present in the debug dir under the TuneRun results subdir. This files are used for understanding reproduciobility across identical runs of tune.
  4. handle_tune nf-test has been created and dnatofloat on cpu is set as proxy for reproducibility. More work here in the future
  5. Now tune configs can specify a run_params key that handles RunConfig auxiliary information like stop criteria necessary for the FIFOscheduler now present in the dnatofloat cpu tune config.

TODO for the future is decide wich flavor of the nf-test tune should be put as guthub action to test reproducibility.

…e, weigths are now set in a reproducible and deterministic manner
@alessiovignoli alessiovignoli linked an issue Jul 24, 2024 that may be closed by this pull request
elif user_tune_config["tune"]["scheduler"]["name"] == "FIFOScheduler":
user_tune_config["tune"]["run_params"]["stop"]["training_iteration"] = 1

# TODO future schedulers specific info will go here as well. maybe find a cleaner way.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely this needs to become a stimulus class in the same way experiment is

results.save_best_model(output)
results.save_best_config(best_config_path)
results.save_best_metrics_dataframe(best_metrics_path)
results.save_best_optimizer(best_optimizer_path)

# debug section. predict the validation data using the best model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the behavior of debug load the best model and tests on the validation data, shouldn't this be reserved to the analysis module? How is this helping us with debug tuning?

@@ -179,12 +210,34 @@ def setup(self, config: dict, training: object, validation: object) -> None:
self.training = DataLoader(training, batch_size=self.batch_size, shuffle=True) # TODO need to check the reproducibility of this shuffling
self.validation = DataLoader(validation, batch_size=self.batch_size, shuffle=True)

# debug section, first create a dedicated directory for each worker inside Ray_results/<tune_model_run_specific_dir> location
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, I believe this should be done regardless of debug or not (saving seed and initial model), this would be a "robustness mode" toggled "on" by default!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a "reproducibility" mode, as I believe the formula "model + initial state + seed + training code + training data" is our "deep learning" container

@mathysgrapotte mathysgrapotte merged commit 7e57cbe into main Jul 25, 2024
4 checks passed
@mathysgrapotte mathysgrapotte deleted the nf-test_tune branch July 25, 2024 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[tests] Add global pipeline tests (nf-tests)
2 participants