Monitor observables every `N` epochs #573

elcorto · 2024-08-22T13:34:43Z

When using during_training_metric, the respective quantity is calculated in every epoch, which may be costly if during_training_metric="total_energy".

When using shuffled snapshots, adding the required calculation_output_file as in

data_handler.add_snapshot(
    "Be_snapshot_shuffled1.in.npy",
    data_path,
    "Be_snapshot_shuffled1.out.npy",
    data_path,
    "va",
    calculation_output_file=os.path.join( data_path, "Be_snapshot1.out"),
)

may not be valid since the reference data in Be_snapshot1.out doesn't match the validation data. I'm not sure what data is read from this file, so this may or may not be a problem, but in any case one must provide some file here, else we see Exception: Could not guess type of additional calculation data provided to MALA..

In addition, #571 and #572 make it hard to use the feature in production at the moment.

So, is there a way to do something like examples/basic/ex02_test_network.py every N epochs only, where one defines non-shuffled test snapshots plus reference data calculation_output_file="/path/to/qe.out". This would be independent of the validation data (one could call it second validation data set) and save compute as well.

The text was updated successfully, but these errors were encountered:

RandomDefaultUser · 2024-10-08T15:36:53Z

Hi @elcorto, thanks for raising this issue!
You are indeed right that using some output with shuffled snapshots is dubious at best. To be more precise: band energy and total energy as an optimization metric do NOT work with shuffled snapshots at all. There is currently no way to safeguard this, albeit OpenPMD could maybe help in this respect in the future.

I generally agree that having a mechanic that would track a targeted metric only after N steps sounds useful. Just to double check with @nerkulec , since you overhauled the reporting, I assume this would not clash with the current state of things, right?

nerkulec · 2024-10-08T15:41:52Z

@RandomDefaultUser Sure, I can integrate this, I already have this on my branch

RandomDefaultUser · 2024-10-09T16:26:05Z

Great, thank you!

nerkulec · 2024-10-10T14:03:28Z

I implemented it in #584 in a way that LDOS error is evaluated every epoch, while other metrics only every parameters.running.validate_every_n_epochs epochs.
I usually train on shuffled data, and use unshuffled data for validation. That eliminates the need for two validation sets. @elcorto does that also work for you?

elcorto · 2024-10-16T17:48:29Z

Thanks @nerkulec for this addition. I took the liberty to re-open this issue such that we can discuss this (which should be quickly resolved).

So to make sure I understand: There are two new parameters:

parameters.running.validation_metrics (added in Unified error calculation #560 I think)
parameters.running.validate_every_n_epochs

I was under the impression that all of those need to be added, in the running case, as a property to common.parameters.ParametersRunning, with a doc string such that they show up in the sphinx docs?

So as I understand #584:

validate_every_n_epochs will make the val loss be calculated every validate_every_n_epochs epochs instead of every (validate_every_n_epochs=1)
It assumes that the val snapshot is not shuffled, for certain parameters.running.validation_metrics apart from "ldos", such as "total_energy"? If yes, then which metrics rely on an unshuffled snapshot? I guess this comes down to what is read from calculation_output_file? This was the first point that I was trying to address above, sorry for not being more clear.

Given this new feature, what is the difference between validation_metrics and during_training_metric, other than the evaluation frequency defined by validate_every_n_epochs?

I think what I had in mind was this workflow, based on using shuffled snapshots by default:

shuffle one or more snapshots (or randomly sub-sample one, or or ...), this creates the whole dataset (descriptors, ldos)
do a train/val split (not a train/val/test split as usual in DL since the "test" set is a separate snapshot for which e.g. total energy is known); the fact that the train and val sets are supplied as "snapshots" is an implementation detail at this point
monitor train and val loss (usual stuff, detect overfitting, etc)
new: every N epochs, calculate observables which need a full snapshot as reference (total energy, say), this is what I meant by "second validation data set", so every N epochs we get the total energy error to some reference
do final eval on test snapshot (or in test runs, skip this and treat the N-epochs snapshot as independent test set)

So, for certain validation_metrics other than "ldos", the proposed workflow in #584 is to use a another full snapshot as a stand-in for a standard i.i.d. sampled val set from the same distribution as the train set, i.e. detect overfitting the train set by looking at the val loss computed on a (potentially) different population (what the MALA workflow was before using shuffled snapshots and what is still reflected in the API). This is kind of OK since we also have to do the final test eval on a different full snapshot where we need to assume that the descriptors come from the same distribution as the train and val ones -- this is just an artifact of the data we are dealing with. I still think it would be cleaner to have a standard train process (val loss on val set from a split of the same dataset) with a separate option to kick off a metric/observable calculation every N epochs which may require a full unshuffled snapshot. Since the val loss is calculated as often as the train loss, this would also speed up the val loss calculation given, say, a 80/20 train/val split.

RandomDefaultUser · 2024-10-18T09:16:34Z

I am wondering if maybe this is a discussion we should have during a meeting (or potentially the design workshop?)
I feel both the current method as well as what you, @elcorto, outline in your comment is reasonable and this is more of a "what do we want to do?" question rather than a "how do we implement it?", which may be resolved quicker in person.

The current implementation only allows for either shuffled validation snapshots and no observables or unshuffled validation snapshots and observables, just as you have mentioned.

I am wondering though what the intended use should be. I personally always use shuffled validation snapshots and no observables, but as I understand both you and @nerkulec use unshuffled validation snapshots and observables (or would like to at least incorporate that into the process). In that case, it may make sense to modify the entire interface, and subsume such a change to larger modifications of the data management/training subroutine?

What do you think?

elcorto · 2024-10-18T17:31:34Z

I agree that this is best discussed F2F. I'd volunteer to document the current state just as you summarized above, afterwards I think we can close this issue. To do this, there is for me still the question what the difference between the new validation_metrics and during_training_metric is (apart from the eval frequency via validate_every_n_epochs), as both can be used to calculate "total_energy", for instance, if I understood correctly. @nerkulec feel free to approach me outside GitHub as well for this :)

nerkulec · 2024-10-19T08:23:35Z

@elcorto The difference between validation_metrics and during_training_metric is that validation_metrics are used for logging to monitor the training process (that's why it's possible to have multiple of them) and during_training_metric is used specifically for things like learning rate scheduling and early stopping.

elcorto · 2024-11-22T16:31:51Z

This issue is not a blocker for the 1.3.0 release, but I would propose to keep it open until we have documented the workflow enabled by the new features from #584. Of course, if you consider those settings experimental or not to be set by users, please feel free to close this issue, since some form of "every N epochs" is implemented, albeit not (I think) the one I outlined above.

I offered to document things, but ATM I'm still not clear on when to use which setting (validation_metrics, validate_every_n_epochs, during_training_metric), for which setting one needs a full non-shuffled snapshot, etc. Quoting from above, these things could be part of a worked example which uses shuffled snapshots for training.

The current implementation only allows for either shuffled validation snapshots and no observables or unshuffled validation snapshots and observables, just as you have mentioned.

The difference between validation_metrics and during_training_metric is that validation_metrics are used for logging to monitor the training process (that's why it's possible to have multiple of them) and during_training_metric is used specifically for things like learning rate scheduling and early stopping.

In terms of docs, currently

only during_training_metric is part of the API docs
only validation_metrics and validate_every_n_epochs are used in examples/advanced/ex03_tensor_board.py

nerkulec · 2024-11-22T16:38:18Z

Regarding docstrings and documentation, I believe that's done in #609. Please tell me if something is missing.
Regarding your question: you need unshuffled snapshots if energy metrics are computed (both for validation_metrics and during_training_metric).

elcorto · 2024-11-22T17:31:20Z

Yes, sorry, I just saw #609. With the help of that, one needs to go ahead and use the settings in production in order to suggest doc improvements, so let's close this one then.

elcorto added the question Further information is requested label Aug 22, 2024

RandomDefaultUser added this to the v1.3.0 - Into the multi-GPU-verse milestone Oct 7, 2024

nerkulec mentioned this issue Oct 10, 2024

Validation every N steps #584

Merged

RandomDefaultUser closed this as completed Oct 14, 2024

elcorto reopened this Oct 16, 2024

RandomDefaultUser removed this from the v1.3.0 - Into the multi-GPU-niverse milestone Nov 21, 2024

elcorto mentioned this issue Nov 22, 2024

Fix missing parameter docstrings #609

Merged

elcorto closed this as completed Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitor observables every `N` epochs #573

Monitor observables every `N` epochs #573

elcorto commented Aug 22, 2024

RandomDefaultUser commented Oct 8, 2024 •

edited

Loading

nerkulec commented Oct 8, 2024

RandomDefaultUser commented Oct 9, 2024

nerkulec commented Oct 10, 2024

elcorto commented Oct 16, 2024 •

edited

Loading

RandomDefaultUser commented Oct 18, 2024

elcorto commented Oct 18, 2024

nerkulec commented Oct 19, 2024

elcorto commented Nov 22, 2024

nerkulec commented Nov 22, 2024

elcorto commented Nov 22, 2024 •

edited

Loading

Monitor observables every N epochs #573

Monitor observables every N epochs #573

Comments

elcorto commented Aug 22, 2024

RandomDefaultUser commented Oct 8, 2024 • edited Loading

nerkulec commented Oct 8, 2024

RandomDefaultUser commented Oct 9, 2024

nerkulec commented Oct 10, 2024

elcorto commented Oct 16, 2024 • edited Loading

RandomDefaultUser commented Oct 18, 2024

elcorto commented Oct 18, 2024

nerkulec commented Oct 19, 2024

elcorto commented Nov 22, 2024

nerkulec commented Nov 22, 2024

elcorto commented Nov 22, 2024 • edited Loading

Monitor observables every `N` epochs #573

Monitor observables every `N` epochs #573

RandomDefaultUser commented Oct 8, 2024 •

edited

Loading

elcorto commented Oct 16, 2024 •

edited

Loading

elcorto commented Nov 22, 2024 •

edited

Loading