Added Gaussian Mixture Models as a toy distribution #28

thelostscout · 2024-05-24T11:52:47Z

I thought it might be useful to have access to gaussian mixture model toy distributions

LarsKue · 2024-05-24T12:39:23Z

Thank you for the addition! Can you briefly explain in the docstring what this distribution/dataset looks like (e.g. in 2D), and how it differs from the Hypersphere dataset?

thelostscout · 2024-05-24T17:51:20Z

I don't think its too comparable to the hyperspheres from what I understand. In this case, the user controls the placement of all gaussian blobs as well as their weights and standard deviations.
Do you think it would be sensible to reduce the complexity of the creation of the distributions through the reduction to high level arguments like the number of mixtures?

LarsKue · 2024-06-03T08:34:43Z

Yes, controlling the datasets via high-level hyperparameters in a similar fashion to how we construct models is the core philosophy of this library. Would you like to add this?

thelostscout · 2024-06-03T12:18:11Z

I think random generation will move it much closer towards the hyperspheres dataset, dependent on how the generation of the means and stddevs is implemented. It might make the addition obsolete.

LarsKue · 2024-06-03T12:26:07Z

In that case, let's stick to the hyperspheres dataset. You are welcome to add more generation modes to the hyperspheres dataset, though. For instance, we could replace it with something like

class MixtureDataset:
    def __init__(self, mode="spheres"):
        match mode:
            case "cubes": ...
            case "spheres": ...
            case "random": ...

where each mode changes the behaviour of the generation of the mean and std.

thelostscout · 2024-06-03T13:39:32Z

My issue with random creation is lacking reproducability. Say I wanted to learn a distribution with two gaussians and compare the loss values for different network types. In this case the loss will be different depending on the overlap and position of the means usually.

LarsKue · 2024-06-03T15:38:03Z

For this, you can either copy the dataset directly or use a seed (see lightning.seed_everything) before sampling.

thelostscout · 2024-06-05T14:33:09Z

Hmm, I was thinking of a method that uses its own rng to be able to seed the dataset generation without affecting the rest of the process. But maybe thats not necessary

thelostscout · 2024-06-05T14:52:39Z

In that case, let's stick to the hyperspheres dataset. You are welcome to add more generation modes to the hyperspheres dataset, though. For instance, we could replace it with something like
class MixtureDataset:
    def __init__(self, mode="spheres"):
        match mode:
            case "cubes": ...
            case "spheres": ...
            case "random": ...
where each mode changes the behaviour of the generation of the mean and std.

Yes, I can implement a hypercube version. I think I would place the blobs on the corners (and hence have an upper bound for the amount of centers).
Would you consider a name change of the dataset to something like make_blobs or gmm? When I looked at the different datasets I assumed that hyperspheres would do something similar to hypershells, rather make a gaussian mixture model after a certain distribution rule.

LarsKue · 2024-06-05T14:56:24Z

Would you consider a name change of the dataset to something like make_blobs or gmm?

I would welcome a name change consistent with the implementation of changes. However, let's stick to ML jargon, using Dataset in place of Model where possible.

thelostscout · 2024-06-05T16:22:15Z

Ok, the trivial way would be to name it GaussianMixtureDataset. Or maybe MultiGaussianDataset? Or DistributedBlobsDataset?

thelostscout · 2024-06-06T09:37:01Z

Ok, see new pr

Added Gaussian Mixture Models as a toy distribution

2e1b6ff

Improved docstring

a8cd033

thelostscout closed this Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Gaussian Mixture Models as a toy distribution #28

Added Gaussian Mixture Models as a toy distribution #28

thelostscout commented May 24, 2024

LarsKue commented May 24, 2024

thelostscout commented May 24, 2024

LarsKue commented Jun 3, 2024

thelostscout commented Jun 3, 2024

LarsKue commented Jun 3, 2024

thelostscout commented Jun 3, 2024

LarsKue commented Jun 3, 2024

thelostscout commented Jun 5, 2024

thelostscout commented Jun 5, 2024

LarsKue commented Jun 5, 2024

thelostscout commented Jun 5, 2024

thelostscout commented Jun 6, 2024

Added Gaussian Mixture Models as a toy distribution #28

Added Gaussian Mixture Models as a toy distribution #28

Conversation

thelostscout commented May 24, 2024

LarsKue commented May 24, 2024

thelostscout commented May 24, 2024

LarsKue commented Jun 3, 2024

thelostscout commented Jun 3, 2024

LarsKue commented Jun 3, 2024

thelostscout commented Jun 3, 2024

LarsKue commented Jun 3, 2024

thelostscout commented Jun 5, 2024

thelostscout commented Jun 5, 2024

LarsKue commented Jun 5, 2024

thelostscout commented Jun 5, 2024

thelostscout commented Jun 6, 2024