MLPeta

This is a HPC Machine learning Challenge/Experiment for Parallel & Distributed Vectorization/Embedding 1 Petabyte of raw data from Storage into VectorDB.

Allowed Resources

Allowed resources are

OnPrem=Ubuntu with 10 x NVIDIA A100
GCP a2-ultragpu-8g/g2-standard-96/nvidia-tesla-v100 https://cloud.google.com/compute/docs/gpus#a100-gpus
AWS EKS/EC2 P4/G4/G4ad https://aws.amazon.com/blogs/aws/now-available-ec2-instances-g4-with-nvidia-t4-tensor-core-gpus/

Suggested Tools

Kubernetes (Auto Scaled Pods&Nodes)
Ray
Slurm ( https://github.com/SchedMD/slurm )
Python
Tensorflow

For K8s

Slurm Helm Chart https://github.com/stackhpc/slurm-k8s-cluster/tree/main/slurm-cluster-chart https://github.com/SchedMD/slurm

General Suggestions

Use Python decorators Use MultiThreading Use asyncio

The Flow

Read Storage (./storage/read.py)
Train
Store

DrawIO

Available draw-io diagram base for your use

Resources

https://medium.com/@55_learning/integrate-slurm-with-kubernetes-2637d9250fdd

Inspiration

https://www.storagereview.com/news/storagereview-lab-breaks-pi-calculation-world-record-with-over-202-trillion-digits

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
diagram		diagram
ray		ray
storage		storage
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLPeta

Allowed Resources

Suggested Tools

For K8s

General Suggestions

The Flow

DrawIO

Resources

Inspiration

Solution

About

Releases

Packages

Languages

amitsides/MLPeta

Folders and files

Latest commit

History

Repository files navigation

MLPeta

Allowed Resources

Suggested Tools

For K8s

General Suggestions

The Flow

DrawIO

Resources

Inspiration

Solution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages