-
Notifications
You must be signed in to change notification settings - Fork 528
Insights: skypilot-org/skypilot
Overview
Could not load contribution data
Please try again later
6 Pull requests merged by 5 people
-
Not mutate azure dep list at runtime
#4457 merged
Dec 11, 2024 -
[k8s] Fix show-gpus when running with incluster auth
#4452 merged
Dec 11, 2024 -
use lazy import for runpod
#4451 merged
Dec 9, 2024 -
[Feature] support spot pod on RunPod
#4447 merged
Dec 9, 2024 -
smoke tests support storage mount only
#4446 merged
Dec 9, 2024 -
[robustness] cover some potential resource leakage cases
#4443 merged
Dec 8, 2024
7 Pull requests opened by 5 people
-
Continue storage deletion when some fail
#4454 opened
Dec 10, 2024 -
[Core] Avoid high concurrency issue with control master
#4455 opened
Dec 10, 2024 -
add 1, 2, 4 size H100's to GCP
#4456 opened
Dec 10, 2024 -
detach the managed job controller from job submission
#4458 opened
Dec 11, 2024 -
[core] skip provider.availability_zone in the cluster config hash
#4463 opened
Dec 11, 2024 -
[Example] PyTorch distributed training with minGPT
#4464 opened
Dec 12, 2024 -
[k8s] Add validation for pod_config #4206
#4466 opened
Dec 12, 2024
21 Issues closed by 4 people
-
[Core] Provision an A10 GPU on Azure takes 20 minutes
#3718 closed
Dec 12, 2024 -
[UI] Ads on the SkyPilot documentation page
#4210 closed
Dec 12, 2024 -
cannot run `sky jobs logs -n <job_name>` on SUCCEEDED job
#4235 closed
Dec 12, 2024 -
[UX] Unnecessary logs from ray
#4300 closed
Dec 12, 2024 -
[Fluidstack] sky launch can leak instances when instance creation times out
#4392 closed
Dec 12, 2024 -
Unable to use NodePort in EKS
#3805 closed
Dec 12, 2024 -
[K8s] Cannot `sky show-gpus` for service account
#4152 closed
Dec 11, 2024 -
[k8s] Investigate and document `podPidsLimit` kubelet arg
#3412 closed
Dec 10, 2024 -
[Spot] Auto-translated bucket leakage if the spot job is not submitted correctly
#1225 closed
Dec 10, 2024 -
Race between status update and instance creation can cause resource leak
#4431 closed
Dec 9, 2024 -
[Core] Ray job refused to submit jobs in PENDING status
#4260 closed
Dec 9, 2024 -
[Jobs/Core] Leakage of `sky jobs cancel`
#4410 closed
Dec 9, 2024 -
runpod 4090 spot not available
#4265 closed
Dec 9, 2024 -
Spot instance support for runpod.
#3927 closed
Dec 9, 2024 -
`sky check` from one Kubernetes cluster to another failing
#3904 closed
Dec 9, 2024 -
[UX] Shortcut `k8s` for `kubernetes`
#4089 closed
Dec 9, 2024 -
[k8s] Parallelize pod initialization steps
#4229 closed
Dec 9, 2024 -
[k8s] Skip SSH setup for faster provisioning
#4225 closed
Dec 9, 2024 -
[k8s] multinode torch distributed nccl timeout
#3788 closed
Dec 8, 2024 -
RunPod H100 pricing / catalog needs refresh
#3794 closed
Dec 6, 2024
10 Issues opened by 7 people
-
[docs] Make YAML keys referenceable
#4462 opened
Dec 11, 2024 -
[k8s] Fail to ssh into the head node on k8s
#4461 opened
Dec 11, 2024 -
[Core] Launching on a just launched existing cluster with `--fast` does not skip the provision
#4460 opened
Dec 11, 2024 -
[UX] `gh repo clone` fail to work after `gh auth login` on cluster
#4459 opened
Dec 11, 2024 -
[Dev] Automatically source the sky environment for dev mode
#4453 opened
Dec 10, 2024 -
[UX] Additional message from OCI even though not enabled
#4450 opened
Dec 9, 2024 -
Latest skypilot image does not support azure accelerated networking and nccl
#4448 opened
Dec 8, 2024 -
[SERVE][AUTOSCALERS] Replica scaling sampling period and stability.
#4444 opened
Dec 5, 2024 -
[SERVE] Allow adjustment of scaling policies without redeployment
#4442 opened
Dec 5, 2024
43 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Support buildkite CICD and restructure smoke tests
#4396 commented on
Dec 12, 2024 • 21 new comments -
Preliminary Vast AI support
#4365 commented on
Dec 12, 2024 • 7 new comments -
Add Envoy as an alternative Sky Serve load balancer implementation
#4256 commented on
Dec 10, 2024 • 4 new comments -
[Release] Release 0.7.1
#4438 commented on
Dec 11, 2024 • 2 new comments -
[DigitalOcean] droplet integration
#3832 commented on
Dec 10, 2024 • 1 new comment -
[k8s] Run GPU labeller automatically on new nodes added to cluster
#3432 commented on
Dec 6, 2024 • 0 new comments -
[k8s] Add validation for `pod_config`
#4206 commented on
Dec 11, 2024 • 0 new comments -
[K8s] Error in k8s secret fetching breaks the provision failover loop
#4148 commented on
Dec 11, 2024 • 0 new comments -
[k8s] Jobs controller on stale context needs better error messages
#4268 commented on
Dec 11, 2024 • 0 new comments -
[k8s] Leaked kubectl port-forward processes
#4343 commented on
Dec 11, 2024 • 0 new comments -
Azure image-id from marketplace with :latest fails
#4435 commented on
Dec 12, 2024 • 0 new comments -
[DeepSpeed Example] Fail on AWS T4 due to package import issue
#4434 commented on
Dec 12, 2024 • 0 new comments -
[UX] Automatically source the skypilot runtime when ssh to the cluster and SKYPILOT_DEV=1
#4372 commented on
Dec 12, 2024 • 0 new comments -
[Core/UX] Improve the display of returncode for multi-node
#4232 commented on
Dec 12, 2024 • 0 new comments -
[UI] Empty Accelerator should raise an issue
#4153 commented on
Dec 12, 2024 • 0 new comments -
[UX] An annoying message in the provision log
#4102 commented on
Dec 12, 2024 • 0 new comments -
[Docs] Add docs for installing SkyPilot with pipx
#3490 commented on
Dec 12, 2024 • 0 new comments -
`sky launch` takes ~5s to print out optimizer table, which is slow
#3159 commented on
Dec 12, 2024 • 0 new comments -
[Core] Allow more PENDING jobs to be scheduled concurrently (1.4x faster)
#4311 commented on
Dec 6, 2024 • 0 new comments -
Mount cached mode
#4369 commented on
Dec 11, 2024 • 0 new comments -
[docs] Change urls to docs.skypilot.co, add 404 page
#4413 commented on
Dec 6, 2024 • 0 new comments -
[Serve] Add and adopt least load policy as default poicy.
#4439 commented on
Dec 10, 2024 • 0 new comments -
[Serve][k8s] K8s replica ports not detected
#3798 commented on
Dec 6, 2024 • 0 new comments -
[Bug][UX] Meaning of `DEVICE_MEM` for multi-GPU instance type is not aligned in `sky show-gpus`
#3434 commented on
Dec 8, 2024 • 0 new comments -
[k8s] Ambiguity when GPU labels overlap with an existing accelerator
#3562 commented on
Dec 9, 2024 • 0 new comments -
decorated functions are not properly typechecked
#4353 commented on
Dec 9, 2024 • 0 new comments -
Run with UV package manager
#4428 commented on
Dec 9, 2024 • 0 new comments -
[AWS] Bucket on eu-south-1 fail to copy/mount
#3405 commented on
Dec 10, 2024 • 0 new comments -
[Storage] `sky storage delete -a` aborted when deletion of one storage failed
#4050 commented on
Dec 10, 2024 • 0 new comments -
[K8s] Add a dedicated doc page for multiple kubernetes
#4000 commented on
Dec 11, 2024 • 0 new comments -
[k8s] Fix /dev/fuse access on Kubernetes
#4108 commented on
Dec 11, 2024 • 0 new comments -
[k8s] Support non-debian custom images
#4110 commented on
Dec 11, 2024 • 0 new comments -
[k8s] Change GPU base image to CUDA `devel` instead of `runtime`
#4122 commented on
Dec 11, 2024 • 0 new comments -
[k8s] Remote identity support when multiple contexts are configured
#4131 commented on
Dec 11, 2024 • 0 new comments -
[k8s] Requesting `--cpus 1.5` and starting a user Ray program crashes
#4190 commented on
Dec 11, 2024 • 0 new comments -
[k8s][gke][dws] autodown not toggled if file sync fails
#4170 commented on
Dec 11, 2024 • 0 new comments -
[k8s] Prevent mounting of /dev/shm in pods
#4233 commented on
Dec 11, 2024 • 0 new comments -
[k8s] Default image cannot install conda package in base env
#4374 commented on
Dec 11, 2024 • 0 new comments -
sky jobs launch on Kubernetes seems not working now
#4346 commented on
Dec 11, 2024 • 0 new comments -
[k8s] Support exec based auth kubeconfigs on controllers
#4379 commented on
Dec 11, 2024 • 0 new comments -
[k8s] L40 GPUs get detected as L4s
#4404 commented on
Dec 11, 2024 • 0 new comments -
[Storage] Refactor S3Store/R2Store to an abstract S3CompatibleStore class
#2687 commented on
Dec 11, 2024 • 0 new comments -
[k8s] Support multiple Kubernetes clusters
#2937 commented on
Dec 11, 2024 • 0 new comments