-
Notifications
You must be signed in to change notification settings - Fork 528
Insights: skypilot-org/skypilot
November 12, 2024 – December 12, 2024
Overview
Could not load contribution data
Please try again later
70 Pull requests merged by 16 people
-
Not mutate azure dep list at runtime
#4457 merged
Dec 11, 2024 -
[k8s] Fix show-gpus when running with incluster auth
#4452 merged
Dec 11, 2024 -
use lazy import for runpod
#4451 merged
Dec 9, 2024 -
[Feature] support spot pod on RunPod
#4447 merged
Dec 9, 2024 -
smoke tests support storage mount only
#4446 merged
Dec 9, 2024 -
[robustness] cover some potential resource leakage cases
#4443 merged
Dec 8, 2024 -
[k8s] Add resource limits only if they exist
#4440 merged
Dec 5, 2024 -
make --fast robust against credential or wheel updates
#4289 merged
Dec 4, 2024 -
[Minor] README updates.
#4436 merged
Dec 4, 2024 -
[perf] use uv for venv creation and pip install
#4414 merged
Dec 3, 2024 -
[Jobs] Allow logs for finished jobs and add
sky jobs logs --refresh
for restartin jobs controller#4380 merged
Dec 3, 2024 -
[Core] Execute setup when
--detach-setup
and norun
section#4430 merged
Dec 3, 2024 -
avoid catching ValueError during failover
#4432 merged
Dec 3, 2024 -
[k8s] Fix
show-gpus
availability map when nvidia drivers are not installed#4429 merged
Dec 3, 2024 -
update readme for test kubernetes example
#4426 merged
Dec 3, 2024 -
[Tests] Move tests to uv to speed up the dependency installation by >10x
#4424 merged
Dec 3, 2024 -
[k8s] Fix resources.image_id backward compatibility
#4425 merged
Nov 28, 2024 -
fix the pylint hook for pre-commit
#4422 merged
Nov 27, 2024 -
Add a pre commit config to help format before pushing
#4258 merged
Nov 27, 2024 -
[k8s] Update comparison page image
#4415 merged
Nov 27, 2024 -
[k8s] Fix in-cluster auth namespace fetching
#4420 merged
Nov 27, 2024 -
Event based smoke tests -- manged jobs
#4386 merged
Nov 26, 2024 -
[UX] Remove K80 and M60 from common GPU list
#4382 merged
Nov 26, 2024 -
Fix OD instance on Azure
#4411 merged
Nov 25, 2024 -
[AWS] Get rid of credential files if
remote_identity: SERVICE_ACCOUNT
specified#4395 merged
Nov 25, 2024 -
[UX] Allow disabling ports in CLI
#4378 merged
Nov 25, 2024 -
Fix Spot instance on Azure
#4408 merged
Nov 25, 2024 -
[k8s] Support in-cluster and kubeconfig auth simultaneously
#4188 merged
Nov 25, 2024 -
[Storage] Call
sync_file_mounts
when either rsync or storage file_mounts are specified#4317 merged
Nov 24, 2024 -
[k8s] Nimbus backward compatibility
#4400 merged
Nov 23, 2024 -
[k8s] Skip listing all pods to speed up optimizer
#4398 merged
Nov 23, 2024 -
remove
uv
from runtime setup due to azure installation issue#4401 merged
Nov 23, 2024 -
[Core] Skip worker ray start for multinode
#4390 merged
Nov 22, 2024 -
[k8s] Improve multi-node provisioning time (nimbus)
#4393 merged
Nov 22, 2024 -
[docs] Specify compartment for OCI resources.
#4384 merged
Nov 22, 2024 -
use uv for pip install and for venv creation
#4394 merged
Nov 21, 2024 -
[k8s] Move setup and ray start to pod args to make them async
#4389 merged
Nov 21, 2024 -
[Examples] Specify version for vllm cuz vllm v0.6.4.post1 has issue
#4391 merged
Nov 21, 2024 -
[OCI] set zone in the ProvisionRecord
#4383 merged
Nov 20, 2024 -
[Jobs] Disable deduplication for logs
#4388 merged
Nov 20, 2024 -
[UX] user-friendly message shown if Kubernetes is not enabled.
#4336 merged
Nov 20, 2024 -
Support event based smoke test instead of sleep time based to reduce flaky test and faster test
#4284 merged
Nov 20, 2024 -
[k8s] Handle apt update log not existing
#4381 merged
Nov 18, 2024 -
[ux] display human-readable name for controller
#4376 merged
Nov 18, 2024 -
[FluidStack] Fix provisioning and add new gpu types
#4359 merged
Nov 18, 2024 -
Add Lambda's GH200 instance type
#4377 merged
Nov 16, 2024 -
[Core] Add
NO_UPLOAD
forremote_identity
#4307 merged
Nov 16, 2024 -
[k8s] fix managed job issue on k8s
#4357 merged
Nov 16, 2024 -
[DAG] Integrate Data Storage Buckets for Data-Bearing Edges in Optimization
#4320 merged
Nov 16, 2024 -
[ux] cache cluster status of autostop or spot clusters for 2s
#4332 merged
Nov 15, 2024 -
[perf] optimizations for sky jobs launch
#4341 merged
Nov 15, 2024 -
[Docs] Fix ask ai location
#4370 merged
Nov 15, 2024 -
[Docs] Fix some issues with Managed Jobs example.
#4361 merged
Nov 15, 2024 -
[Core] Replace ray job submit for 3x/8.5x faster job scheduling for cluster/managed jobs
#4318 merged
Nov 15, 2024 -
[OCI] Enable SkyServe for OCI
#4338 merged
Nov 15, 2024 -
[fast] if cluster is INIT, force refresh before deciding to provision
#4328 merged
Nov 15, 2024 -
[Core] NoCloudAccessError check is escaped from storage sync
#4366 merged
Nov 15, 2024 -
[Serve] Update log pattern in
_follow_replica_logs
for new UX 3.0#4333 merged
Nov 14, 2024 -
[Jobs] Remove assertion for one single controller resources.
#4358 merged
Nov 14, 2024 -
[DAG] Run global optimization on controller for task placement
#4364 merged
Nov 14, 2024 -
Update
--env-file
to sky doc#4345 merged
Nov 14, 2024 -
[Docs] resize image and move path up a level.
#4354 merged
Nov 14, 2024 -
[Docs] Update k8s docs
#4352 merged
Nov 14, 2024 -
[smoke] if --generic-cloud is set, force enable that cloud
#4335 merged
Nov 14, 2024 -
Added user agent string for catalog downloading request
#4347 merged
Nov 13, 2024 -
[Catalog] fix GCP catalog missing SKUs
#4322 merged
Nov 13, 2024 -
[Docs] Add a concept page.
#4342 merged
Nov 13, 2024 -
[k8s] On-demand single-host TPU support on GKE
#3947 merged
Nov 13, 2024 -
Refactor: Consolidate log streaming logic into centralized
log_utils.follow_logs()
#4323 merged
Nov 13, 2024 -
improve tracing reporting and coverage
#4331 merged
Nov 12, 2024
24 Pull requests opened by 13 people
-
[k8s] support to use custom gpu resource name if it's not nvidia.com/gpu
#4337 opened
Nov 12, 2024 -
[Serve] Enable multiple ports in SkyServe replicas
#4356 opened
Nov 14, 2024 -
[WIP][Serve] Enable launching multiple external LB on controller.
#4362 opened
Nov 14, 2024 -
Preliminary Vast AI support
#4365 opened
Nov 15, 2024 -
Mount cached mode
#4369 opened
Nov 15, 2024 -
show logs for storage mount
#4387 opened
Nov 20, 2024 -
Support buildkite CICD and restructure smoke tests
#4396 opened
Nov 22, 2024 -
update image generation to use uv
#4399 opened
Nov 23, 2024 -
[Jobs] Move task retry logic to correct branch in `stream_logs_by_id`
#4407 opened
Nov 24, 2024 -
[docs] Change urls to docs.skypilot.co, add 404 page
#4413 opened
Nov 25, 2024 -
[azure] support for existing subnet
#4417 opened
Nov 26, 2024 -
Azure managed identity
#4418 opened
Nov 26, 2024 -
[Docs] Refactor pod_config docs
#4427 opened
Nov 28, 2024 -
[k8s] Minor: fix dictionary merging logic
#4437 opened
Dec 4, 2024 -
[Release] Release 0.7.1
#4438 opened
Dec 4, 2024 -
[Serve] Add and adopt least load policy as default poicy.
#4439 opened
Dec 4, 2024 -
[Jobs] Restart dashboard when refreshing the controller
#4441 opened
Dec 4, 2024 -
Continue storage deletion when some fail
#4454 opened
Dec 10, 2024 -
[Core] Avoid high concurrency issue with control master
#4455 opened
Dec 10, 2024 -
add 1, 2, 4 size H100's to GCP
#4456 opened
Dec 10, 2024 -
detach the managed job controller from job submission
#4458 opened
Dec 11, 2024 -
[core] skip provider.availability_zone in the cluster config hash
#4463 opened
Dec 11, 2024 -
[Example] PyTorch distributed training with minGPT
#4464 opened
Dec 12, 2024 -
[k8s] Add validation for pod_config #4206
#4466 opened
Dec 12, 2024
67 Issues closed by 9 people
-
[Core] Provision an A10 GPU on Azure takes 20 minutes
#3718 closed
Dec 12, 2024 -
[UI] Ads on the SkyPilot documentation page
#4210 closed
Dec 12, 2024 -
cannot run `sky jobs logs -n <job_name>` on SUCCEEDED job
#4235 closed
Dec 12, 2024 -
[UX] Unnecessary logs from ray
#4300 closed
Dec 12, 2024 -
[Fluidstack] sky launch can leak instances when instance creation times out
#4392 closed
Dec 12, 2024 -
Unable to use NodePort in EKS
#3805 closed
Dec 12, 2024 -
[K8s] Cannot `sky show-gpus` for service account
#4152 closed
Dec 11, 2024 -
[k8s] Investigate and document `podPidsLimit` kubelet arg
#3412 closed
Dec 10, 2024 -
[Spot] Auto-translated bucket leakage if the spot job is not submitted correctly
#1225 closed
Dec 10, 2024 -
Race between status update and instance creation can cause resource leak
#4431 closed
Dec 9, 2024 -
[Core] Ray job refused to submit jobs in PENDING status
#4260 closed
Dec 9, 2024 -
[Jobs/Core] Leakage of `sky jobs cancel`
#4410 closed
Dec 9, 2024 -
runpod 4090 spot not available
#4265 closed
Dec 9, 2024 -
Spot instance support for runpod.
#3927 closed
Dec 9, 2024 -
`sky check` from one Kubernetes cluster to another failing
#3904 closed
Dec 9, 2024 -
[UX] Shortcut `k8s` for `kubernetes`
#4089 closed
Dec 9, 2024 -
[k8s] Parallelize pod initialization steps
#4229 closed
Dec 9, 2024 -
[k8s] Skip SSH setup for faster provisioning
#4225 closed
Dec 9, 2024 -
[k8s] multinode torch distributed nccl timeout
#3788 closed
Dec 8, 2024 -
RunPod H100 pricing / catalog needs refresh
#3794 closed
Dec 6, 2024 -
[Core] `setup` will not run if you don't have a `run` section and `--detach-setup` is specified
#4419 closed
Dec 3, 2024 -
[k8s] Failure to detect Kind when running serve controller with Kind
#3782 closed
Dec 2, 2024 -
[Serve] handling sky cli version update for running services.
#3768 closed
Nov 29, 2024 -
[k8s] Kubernetes-native deployment for SkyPilot
#3278 closed
Nov 29, 2024 -
[Catalog] AWS K80 image removed, causing any K80 launch to fail
#3273 closed
Nov 29, 2024 -
[Serve] Ollama CLI does not work over SkyServe
#3766 closed
Nov 28, 2024 -
[Tests] Add unit tests for `show-gpus` behavior
#3539 closed
Nov 28, 2024 -
Skypilot 0.7.0 tries to start non-spot instances when requesting spot instances on Azure
#4406 closed
Nov 25, 2024 -
[Storage] set_storage_mounts not working in python API
#4315 closed
Nov 24, 2024 -
[k8s] Support configuring annotations when launching
#3757 closed
Nov 24, 2024 -
[k8s] Force deletion of misbehaving pods
#3755 closed
Nov 24, 2024 -
[k8s] Nimbus backward compatibility issues
#4397 closed
Nov 23, 2024 -
[GCP/Spot] Skypilot GCP user credentials expire on controller with SSO
#2738 closed
Nov 23, 2024 -
Does sky still supports local cluster mode?
#4385 closed
Nov 20, 2024 -
Support TPU with docker
#2217 closed
Nov 19, 2024 -
Skypilot- facing AuthorizationError while deploying on Azure
#3745 closed
Nov 19, 2024 -
[Serve] Allow scaling of replicas based on response time
#3686 closed
Nov 17, 2024 -
[AWS] Use AWS_PROFILE if set locally
#2737 closed
Nov 17, 2024 -
[Catalog] Lambda catalog fetcher fails due to GH200
#4375 closed
Nov 16, 2024 -
[k8s] Move socat/netcat dependency checks to `sky check` for k8s
#3252 closed
Nov 16, 2024 -
[k8s] Running docker on SkyPilot k8s cluster
#3062 closed
Nov 16, 2024 -
[Core] NoCloudAccessError check is escaped from storage sync
#4367 closed
Nov 16, 2024 -
Implement `with_data` API for Edge-Based Data Flow in Task DAGs
#4254 closed
Nov 15, 2024 -
requirements.txt cleanup
#4371 closed
Nov 15, 2024 -
[Core] Speed up job scheduling speed on unmanaged jobs
#4295 closed
Nov 15, 2024 -
[Jobs] Speed up the time for managed jobs to be scheduled
#4294 closed
Nov 15, 2024 -
[Core] Cancel 1000 jobs can take 5-10 mins
#4293 closed
Nov 15, 2024 -
[Kubernates] Not user-friendly message shown if Kubernates is not enabled.
#4324 closed
Nov 15, 2024 -
[UX] Verify ssh proxy command before launching
#3286 closed
Nov 15, 2024 -
[Storage] Refactor _validate method from subclasses of AbstractStore class
#3723 closed
Nov 14, 2024 -
[100-jobs/Spot] More efficient batch spot jobs submission
#3190 closed
Nov 14, 2024 -
Feature Request OVH public cloud
#2974 closed
Nov 14, 2024 -
[AWS] Not robust identity checking
#4350 closed
Nov 13, 2024 -
[Fluidstack] Start firewall on fluidstack clusters
#3268 closed
Nov 13, 2024 -
[UX] `sky logs` should use the last running job instead of the last job
#3264 closed
Nov 13, 2024 -
[Spot] Autoscaling of spot controller based on the load
#3189 closed
Nov 13, 2024 -
[Core] Avoid calling ray stop for cluster created with new provisioner when terminating
#3183 closed
Nov 13, 2024 -
[k8s] Fail to install package in the base conda env on k8s default image
#3161 closed
Nov 13, 2024 -
[API] Better way to get IP for clusters and endpoints for service
#3053 closed
Nov 13, 2024 -
AWS: Support for EC2 Launch Templates
#2700 closed
Nov 13, 2024 -
[GPU] Add support for AMD GPUs
#2648 closed
Nov 13, 2024 -
[spot dashboard/UI] Allow showing live logs of running jobs
#2108 closed
Nov 13, 2024 -
[catalog] Automatically update Azure catalog
#1628 closed
Nov 13, 2024 -
Investigate making unmanaged spot instances (auto-)stoppable & resumable
#1448 closed
Nov 13, 2024
38 Issues opened by 13 people
-
[docs] Make YAML keys referenceable
#4462 opened
Dec 11, 2024 -
[k8s] Fail to ssh into the head node on k8s
#4461 opened
Dec 11, 2024 -
[Core] Launching on a just launched existing cluster with `--fast` does not skip the provision
#4460 opened
Dec 11, 2024 -
[UX] `gh repo clone` fail to work after `gh auth login` on cluster
#4459 opened
Dec 11, 2024 -
[Dev] Automatically source the sky environment for dev mode
#4453 opened
Dec 10, 2024 -
[UX] Additional message from OCI even though not enabled
#4450 opened
Dec 9, 2024 -
Latest skypilot image does not support azure accelerated networking and nccl
#4448 opened
Dec 8, 2024 -
[SERVE][AUTOSCALERS] Replica scaling sampling period and stability.
#4444 opened
Dec 5, 2024 -
[SERVE] Allow adjustment of scaling policies without redeployment
#4442 opened
Dec 5, 2024 -
Azure image-id from marketplace with :latest fails
#4435 opened
Dec 3, 2024 -
[DeepSpeed Example] Fail on AWS T4 due to package import issue
#4434 opened
Dec 3, 2024 -
[Jobs/Serve] Warning for credentials that requires reauth
#4433 opened
Dec 3, 2024 -
Run with UV package manager
#4428 opened
Dec 2, 2024 -
[Tests] Add tests for Azure spot instance and file_mounts with only storage mounts
#4423 opened
Nov 27, 2024 -
[Core] Failure in pytorch distributed training code failed to get a job into FAILED state
#4421 opened
Nov 27, 2024 -
[Storage] Support disable exclude .gitignore
#4416 opened
Nov 26, 2024 -
[Core] Autostop/autodown fails on AWS
#4412 opened
Nov 25, 2024 -
[Core] More robust authentication on remote cluster
#4409 opened
Nov 25, 2024 -
Pylint comments appear in the API documentation (e.g., `# pylint: disable=line-too-long`)
#4405 opened
Nov 24, 2024 -
[k8s] L40 GPUs get detected as L4s
#4404 opened
Nov 24, 2024 -
[Storage] Investigate GeeseFS for s3 mounting
#4403 opened
Nov 23, 2024 -
[k8s] Support exec based auth kubeconfigs on controllers
#4379 opened
Nov 17, 2024 -
[k8s] Default image cannot install conda package in base env
#4374 opened
Nov 15, 2024 -
[Core] Expired credentials causes unexpected failure of `sky launch`
#4373 opened
Nov 15, 2024 -
[UX] Automatically source the skypilot runtime when ssh to the cluster and SKYPILOT_DEV=1
#4372 opened
Nov 15, 2024 -
[Serve] Feature request: support num_nodes for the Controller
#4368 opened
Nov 15, 2024 -
[Core] Environment variables should be parsed at task execution, not `sky.Task` instantiation
#4363 opened
Nov 14, 2024 -
GCS file mount sync hangs if GCP credentials are expired
#4360 opened
Nov 14, 2024 -
[Core] Unblock user program for SIGINT
#4355 opened
Nov 14, 2024 -
decorated functions are not properly typechecked
#4353 opened
Nov 14, 2024 -
remove empty file mount from yaml config
#4351 opened
Nov 13, 2024 -
[Serve] Failure-count based unrecoverable failure detection
#4349 opened
Nov 13, 2024 -
[Serve] Fall back to latest ready version when detects unrecoverable failure
#4348 opened
Nov 13, 2024 -
sky jobs launch on Kubernetes seems not working now
#4346 opened
Nov 13, 2024 -
Doesn't use right GCP config path on Windows
#4344 opened
Nov 13, 2024 -
[k8s] Leaked kubectl port-forward processes
#4343 opened
Nov 13, 2024 -
[timeline] disable trace collection if SKYPILOT_TIMELINE_FILE_PATH is not set
#4340 opened
Nov 12, 2024
57 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[Jobs] Allowing to specify intermediate bucket for file upload
#4257 commented on
Dec 2, 2024 • 46 new comments -
Add Envoy as an alternative Sky Serve load balancer implementation
#4256 commented on
Dec 10, 2024 • 40 new comments -
Replace `len()` Zero Checks with Pythonic Empty Sequence Checks
#4298 commented on
Nov 15, 2024 • 13 new comments -
[DigitalOcean] droplet integration
#3832 commented on
Dec 10, 2024 • 8 new comments -
Add hourly price and instance type to env SKYPILOT_CLUSTER_INFO
#4326 commented on
Nov 12, 2024 • 1 new comment -
[Core] Support Tailscale VPN
#4025 commented on
Nov 30, 2024 • 1 new comment -
[UX] An annoying message in the provision log
#4102 commented on
Dec 12, 2024 • 0 new comments -
[UI] Empty Accelerator should raise an issue
#4153 commented on
Dec 12, 2024 • 0 new comments -
[Core/UX] Improve the display of returncode for multi-node
#4232 commented on
Dec 12, 2024 • 0 new comments -
[k8s] Jobs controller on stale context needs better error messages
#4268 commented on
Dec 11, 2024 • 0 new comments -
[K8s] Error in k8s secret fetching breaks the provision failover loop
#4148 commented on
Dec 11, 2024 • 0 new comments -
[k8s] Add validation for `pod_config`
#4206 commented on
Dec 11, 2024 • 0 new comments -
[k8s] Support multiple Kubernetes clusters
#2937 commented on
Dec 11, 2024 • 0 new comments -
[Storage] Refactor S3Store/R2Store to an abstract S3CompatibleStore class
#2687 commented on
Dec 11, 2024 • 0 new comments -
[k8s] Prevent mounting of /dev/shm in pods
#4233 commented on
Dec 11, 2024 • 0 new comments -
[k8s][gke][dws] autodown not toggled if file sync fails
#4170 commented on
Dec 11, 2024 • 0 new comments -
[k8s] Requesting `--cpus 1.5` and starting a user Ray program crashes
#4190 commented on
Dec 11, 2024 • 0 new comments -
[cudo] Provisioning failing with Invalid value for `count_vm_available`
#3829 commented on
Nov 13, 2024 • 0 new comments -
[Docs] Add docs for installing SkyPilot with pipx
#3490 commented on
Dec 12, 2024 • 0 new comments -
`sky launch` takes ~5s to print out optimizer table, which is slow
#3159 commented on
Dec 12, 2024 • 0 new comments -
feat: add flux provisioner
#3777 commented on
Nov 22, 2024 • 0 new comments -
Fix issue 3744: Regions are not respected for buckets created with sky launch
#3789 commented on
Dec 2, 2024 • 0 new comments -
[Cudo] privte networks and API/fetch fix
#3841 commented on
Nov 12, 2024 • 0 new comments -
[UX] Add infeasibility reasons to the exception message
#3986 commented on
Nov 29, 2024 • 0 new comments -
[Spot/Serve] Optimize the translation of filemounts
#4016 commented on
Nov 14, 2024 • 0 new comments -
[Jobs] Limit number of concurrent jobs & launches.
#4248 commented on
Nov 15, 2024 • 0 new comments -
[Core] Allow more PENDING jobs to be scheduled concurrently (1.4x faster)
#4311 commented on
Dec 6, 2024 • 0 new comments -
[WIP] Advanced DAG Workflow.
#4319 commented on
Nov 16, 2024 • 0 new comments -
[Jobs] Fast jobs cancellation for PENDING managed jobs
#4321 commented on
Nov 13, 2024 • 0 new comments -
[feature] better handling of failed rollouts
#4312 commented on
Nov 13, 2024 • 0 new comments -
`sky storage ls` should take bucket name as an arg, same as `status`
#2284 commented on
Nov 14, 2024 • 0 new comments -
[Storage][k8s] Support mounting data sources on Kubernetes cluster
#2497 commented on
Nov 14, 2024 • 0 new comments -
[Storage] Precheck the existence of file mounts.
#2505 commented on
Nov 14, 2024 • 0 new comments -
[core][k8s][perf] `service_catalog.list_accelerators` is called on every `resource.copy()`
#2725 commented on
Nov 14, 2024 • 0 new comments -
[Feature] Autostop definition in YAML?
#3953 commented on
Nov 15, 2024 • 0 new comments -
Bug: `stream_logs_by_id` incorrectly handles task retry logic
#4250 commented on
Nov 15, 2024 • 0 new comments -
[UX] Launch on existing cluster should be very fast
#4157 commented on
Nov 15, 2024 • 0 new comments -
[Core] Support image id when using docker as runtime environment
#3341 commented on
Nov 18, 2024 • 0 new comments -
Using smaller images when docker is used
#2218 commented on
Nov 19, 2024 • 0 new comments -
[clouds] Support Flux Framework as a backend
#3751 commented on
Nov 24, 2024 • 0 new comments -
[Serve] Expose multiple ports while using sky serve
#3727 commented on
Nov 26, 2024 • 0 new comments -
Cannot choose Custom V-net and custom images in Azure
#2910 commented on
Nov 26, 2024 • 0 new comments -
Custom images: challenges & problems to solve
#2673 commented on
Dec 2, 2024 • 0 new comments -
[core] Make schemas more visible
#3428 commented on
Dec 3, 2024 • 0 new comments -
[Optimizer] Check the unsupported features before the optimization
#2233 commented on
Dec 4, 2024 • 0 new comments -
[SkyServe] Custom Setup Command on Controller
#2994 commented on
Dec 4, 2024 • 0 new comments -
[k8s] Run GPU labeller automatically on new nodes added to cluster
#3432 commented on
Dec 6, 2024 • 0 new comments -
[Serve][k8s] K8s replica ports not detected
#3798 commented on
Dec 6, 2024 • 0 new comments -
[Bug][UX] Meaning of `DEVICE_MEM` for multi-GPU instance type is not aligned in `sky show-gpus`
#3434 commented on
Dec 8, 2024 • 0 new comments -
[k8s] Ambiguity when GPU labels overlap with an existing accelerator
#3562 commented on
Dec 9, 2024 • 0 new comments -
[AWS] Bucket on eu-south-1 fail to copy/mount
#3405 commented on
Dec 10, 2024 • 0 new comments -
[Storage] `sky storage delete -a` aborted when deletion of one storage failed
#4050 commented on
Dec 10, 2024 • 0 new comments -
[K8s] Add a dedicated doc page for multiple kubernetes
#4000 commented on
Dec 11, 2024 • 0 new comments -
[k8s] Fix /dev/fuse access on Kubernetes
#4108 commented on
Dec 11, 2024 • 0 new comments -
[k8s] Support non-debian custom images
#4110 commented on
Dec 11, 2024 • 0 new comments -
[k8s] Change GPU base image to CUDA `devel` instead of `runtime`
#4122 commented on
Dec 11, 2024 • 0 new comments -
[k8s] Remote identity support when multiple contexts are configured
#4131 commented on
Dec 11, 2024 • 0 new comments