Name		Name	Last commit message	Last commit date
parent directory ..
templates		templates
.helmignore		.helmignore
Chart.yaml		Chart.yaml
OWNERS		OWNERS
README.md		README.md
distributed-tf.jpg		distributed-tf.jpg
values.yaml		values.yaml

README.md

Distributed TensorFlow

TensorFlow is an open source software library for numerical computation using data flow graphs, and it supports distributed computing, allowing 'data parallel' or 'model parallel' on different servers. This means data scientists can now scale out distributed training to 100s of GPUs using TensorFlow.

Prerequisites

Kubernetes cluster v1.12+

Chart Details

This chart will create a TensorFlow cluster, and distribute a computation graph across that cluster.

Installing the Chart

To install the chart with the release name mnist:

$ helm install mnist incubator/distributed-tensorflow

To install with custom values via file:

$ helm install --values values.yaml mnist incubator/distributed-tensorflow

Below is an example of the custom value file values.yaml with GPU support.

worker:
  number: 2
  podManagementPolicy: Parallel
  image:
    repository: dysproz/distributed-tf
    tag: 1.6.0-gpu
  port: 9090
  gpuCount: 1

ps:
  number: 2
  podManagementPolicy: Parallel
  image:
    repository: dysproz/distributed-tf
    tag: 1.6.0
    pullPolicy: IfNotPresent
  port: 8080

# optimize for training
hyperparams:
  batchsize: 20
  learningrate: 0.001
  trainsteps: 10000

Notice: you can check the details of docker image from Docker hub

Uninstalling the Chart

To uninstall/delete the mnist deployment:
```
 $ helm delete mnist
```

The command removes all the Kubernetes components associated with the chart and deletes the release.

Configuration

The following table lists the configurable parameters of the Service Distributed Tensorflow chart and their default values.

Parameter	Description	Default
`worker.image.repository`	TensorFlow Worker Server's image repository	`dysproz/distributed-tf`
`worker.image.tag`	TensorFlow Worker Server's image tag	`gpu`
`worker.image.pullPolicy`	image pullPolicy for the worker	`IfNotPresent`
`worker.gpuCount`	Set the gpu to be allocated and allowed for the Pods	`0`
`worker.env`	key-value environment variables	None
`ps.image.repository`	TensorFlow Parameter Server's image repository	`dysproz/distributed-tf`
`ps.image.tag`	TensorFlow Parameter Server's image tag	`1.7.0-gpu`
`ps.image.pullPolicy`	image pullPolicy for the ps	`IfNotPresent`
`ps.env`	key-value environment variables	None
`volumes`	List of volumes defined in cluster (in standard k8s format)	host path to /tmp/mnist
`volumeMounts`	Volumes mounted into Pods (in standard k8s format)	host path volume mounted into /tmp/mnist-log
`hyperparams.batchsize`	batch size	`20`
`hyperparams.learningrate`	learning rate	`0.001`
`hyperparams.trainsteps`	train steps	`0` (continuous run)
`hyperparams.datadir`	data directory	None
`hyperparams.logdir`	logging directory	None
`hyperparams.hiddenunits`	Hidden units in neural network	None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distributed-tensorflow

distributed-tensorflow

README.md

Distributed TensorFlow

Prerequisites

Chart Details

Installing the Chart

Uninstalling the Chart

Configuration

Files

distributed-tensorflow

Directory actions

More options

Directory actions

More options

Latest commit

History

distributed-tensorflow

Folders and files

parent directory

README.md

Distributed TensorFlow

Prerequisites

Chart Details

Installing the Chart

Uninstalling the Chart

Configuration