Skip to content

Latest commit

 

History

History
252 lines (191 loc) · 7.78 KB

how-to-generate-cluster-config.md

File metadata and controls

252 lines (191 loc) · 7.78 KB

Generate configuration from template

Index

Step 1. Write Quick start

There is a example file in the link .

An example yaml file is shown below. Note that you should change the IP address of the machine and ssh information accordingly.

# quick-start.yaml

# (Required) Please fill in the IP address of the server you would like to deploy OpenPAI
machines:

  - 192.168.1.11
  - 192.168.1.12
  - 192.168.1.13

# (Required) Log-in info of all machines. System administrator should guarantee
# that the username/password pair or username/key-filename is valid and has sudo privilege.
ssh-username: pai
ssh-password: pai-password

# (Optional, default=None) the key file that ssh client uses, that has higher priority then password.
#ssh-keyfile-path: <keyfile-path>

# (Optional, default=22) Port number of ssh service on each machine.
#ssh-port: 22

# (Optional, default=DNS of the first machine) Cluster DNS.
#dns: <ip-of-dns>

# (Optional, default=10.254.0.0/16) IP range used by Kubernetes. Note that
# this IP range should NOT conflict with the current network.
#service-cluster-ip-range: <ip-range-for-k8s>

Step 2. Generate OpenPAI configuration files

(1) generate configuration files
cd /pai

# cmd should be executed under pai directory in the dev-box.

python paictl.py config generate -i /pai/deployment/quick-start/quick-start.yaml -o ~/pai-config -f
(2) update docker tag to release version
vi ~/pai-config/services-configuration.yaml

For example: v0.x.y branch, user should change docker-tag to v0.x.y.

docker-tag: v0.x.y
(3) changing gpu count and type

Quick start will generate node with 1 gpu with type generic, this may not suit your situation, for example, if you have two types of machines, and one type has 4 Tesla K80 gpu cards, and another has 2 Tesla P100 cards, you should modify your ~/pai-config/layout.yaml as following:

machine-sku:
  k80-node:
    mem: 40G
    gpu:
      type: Tesla K80
      count: 4
    cpu:
      vcore: 24
    os: ubuntu16.04
  p100-node:
    mem: 20G
    gpu:
      type: Tesla P100
      count: 2
    cpu:
      vcore: 24
    os: ubuntu16.04

machine-list:

  - hostname: xxx
    hostip: yyy
    machine-type: k80-node
  - hostname: xxx
    hostip: yyy
    machine-type: p100-node
(4) The default value in the generated configuration

The paictl tool sets the following default values in the 4 configuration files:

<th>
  Default value
</th>
<td>
  The first machine in the machine list will be configured as the master node.
</td>
<td>
  If not explicitly specified, the SSH port is set to <code>22</code>.
</td>
<td>
  If not explicitly specified, the cluster DNS is set to the value of the <code>nameserver</code> field in <code>/etc/resolv.conf</code> file of the master node.
</td>
<td>
  If not explicitly specified, the IP range used by Kubernetes is set to <code>10.254.0.0/16</code>.
</td>
<td>
  The docker registry is set to <code>docker.io</code>, and the docker namespace is set to <code>openpai</code>. In another word, all PAI service images will be pulled from <code>docker.io/openpai</code> (see <a href="https://hub.docker.com/r/openpai/">this link</a> on DockerHub for the details of all images).
</td>
<td>
  Cluster id is set to <code>pai-example</code>
</td>
<td>
  REST server's admin user is set to <code>admin</code>, and its password is set to <code>admin-password</code>
</td>
<td>
  There is only one VC in the system, <code>default</code>, which has 100% of the resource capacity.
</td>
Configuration Property
master node
SSH port
cluster DNS
IP range used by Kubernetes
docker registry
Cluster id
REST server's admin user
VC

Optional Step 3. Customize configure OpenPAI

This method is for advanced users.

The description of each field in these configuration files can be found in A Guide For Cluster Configuration.

If user want to customize configuration, please see the table below