- Step 1. Write Quick Start Configuration
- Step 2. Generate OpenPAI configuration files
- Optional Step 3. Customize configure OpenPAI
There is a example file in the link .
An example yaml file is shown below. Note that you should change the IP address of the machine and ssh information accordingly.
# quick-start.yaml
# (Required) Please fill in the IP address of the server you would like to deploy OpenPAI
machines:
- 192.168.1.11
- 192.168.1.12
- 192.168.1.13
# (Required) Log-in info of all machines. System administrator should guarantee
# that the username/password pair or username/key-filename is valid and has sudo privilege.
ssh-username: pai
ssh-password: pai-password
# (Optional, default=None) the key file that ssh client uses, that has higher priority then password.
#ssh-keyfile-path: <keyfile-path>
# (Optional, default=22) Port number of ssh service on each machine.
#ssh-port: 22
# (Optional, default=DNS of the first machine) Cluster DNS.
#dns: <ip-of-dns>
# (Optional, default=10.254.0.0/16) IP range used by Kubernetes. Note that
# this IP range should NOT conflict with the current network.
#service-cluster-ip-range: <ip-range-for-k8s>
cd /pai
# cmd should be executed under pai directory in the dev-box.
python paictl.py config generate -i /pai/deployment/quick-start/quick-start.yaml -o ~/pai-config -f
vi ~/pai-config/services-configuration.yaml
For example: v0.x.y branch, user should change docker-tag to v0.x.y.
docker-tag: v0.x.y
Quick start will generate node with 1 gpu with type generic, this may not suit your situation, for example, if you have two types of machines, and one type has 4 Tesla K80 gpu cards, and another has 2 Tesla P100 cards, you should modify your ~/pai-config/layout.yaml as following:
machine-sku:
k80-node:
mem: 40G
gpu:
type: Tesla K80
count: 4
cpu:
vcore: 24
os: ubuntu16.04
p100-node:
mem: 20G
gpu:
type: Tesla P100
count: 2
cpu:
vcore: 24
os: ubuntu16.04
machine-list:
- hostname: xxx
hostip: yyy
machine-type: k80-node
- hostname: xxx
hostip: yyy
machine-type: p100-node
The paictl
tool sets the following default values in the 4 configuration files:
<th>
Default value
</th>
<td>
The first machine in the machine list will be configured as the master node.
</td>
<td>
If not explicitly specified, the SSH port is set to <code>22</code>.
</td>
<td>
If not explicitly specified, the cluster DNS is set to the value of the <code>nameserver</code> field in <code>/etc/resolv.conf</code> file of the master node.
</td>
<td>
If not explicitly specified, the IP range used by Kubernetes is set to <code>10.254.0.0/16</code>.
</td>
<td>
The docker registry is set to <code>docker.io</code>, and the docker namespace is set to <code>openpai</code>. In another word, all PAI service images will be pulled from <code>docker.io/openpai</code> (see <a href="https://hub.docker.com/r/openpai/">this link</a> on DockerHub for the details of all images).
</td>
<td>
Cluster id is set to <code>pai-example</code>
</td>
<td>
REST server's admin user is set to <code>admin</code>, and its password is set to <code>admin-password</code>
</td>
<td>
There is only one VC in the system, <code>default</code>, which has 100% of the resource capacity.
</td>
Configuration Property |
---|
|
|
|
|
|
|
|
|
This method is for advanced users.
The description of each field in these configuration files can be found in A Guide For Cluster Configuration.
If user want to customize configuration, please see the table below
-
Configure OpenPAI from scenarios
- placement
- scheduling
- account
- port / data folder etc.
- component version
- HA
-
- Cluster related configuration: configuration of layout.yaml
- Kubernetes role related configuration: It will be deprecated
- Kubernetes related configuration: configuration of kubernetes-configuration.yaml
- Service related configuration: configuration of services-configuration.yaml
-
Configure OpenPAI services [Note: This part is for advanced user who wants to customize OpenPAI each service]