forked from CHTC/chtc-website-source
-
Notifications
You must be signed in to change notification settings - Fork 0
/
roadmap.shtml
262 lines (202 loc) · 11.4 KB
/
roadmap.shtml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
<!--#set var="title" value="Roadmap to Running Jobs"-->
<!--#include virtual="/includes/template-1-openhead.shtml" -->
<!--#include virtual="/includes/template-2-opensidebar.shtml" -->
<!--#include virtual="/includes/template-3-sidebar.shtml" -->
<!--#include virtual="/includes/template-4-mainbody.shtml" -->
<p>This guide assumes that you have gotten an account at CHTC
on one of our compute systems. If you do *not* have an account at CHTC (or if you
aren't sure), fill out the <a href="{{'/form' | relative_url }}">account request
form</a> or email CHTC's research computing facilitators at
<a href="">[email protected]</a>. </p>
<h1>Overview</h1>
<p>Once you have an account at CHTC, there are 4 common steps to getting started
with running work. The very first time you use CHTC resources, you should plan to go
through all of the following steps; as you continue to use CHTC and do new kinds
of work, you'll need to go through steps 3 - 4 with each new kind of job submission. </p>
<b>1. Log In and Move Files</b><br>
<p>As a pre-requisite to using any CHTC resources, you will need to be able to:
<ul>
<li>log in to a remote server</li>
<li>transfer files from a remote server to another
computer that you can access</li>
<li>have basic command line skills for navigating
directories (folders) and manipulating files.</li>
</ul>
</p>
<b>2. Practice Job Submission</b><br>
<p>Familiarize yourself with the practice of job submission by running
one of our sample jobs. </p>
<b>3. Prepare One Job</b><br>
<p>The next step to running work in CHTC is to get one (small) job ready to
run by:
<ul>
<li>downloading or installing your software</li>
<li>copying any necessary data or
scripts to your home directory on the submit/head nodes</li>
<li>creating a job
submission file for that job.</li>
</ul></p>
<b>4. Test and Scale</b><br>
<p>Once you've assembled most of the pieces you need for your job, the next
step is to test what you've created as a single or small job, and then to scale upwards to
submit a "full scale" job or batch of jobs. </p>
<b>5. Use Additional Features (optional)</b><br>
<p>There can be additional tools to improve your CHTC use, including building
up a workflow, accessing resources outside CHTC, and more. </p>
<p>For more details specific to the system you will be using, see the expanded guides below: </p>
<ul>
<li><a href="#htc">Roadmap For Running Jobs on the HTC system</a></li>
<li><a href="#hpc">Roadmap For Running Jobs on the HPC cluster</a></li>
</ul>
<hr>
<a name="htc"></a>
<h1>Roadmap For Running Jobs on the HTC System</h1>
<h2>1. Accessing the HTC system</h2>
<ul>
<li class="spaced"><a href="{{'/connecting' | relative_url }}">Connecting to CHTC</a>:
This guide has instructions on how
to log in, move files to and from the submit server, and use other command
line skills.</li>
</ul>
<h2>2. Practice HTCondor Job Submission</h2>
<ul>
<li class="spaced"><a href="{{'/hello-world' | relative_url }}">Hello World</a>: Run a simple HTCondor job.</li>
<li class="spaced"><a href="{{'/condor_q' | relative_url }}">condor_q</a>: Learn about some of the different
options for <code>condor_q</code></li>
</ul>
<h2>3. Preparing a Single Job</h2>
<b>A. Choose a Job Size</b></br>
<p>The ideal CHTC job is between 10 minutes and 1 day in length. While jobs
can run shorter and longer (up to 3 days), you'll generally maximize throughput
(fastest time to completion of all jobs together)
by lowering job length. If you can control how long your job is by breaking
a larger task into pieces, or batching together many very short tasks, try
to aim for jobs that are a few hours. </p>
<p>Whether or not you can control job size, you should think through what
one job is going to do, and what the input and output will be.</p>
<b>B. Prepare Your Software</b></br>
<p>Some programs are released as Linux binaries, which can be downloaded and
run as is. </p>
<p>If your program has to be compiled from source code, follow the instructions
in our <a href="{{'/inter-submit' | relative_url }}">Build Job guide</a> to start an interactive
session for compiling. </p>
<p>For specific scripting
languages or programs or using Docker containers, we have the following guides:</p>
<ul>
<li class="spaced"><a href="{{'/python-jobs' | relative_url }}">Running Python Jobs</a>: Run Python code or scripts in CHTC</li>
<li class="spaced"><a href="{{'/r-jobs' | relative_url }}">Running R Jobs</a>: Run R code or scripts in CHTC </li>
<li class="spaced"><a href="{{'/matlab-jobs' | relative_url }}">Running Matlab Jobs</a>: Run Matlab code or scripts in CHTC </li>
<li class="spaced"><a href="{{'/docker-jobs' | relative_url }}">Using Docker to Run Jobs</a>: Use Docker containers to run software in CHTC</li>
<li class="spaced"><a href="{{'/mpi-jobs' | relative_url }}">Running Jobs that use MPI</a>: Information about the HTC system's MPI capabilities</li>
</ul>
<b>C. Prepare Your Data</b></br>
<p>Look at our <a href="{{'/file-availability' | relative_url }}">File Availability page</a> to determine
what method is best to handle your data. If you need access to
Squid or Gluster, email the CHTC facilitators.</p>
<ul>
<li class="spaced"><a href="{{'/file-availability' | relative_url }}">Using Default File Transfer</a>: HTCondor's default method of handling files</li>
<li class="spaced"><a href="{{'/file-avail-squid' | relative_url }}">Using Squid</a>: Web server for repeated, medium-sized downloads</li>
<li class="spaced"><a href="{{'/file-avail-gluster' | relative_url }}">Using Gluster</a>: File share for large files</li>
</ul>
<h2>4. Testing and Scaling Up</h2>
<ul>
<li class="spaced"><a href="{{'/testing' | relative_url }}">Testing Job Submissions and Scaling Up</a>: Test
and responsibly scale up to many jobs</li>
</ul>
<p><b>Why test?</b> Spending the time and effort to run test jobs will pay off in more
effective high-throughput computing in the following ways: </p>
<ul>
<li> <b>Better matching:</b> Pinpointing the required amount of memory/disk space
will allow your jobs to match to as many computers as possible.
Requesting excessive amounts of memory and disk space will
limit the number of slots where your jobs can run, as
well as using unnecessary space that could be available for other users.
That said...</li>
<li> <b>Fewer holds or evictions:</b> If you don't request *enough* memory or
disk and your job exceeds its request, it can go on hold. Jobs that have
not been tested and run over 72 hours are liable to be evicted without
finishing. </li>
<li> <b>Fewer wasted compute hours:</b> the evictions and holds described
above will be wasted compute hours, decreasing your priority in the
high-throughput system while not returning any results. </li>
<li> <b>Making good choices:</b> knowing how big and how long your jobs are,
and the size of input/output files will show you how to most
effectively use CHTC resources. Jobs under 2 hrs or so?
Allow your jobs to flock and glide to the UW Grid and Open Science Grid.
Input files of more than 5 GB? You should probably be using the CHTC
large file staging area. Longer jobs? Include a line in your submit
file restricting your jobs to the CHTC servers that guarantee 72 hours. </li>
<li><b>Being a good citizen</b>:CHTC's high-throughput system has
hundreds to thousands of users,
meaning that poor computing practices by one user can impact many other users.
Users who submit jobs that don't finish or are evicted because of incorrect
memory requests are using hours that could have been used by other people.
In the worst case, untested code can cause other jobs running on an execute
server to be evicted, directly harming someone else's research process.
The best practices listed in these guides exist for a reason. Testing your
code and job submissions to make sure they abide by CHTC recommendations
will not only benefit your own throughput but make sure that everyone else
is also getting a fair share of the resource. </li>
</ul>
See our <a href="{{'/troubleshooting' | relative_url }}">troubleshooting</a> guide for ongoing support.
<h2>5. Use Additional Features</h2>
<ul>
<li><a href="{{'/outside-chtc' | relative_url }}">Accessing Extra Resources by Running Jobs Outside CHTC</a></li>
</ul>
<!--<h2>B. Create a Workflow (optional)</h2>
DAG guide -->
<hr>
<a name="hpc"></a>
<h1>Roadmap For Running Jobs on the HPC Cluster</h1>
<h2>1. Accessing the HPC cluster</h2>
<ul>
<li class="spaced"><a href="{{'/connecting' | relative_url }}">Connecting to CHTC</a>:
This guide has instructions on how
to log in, move files to and from the head nodes, and use other command
line skills.</li>
</ul>
<h2>2. Practice SLURM Job Submission</h2>
<p>At the moment, we do not have a guide with a sample job that runs "out-of-the-box".
However, the sample submit file in <a href="/HPCuseguide.shtml#batch-job">section 3B</a> of the
<a href="{{'/HPCuseguide' | relative_url }}">HPC Use Guide</a> can be modified to run a short, simple command like
<code>sleep</code> for those who want to practice using SLURM commands.</p>
<h2>3. Preparing a Single Job</h2>
<b>A. Prepare Your Software</b></br>
<ul>
<li class="spaced"><a href="{{'/file-availability' | relative_url }}">Using the HPC cluster's default MPI modules</a></li>
</ul>
<p>Whenever possible, installation and basic testing should be
performed in an interactive job, as described in <a href="/HPCuseguide.shtml#interactve-job">section 3A</a> of the
<a href="{{'/HPCuseguide' | relative_url }}">HPC Use Guide</a> </p>
<b>B. Create a Submit File</b></br>
<p>Create a SLURM submit file that requests the resources necessary to your
job and runs your commands. Make sure that you review the table of
partitions and associated nodes (found in <a href="{{ '/HPCuseguide.shtml#job-scheduling' | relative_url }}">section D</a> of the
<a href="{{'/HPCuseguide' | relative_url }}">HPC Use Guide</a>) to make sure that you know which
partition you want to use and that your resource request matches what is available
in that partition.</p>
<h2>4. Testing and Scaling Up</h2>
<p>To test work in the HPC cluster, first run a small scale version of
your work on 1-2 nodes. If this completes successfully, slowly scale upward by
adding nodes and/or increasing the complexity of your problem. We recommend doing
this gradually so that it becomes clear at what point extra nodes/cores will not
help your program run faster.
<p><b>Why test?</b> Spending the time and effort to run test jobs will pay off in more
effective high performance computing in the following ways: </p>
<ul>
<li><b>Making the most of your time</b>: Jobs can run faster on multiple
nodes...but they may also take longer to start. Testing your jobs can help
you weigh the trade-off between wait-time and execution-time for your particular
software and computation.</li>
<li><b>Being a good citizen</b>:CHTC's cluster has hundreds of users,
meaning that poor computing practices by one user can impact many other people.
Not saving output before a job exits, requesting nodes that
aren't used by your program (or don't provide any speed up), or running code
that consistently fails or crashes
all waste resources that could have been used by others.
Testing your
code and job submissions to make sure they abide by CHTC recommendations
will not only benefit your own computing but make sure that everyone else
is also getting a fair share of the resource. </li>
</ul>
<!--#include virtual="/includes/template-5-finish.shtml" -->