forked from cooperative-computing-lab/cctools
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmakeflow.html
689 lines (524 loc) · 30.9 KB
/
makeflow.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
<link rel="stylesheet" type="text/css" href="manual.css">
<title>Makeflow User's Manual</title>
</head>
<body>
<div id="manual">
<h1>Makeflow User's Manual</h1>
<p style="text-align: right;"><b>Last edited: November 2013</b></p>
<p>Makeflow is Copyright (C) 2009 The University of Notre Dame.<br>
All rights reserved.<br>
This software is distributed under the GNU General Public License.<br>
See the file COPYING for details.</p>
<h2 id="overview">Overview<a class="sectionlink" href="#overview" title="Link to this section.">⇗</a></h2>
<p>
Makeflow is a <b>workflow engine</b> for distributed computing.
It accepts a specification of a large amount of work to be
performed, and runs it on remote machines in parallel where possible.
In addition, Makeflow is fault-tolerant, so you can use it to coordinate
very large tasks that may run for days or weeks in the face of failures.
Makeflow is designed to be similar to <b>Make</b>, so if you can write
a Makefile, then you can write a Makeflow.
<p>
You can run a Makeflow on your local machine to test it out.
If you have a multi-core machine, then you can run multiple tasks simultaneously.
If you have a Condor pool or a Sun Grid Engine batch system, then you can send
your jobs there to run. If you don't already have a batch system, Makeflow comes with a
system called Work Queue that will let you distribute the load across any collection
of machines, large or small.
<p>
Makeflow is part of the <a href=http://www.cse.nd.edu/~ccl/software>Cooperating Computing Tools</a>. You can download the CCTools from <a href=http://www.cse.nd.edu/~ccl/software/download>this web page</a>, follow the <a href=install.html>installation instructions</a>, and you are ready to go.
<h2 id="language">The Makeflow Language<a class="sectionlink" href="#language" title="Link to this section.">⇗</a></h2>
The Makeflow language is very similar to Make.
A Makeflow script consists of a set of rules.
Each rule specifies a set of <i>target files</i> to create,
a set of <i>source files</i> needed to create them,
and a <i>command</i> that generates the target files from the source files.
<p>
Makeflow attempts to generate all of the target files in a script.
It examines all of the rules and determines which rules must run before
others. Where possible, it runs commands in parallel to reduce the
execution time.
<p>
Here is a Makeflow that uses the <tt>convert</tt> utility to make an animation.
It downloads an image from the web, creates four variations
of the image, and then combines them back together into an animation.
The first and the last task are marked as LOCAL to force them to
run on the controlling machine.
<code>CURL=/usr/bin/curl
CONVERT=/usr/bin/convert
URL=http://www.cse.nd.edu/~ccl/images/capitol.jpg
capitol.montage.gif: capitol.jpg capitol.90.jpg capitol.180.jpg capitol.270.jpg capitol.360.jpg
LOCAL $CONVERT -delay 10 -loop 0 capitol.jpg capitol.90.jpg capitol.180.jpg capitol.270.jpg capitol.360.jpg capitol.270.jpg capitol.180.jpg capitol.90.jpg capitol.montage.gif
capitol.90.jpg: capitol.jpg $CONVERT
$CONVERT -swirl 90 capitol.jpg capitol.90.jpg
capitol.180.jpg: capitol.jpg $CONVERT
$CONVERT -swirl 180 capitol.jpg capitol.180.jpg
capitol.270.jpg: capitol.jpg $CONVERT
$CONVERT -swirl 270 capitol.jpg capitol.270.jpg
capitol.360.jpg: capitol.jpg $CONVERT
$CONVERT -swirl 360 capitol.jpg capitol.360.jpg
capitol.jpg: $CURL
LOCAL $CURL -o capitol.jpg $URL
</code>
Note that Makeflow differs from Make in a few important ways.
Read section 4 below to get all of the details.
<h2 id="running">Running Makeflow<a class="sectionlink" href="#running" title="Link to this section.">⇗</a></h2>
To try out the example above, copy and paste it into a file named <tt>example.makeflow</tt>.
To run it on your local machine:
<code>% makeflow example.makeflow</code>
Note that if you run it a second time, nothing will happen, because all of the files are built:
<code>% makeflow example.makeflow
makeflow: nothing left to do
</code>
Use the <tt>-c</tt> option to clean everything up before trying it again:
<code>% makeflow -c example.makeflow</code>
If you have access to a batch system running <a href=http://www.sun.com/software/sge>SGE</a>,
then you can direct Makeflow to run your jobs there:
<code>% makeflow -T sge example.makeflow</code>
Or, if you have a <a href=http://www.cs.wisc.edu/condor>Condor Pool</a>,
then you can direct Makeflow to run your jobs there:
<code>% makeflow -T condor example.makeflow</code>
To submit Makeflow as a Condor job that submits more Condor jobs:
<code>% condor_submit_makeflow example.makeflow</code>
You will notice that a workflow can run very slowly if you submit
each batch job to SGE or Condor, because it typically
takes 30 seconds or so to start each batch job running. To get
around this limitation, we provide the Work Queue system.
This allows Makeflow to function as a master process that
quickly dispatches work to remote worker processes.
<p>
To begin, let's assume that you are logged into a machine
named <tt>barney.nd.edu</tt>. start your Makeflow like this:
<code>% makeflow -T wq example.makeflow</code>
Then, submit 10 worker processes to Condor like this:
<code>% condor_submit_workers barney.nd.edu 9123 10
Submitting job(s)..........
Logging submit event(s)..........
10 job(s) submitted to cluster 298.
</code>
Or, submit 10 worker processes to SGE like this:
<code>% sge_submit_workers barney.nd.edu 9123 10</code>
Or, submit 10 worker processes to Torque like this:
<code>% torque_submit_workers barney.nd.edu 9123 10</code>
Or, you can start workers manually on any other machine you can log into:
<code>% work_queue_worker barney.nd.edu 9123</code>
Once the workers begin running, Makeflow will dispatch multiple
tasks to each one very quickly. If a worker should fail, Makeflow
will retry the work elsewhere, so it is safe to submit many
workers to an unreliable system.
<p>
When the Makeflow completes, your workers will still be available,
so you can either run another Makeflow with the same workers,
remove them from the batch system, or wait for them to expire.
If you do nothing for 15 minutes, they will automatically exit.
<p>
Note that <tt>condor_submit_workers</tt>, <tt>sge_submit_workers</tt>,
and <tt>torque_submit_workers</tt>.
are simple shell scripts, so you can edit them directly if you would
like to change batch options or other details.
<h2 id="resources">Resources Categories<a class="sectionlink" href="#resources" title="Link to this section.">⇗</a></h2>
Makeflow has the capability of automatically setting the cores, memory,
and disk space requirements to the underlying batch system (currently this only
works with Work Queue and Condor). Jobs are grouped into <em> job categories
</em>, and jobs in the same category have the same cores, memory, and disk
requirements.
<br>
<br>
Job categories and resources are specified with variables. Jobs are assigned
to the category named in the value of the variable CATEGORY. Likewise, the
values of the variables CORES, MEMORY (in MB), and DISK (in MB) describe the
resources requirements for the category specified in CATEGORY.
<br>
<br>
Jobs without an explicit category are assigned to <tt>default</tt>.
Jobs in the default category get their resource requirements from the value of
the environment variables CORES, MEMORY, and DISK.
<br>
<br>
Consider the following example:
<code>MEMORY=100
one: src
cmd
CATEGORY="simulation"
DISK=500
CATEGORY="preprocessing"
MEMORY=200
DISK=200
two: src
cmd
three: src
cmd
CATEGORY="simulation"
DISK=700
four: src
cmd
five: src
@CATEGORY="preprocessing"
cmd
six: src
cmd
CATEGORY="analysis"
CORES=1
MEMORY=400
DISK=400
seven: src
cmd
</code>
<code>export CORES=4
makeflow ...
</code>
<h6> Job categories (job named after its output): </h6>
<dl>
<dt> preprocessing: </dt>
<dd> two, three, five </dd>
<dt> simulation: </dt>
<dd> four, six </dd>
<dt> analysis: </dt>
<dd> seven </dd>
<dt> default: </dt>
<dd> one </dd>
<tr>
<h6> Resources specified: </h6>
<table>
<tr><th>Category</th><th>Cores</th><th>Memory (MB)</th><th>Disk (MB)</th></tr>
<tr><td> preprocessing </td> <td> - </td> <td> 200 </td> <td> 200 </td> </tr>
<tr><td> simulation </td> <td> - </td> <td> - </td> <td> 700 </td> </tr>
<tr><td> analysis </td> <td> 1 </td> <td> 400 </td> <td> 400 </td> </tr>
<tr><td> default_category </td> <td> 4 </td> <td> 100 </td> <td> - </td> </tr>
</table>
<p> Note that in this example, in job <tt>five</tt>, the CATEGORY temporarily
has the value of "preprocessing", and then it is re-established to "simulation"
(so that <tt>four</tt> and <tt>six</tt> belong to the same category). This is
done using rule lexical scoping, explained in the next section.
<h2 id="details">The Fine Details<a class="sectionlink" href="#details" title="Link to this section.">⇗</a></h2>
<p>The Makeflow language is very similar to Make, but it does have a few important differences that you should be aware of.</p>
<h3 id="details.dependencies">Get the Dependencies Right<a class="sectionlink" href="#details.dependencies" title="Link to this section.">⇗</a></h3>
<p>You must be careful to accurately specify <b>all of the files that a rule
requires and creates</b>, including any custom executables. This is because
Makeflow requires all these information to construct the environment for a
remote job. For example, suppose that you have written a simulation program
called <tt>mysim.exe</tt> that reads <tt>calib.data</tt> and then produces and
output file. The following rule won't work, because it doesn't inform Makeflow
what files are neded to execute the simulation:</p>
<code># This is an incorrect rule.
output.txt:
./mysim.exe -c calib.data -o output.txt
</code>
<p>However, the following is correct, because the rule states all of the files
needed to run the simulation. Makeflow will use this information to construct
a batch job that consists of <tt>mysim.exe</tt> and <tt>calib.data</tt> and
uses it to produce <tt>output.txt</tt>:</p>
<code># This is a correct rule.
output.txt: mysim.exe calib.data
./mysim.exe -c calib.data -o output.txt
</code>
<p>When a regular file is specified as an input file, it means the command
relies on the contents of that file. When a directory is specified as an input
file, however, it could mean one of two things. First, the command depends on
the contents inside the directory. Second, the command relies on the existence
of the directory (for example, you just want to add more things into the
directory later, it does not matter what's already in it). <b>Makeflow assumes
that an input directory indicates that the command relies on the directory's
existence</b>.</p>
<h3 id="details.phony">No Phony Rules<a class="sectionlink" href="#details.phony" title="Link to this section.">⇗</a></h3>
<p>For a similar reason, you cannot have "phony" rules that don't actually
create the specified files. For example, it is common practice to define a
<tt>clean</tt> rule in Make that deletes all derived files. This doesn't make
sense in Makeflow, because such a rule does not actually create a file named
<tt>clean</tt>. Instead use the <tt>-c</tt> option as shown above.</p>
<h3 id="details.plain">Just Plain Rules<a class="sectionlink" href="#details.plain" title="Link to this section.">⇗</a></h3>
<p>Makeflow does not support all of the syntax that you find in various
versions of Make. Each rule must have exactly one command to execute. If you
have multiple commands, simply join them together with semicolons. Makeflow
allows you to define and use variables, but it does not support pattern rules,
wildcards, or special variables like <tt>$<</tt> or <tt>$@</tt>. You simply
have to write out the rules longhand, or write a script in your favorite
language to generate a large Makeflow.</p>
<h3 id="details.local">Local Job Execution<a class="sectionlink" href="#details.local" title="Link to this section.">⇗</a></h3>
<p>Certain jobs don't make much sense to distribute. For example, if you have
a very fast running job that consumes a large amount of data, then it should
simply run on the same machine as Makeflow. To force this, simply add the word
<tt>LOCAL</tt> to the beginning of the command line in the rule.</p>
<h3 id="details.scope">Rule Lexical Scope<a class="sectionlink" href="#details.scope" title="Link to this section.">⇗</a></h3>
<p>Variables in Makeflow have global scope, that is, once defined, their value
can be accessed from any rule. Sometimes it is useful to define a variable
locally inside a rule, without affecting the global value. In Makeflow, this
can be achieved by defining the variables after the rule's requirements, but
before the rule's command, and prepending the name of the variable with
<tt>@</tt>, as follows:</p>
<code>SOME_VARIABLE=original_value
#rule 1 target_1: source_1
command_1
#rule 2 target_2: source_2
@SOME_VARIABLE=local_value_for_2
command_2
#rule 3 target_3: source_3
command_3 </code>
In this example, SOME_VARIABLE has the value 'original_value' for rules 1 and
3, and the value 'local_value_for_2' for rule 2.
<h3 id="details.refinement">Batch Job Refinement<a class="sectionlink" href="#details.refinement" title="Link to this section.">⇗</a></h3>
<p>When executing jobs, Makeflow simply uses the default settings in your batch
system. If you need to pass additional options, use the <tt>BATCH_OPTIONS</tt>
variable or the <tt>-B</tt> option to Makeflow.</p>
<p>When executing jobs, Makeflow simply uses the default settings in your batch
system. If you need to pass additional options, use the <tt>BATCH_OPTIONS</tt>
variable or the <tt>-B</tt> option to Makeflow.</p>
<p>When using Condor, this string will be added to each submit file. For
example, if you want to add <tt>Requirements</tt> and <tt>Rank</tt> lines to
your Condor submit files, add this to your Makeflow:</p>
<code>BATCH_OPTIONS = Requirements = (Memory>1024)</code>
<p>
When using SGE, the string will be added to the qsub options. For example, to specify that jobs should be submitted to the <tt>devel</tt> queue:
<code>BATCH_OPTIONS = -q devel</code>
<h3 id="details.renaming">Remote File Renaming<a class="sectionlink" href="#details.renaming" title="Link to this section.">⇗</a></h3>
<p>With the Work Queue and Condor batch systems, Makeflow has a feature called
remote file renaming. For example:</p>
<code>local_name->remote_name</code>
<p>indicates that the file <tt>local_name</tt> is called <tt>remote_name</tt>
in the remote system. Consider the following example:</p>
<code>b.out: a.in myprog
LOCAL myprog a.in > b.out
c.out->out: a.in->in1 b.out myprog->prog
prog in1 b.out > out
</code>
<p>The first rule runs locally, using the executable <tt>myprog</tt> and the
local file <tt>a.in</tt> to locally create <tt>b.out</tt>. The second rule
runs remotely, but the remote system expects <tt>a.in</tt> to be named
<tt>in1</tt>, <tt>c.out</tt>, to be named <tt>out</tt> and so on. Note that we
did not need to rename the file <tt>b.out</tt>. Without remote file renaming,
we would have to create either a symbolic link, or a copy of the files with the
expected correct names.</p>
<h3 id="details.displaying">Displaying a Makeflow<a class="sectionlink" href="#details.displaying" title="Link to this section.">⇗</a></h3>
<p>When run with the <tt>-D</tt> option, Makeflow will emit a diagram of the
Makeflow in the <a href=http://www.graphviz.org>Graphviz DOT</a> format. If you
have <tt>dot</tt> installed, then you can generate an image of your workload
like this:</p>
<code>% makeflow -D example.makeflow | dot -T gif > example.gif</code>
<h3 id="details.number">Total number of tasks in a Makeflow<a class="sectionlink" href="#details.number" title="Link to this section.">⇗</a></h3>
<p>The following command parses Makeflow's '-D' option output and returns the
total number of tasks in the given Makeflow script:</p>
<code>% makeflow -D example.makeflow | grep '^N[0-9]\+ \[label=' | wc -l</code>
<h2 id="workqueue">Running Makeflow with Work Queue<a class="sectionlink" href="#workqueue" title="Link to this section.">⇗</a></h2>
<p>With the '-T wq' option, Makeflow runs as a master process that dispatches
tasks to remote worker processes using the <a href="workqueue.html">Work
Queue</a> framework.</p>
<h3 id="workqueue.port">Selecting a Port<a class="sectionlink" href="#workqueue.port" title="Link to this section.">⇗</a></h3>
<p>Makeflow listens on a port which the remote workers would connect to. The
default port number is 9123. Sometimes, however, the port number might be not
available on your system. You can change the default port via the <tt>-p
<port number></tt> option. For example, if you want the master to listen
on port 9567 by default, you can run the following command:</p>
<code>% makeflow -T wq -p 9567 example.makeflow</code>
<h3 id="workqueue.projectnames">Project Names<a class="sectionlink" href="#workqueue.projectnames" title="Link to this section.">⇗</a></h3>
<p>A simpler way to match workers to masters is to use the project name
matching. You can give the master a project name with the -N option.</p>
<code> % makeflow -T wq -a -N MyProj example.makeflow </code>
<p>The -N option gives the master a project name called 'MyProj'. The -a option
enables the catalog mode of the master. Only in the catalog mode a master
would advertise its information, such as the project name, running status,
hostname and port number, to a catalog server. Then a worker could retrieve
these information from the same catalog server. The above command uses the
default catalog server at Notre Dame which runs 24/7. We will talk about how to
set up your own catalog server later.</p>
<p>To start a worker that automatically finds MyProj's master via the default
Notre Dame catalog server:</p>
<code>% work_queue_worker -a -N MyProj </code>
<p>The <tt>-a</tt> option enables the catalog mode on the worker, which tells the worker
to contact a catalog server to find out a project's (specified by -N option)
hostname and port.</p>
<p>You can also give multiple -N options to a worker. The worker will find out
which ones of the specified projects are running from the catalog server and
randomly select one to work for. When one project is done, the worker would
repeat this process. Thus, the worker can work for a different master without
being stopped and given the different master's hostname and port. An example of
specifying multiple projects:</p>
<code>% work_queue_worker -a -N proj1 -N proj2 -N proj3</code>
<h3 id="workqueue.password">Setting a Password<a class="sectionlink" href="#workqueue.password" title="Link to this section.">⇗</a></h3>
<p>We recommend that any workflow that uses a project name also set a password.
To do this, select any passphrase and write it to a file called
<tt>mypwfile</tt>. Then, run Makeflow and each worker with the
<tt>--password</tt> option to indicate the password file:</p>
<code>% makeflow <b>--password</b> mypwfile ...
% work_queue_worker <b>--password</b> mypwfile ...
</code>
<h3 id="workqueue.catalog">Catalog Server<a class="sectionlink" href="#workqueue.catalog" title="Link to this section.">⇗</a></h3>
<p>Now let's look at how to set up your own catalog server. Say you want to
run your catalog server on a machine named <tt>catalog.somewhere.edu</tt>.
The default port that the catalog server will be listening on is 9097, you can
change it via the '-p' option.</p>
<code>% catalog_server</code>
<p>Now you have a catalog server listening at catalog.somewhere.edu:9097. To
make your masters and workers contact this catalog server, simply add the
<tt>-C hostname:port</tt> option to both of your master and worker:</p>
<code>% makeflow -T wq -C catalog.somewhere.edu:9097 -N MyProj example.makeflow
% work_queue_worker -C catalog.somewhere.edu:9097 -a -N MyProj
</code>
<h3 id="workqueue.drivers">Supported Makeflow Drivers<a class="sectionlink" href="#workqueue.drivers" title="Link to this section.">⇗</a></h3>
<p>The full list of supported Makeflow drivers include:</p>
<ul>
<li>Local - local execution on a single multicore system</li>
<li>Condor - execution on a campus grid via the local <a href="http://www.cs.wisc.edu/condor">Condor Pool</a></li>
<li>SGE - execution on a high-performance cluster using the <a href="http://www.oracle.com/us/products/tools/oracle-grid-engine-075549.html">Oracle Grid Engine</a> (formerly the Sun Grid Engine).</li>
<li>Moab - execution on a high-performance cluster using the <a href="http://www.adaptivecomputing.com/resources/docs/mwm/6-0/index.php">Moab Workload Manager</a></li>
<li>Cluster - execution on a high-performance cluster using a user-defined cluster manager</li>
<li>Work Queue - execution on a lightweight, scalable master-worker <a href="http://nd.edu/~ccl/software/workqueue/">framework</a> managed directly by Makeflow</li>
<li>Hadoop - execution on a cluster managed by the <a href="http://hadoop.apache.org/">Hadoop</a> framework, using data stored in <a href="http://hadoop.apache.org/hdfs">HDFS</a>.</li>
</ul>
<h4 id="workqueue.drivers.clusters">User-defined Clusters<a class="sectionlink" href="#workqueue.drivers.clusters" title="Link to this section.">⇗</a></h4>
<p>For clusters that are not directly supported by Makeflow we strongly suggest
using the <a href="http://nd.edu/~ccl/software/workqueue">Work Queue</a> system
and submitting workers via the cluster's normal submission mechanism.</p>
<p>For clusters using managers similar to SGE or Moab that are configured to
preclude the use of Work Queue we have the "Cluster" custom driver. To use the
"Cluster" driver the Makeflow must be run in a parallel filesystem available to
the entire cluster, and the following environment variables must be set.</p>
<ul>
<li>BATCH_QUEUE_CLUSTER_NAME: The name of the cluster, used in debugging messages and as the name for the wrapper script.</li>
<li>BATCH_QUEUE_CLUSTER_SUBMIT_COMMAND: The submit command for the cluster (such as qsub or msub)</li>
<li>BATCH_QUEUE_CLUSTER_SUBMIT_OPTIONS: The command-line arguments that must be passed to the submit command. This string should end with the argument used to set the name of the task (usually -N).</li>
<li>BATCH_QUEUE_CLUSTER_REMOVE_COMMAND: The delete command for the cluster (such as qdel or mdel)</li>
</ul>
<p>These will be used to construct a task submission for each makeflow rule that consists of:</p>
<code>% $SUBMIT_COMMAND $SUBMIT_OPTIONS "<rule name>" $CLUSTER_NAME.wrapper "<rule commandline>"</code>
<p>The wrapper script is a shell script that reads the command to be run as an argument and handles bookkeeping operations necessary for Makeflow.</p>
<h2 id="garbage">Makeflow Garbage Collection<a class="sectionlink" href="#garbage" title="Link to this section.">⇗</a></h2>
<p>As the workflow execution progresses, Makeflow can automatically delete
intermediate files that are no longer needed. In this context, an intermediate
file is an input of some rule that is the target of another rule. Therefore, by
default, garbage collection does not delete the original input files, nor <b>
final</b> target files.</p>
<p>Which files are deleted can be tailored from the default by appending files
to the Makeflow variables <tt>GC_PRESERVE_LIST</tt> and
<tt>GC_COLLECT_LIST</tt>. Files added to <tt>GC_PRESERVE_LIST</tt> are never
deleted, thus it is used to mark intermediate files that should not be deleted.
Similarly, <tt>GC_COLLECT_LIST</tt> marks final target files that should be
deleted. Makeflow is conservative, in the sense that <tt>GC_PRESERVE_LIST</tt>
takes precedence over <tt>GC_COLLECT_LIST</tt>, and original input files are
never deleted, even if they are listed in <tt>GC_COLLECT_LIST</tt>.</p>
<p>Makeflow offers two modes for garbage collection: reference count, and on
demand. With the reference count mode, intermediate files are deleted as soon
as no rule has them listed as input. The on-demand mode is similar to reference
count, only that files are deleted until the space on the local file system is
below a given threshold.</p>
<p>To activate reference count garbage collection:</p>
<code>makeflow -gref_count</code>
<p>To activate on-demand garbage collection, with a threshold of 500MB:</p>
<code>makeflow -gon_demand -G500000000</code>
<h2 id="logformat">Makeflow Log File Format<a class="sectionlink" href="#logformat" title="Link to this section.">⇗</a></h2>
<p>After you have executed the <tt>example.makeflow</tt> Makeflow script, you
should see a log file named <tt>example.makeflow.makeflowlog</tt> under the
directory where you ran the <tt>makeflow</tt> command. The Makeflow log file
records all the tasks that Makeflow tries to run and how the tasks status
change during the previous executions.</p>
<p>In Makeflow, each task is called a node. You can think of it as a tree node
in a tree structure of the target workflow. There is a unique id, a positive
integer, associated with each node. This id is referred as node id. The
Makeflow log file starts with a list of node (task) descriptions where each
task is specified in the following format:</p>
<code># NODE node_id original_command# SYMBOL node_id symbol
# PARENTS node_id parent_node_id_1 parent_node_id_2 ... parent_node_id_n
# SOURCES node_id source_file_1 source_file_2 ... source_file_n
# TARGETS node_id target_file_1 target_file_2 ... target_file_n
# COMMAND node_id translated_command
</code>
<p>These annotation lines all start with <tt>#</tt> and will be treated as
comments by Makeflow. When Makeflow tries to resume the execution of a Makeflow
script from a existing Makeflow log file, it will skip the scanning of these
commented lines. Also, annotations being commented out allows third-party
applications, such as gnuplot, to skip them when plotting data.</p>
<p>The body sections of a Makeflow log file come after the node specificiation
section. The following is a snippet from
<tt>example.makeflow.makeflowlog</tt>:</p>
<code># STARTED timestamp
1347281321284638 5 1 9206 5 1 0 0 0 6
1347281321348488 5 2 9206 5 0 1 0 0 6
1347281321348760 4 1 9207 4 1 1 0 0 6
1347281321348958 3 1 9208 3 2 1 0 0 6
1347281321629802 4 2 9207 3 1 2 0 0 6
1347281321630005 2 1 9211 2 2 2 0 0 6
1347281321635236 3 2 9208 2 1 3 0 0 6
1347281321635463 1 1 9212 1 2 3 0 0 6
1347281321742870 2 2 9211 1 1 4 0 0 6
1347281321752857 1 2 9212 1 0 5 0 0 6
1347281321753064 0 1 9215 0 1 5 0 0 6
1347281325731146 0 2 9215 0 0 6 0 0 6
# COMPLETED timestamp
</code>
<p>If a Makeflow execution is completed without errors, a body section of the
log consists of a series of node (task) status change lines (the 10 column
line) surrounded by a pair of "# STARTED" and "# COMPLETED" lines which record
the start and end unix time of that execution. A node status change line has
the following form:</p>
<code>timestamp node_id new_state job_id nodes_waiting nodes_running nodes_complete nodes_failed nodes_aborted node_id_counter</code>
<p>Here is the column specificiation:</p>
<ul>
<li><b>timestamp</b> - the unix time (in microseconds) when this line is written to the log file. </li>
<li><b>node_id</b> - the id of this node (task). </li>
<li><b>new_state</b> - a integer represents the new state this node (whose id is in the node_id column) has just entered. The value of the integer ranges from 0 to 4 and the states they are representing are:</li>
<ol start="0">
<li> waiting </li>
<li> running </li>
<li> complete </li>
<li> failed </li>
<li> aborted </li>
</ol>
<li><b>job_id</b> - the job id of this node in the underline execution
system (local or batch system). If the Makeflow is executed locally, the
job id would be the process id of the process that executes this node. If
the underline execution system is a batch system, such as Condor or SGE,
the job id would be the job id assigned by the batch system when the task
was sent to the batch system for execution.</li>
<li><b>nodes_waiting</b> - the number of nodes are waiting to be executed.</li>
<li><b>nodes_running</b> - the number of nodes are being executed.</li>
<li><b>nodes_complete</b> - the number of nodes has been completed.</li>
<li><b>nodes_failed</b> - the number of nodes has failed.</li>
<li><b>nodes_aborted</b> - the number of nodes has been aborted.</li>
<li><b>node_id_counter</b> - total number of nodes in this Makeflow</li>
</ul>
<p>You might also see the following two lines in the Makeflow log file if the
Makeflow is aborted or failed during a execution. The unix time at which such
event occured is recorded:</p>
<code># ABORTED timestamp
# FAILED timestamp
</code>
<p>When file garbage collection is enabled, the log file records information at
each collection cycle. Collection information is included in lines starting
with the <tt># GC</tt> prefix:</p>
<code># GC timestamp collected time_spent dag_gc_collected</code>
<p>Each garbage collection line records the garbage collection statistics
during a garbage collection cycle:</p>
<ul>
<li><b>timestamp</b> - the unix time (in microseconds) when this line is written to the log file. </li>
<li><b>collected</b> - the number of files were collected in this garbage collection cycle.</li>
<li><b>time_spent</b> - the length of time this cycle took.</li>
<li><b>dag_gc_collected</b> - the total number of files has been collected so far since the start this Makeflow execution.</li>
</ul>
<h2 id="linking">Linking Workflow Dependencies<a class="sectionlink" href="#linking" title="Link to this section.">⇗</a></h2>
<p><tt>Makeflow</tt> provides a tool to collect all of the dependencies for a
given workflow into one directory. By collecting all of the input files and
programs contained in a workflow it is possible to run the workflow on other
machines.</p>
<p>Currently, <tt>Makeflow</tt> copies all of the files specified as
dependencies by the rules in the makeflow file, including scripts and data
files. Some of the files not collected are dynamically linked libraries,
executables not listed as dependencies (<tt>python</tt>, <tt>perl</tt>), and
configuration files (<tt>mail.rc</tt>).</p>
<p>To avoid naming conflicts, files which would otherwise have an identical
path are renamed when copied into the bundle:</p>
<ul>
<li>All file paths are relative to the top level directory.</li>
<li>The makeflow file is rewritten with the new file locations and placed in the top level directory.</li>
<li>Files originally specified with an absolute path are placed into the top level directory.</li>
<li>Files with the same path are appended with a unique number</li>
</ul>
<p>Example usage:</p>
<code>% makeflow -b some_output_directory example.makeflow</code>
<h2 id="moreinfo">For More Information<a class="sectionlink" href="#moreinfo" title="Link to this section.">⇗</a></h2>
<p>For the latest information about Makeflow, please visit our <a href=http://www.cse.nd.edu/~ccl/software/makeflow>web site</a> and subscribe to our <a href=http://www.cse.nd.edu/~ccl/software>mailing list</a>.</p>
</div>
</body>
</html>