Skip to content

Commit ad2ea83

Browse files
Added batch processor documentation
1 parent 3ffec83 commit ad2ea83

File tree

2 files changed

+285
-0
lines changed

2 files changed

+285
-0
lines changed

index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ Running Measurements with pScheduler
7575
pscheduler_client_tasks
7676
pscheduler_client_schedule
7777
pscheduler_server_running
78+
pscheduler_batch
7879
config_pscheduler_limits
7980
pscheduler_ref_tests_tools
8081
pscheduler_ref_archivers

pscheduler_batch.rst

+284
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,284 @@
1+
****************
2+
Batch Processing
3+
****************
4+
5+
**NOTE:** In release 4.3.0, this is a beta feature.
6+
7+
pScheduler can be used to run a series of tasks, called a *batch*,
8+
from the command line and return the results for further processing by
9+
other programs.
10+
11+
Each task is referred to as a *job*.
12+
13+
Jobs can be done singly or in multiple *iterations* with the latter
14+
being possible serially or in parallel.
15+
16+
17+
.. _pscheduler_batch_invocation:
18+
19+
Invocation
20+
----------
21+
22+
The batch processor is started using a ``pscheduler`` command::
23+
24+
pscheduler batch [ OPTIONS ] [ INPUT-FILE ]
25+
26+
27+
**NOTE:** The batch processor has a safety mechanism to prevent
28+
accidental use in production until it is no longer a beta feature.
29+
To invoke it in this version, set the environment variable ``BETA``
30+
to any value, e.g.::
31+
32+
BETA=1 pscheduler batch [ OPTIONS ] [ INPUT-FILE ]
33+
34+
The ``OPTIONS`` control the batch processor's behavior and can be listed
35+
using the ``--help`` option.
36+
37+
The ``INPUT-FILE`` is a path to the input. If not provided or is
38+
``-``, input will be taken from the standard input.
39+
40+
The final result (see :doc:`_pscheduler_batch_output`) is sent to the
41+
standard output.
42+
43+
Error and diagnostic output will be sent to standard error.
44+
45+
46+
47+
48+
.. _pscheduler_batch_input:
49+
50+
Input
51+
-----
52+
53+
Input to the batch processor is JSON in the form of a single object
54+
containing two pairs, ``global`` and ``jobs``.
55+
56+
57+
.. _pscheduler_batch_input_global:
58+
59+
The ``global`` Pair
60+
^^^^^^^^^^^^^^^^^^^
61+
62+
The ``global`` pair is an optional JSON object containing data and
63+
transforms provided or applied to all jobs. It contains the following
64+
pairs, all optional:
65+
66+
``data`` (Any JSON) - Data made available to all jq transforms as a
67+
variable named ``$global``.
68+
69+
``transform-pre`` (pScheduler jq Transform) - A transform applied to
70+
the ``task`` object in each job before anything else is done.
71+
72+
``transform-post`` (pScheduler jq Transform) - A transform applied to
73+
the ``task`` object in each job after `transform-pre` and the job's
74+
`task-transform` have been applied.
75+
76+
77+
78+
.. _pscheduler_batch_input_jobs:
79+
80+
The ``jobs`` Pair
81+
^^^^^^^^^^^^^^^^^
82+
83+
The ``jobs`` pair is an array of objects, each containing a single job::
84+
85+
"jobs": [
86+
{ ... Job 1 .. },
87+
{ ... Job 2 .. },
88+
{ ... Job n .. }
89+
]
90+
91+
A job is described in a JSON object containing the following pairs:
92+
93+
``label`` (String) - A label for the job, used for reference in
94+
debugging output.
95+
96+
``enabled`` (Boolean) - Determines whether or not the job is run.
97+
Defaults to ``true``.
98+
99+
``iterations`` (Number) - The number of times to run the specified task.
100+
101+
``parallel`` (Boolean) - Whether or not the job's iterations should be
102+
run in parallel. This defaults to ``false`` and implies ``sync-start``
103+
(see below) unless ``sync-start`` is explicitly set ``false``.
104+
105+
``setup-time`` (Boolean) - The amount of time expected for pScheduler to
106+
set up a single run. The default of ``PT15S`` should be more than
107+
sufficient in most cases. This is ignored if not doing a synchronized
108+
start (see ``sync-start``, below).
109+
110+
``backoff`` (String) - ISO8601 duration indicating how long each
111+
iteration run in parallel waits before being submitted to pScheduler.
112+
The first will have no backoff, the second will have the indicated
113+
backoff, the third will have twice that, etc. This value is ignored
114+
if ``parallel`` is ``false``.
115+
116+
``sync-start`` (Boolean) - If running in parallel, set the start time of
117+
all iterations to be the same. The time is based on the ``number`` of
118+
times the task is run, ``backoff`` and ``setup-time``. This value is
119+
ignored if ``parallel`` is ``false``. Note that tasks subject to
120+
restrictions on being run at the same time will not necessarily start
121+
in sync (or at all if no ``slip`` is allowed as part of the task's
122+
``schedule`` section.
123+
124+
``task`` (Object) - A pScheduler task specification as would be
125+
produced using the ``task`` command's ``--export`` switch. Note that
126+
if the specification contains a ``schedule``, those parameters will be
127+
ignored.
128+
129+
``task-transform`` - A jq transform that operates on the ``task``'s
130+
value for each iteration to make iteration-specific changes. The
131+
``$iteration`` variable is provided to indicate which iteration
132+
(starting with ``0``) is being transformed. The script should operate
133+
on the input in place.
134+
135+
136+
For example, this job will run five sequential ``rtt`` tests to
137+
``www.perfsonar.net`` with 5, 10, 15, 20 and 25 pings sent. The
138+
``task-transform`` adds a ``count`` to the test specification that is
139+
calculated based on the iteration::
140+
141+
{
142+
"label": "rtt",
143+
"iterations": 5,
144+
"parallel": false,
145+
"task": {
146+
"test": {
147+
"type": "rtt",
148+
"spec": {
149+
"schema": 1,
150+
"dest": "www.perfsonar.net"
151+
}
152+
}
153+
},
154+
"task-transform": {
155+
"script": [
156+
".test.spec.count = ($iteration + 1) * 5"
157+
]
158+
}
159+
}
160+
161+
162+
163+
.. _pscheduler_batch_output:
164+
165+
Output
166+
------
167+
168+
Once all jobs have been completed, the batch processor will output a
169+
copy of the input with the addition of a ``results`` pair in each job
170+
containing information about what tasks were run and the results they
171+
produced.
172+
173+
The ``results`` pair is an array of JSON objects, with one element per
174+
iteration. Each object contains the following pairs:
175+
176+
``task`` (pScheduler Task Specification) - The task that was submitted
177+
to pScheduler and run.
178+
179+
``runs`` (Array of pScheduler Results) - An array of the results
180+
produced by the task. In most cases, there will be a single element,
181+
but for tasks that return multiple results (e.g., ``latencybg``),
182+
there will be more than one. Each result is a JSON object containing
183+
pairs named ``application/json``, ``text/plain`` and ``text/html`` for
184+
each of the formats in which pScheduler can produce a result.
185+
186+
187+
188+
.. _pscheduler_batch_python:
189+
190+
Invocation from Python
191+
----------------------
192+
193+
The batch processor can be invoked from Python on any system where
194+
pScheduler's Python library is installed. (On CentOS, this would be
195+
the ``python-pscheduler`` package.)
196+
197+
For example::
198+
199+
#!/usr/bin/env python3
200+
201+
import pscheduler.batchprocessor
202+
import sys
203+
204+
205+
batch = { ... }
206+
207+
def debug(message):
208+
"""
209+
Callback function for the batch processor to emit a line of
210+
debug.
211+
"""
212+
print(message, file=sys.stderr)
213+
214+
processor = pscheduler.batchprocessor.BatchProcessor(batch)
215+
216+
# Leave out the debug argument for no debugging.
217+
# This can be invoked multiple times to run the same batch repeatedly.
218+
result = processor(debug=debug)
219+
220+
221+
222+
.. _pscheduler_batch_tips:
223+
224+
Tips and Tricks
225+
---------------
226+
227+
Running Different Tasks as Part of the Same Job
228+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
229+
230+
Different tests can be run in parallel by using the ``task-transform``
231+
to alter the contents of the ``test`` pair for each iteration.
232+
233+
* Put an array of the tests to be run in the task's ``reference``
234+
pair. The length of the array should be the same as the specified
235+
``iterations``.
236+
237+
* Leave the task's ``test`` section as an empty object (``{}``).
238+
239+
* Add a ``task-transform`` that replaces the test with an element
240+
from the array (e.g., ``.test = .reference.tests[$iteration]``).
241+
242+
243+
This example runs a three-minute-long streaming latency test with a
244+
throughput test to the same host during the second minute. The
245+
``backoff`` value makes the througput test sleep for one minute before
246+
it is scheduled and started so there's latency data produced
247+
beforehand and afterward.::
248+
249+
{
250+
"label": "different-in-parallel",
251+
"iterations": 2,
252+
"parallel": true,
253+
"backoff": "PT1M",
254+
"task": {
255+
"reference": {
256+
"tests": [
257+
{
258+
"type": "latencybg",
259+
"spec": {
260+
"dest": "ps.example.net",
261+
"duration": "PT3M"
262+
}
263+
},
264+
{
265+
"type": "throughput",
266+
"spec": {
267+
"dest": "ps.example.net",
268+
"duration": "PT1M"
269+
}
270+
}
271+
]
272+
273+
},
274+
"#": "This is intentionally empty:",
275+
"test": { }
276+
},
277+
"task-transform": {
278+
"script": [
279+
"# Replace the test section of the task with one of the",
280+
"# tests in the reference block based on the iteration.",
281+
".test = .reference.tests[$iteration]"
282+
]
283+
}
284+
}

0 commit comments

Comments
 (0)