-
Notifications
You must be signed in to change notification settings - Fork 32
/
Copy pathmultiple-jobs.shtml
407 lines (319 loc) · 13.3 KB
/
multiple-jobs.shtml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
---
layout: default
title: Submitting Multiple Jobs Using HTCondor
---
<b>To best understand the below information, users should already have an understanding of:</b>
<ol>
<li>Using the command line to: navigate within directories, create/copy/move/delete files and directories, and run their intended programs (aka "executables").</li>
<!-- coming soon <li>The CHTC’s Roadmap for High Throughput Computing</li> -->
<li><a href="{{ '/helloworld' | relative_url}}">The CHTC's Intro to Running HTCondor Jobs</a></li>
<li>The importance of indicating the below submit file lines, which are left
out of the below examples for simplicity:
<pre class="sub">
universe = vanilla # or other
request_memory = <i>< MB ></i>
request_disk = <i>< KB ></i>
request_cpus = <i>< positive integer ></i>
should_transfer_files = YES
when_to_transfer_output = ON_EXIT # or other
</pre>
</li>
</ol>
<!-- Submitting Many Jobs with a Single Submit File -->
<h1>Overview</h1>
One of HTCondor's features is the ability to submit many similar jobs using
one submit file. This is accomplished by using:
<blockquote>
<ul>
<li>a "queue" keyword, that indicates how many jobs to run</li>
<li><i>variables</i> in the submit file, taking the place of information
that will change with each job, such as input files, input directories, or
code arguments. Variables in an HTCondor submit file look like this:
<code>$(variable_name)</code></li>
</ul>
</blockquote>
There are different ways to "queue" multiple jobs and indicate the difference
between them using variables.
<ol>
<li><b><a href="#process">Using Integer-Numbered Input Files</a></b> - assumes that each job needs a different (numbered) input file</li>
<li><b><a href="#foreach">Using Any Unique Input File Names</a></b> - assumes that each job needs a different input file</li>
<li><b><a href="#discussion">Variations Beyond Input Files</a></b>
<ol type="A">
<li><b><a href="#args">Varying arguments</a></b> - assumes that each job needs a different set of input arguments</li>
<li><b><a href="#multiple-args">Supplying multiple variables per job</a></b> - assumes that multiple items (input files, arguments, directories) are changing for each job</li>
</ol>
</li>
<li><b><a href="#initialdir">Organizing Jobs in Individual Directories</a></b> - another
option that can be helpful in organizing job submissions.</li>
</ol>
<p>This guide merely scratches the surface of what is possible when submitting
many jobs from a submit file. If you are interested in a particular approach
that isn't listed here, <a href="mailto:[email protected]">contact a Research
Computing Facilitator</a> and we can advise on the best implementation. </p>
<a name="process"></a>
<h1>1. Submitting Multiple Jobs Using Integer-Numbered Input Files</h1>
<blockquote>
<p>This example is most similar to our
<a href="{{ '/helloworld' | relative_url}}">Intro to Running HTCondor
Jobs</a>. </p>
</blockquote>
<p>This method addresses an example where you have multiple, numbered
input files and want to run a job for each one. On the submit server,
this might look like this: </p>
<pre class="term">
[alice@submit]$ ls
0.data 1.data 2.data my_exec reference.data
</pre>
<p>Suppose we want to run "my_exec" three times, once for each ".data" file, and
where every job needs the "reference.data" file.
We will use HTCondor's built in <code>$(Process)</code> variable
to create multiple numbered jobs, each using a different input file. In a
submit file, this will look like this: </p>
<pre class="sub">
# Submit File
executable = <i>my_exec</i>
transfer_input_files = <i>$(Process)</i>.data, <i>reference</i>.data
output = <i>$(Process)</i>.out
queue 3
</pre>
<p>Here the <code>queue 3</code> line will create 3 jobs. Each job will be assigned
a unique "Process" number, starting at 0 and ending at 2. For each job, the
appropriate number will replace <code>$(Process)</code> in the submit file above. This
means that each job will get a unique input file. </p>
<p>More generally, for "queue <i>N</i>" in the submit file,
the $(Process) variable will start with
"0" and end with <i>N</i>-1, such that you can use $(Process) to indicate
unique filenames and filepaths. </p>
<p>Once the job(s) finishes running, the directory on the submit server should
look like this: </p>
<pre class="term">
[alice@submit]$ ls
0.data 0.out 1.data 1.out 2.data 2.out my_exec reference.data
</pre>
<br>
<blockquote>
<h3>Starting at One</h3>
<p>What if your files are numbered 1 - 100 instead of 0 - 99? Instead of renaming your
files, you can do the following. In your submit file, use: </p>
<pre class="sub">plusone = $(Process) + 1
NewProcess = $INT(plusone, %d)</pre>
<p>Then use <code>$(NewProcess)</code> anywhere in the submit file that you would have
normally used <code>$(Process)</code>. </p>
<p>Note that there is nothing special about the names <code>plusone</code> and
<code>NewProcess</code> - you can use any names you want as variables. </p>
</blockquote>
<a name="foreach"></a>
<h1>2. Submitting Multiple Jobs Using Unique Input Files</h1>
<p><a href="#process">The previous examples</a> are great for numbered
files but it's not always convenient to use this
naming scheme. It is possible to write a single submit file that uses
different non-numbered files or directories by using a different
<code>queue</code> statement. </p>
<p>Suppose we have a program called <code>compare_states</code> and we want to run it
on data from three different states, Iowa, Missouri and Wisconsin. In our
submit directory, that looks like this: </p>
<pre class="term">
[alice@submit]$ ls
compare_states iowa.data missouri.data united-states.data wisconsin.data
</pre>
<p>Here, we want to create a job for each state, and use the <code>united-states.data</code> file for
all three jobs.</p>
<p>First, we will make a list of each data file that we want
to use in an individual job: </p>
<b><code>state_list.txt</code></b>
<pre class="file">
iowa.data
missouri.data
wisconsin.data
</pre>
<p>Then, in the submit file, we will use that list as part of the <code>queue</code>
statement. We will choose a name for the items in the list; in this case, we've
chosen the word "state". </p>
<pre class="sub">
# Submit File
executable = compare_states
transfer_input_files = $(state),united-states.data
out = $(state).out
queue state from state_list.txt
</pre>
<p>That final "queue...from" statement will create a job for every file in our
list. It will assign each file to the "state" variable in turn. In the rest
of the submit file, <code>$(state)</code> will be replaced by the values in
the list, one value for each job. Here we've used the values from the list to
transfer the appropriate input file and name the log file. When the
job completes, the home directory would look like this: </p>
<pre class="term">
[alice@submit]$ ls
compare_states iowa.data.out missouri.data.out wisconsin.data
iowa.data missouri.data united-states.data wisconsin.data.out
</pre>
<br>
<blockquote>
<b>Tips for creating a list</b> <br>
<p>If you want a quick way to create a list of
files like "state_list.txt", use the shell command <code>ls</code> and the
greater-than symbol <code>></code> to list your files and then save
them. For example:
<pre class="term">[alice@submit]$ ls *.data > my_files.txt</pre>
Will list all files that end with the ".data" extension and save them to a
file called "my_files.txt".</p>
</blockquote>
<blockquote>
<b>Naming Variables</b> <br>
<p>You can use any names you like in the "queue ... from" statement:
<pre class="sub">
queue year from list.txt
queue state from list.txt
queue x from list.txt
</pre>
We recommend using a descriptive name so that it is easier to read your submit file.
</p>
</blockquote>
<a name="list"></a>
<h2>Other Ways of Listing Files</h2>
<p>There are two other ways to provide a list, where you would like each item
in the list to create its own job. In the examples below, we are assuming the
same program and files as above. </p>
<a name="matching"></a>
<b>a. Create jobs "matching" a file pattern</b><br>
<p>This method of submission creates a job for each file or directory that
matches the pattern provided in your submit file. </p>
<pre class="sub">
# Submit File
executable = compare_states
transfer_input_files = $(state),united-states.data
out = $(state).out
queue state matching *.data
</pre>
<p>In this case, we have generated a list by using the pattern
<code>*.data</code>; a job will be created for each file that ends
with the "data" suffix. </p>
<a name="in"></a>
<b>b. Create jobs from a list written into the submit file</b><br>
<p>In this final example, the list of desired items is written directly into the submit
file using the syntax "queue ... in ( ... )"</p>
<pre class="sub">
# Submit File
executable = compare_states
transfer_input_files = $(state),united-states.data
out = $(state).out
queue state in (iowa.data
missouri.data
wisconsin.data)
</pre>
<a name="discussion"></a>
<h1>3. Customizing Multiple Job Submissions</h1>
<p>If your code doesn't require unique input files to run multiple jobs, keep
reading to find other ways to provide many jobs with a different value: either
an argument or combination of values. </p>
<a name="args"></a>
<h2>A. Varying arguments</h2>
<p>Suppose we want to analyze a particular state over several years, and this
option is set using a command line argument: </p>
<pre class="term">
$ compare_states -y 2015 -i wisconsin.data -o wisconsin_2015.out
</pre>
<p>For this example, the year will be changing, not the state, so we will
create a list of the years to analyze.</p>
<b><code>year_list.txt</code></b>
<pre class="file">
2013
2014
2015
</pre>
<p>In the submit file, we use the same "queue ... from" syntax as before,
but with a different variable name ("year") and fill in the arguments
line with a variable.
</p>
<pre class="sub">
executable = compare_states
arguments = -y $(year) -i wisconsin.data -o wisconsin_$(year).out
transfer_input_files = wisconsin.data, united-states.data
queue year from year_list.txt
</pre>
<a name="multiple-args"></a>
<h2>B. Using Multiple Arguments</h2>
<p>The method of job submission shown above in a <a href="#foreach">queue ... from</a>
statement can also be used to submit jobs using multiple values. Suppose that
we want to run two jobs for each state file - one for 2013 and one for 2015, using
the same syntax as <a href="#args">before</a>: </p>
<pre class="term">
$ compare_states -y 2015 -i wisconsin.data -o wisconsin_2015.out
</pre>
<p>We will
create a list of each combination that we want to run:
</p>
<b><code>states_list.txt</code></b>
<pre class="file">
2013, wisconsin
2015, wisconsin
2013, missouri
2015, missouri
2013, iowa
2015, iowa
</pre>
<p>Then, in the submit file, we use the same "queue ... from" syntax as before,
but we include two variables (one for each value in the line) that we can use
in the submit file.</p>
<pre class="sub">
executable = compare_states
arguments = -y $(year) -i $(state).data -o $(state)_$(year).out
transfer_input_files = $(state).data, united-states.data
queue year,state from states_list.txt
</pre>
<p>This setup will create six jobs, one for each line in the
<code>states_list.txt</code> file. </p>
<a name="initialdir"></a>
<h1>4. Dividing Jobs Into Multiple Directories</h1>
<p>One way to organize jobs is to assign each job to its own directory,
instead of putting files in the same directory with unique names. To
continue our "compare_states" example, suppose there's a directory for
each state you want to analyze, and each of those directories has its
own input file, like so: </p>
<pre class="term">
[alice@submit]$ ls -F
compare_states iowa/ missouri/ united-states.data wisconsin/
[alice@submit]$ ls -F wisconsin/
input.data
[alice@submit]$ ls -F missouri/
input.data
[alice@submit]$ ls -F iowa/
input.data
</pre>
<p>We will use two features to submit jobs. First, like we have done
for the previous examples, create a list of the items that are changing
for each job. In this case, it will be each directory. </p>
<b><code>states_list.txt</code></b>
<pre class="file">
iowa
missouri
wisconsin
</pre>
<p>There will be a "queue ... from" statement to create a job for each
directory in the list. However, we will add another item to the submit file
called "initialdir" that will submit each job out of its own directory: </p>
<pre class="sub">
# Submit File
executable = compare_states
initialdir = $(state)
transfer_input_files = input.data, ../united-states.data
out = job.out
queue state from states_list.txt
</pre>
<p>Here, HTCondor will create a job for each state in the list and use that
state's directory as the "initialdir" - the directory where the job will actually
be submitted. Therefore, in <code>transfer_input_files</code>, we can use
<code>input.data</code> without using the directory name in the path, and to use
the <code>united-states.data</code> file, we need to indicate it is in the directory
above the state directory. The output file will also go into the state directory. </p>
<p>After the job runs, the job directories would look like this: </p>
<pre class="term">
[alice@submit]$ ls -F
compare_states iowa/ missouri/ united-states.data wisconsin/
[alice@submit]$ ls -F wisconsin/
input.data job.out
[alice@submit]$ ls -F missouri/
input.data job.out
[alice@submit]$ ls -F iowa/
input.data job.out
</pre>