-
Notifications
You must be signed in to change notification settings - Fork 32
/
Copy pathhigh-memory-jobs.shtml
263 lines (226 loc) · 9.95 KB
/
high-memory-jobs.shtml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
---
layout: default
title: Submitting High Memory Jobs on CHTC's HTC System
---
<b>The examples and information in the below guide are useful ONLY if:</b><br>
<ul>
<li>you already have an account on a CHTC-administered submit server</li>
<li>your jobs will use more than ~120 GB of memory</li>
</ul>
<b>To best understand the below information, users should already be familiar with:</b><br>
<ol>
<li>Using the command-line to: navigate directories, create/edit/copy/move/delete
files and directories, and run intended programs (aka "executables").</li>
<li><a href="{{'/helloworld' | relative_url }}">
CHTC's Intro to Running HTCondor Jobs</a></li>
<li>CHTC's guides for handling large data (<a href="{{'/file-avail-largedata' | relative_url }}">
Guide here</a>) and software installation.</li>
</ol>
<h2>Overview</h2>
<p>A high-memory job is one that
requires a significantly larger amount of memory (also known as RAM) than a typical
high throughput job usually over
200 GB and up to 1-4 TB.
In the following guide, we cover resources and recommendations for
running high-memory work in CHTC. <b>However, please make sure to email us if you believe you
will need to run "high-memory" work for the first time, or are planning the execution of new
"high-memory" work that is different from what you've run before.</b> We'll happily help you
with some personalized tips and considerations for getting your work done most efficiently.</p>
<ol>
<li><a href="#resources">High Memory Resources in CHTC</a></li>
<li><a href="#get-started">Getting Started</a></li>
<li><a href="#running">Running High Memory Jobs</a></li>
</ol>
<a name="resoures"></a>
<h1>1. High Memory Resources in CHTC</h1>
<p>Our high memory machines have the following specs: </p>
<table class="gtable">
<tr>
<th>Machine name</th>
<th>Amount of memory</th>
<th>Number of CPUs</th>
<th>Local disk space on machine</th>
</tr>
<tr>
<td><code>mem1.chtc.wisc.edu</code></td>
<td>1 TB</td>
<td>80</td>
<td>1 TB</td>
</tr>
<tr>
<td><code>mem2.chtc.wisc.edu</code> </td>
<td>2 TB </td>
<td>80 </td>
<td>1 TB </td>
</tr>
<tr>
<td><code>mem3.chtc.wisc.edu</code> </td>
<td>4 TB </td>
<td>80 </td>
<td>6 TB </td>
</tr>
<tr>
<td><code>wid-003.chtc.wisc.edu</code> </td>
<td>512 GB </td>
<td>16 </td>
<td>2.5 TB </td>
</tr>
</table>
<a name="get-started"></a>
<h1>2. Getting Started</h1>
<a name="id"></a>
<h2>A. Identifying High Memory Jobs</h2>
<p>Jobs that request over 200GB of memory in their
<a href="#submit">submit file</a> can run on our dedicated high memory machines.
However, if your job doesn't need quite that much memory, it's good to
request less, as doing so will allow your job(s) to run on more servers,
since CHTC has hundreds of servers with up to 100 GB of memory and dozens of
servers with up to 250 GB of memory.</p>
<a name="testing"></a>
<h2>B. Testing</h2>
<p>Before running a full-size high-memory job, make sure to
use a small subset of data in a test job.
Not only will this give you a chance to try out the submit file
syntax and make sure your job runs, but it can help you estimate
how much memory and/or disk you will need for a job using your full data. </p>
<p>You can also use interactive jobs to test commands that will end up in your
"executable" script. To run an interactive job, prepare
your submit file as usual. Note that for an interactive
job, you should use a smaller memory request (and possibly lower CPU
and disk as well) than for the final job
(so that the interactive job starts) and plan to simply test commands,
not run the entire program. To submit interactive job,
use the <code>-i</code> flag with <code>condor_submit</code>:
<pre class="term">[alice@submit]$ condor_submit -i submit.file</pre>
After waiting for the interactive job to start, this
should open a bash session on an execute machine, which will allow
you to test your commands interactively. Once your testing is done,
make the appropriate changes to your executable, adjust your resource
requests, and submit the job normally.</p>
<a name="consult"></a>
<h2>C. Consult with Facilitators</h2>
<p>If you are unsure how to run high-memory jobs on CHTC, or
if you're not sure if everything in this guide applies to you, get in touch
with a research computing facilitator by emailing [email protected].</p>
<a name="running"></a>
<h1>3. Running High Memory Jobs</h1>
<a name="submit"></a>
<h2>A. Submit File</h2>
<p>The submit file shown in our <a href="{{'/helloworld' | relative_url }}">Hello World example</a>
is a good starting point for building your high memory job submit file. The following
are places where it's important to customize:</p>
<ul>
<li class="spaced"><b><code>request_memory</code></b>: It is crucial
to make this request as accurate as you can by <a href="#testing">testing</a> at a small scale if
possible (see above). Online documentation/help pages or your colleagues' experience is
another source of information about required memory. <br></li>
<li class="spaced"><b>Long running jobs</b>: If your high memory job is likely to run
longer than our 3-day time limit, please email us for options on how to run for longer.
In the past, high memory jobs received an extra time allowance automatically but this is
no longer the case.</li>
<li class="spaced"><b><code>request_cpus</code></b>: Sometimes, programs that use a large
amount of memory can also take advantage of multiple CPUs. If this
is the case for your program, you can request multiple CPUs.
However, <b>it is always easier to start jobs that request
fewer number of cores, rather than more</b>. We recommend: <br><br>
<table class="gtable">
<tr>
<th>Requesting ___ of memory?</th> <th>Request fewer than ___ CPUs</th>
</tr>
<tr>
<td>up to 100 GB</td> <td> 4</td>
</tr>
<tr>
<td>100-500 GB</td> <td>8</td>
</tr>
<tr>
<td>500GB-1TB</td> <td>16</td>
</tr>
<tr>
<td>1-1.5TB</td> <td>20</td>
</tr>
<tr>
<td>1.5-2TB</td> <td>20</td>
</tr>
<tr>
<td>2TB or greater</td> <td>32</td>
</tr>
</table>
If you think a higher CPU request would significantly improve your job's
performance, contact a facilitator. <br><br>
</li>
<li class="spaced"><b><code>request_disk</code></b>: Request the maximum amount of data
your job will ever have within the job working directory
on the execute node, including all output and input (which will take up space before
some of it is removed from the job working directory at the end of the job). <br><br> </li>
<li class="spaced"><b>Other requirements</b>: if your job uses files from
<a href="{{'/file-avail-largedata' | relative_url }}">our large data space</a>, or
<a href="{{'/docker-jobs' | relative_url }}">Docker for software</a>,
add the necessary requirements for these resources to your submit file. <br><br></li>
</ul>
<p>Altogether, a sample submit file may look something like this: </p>
<pre class="sub">### Example submit file for a single staging-dependent job
universe = vanilla
# Files for the below lines will all be somewhere within /home/username,
# and not within /staging/username
log = run_myprogram.log
executable = run_Trinity.sh
output = $(Cluster).out
error = $(Cluster).err
transfer_input_files = trinityrnaseq-2.0.1.tar.gz
should_transfer_files = YES
# Require execute servers that have large data staging
Requirements = (Target.HasCHTCStaging == true)
# Memory, disk and CPU requests
request_memory = 200GB
request_disk = 100GB
request_cpus = 4
# Submit 1 job
queue 1
### END</pre>
<a name="software"></a>
<h2>B. Software</h2>
<p>Like any other job, the best option for high memory work is to create a portable
installation of your software. We have guides for
<a href="{{'/howto_overview' | relative_url }}">scripting languages</a> and
<a href="{{'/docker-jobs' | relative_url }}">using Docker</a>,
and can otherwise provide individual support for program installation
<a href="{{'/get-help' | relative_url }}">during office hours or over email</a>. </p>
<a name="executable"></a>
<h2>C. "Executable" script</h2>
<p>As described in many of our guides (for <a href="{{'/howto_overview' | relative_url }}">software</a> or for
using <a href="{{'/file-avail-largedata' | relative_url }}">large data</a>), you will need to write
a script that will run your software commands for you and that will serve
as the submit file "executable". Things to note are: </p>
<ul>
<li>If using files from our large data staging space, follow the recommendations in our
<a href="{{'/file-avail-largedata' | relative_url }}">guide</a>. </li>
<li>If using multiple cores, make sure that you request the same number of
"threads" or "processes" in your command as you requested in your
<a href="#submit">submit file</a>.</li>
</ul>
<p>
Altogether, a sample script may look something like this
(perhaps called <code>run_Trinity.sh</code>): </p>
<pre class="file">#!/bin/bash
# Copy input data from /staging to the present directory of the job
# and un-tar/un-zip them.
cp /staging/username/reads.tar.gz ./
tar -xzvf reads.tar.gz
rm reads.tar.gz
# Set up the software installation in the job working directory, and
# add it to the job's PATH
tar -xzvf trinityrnaseq-2.0.6-installed.tar.gz
rm trinityrnaseq-2.0.6-installed.tar.gz
export PATH=$(pwd)/trinityrnaseq-2.0.6:$PATH
# Run software command, referencing input files in the working directory and
# redirecting "stdout" to a file. Backslashes are line continuation.
Trinity --seqType fq --left reads_1.fq \
--right reads_2.fq --CPU 4 --max_memory \
20G > trinity_stdout.txt
# Trinity will write output to the working directory by default,
# so when the job finishes, it needs to be moved back to /staging
tar -czvf trinity_out_dir.tar.gz trinity_out_dir
cp trinity_out_dir.tar.gz trinity_stdout.txt /staging/username/
rm reads_*.fq trinity_out_dir.tar.gz trinity_stdout.txt
### END</pre>