-
Notifications
You must be signed in to change notification settings - Fork 32
/
Copy pathr-jobs.shtml
317 lines (255 loc) · 11 KB
/
r-jobs.shtml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
---
layout: default
title: Running R Jobs on CHTC
---
<p><b>To best understand the below information, users
should already have an understanding of:</b>
</p>
<ul>
<li>Using the command line to: navigate within directories,
create/copy/move/delete files and directories, and run their
intended programs (aka "executables").</li>
<li><a href="{{ '/helloworld' | relative_url }}}">The CHTC's Intro
to Running HTCondor Jobs</a></li>
</ul>
<h1>Overview</h1>
<p>CHTC provides several copies of R that can be used to run R
code in jobs. See our list of supported versions here: <a href="#supported">CHTC Supported R</a></p>
<p>This guide details the steps needed to:
<ol>
<li><a href="#build">Create a portable copy of your R packages</a></li>
<li><a href="#script">Write a script that uses R and your packages</a></li>
<li><a href="#submit">Submit jobs</a></li>
</ol>
</p>
<p>If you want to build your own copy of base R, see this archived page:
<ul>
<li><a href="{{ 'archived/r-jobs' | relative_url }}">Building a R installation</a></li>
<!--<li>Using miniconda to install R</li>-->
</ul>
</p>
<a name="supported"></a>
<h1>Supported R Installations</h1>
<br>
<table class="gtable">
<tr>
<th>R version</th><th>Name of R installation file</th>
</tr>
<tr> <td>R 3.2.5</td><td>R325.tar.gz</td></tr>
<tr> <td>R 3.3.3</td><td>R333.tar.gz</td></tr>
<tr> <td>R 3.4.4</td><td>R344.tar.gz</td></tr>
<tr> <td>R 3.5.1</td><td>R351.tar.gz</td></tr>
<tr> <td>R 3.6.1</td><td>R361.tar.gz</td></tr>
</table>
<a name="build"></a>
<h1>1. Adding R Packages</h1>
<p>If your code uses specific R packages (like <code>dplyr</code>,
<code>rjags</code>, etc) follow the directions
below to download and prepare the packages you need for job submission. <b>If your
job does not require any extra R packages, skip to parts 2 and 3</b>. </p>
<p>You are going to start an interactive job that runs on the HTC build servers
and that downloads a copy of R. You will then install your packages to a
folder and zip those files to return to the submit server.</p>
<blockquote>These instructions are primarily about adding packages to a fresh
install of R; if you want to add packages to a pre-existing package folder, there
will be notes below in boxes like this one.
</blockquote>
<a name="version"></a>
<h2>A. Submit an Interactive Job</h2>
<p>Create the following special submit file on the submit server, calling it
something like <code>build.sub</code>. </p>
<pre class="sub">
# R build file
universe = vanilla
log = interactive.log
# Choose a version of R from the table above
transfer_input_files = http://proxy.chtc.wisc.edu/SQUID/chtc/R<i>###</i>.tar.gz
+IsBuildJob = true
requirements = (OpSysMajorVer =?= 7)
request_cpus = 1
request_memory = 2GB
request_disk = 2GB
queue
</pre>
<p>The only thing you should need to change in the above file is the name of
the <code>R###.tar.gz</code> file - in the "transfer_input_files" line. We
have four versions of R available to build from -- see the table above.</p>
<blockquote>If you want to add packages to a pre-existing package directory, add
the <code>tar.gz</code> file with the packages to the <code>transfer_input_files</code> line:
<pre class="sub">transfer_input_files = http://proxy.chtc.wisc.edu/SQUID/chtc/R<i>###</i>.tar.gz, <i>packages.tar.gz</i></pre>
</blockquote>
<p>Once this submit file is created, you will start the interactive job by
running the following command: </p>
<pre class="term">
[alice@submit]$ condor_submit -i <i>build.sub</i>
</pre>
<p>It may take a few minutes for the build job to start.</p>
<h2>B. Install the Packages</h2>
<b>1. Set up R</b><br>
<p>Once the interactive build job starts,
you should see the R installation that you specified inside the working
directory:
<pre class="term">
[alice@build]$ ls -l
-rw-r--r-- 1 alice alice 78M Mar 26 12:24 R<i>###</i>.tar.gz
drwx------ 2 alice alice 4.0K Mar 26 12:24 tmp
drwx------ 3 alice alice 4.0K Mar 26 12:24 var
</pre>
</p>
<p>We'll now unzip the copy of R and set the <code>PATH</code>
variable to reference that version of R:
<pre class="term">
[alice@build]$ tar -xzf R<i>###</i>.tar.gz
[alice@build]$ export PATH=$PWD/R/bin:$PATH
[alice@build]$ export RHOME=$PWD/R
</pre>
To make sure that your setup worked, try running:
<pre class="term">
[alice@build]$ R --version
</pre>
The output should match the version number that you want to be using!
</p>
<blockquote>
If you brought along your own package directory, un-tar it here and skip
the directory creation step below.
</blockquote>
<b>2. Install packages</b><br>
<p> First, create, a directory to put your packages into:
<pre class="term">
[alice@build]$ mkdir <i>packages</i>
</pre>
<p> Then, tell R to use that directory for the packages you're going to install:
<pre class="term">
[alice@build]$ export R_LIBS=$PWD/<i>packages</i>
</pre>
You can choose what name to use for this directory -- if you have different
sets of packages that you use for different jobs, you could use a more
descriptive name than "packages".
</p>
<p>Then start the R console:
<pre class="term">[alice@build]$ R</pre>
</p>
<p>
In the R terminal, first, set the package installation location, and then install your
packages using <code>install.packages</code>.
<pre class="term">
> install.packages("<i>package_name</i>")
</pre>
</p>
<p>The first time you will be prompted to choose a "CRAN mirror" - this is where
R is downloading the package. Choose any <code>http</code> (not <code>https</code>!) option.</p>
<p>If you need a Bioconductor package you will first need to install the Bioconductor
installation manager, then use Bioconductor to install your package:
<pre class="term">
> if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
> BiocManager::install("<i>package_name</i>")
</pre>
</p>
<p>After you've installed all your packages, we recommend loading each library to confirm that they
installed successfully:
<pre class="term">
> library(<i>package_name</i>)
</pre>
Repeat this step as needed to load all packages installed during your interactive session.
</p>
<p>Then exit the R console:
<pre class="term">
> quit()</pre>
</p>
<h2>C. Finish Up</h2>
<b>1. Create a <code>tar.gz</code> file of your packages</b><br>
<p>Right now, if we exit the interactive job, nothing will be transferred back
because we haven't created any new <b>files</b> in the working directory, just
<b>sub-directories</b>. In order to transfer back our installation, we will
need to compress it into a tarball file - not only will HTCondor then transfer
back the file, it is generally easier to transfer
a single, compressed tarball file than an uncompressed set of directories.</p>
<p>Run the following command to create your own tarball of your packages:
<pre class="term">[alice@build]$ tar -czf <i>packages</i>.tar.gz <i>packages/</i></pre>
Again, you can use a different name for the <code>tar.gz</code> file, if you want.</p>
<b>2. Finish the interactive build job</b><br>
<p>We now have our packages bundled and ready for CHTC!
You can now exit the interactive job and the
tar.gz file with your R packages will return to the submit server with you. </p>
<pre class="term">[alice@build]$ exit </pre>
<a name="script"></a>
<h1>2. Creating a Script</h1>
<p>In order to use CHTC's copy of R and the packages you have prepared
in an HTCondor job, we will need to write a script that unpacks both R
and the packages and then runs our R
code. We will use this script as as the <code>executable</code> of our HTCondor
submit file. </p>
<p>A sample script appears below. After the first line, the lines starting
with hash marks are comments . You should replace "my_script.R" with the name of
the script you would like to run.
<pre class="file">
#!/bin/bash
# untar your R installation. Make sure you are using the right version!
tar -xzf R<i>###</i>.tar.gz
# (optional) if you have a set of packages (created in Part 1), untar them also
tar -xzf <i>packages</i>.tar.gz
# make sure the script will use your R installation,
# and the working directory as its home location
export PATH=$PWD/R/bin:$PATH
export RHOME=$PWD/R
export R_LIBS=$PWD/<i>packages</i>
# run your script
Rscript <i>my_script.R</i>
</pre>
</p>
<p>If you have additional commands you would like to be run within the job, you
can add them to this base script. Once your script does what you would like, give
it executable permissions by running:
<pre class="term">
[alice@submit] chmod +x run_R.sh
</pre>
<blockquote>
<h2>Arguments in R</h2>
<p>To pass arguments to an R script within a job, you'll need to use the following
syntax in your main executable script, in place of the generic command above:
<pre class="file">
Rscript <i>myscript.R</i> $1 $2
</pre>
Here, <code>$1</code> and <code>$2</code> are the first and second
arguments passed to the bash script from the submit file (see below),
which are then sent on to the R script. For more (or fewer) arguments,
simply add more (or fewer) arguments and numbers.</p>
<p>In addition, your R script will need to be able to accept arguments from the
command line. There is sample code for doing this on
<a href="https://www.r-bloggers.com/passing-arguments-to-an-r-script-from-command-lines/">this
r-bloggers.com page</a> and about a quarter of the way into this
<a href="https://swcarpentry.github.io/r-novice-inflammation/05-cmdline/">Software
Carpentry lesson</a> (look for <code>print-args-trailing.R</code>).
</p>
</blockquote>
</p>
<a name="submit"></a>
<h1>3. Submitting Jobs</h1>
<p>A sample submit file can be found in our <a href="{{'/helloworld' | relative_url }}">hello world
</a> example page. You should make the following changes in order to run
R jobs: </p>
<ul>
<li> Your <code>executable</code> should be the script that you wrote
<a href="#script">above</a>. <pre class="sub">executable = run_R.sh</pre></li>
<li>Modify the CPU/memory request lines. Test a few jobs for disk space/memory usage in
order to make sure your requests for a large batch are accurate!
Disk space and memory usage can be found in the log file after the job completes. </li>
<li> Change <code>transfer_input_files</code> to include:
<pre class="sub">transfer_input_files = http://proxy.chtc.wisc.edu/SQUID/chtc/R<i>###</i>.tar.gz, <i>packages</i>.tar.gz, <i>my_script.R</i></pre></li>
<li>If your script takes arguments (see the box from the
previous section), include those in the arguments line:
<pre class="sub">arguments = value1 value2</pre></li>
</ul>
<a name="squid"></a>
<blockquote>
<h3>How big is your package tarball?</h3>
<p>If your package tarball is larger than 100 MB, you should NOT transfer
the tarball using <code>transfer_input_files</code>. Instead, you should use
CHTC's web proxy, <code>squid</code>. To learn more about <code>squid</code> please
see our user guide
<a href="{{'/file-avail-squid' | relative_url }}">File Availability with
Squid Web Proxy</a>. To request space on <code>squid</code>, email the research computing facilitators at
<a href="mailto:[email protected]">[email protected]</a>.
</p>
</blockquote>