-
Notifications
You must be signed in to change notification settings - Fork 32
/
Copy pathmatlab-jobs.shtml
282 lines (232 loc) · 11.1 KB
/
matlab-jobs.shtml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
---
highlighter: none
layout: default
title: Running Matlab Jobs on CHTC
---
<p><b>To best understand the below information, users should already have an understanding of:</b>
</p>
<ul>
<li>Using the command line to: navigate within directories, create/copy/move/delete files and directories, and run their intended programs (aka "executables").</li>
<li><a href="{{'/helloworld' | relative_url }}">The CHTC's Intro to Running HTCondor Jobs</a></li>
</ul>
<h1>Overview</h1>
<p>Like most programs, Matlab is not installed on CHTC's high throughput compute system.
One way to run Matlab where it isn't installed is to compile Matlab <code>.m</code>
files into a binary file and run the binary by using a set of files called the Matlab
runtime. In order to run Matlab in CHTC, it is therefore necessary to perform the
following steps which will be detailed in the guide below (click on the links to go
to the relevant section):
<ol>
<li><a href="#prepare">Prepare your Matlab program</a>
<ul>
<li> <a href="#compile">Compile your Matlab code</a></li>
<li><a href="#script">Edit the script that runs Matlab</a></li>
</ul></li>
<li><a href="#submit">Write a submit file that uses the compiled code and script</a></li>
</ol>
<p>If your Matlab code depends on random number generation, using
a function like <code>rand</code> or <code>randperm</code>, please see
the section on <a href="#random">ensuring randomness</a> below.</p>
<a name="supported"></a>
<h1>Supported Versions of Matlab</h1>
<br>
<table class="gtable">
<tr>
<th>Matlab version</th>
</tr>
<tr>
<td>Matlab 2015b</td>
</tr>
<tr>
<td>Matlab 2018b</td>
</tr>
</table>
<a name="prepare"></a>
<h1>1. Preparing Your Matlab Program</h1>
<p>You can compile <code>.m</code> files into a Matlab binary
yourself by requesting an interactive session on one of our build machines.
The session is essentially a job without an executable;
<b>you</b> are the one running the commands instead (in this case, to compile the code).
</p>
<a name="compile"></a>
<h2>A. Start an Interactive Build Job</h2>
<p>Start by uploading all of the Matlab code files (usually <code>.m</code>, <i>not</i>,
<code>.mat</code> files) that you need to run your code to the submit server. </p>
<blockquote>If you have many of Matlab code files
(more than 1-5), it's a good idea to combine them into a <code>.tar.gz</code> file (like a zip
file), so that you can simply transfer the single <code>.tar.gz</code> file for
compiling the code. You can create a tar file by running this command:
<code>tar -czf code.tar.gz <i>files and folders</i></code></blockquote>
<p>Create the following special submit file on the submit server, calling it
something like <code>build.sub</code>. </p>
<pre class="sub">
# Matlab build file
universe = vanilla
log = interactive.log
# List all of your .m files, or a tar.gz file if you've combined them.
transfer_input_files = <i>script</i>.m, <i>functions</i>.tar.gz
+IsBuildJob = true
requirements = (OpSysMajorVer =?= 7)
request_cpus = 1
request_memory = 2GB
request_disk = 2GB
queue
</pre>
<p>Fill in the "transfer_input_files" line with <i>your</i> Matlab .m files, or a
tar.gz file with all of the Matlab files your code uses. </p>
<p>Once this submit file is created, you will start the interactive job by
running the following command: </p>
<pre class="term">
[alice@submit]$ condor_submit -i <i>build.sub</i>
</pre>
<p>It may take a few minutes for the build job to start.</p>
<a name="compile"></a>
<h2>B. Compile Matlab Code and Exit Interactive Job</h2>
<p>Once the interactive job has started, you can compile your code.
In this example, <code>script.m</code> represents the
name of the primary Matlab script; you should replace <code>script.m</code>
with the name of your own primary script. Note that if your main
script references other <code>.m</code> files, as long as they are
present in the working directory, they will all be compiled
together with the main script into one binary.
</p>
<blockquote>If you combined your Matlab <code>.m</code> files into one <code>.tar.gz</code>
file, make sure to "un-tar" that file before running the compiling steps below. </blockquote>
<p>To access the Matlab compiler on the build node, you'll need to load a the
appropriate Matlab module. For Matlab 2015b, the module load command will look
like this:
<pre class="term">[alice@build]$ module load MATLAB/R2015b</pre>
If you want to use Matlab 2018b, load module <code>MATLAB/R2018b</code>.
Once the module is loaded, run the compilation command:
<pre class="term">[alice@build]$ mcc -m -R -singleCompThread -R -nodisplay -R -nojvm <i>script</i>.m</pre>
</p>
<blockquote>
<b>Compilation Options</b>
<p>There are other options for the <code>mcc</code> Matlab compiler
that might be necessary for specific
compiling situations. For example, if your main .m script uses a set of
Matlab functions or .m files that are contained in a subdirectory (called, say,
<code>functions</code>), then your compiling command will need to use the
<code>-a</code> flag at the end of the command like so:
<pre class="term">[alice@build]$ mcc -m \
-R -singleCompThread -R -nodisplay -R -nojvm \
script.m -a functions/</pre>
</p>
<p>(The backslashes, \, are there just to break up the full command.) </p>
<p>If you have questions about compiling your particular code, <a href="mailto:[email protected]">contact
a facilitator</a> or see
the <a href="http://www.mathworks.com/help/compiler/mcc.html">Matlab documentation</a> for
more information about using <code>mcc</code>.
</p>
</blockquote>
<p>Exit the interactive session after you have compiled your code:
<pre class="term">[alice@build]$ exit</pre>
Condor will transfer your compiled code and its scripts back automatically.</p>
<p>Back on the submit node, you
should now have the following files:
<pre class="term">
[alice@submit]$ ls -l
-rw-rw-r-- 1 user user 581724 Feb 19 14:21 mccExcludedFiles.log
-rwxrw-r-- 1 user user 94858 Feb 19 14:21 script
-rwxrw-r-- 1 user user 1024 Feb 19 14:00 script.m
-rw-rw-r-- 1 user user 3092 Feb 19 14:21 readme.txt
-rw-rw-r-- 1 user user 581724 Feb 19 14:21 requiredMCRProducts.txt
-rwxrw-r-- 1 user user 1195 Feb 19 14:21 run_script.sh
</pre>
The file <code>script</code> is the compiled Matlab binary. You will not
need the <code>mccExcludedFiles.log</code>, <code>requiredMCRProducts.txt</code>
or <code>readme.txt</code> to run your jobs.
</p>
<a name="script"></a>
<h2>C. Modifying the Executable</h2>
<p>The <code>mcc</code> command should have created a script called
<code>run_*.sh</code> (where * is the name of your Matlab script; our
example uses the name <code>script</code>). This <code>run_*.sh</code>
script will be the executable for your Matlab jobs and already has
almost all the necessary commands for running your Matlab code. </p>
<p>You'll need
to add one line at the beginning of the <code>run_*.sh</code> script that unpacks
the Matlab runtime. We'll also add some extra options to ensure Matlab
runs smoothly on any Linux system. </p>
<p>The commands that need to be added, and their location looks like this
(<b>replace <code>r2015b.tar.gz</code> with the appropriate version
of Matlab, if you used something different to compile</b>):
</p>
<pre class="file">
#!/bin/sh
# script for execution of deployed applications
#
# Sets up the MATLAB Runtime environment for the current $ARCH and executes
# the specified command.
# Add these lines to run_script.sh
tar -xzf <i>r2015b</i>.tar.gz
mkdir cache
export MCR_CACHE_ROOT=$PWD/cache
# Rest of script follows</pre>
<a name="submit"></a>
<h1>2. Running Matlab Jobs</h1>
<p>This section shows the important elements of creating
a submit file for Matlab jobs. The submit file for your job will be
different than the one used
to compile your code. As a starting point for a submit file,
see our "hello world" example: <a href="{{'/helloworld' | relative_url }}">http://chtc.cs.wisc.edu/helloworld.shtml</a>.
In what follows, replace our example <code>script</code> and <code>run_script.sh</code>
with the name of your binary and scripts. </p>
<ol>
<li>Use <code>run_script.sh</code> as the executable: <br>
<pre class="sub">executable = run_script.sh</pre>
</li>
<li>In order for your Matlab code to run, you will need to use
a Matlab runtime package. This package is easily downloaded from CHTC's web proxy; the
version <b>must</b> match the version you used to compile
your code. Options available on our proxy include:
<ul>
<li><code>r2015b.tar.gz</code></li>
<li><code>r2018b.tar.gz</code></li>
</ul>
To send the runtime package to your jobs, list a link to the appropriate version
in your <code>transfer_input_files</code> line, as
well as your compiled binary and any necessary input files:
<pre class="sub">transfer_input_files = http://proxy.chtc.wisc.edu/SQUID/<i>r2015b</i>.tar.gz,script,input_data</pre>
</li>
<li> The <code>run_script.sh</code> will expect the runtime directory name to be provided
as an argument specified in the submit file (as described in <code>readme.txt</code>).
<table class="gtable">
<tr><th>Matlab version</th> <th>Runtime directory name</th></tr>
<tr><td><code>r2015b.tar.gz</code></td> <td><code>v90</code></td></tr>
<tr><td><code>r2018b.tar.gz</code></td> <td><code>v95</code></td></tr>
</table>
<br>
So to run a Matlab job using <code>r2015b</code> and no additional arguments, the
arguments line in the submit file should read: <br>
<pre class="sub">arguments = v90</pre>
If you <i>are</i> passing additional arguments to the script, they can
go after the first "runtime" argument:
<pre class="sub">arguments = v90 $(Cluster) $(Process) </pre>
</li>
<li>As always, test a few jobs for disk space/memory usage in order to
make sure your requests for a large batch are accurate! Disk space and
memory usage can be found in the log file after the job completes.
<b>If you are using Matlab 2018b, request at least
5.5GB of DISK</b> as the runtime is very large for this version of Matlab. </li>
</ol>
<a name="random"></a>
<h1>Ensuring Randomness</h1>
<p>This section is only relevant for Matlab scripts that
use Matlab's random number functions like <code>rand</code>.</p>
<p>Whenever Matlab is started for the first time on a new computer,
the random number generator begins from the same state. When you
run multiple Matlab jobs, each job is using a copy of Matlab that
is being used for the first time -- thus, every job will start with
the same random number generator and produce identical results. </p>
<p>There are different ways to ensure that each job is using different
randomly generated numbers.
<a href="http://www.mathworks.com/help/matlab/math/why-do-random-numbers-repeat-after-startup.html?refresh=true">This
Mathworks page</a> describes one way to "reset" the random number
generator so that it produces different random values when Matlab
runs for the first time. Deliberately choosing your own different
random seed values for each job can be another way to ensure different
results.</p>
<blockquote>
For older version of this guide, see out <a href="{{'archived/matlab-jobs' | relative_url }}">archived page</a>.
</blockquote>