Skip to content

Commit 54017db

Browse files
committed
docs update
1 parent 1bae7dc commit 54017db

28 files changed

+555
-274
lines changed

CHANGELOG.md

+15-4
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,17 @@
1-
Application Kernels Remote Runner
2-
=========================================
1+
# Application Kernels Remote Runner (AKRR ) Change Log
32

4-
## 2019-10-21 v2.1.0
3+
## 2020-08-10 v2.1.0
54

6-
- Many improvements and bug fixes over v1.0
5+
- new CLI interface. Single entry to all routines: ```akrr [-v] <command> [arguments]```.
6+
- Converted to python3.
7+
- RPM installation is available for CentOS 7.8.
8+
- New AppKernels: HPCG, MDTest and ENZO.
9+
- New resource types support: OpenStack and stand alone machine without queue system (shell).
10+
- Batch appkernels task submission, for example run all appkernels 10 times each as soon as possible.
11+
Usable for checking performance before/after update.
12+
- Docker container based appkernel execution (HPCC on single node on docker enabled resources).
13+
- Many other improvements and bug fixes.
14+
15+
## 2015-08-15 v1.0
16+
17+
- Initial AKRR v1.0 release.

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ a given run frequency. Accordingly, through XDMoD, system managers have the
1414
ability to proactively monitor system performance as opposed to having to rely
1515
on users to report failures or underperforming hardware and software.
1616

17-
* [Overview](docs/AKRR_Overview.md)
17+
* [Overview](docs/index.md)
1818
* [Download](docs/AKRR_Download.md)
1919
* [Installation](docs/AKRR_Install.md)
2020
* [Update](docs/AKRR_Update.md)

docs/AKRR_Add_Resource.md

+20-26
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,15 @@
1-
This section describes how to add traditional HPC resource to AKRR. On
2-
addition of OpenStack check
3-
[Adding OpenStack Resource and Application Kernels Deployment](AKRR_Add_OpenStack_Resource_and_AppKernels.md.md)
1+
This section describes how to add traditional HPC resource to AKRR. AKRR also support
2+
execution on systems without queueing system and [OpenStack](AKRR_Add_OpenStack_Resource_and_AppKernels.md.md) (limited support at this point).
3+
44

55
# Adding a New HPC Resource
66

7-
Addition of new HPC resource to AKRR consists of two steps: configuration of
8-
new resource and deployment of AKRR's HPC resource-side scripts and application
9-
inputs. The last step also performs installation validation.
7+
HPC resource is added in two steps:
8+
9+
1) Configuration of new resource
10+
2) Deployment of AKRR's HPC resource-side scripts and application inputs.
11+
12+
The last step also performs installation validation.
1013

1114
From the AKRR point of view, an HPC resource is a distinct and **homogeneous
1215
set** of computational nodes. The resource name should reflect such a set. For
@@ -72,10 +75,10 @@ XDMoD name._
7275

7376
> **Tips and Tricks**
7477
>
75-
> If resource headnode do not reply on pinging use __--no-ping__ do disable that check.
78+
> If resource headnode do not reply on pinging use __--no-ping__ argument do disable that check.
7679
>
7780
> If your system is fairly non-standard (for example non-default port for ssh,
78-
> usage of globus-ssh for access and similar) you can use __--minimalistic__ option.
81+
> usage of globus-ssh for access and similar) you can use __--minimalistic__ argument.
7982
> This option sets a minimalistic interactive session and the generated
8083
> configuration file must be manually edited.
8184
@@ -182,12 +185,11 @@ and check to make sure that only the key(s) you wanted were added.
182185
and move to resource validation and deployment step.
183186
i.e. execute:
184187
akrr resource deploy -r ub-hpc
185-
186188
```
187189

188190
> **Tips and Tricks**
189191
>
190-
> reducing number of ssh connection
192+
> **Reducing number of ssh connection**:
191193
> AKRR would generate a large number of ssh connections. If you don't want to
192194
> stress you headnode in this manner you can set ssh to reuse the connections. Add
193195
> following to ~/.ssh/config :
@@ -287,20 +289,8 @@ batchJobHeaderTemplate template variable
287289
will become "#SBATCH --nodes=2" in batch job script if application kernel should
288290
run on two nodes.
289291
290-
In order to enter shell curly brackets they should be enter as double curly
291-
brackets. All double curly brackets will be replaced with single curly bracket
292-
during batch job script generation.
293-
294-
> Example:
295-
>
296-
> "awk "{{for (i=0;i<$_TASKS_PER_NODE;++i)print}}"
297-
>
298-
> in  template variable will become:
299-
>
300-
> "awk "{for (i=0;i<$_TASKS_PER_NODE;++i)print}"
301-
>
302-
> in  batch job script.
303-
>
292+
In order to enter curly brackets itself they should be enter as double curly
293+
brackets (i.e. ${{ENV_VAR}} in template will be ${ENV_VAR} in resulting script).
304294
305295
The commented parameters will assume default values. Below is the description of
306296
the parameters and their default values:
@@ -324,16 +314,18 @@ the parameters and their default values:
324314
| **Batch job script settings** |
325315
| batch_scheduler | N | Scheduler type: slurm or pbs. sge might work as well but was not tested | Must be set |
326316
| batch_job_header_template | N | Header for batch job script. Describe the resources requests and set AKRR_NODELIST environment variable containing list of all nodes.See below for more detailed information. | Must be set |
317+
| max_number_of_active_tasks | Y | Maximal number of active tasks, default is -1, that is no limits | -1 |
327318
328319
329-
## How to set _batch_job_header_template_
330-
331320
_batch_job_header_template _is a template used in the generation of batch job
332321
scripts. It specifies the resources (e.g. number of nodes) and other parameters
333322
used by scheduler.
334323
335324
The following are instructions on how to convert batch job script header to _batch_job_header_template_.
325+
326+
<!---
336327
For more details see [Batch Job Script Generation](AKRR_Batch_Job_Script_Generation.md).
328+
--->
337329
338330
Below is a batch script which execute NAMD application on resorch which use Slurm:
339331
@@ -401,9 +393,11 @@ below:
401393
402394
Now, we can generate test application kernel batch job script and visually
403395
inspect it for mistake presence. Run:
396+
404397
```bash
405398
akrr task new --dry-run --gen-batch-job-only -r <resource_name> -a test -n 2
406399
```
400+
407401
This command will generate batch job script and output it to standard output.
408402
Below is example of the output
409403
+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# How AKRR Generate Batch Scripts
2+
3+
AKRR is designed with need to execute multiple application kernels on multiple HPC resources.
4+
The complexity of this task in part comes from
5+
the fact that various HPC resources can be severely different in hardware and software.
6+
For example, HPC resources can use different queueing
7+
systems, different vendors and versions of MPI, BLAS and others libraries.
8+
Furthermore, some applications can be compiled in number of way
9+
which will affect how they should be executed (e.g. TCP/IP vs MPI).
10+
This makes impossible to create a single job script which will work on all
11+
platforms.
12+
A large set of separate job scripts for every application kernel on every HPC resource would be unbearable to maintain.
13+
AKRR
14+
addresses variability of batch job scripts by the use of templates.
15+
16+
Job script is created from template for every new task.
17+
The root template for batch job script is located at
18+
$AKRR_SRC/akrr/default_conf/default.resource.conf and listed bellow:
19+
20+
```python
21+
22+
```

docs/AKRR_Deployment_of_Application_Kernel_on_Resource.md

+32-13
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,13 @@ rarely installed system-wide and thus they need to be installed first.
3434

3535
## Generate Initiate Configuration File
3636

37-
The initial configuration file is generated with
38-
_akrr app add_ command. It will generate an initial
39-
configuration file and place it to $AKRR_HOME/cfg/resource/<app kernelname>.conf.
37+
The initial configuration file is generated.
38+
39+
```bash
40+
akrr app add -r <resource_name> -a <appkernel_name>
41+
```
42+
It will generate an initial
43+
configuration file and place it to _$AKRR_HOME/etc/resource/<resource_name>/<appkernel_name>.conf_.
4044

4145
# Edit Configuration File
4246

@@ -47,39 +51,54 @@ application kernel on this particular machine/resource.  
4751
# Generate Batch Job Script and Execute it Manually (Optional)
4852

4953
The purpose of this step is to ensure that the configuration lead to a correct
50-
(and workable) batch job script. First the batch job script is generated with
51-
**'akrr_ctl.sh batch_job'**. Then this script is executed in an interactive
52-
session (this improves the turn-around in case of errors). If the script fails
53-
to execute, the issues can be fixed first in that script itself and then merged
54-
with the configuration file.
54+
(and workable) batch job script. First the batch job script is generated as:
55+
```bash
56+
# only print batch job script
57+
akrr task new --dry-run --gen-batch-job-only -r <resource_name> -a <appkernel_name> -n <number_of_nodes>
58+
# generate batch job script and copy it to resource (without running it)
59+
akrr task new --gen-batch-job-only -r <resource_name> -a <appkernel_name> -n <number_of_nodes>
60+
```
61+
62+
Then this script is submitted manually ot executed in an interactive session
63+
(this improves the turn-around in case of errors). If the script fails
64+
to execute, the issues can be fixed first in that script itself followed by respective updates in configuration file.
5565

5666
This step is somewhat optional because it is very similar to the next step.
5767
However the opportunity to work in an interactive session will often improve the
5868
turn-around time because there is no need to stay in queue for each iteration.
5969

6070
# Perform Validation Run
6171

62-
For this step **appkernel_validation.py** utility is used to validate
63-
application kernel installation on particular resource. It execute the
64-
application kernel and analyses its results. If it fails the problems need to be
72+
This step validates application kernel installation on the resource.
73+
74+
```bash
75+
akrr app validate -r <resource_name> -a <appkernel_name> -n <number_of_nodes>
76+
```
77+
78+
It execute the application kernel and analyses its results. If it fails the problems need to be
6579
fixed and another round of validation should be performed
6680

6781
# Schedule regular execution of application kernel
6882

6983
Finally, if validation was successful the application kernel can be submited for
7084
regular execution on that resource.
7185

86+
```bash
87+
akrr task new -r <resource_name> -a <appkernel_name> -n <list of nodes counts> -p <periodicity> \
88+
-s <first submit date-time>
89+
```
90+
7291
# Details on the Individual Application Kernels Deployment
7392

7493
* [NAMD Deployment](AKRR_NAMD_Deployment.md)
7594
* [HPCC Deployment](AKRR_HPCC_Deployment.md)
95+
* [HPCG Deployment](AKRR_HPCG_Deployment.md)
7696
* [IMB Deployment](AKRR_IMB_Deployment.md)
7797
* [IOR Deployment](AKRR_IOR_Deployment.md)
78-
* [HPCG Deployment](AKRR_HPCG_Deployment.md)
98+
* [MDTest Deployment](AKRR_MDTest_Deployment.md)
7999
* [NWChem Deployment](AKRR_NWChem_Deployment.md)
80100
* [GAMESS Deployment](AKRR_GAMESS_Deployment.md)
81101
* [Enzo Deployment](AKRR_Enzo_Deployment.md)
82-
83102
* [Creating New Application Kernel](AKRR_Creating_New_Application_Kernel.md)
84103

85104

docs/AKRR_Download.md

+2-3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
# AKRR Download
22

3-
RPM package and source code of upcomming new version of AKRR (2.0) will be
4-
available upon release at <https://github.com/ubccr/akrr/releases>.
3+
RPM package and source code of resent releases is available at <https://github.com/ubccr/akrr/releases>.
54

6-
Next: [AKRR installation](AKRR_Install.md)
5+
Next: [AKRR installation](AKRR_Install.md)

docs/AKRR_ENZO_Deployment.md

+3
Original file line numberDiff line numberDiff line change
@@ -308,6 +308,9 @@ akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8
308308

309309
#Start daily execution from today on nodes 1,2,4,8 and distribute execution time between 1:00 and 5:00
310310
akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8 -t0 "01:00" -t1 "05:00" -p 1
311+
312+
# Run on all nodes count 20 times (default number of runs to establish baseline)
313+
akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8 --n-runs 20
311314
```
312315

313316
see [Scheduling and Rescheduling Application Kernels](AKRR_Tasks_Scheduling.md) and 

docs/AKRR_GAMESS_Deployment.md

+3
Original file line numberDiff line numberDiff line change
@@ -351,6 +351,9 @@ akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8
351351

352352
#Start daily execution from today on nodes 1,2,4,8 and distribute execution time between 1:00 and 5:00
353353
akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8 -t0 "01:00" -t1 "05:00" -p 1
354+
355+
# Run on all nodes count 20 times (default number of runs to establish baseline)
356+
akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8 --n-runs 20
354357
```
355358

356359
see [Scheduling and Rescheduling Application Kernels](AKRR_Tasks_Scheduling.md) and 

docs/AKRR_HPCC_Deployment.md

+3
Original file line numberDiff line numberDiff line change
@@ -356,6 +356,9 @@ akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8
356356

357357
#Start daily execution from today on nodes 1,2,4,8 and distribute execution time between 1:00 and 5:00
358358
akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8 -t0 "01:00" -t1 "05:00" -p 1
359+
360+
# Run on all nodes count 20 times (default number of runs to establish baseline)
361+
akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8 --n-runs 20
359362
```
360363

361364
see [Scheduling and Rescheduling Application Kernels](AKRR_Tasks_Scheduling.md) and 

docs/AKRR_HPCG_Deployment.md

+3
Original file line numberDiff line numberDiff line change
@@ -264,6 +264,9 @@ akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8
264264

265265
#Start daily execution from today on nodes 1,2,4,8 and distribute execution time between 1:00 and 5:00
266266
akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8 -t0 "01:00" -t1 "05:00" -p 1
267+
268+
# Run on all nodes count 20 times (default number of runs to establish baseline)
269+
akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8 --n-runs 20
267270
```
268271

269272
see [Scheduling and Rescheduling Application Kernels](AKRR_Tasks_Scheduling.md) and 

docs/AKRR_IMB_Deployment.md

+3
Original file line numberDiff line numberDiff line change
@@ -295,6 +295,9 @@ akrr task new -r $RESOURCE -a $APPKER -n 2,4,8
295295

296296
#Start daily execution from today on nodes 2,4,8 and distribute execution time between 1:00 and 5:00
297297
akrr task new -r $RESOURCE -a $APPKER -n 2,4,8 -t0 "01:00" -t1 "05:00" -p 1
298+
299+
# Run on all nodes count 20 times (default number of runs to establish baseline)
300+
akrr task new -r $RESOURCE -a $APPKER -n 2,4,8 --n-runs 20
298301
```
299302

300303
see [Scheduling and Rescheduling Application Kernels](AKRR_Tasks_Scheduling.md) and 

docs/AKRR_IOR_Deployment.md

+4-1
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ installed parallel HDF5 library than you might want to skip it.
4141
Below are brief notes on parallel hdf5 installation,  
4242
[http://www.hdfgroup.org/HDF5/](http://www.hdfgroup.org/HDF5/) for HDF5 installation details.
4343

44-
ior-3.2.0 does not work yet with hdf5-1.10.5, so use hdf5-1.8.*.
44+
> **Note:** ior-3.2.0 does not work with hdf5-1.10.\*, so use hdf5-1.8.\* for that version or use development version of ior (3.3-dev).
4545
4646
**On target resource:**
4747
```bash
@@ -799,6 +799,9 @@ akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8
799799

800800
#Start daily execution from today on nodes 1,2,4,8 and distribute execution time between 1:00 and 5:00
801801
akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8 -t0 "01:00" -t1 "05:00" -p 1
802+
803+
# Run on all nodes count 20 times (default number of runs to establish baseline)
804+
akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8 --n-runs 20
802805
```
803806

804807
see [Scheduling and Rescheduling Application Kernels](AKRR_Tasks_Scheduling.md) and 

docs/AKRR_MDTest_Deployment.md

+3
Original file line numberDiff line numberDiff line change
@@ -283,6 +283,9 @@ akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8
283283

284284
#Start daily execution from today on nodes 1,2,4,8 and distribute execution time between 1:00 and 5:00
285285
akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8 -t0 "01:00" -t1 "05:00" -p 1
286+
287+
# Run on all nodes count 20 times (default number of runs to establish baseline)
288+
akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8 --n-runs 20
286289
```
287290

288291
see [Scheduling and Rescheduling Application Kernels](AKRR_Tasks_Scheduling.md) and 

docs/AKRR_NAMD_Deployment.md

+4-1
Original file line numberDiff line numberDiff line change
@@ -504,11 +504,14 @@ DONE, you can move to next step!
504504
Now this application kernel can be submitted for regular execution:
505505

506506
```bash
507-
#Perform a test run on all nodes count
507+
# Perform a test run on all nodes count
508508
akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8
509509

510510
#Start daily execution from today on nodes 1,2,4,8 and distribute execution time between 1:00 and 5:00
511511
akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8 -t0 "01:00" -t1 "05:00" -p 1
512+
513+
# Run on all nodes count 20 times (default number of runs to establish baseline)
514+
akrr task new -r $RESOURCE -a $APPKER -n all --n-runs 20
512515
```
513516

514517
see [Scheduling and Rescheduling Application Kernels](AKRR_Tasks_Scheduling.md) and 

docs/AKRR_NWChem_Deployment.md

+3
Original file line numberDiff line numberDiff line change
@@ -307,6 +307,9 @@ akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8
307307

308308
#Start daily execution from today on nodes 1,2,4,8 and distribute execution time between 1:00 and 5:00
309309
akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8 -t0 "01:00" -t1 "05:00" -p 1
310+
311+
# Run on all nodes count 20 times (default number of runs to establish baseline)
312+
akrr task new -r $RESOURCE -a $APPKER -n 1,2,4,8 --n-runs 20
310313
```
311314

312315
see [Scheduling and Rescheduling Application Kernels](AKRR_Tasks_Scheduling.md) and 

docs/AKRR_Overview.md

-30
This file was deleted.

0 commit comments

Comments
 (0)