@@ -385,18 +385,29 @@ $ pushd $MODEL_WORK_DIR
385
385
386
386
To run for training, use the following command.
387
387
388
+ > Note: for best performance, use the same value for the arguments num-cores and num-intra-thread as follows:
389
+ > For single instance run (mpi_num_processes=1): the value is equal to number of logical cores per socket.
390
+ > For multi-instance run (mpi_num_processes > 1): the value is equal to (#_ of_logical_cores_per_socket - 2).
391
+ > If the ` --num-cores ` or ` --num-intra-threads ` args are not specified, these args will be calculated based on
392
+ > the number of logical cores on your system.
393
+
388
394
``` bash
389
395
$ cd $MODEL_WORK_DIR /models/benchmarks/
390
396
391
397
$ python3 launch_benchmark.py \
392
398
--data-location /path/to/coco-dataset \
393
399
--model-source-dir $MODEL_WORK_DIR /tf_models \
394
- --model-name ssd-resnet34 --framework tensorflow \
400
+ --model-name ssd-resnet34 \
401
+ --framework tensorflow \
395
402
--precision fp32 --mode training \
396
- --num-train-steps 100 --num-cores 52 \
397
- --num-inter-threads 1 --num-intra-threads 52 \
398
- --batch-size=52 --weight_decay=1e-4 \
399
- --mpi_num_processes=1 --mpi_num_processes_per_socket=1 \
403
+ --num-train-steps 100 \
404
+ --num-cores 52 \
405
+ --num-inter-threads 1 \
406
+ --num-intra-threads 52 \
407
+ --batch-size=100 \
408
+ --weight_decay=1e-4 \
409
+ --mpi_num_processes=1 \
410
+ --mpi_num_processes_per_socket=1 \
400
411
--docker-image intel/intel-optimized-tensorflow:2.3.0
401
412
```
402
413
@@ -408,22 +419,30 @@ $ pushd $MODEL_WORK_DIR
408
419
2 . Next, navigate to the benchmarks directory of the intelai/models repository that was cloned earlier.
409
420
Use the below command to test performance by training the model for a limited number of steps:
410
421
411
- Note: for best performance, use the same value for the arguments num-cores and num-intra-thread as follows:
412
- For single instance run (mpi_num_processes=1): the value is equal to number of logical cores per socket.
413
- For multi-instance run (mpi_num_processes > 1): the value is equal to (#_ of_logical_cores_per_socket - 2).
422
+ > Note: for best performance, use the same value for the arguments num-cores and num-intra-thread as follows:
423
+ > For single instance run (mpi_num_processes=1): the value is equal to number of logical cores per socket.
424
+ > For multi-instance run (mpi_num_processes > 1): the value is equal to (#_ of_logical_cores_per_socket - 2).
425
+ > If the ` --num-cores ` or ` --num-intra-threads ` args are not specified, these args will be calculated based on
426
+ > the number of logical cores on your system.
414
427
415
428
``` bash
416
429
$ cd $MODEL_WORK_DIR /models/benchmarks/
417
430
$ python3 launch_benchmark.py \
418
431
--data-location < path to coco_training_dataset> \
419
432
--model-source-dir < path to tf_models> \
420
- --model-name ssd-resnet34 --framework tensorflow \
421
- --precision bfloat16 --mode training \
422
- --num-train-steps 100 --num-cores 52 \
423
- --num-inter-threads 1 --num-intra-threads 52 \
424
- --batch-size=100 --weight_decay=1e-4 \
433
+ --model-name ssd-resnet34 \
434
+ --framework tensorflow \
435
+ --precision bfloat16 \
436
+ --mode training \
437
+ --num-train-steps 100 \
438
+ --num-cores 52 \
439
+ --num-inter-threads 1 \
440
+ --num-intra-threads 52 \
441
+ --batch-size=100 \
442
+ --weight_decay=1e-4 \
425
443
--num_warmup_batches=20 \
426
- --mpi_num_processes=1 --mpi_num_processes_per_socket=1 \
444
+ --mpi_num_processes=1 \
445
+ --mpi_num_processes_per_socket=1 \
427
446
--docker-image intel/intel-optimized-tensorflow:2.3.0
428
447
```
429
448
0 commit comments