source code: https://github.com/cchen156/Learning-to-See-in-the-Dark
-
Run
pip install tensorflow tensorflowonspark
on all the machines (Dom0, VM1 - VM8) -
Add the following lines to /etc/profile file:
export QUEUE=default
export LIB_HDFS=$HADOOP_HOME/lib/native
export LIB_JVM=$JAVA_HOME/jre/lib/amd64/server
export SPARK_HOME=/opt/spark-2.4.0-bin-hadoop2.7
export LD_LIBRARY_PATH=${PATH}
-
Test run (6 images, 10 epochs, batch size 2). Input directory with the test dataset is
hdfs://gpu10:9000/Sony_pickle_test/
, model output ishdfs://gpu10:9000/Sony_model_test
.
${SPARK_HOME}/bin/spark-submit
\
--master yarn
\
--deploy-mode cluster
\
--num-executors 15
\
--driver-memory 3G
\
--executor-memory 3G
\
--py-files /home/hduser/see-in-the-dark/train_Sony.py,/home/hduser/see-in-the-dark/inference_Sony.py,/home/hduser/see-in-the-dark/inference_Sony_our.py
\
--conf spark.dynamicAllocation.enabled=false
\
--conf spark.yarn.maxAppAttempts=1
\
--conf spark.executorEnv.LD_LIBRARY_PATH=$LIB_JVM:$LIB_HDFS
\
--conf spark.driver.memory=3G
\
--conf spark.executor.memory=3G
\
--conf spark.driver.maxResultSize=2G
\
--conf spark.executor.cores=1
\
--conf spark.task.cpus=1
\
/home/hduser/see-in-the-dark/script.py
\
--batch_size 2
\
--steps 30
\
--model hdfs://gpu10:9000/Sony_model_test
\
--input-dir hdfs://gpu10:9000/Sony_pickle_test/image_data
\
--gt-dir hdfs://gpu10:9000/Sony_pickle_test/gt_data
To run in a client mode replace the following lines:
--deploy-mode client
\
--driver-memory 1G
\
--conf spark.yarn.am.memory=1G
\ -
Full dataset. Input directory with the full dataset is
hdfs://gpu10:9000/Sony_pickle/
, model output ishdfs://gpu10:9000/Sony_model
.
${SPARK_HOME}/bin/spark-submit
\
--master yarn
\
--deploy-mode cluster
\
--num-executors 15
\
--driver-memory 3G
\
--executor-memory 3G
\
--py-files /home/hduser/see-in-the-dark/train_Sony.py,/home/hduser/see-in-the-dark/inference_Sony.py,/home/hduser/see-in-the-dark/inference_Sony_our.py
\
--conf spark.dynamicAllocation.enabled=false
\
--conf spark.yarn.maxAppAttempts=1
\
--conf spark.executorEnv.LD_LIBRARY_PATH=$LIB_JVM:$LIB_HDFS
\
--conf spark.driver.memory=3G
\
--conf spark.executor.memory=3G
\
--conf spark.driver.maxResultSize=2G
\
--conf spark.executor.cores=1
\
--conf spark.task.cpus=1
\
/home/hduser/see-in-the-dark/script.py
${SPARK_HOME}/bin/spark-submit
--master yarn
--deploy-mode cluster
--queue ${QUEUE}
--num-executors 15
--driver-memory 3G
--executor-memory 3G
--py-files /tmp/pycharm_rustam/train_Sony.py,/tmp/pycharm_rustam/inference_Sony.py,/tmp/pycharm_rustam/inference_Sony_our.py
--conf spark.dynamicAllocation.enabled=false
--conf spark.yarn.maxAppAttempts=1
--conf spark.executorEnv.LD_LIBRARY_PATH=$LIB_JVM:$LIB_HDFS
--conf spark.driver.memory=3G
--conf spark.executor.memory=3G
--conf spark.driver.maxResultSize=2G
--conf spark.executor.cores=1
--conf spark.task.cpus=1
/tmp/pycharm_rustam/script.py
--mode inference
--steps 1
--model hdfs://gpu10:9000/Sony_model
--inference our
--inputfile hdfs://gpu10:9000/predict_images/20005_01_0.1s.ARW20190418-150337.pkl --outputfile testResult.pkl
- To start flask application, do the following commands:
cd flask_app
\
source flaskapp/bin/activate
\
export FLASK_APP=flask_app.py
\
flask run --host=0.0.0.0 --port=6000
- Connect to vpn.cs.hku.hk, use browser to connect http://202.45.128.135:22610/
- Upload ARW image to the cluster via the web applciation. Image uploading and processing might take 2-4 minutes depending on file size and network speed.