Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimDIT - Generation of DNN training spec json file #13

Open
kompalas opened this issue Sep 23, 2023 · 0 comments
Open

SimDIT - Generation of DNN training spec json file #13

kompalas opened this issue Sep 23, 2023 · 0 comments

Comments

@kompalas
Copy link

Hello! The issue below refers to the genesys project.

I wanted to use the SimDIT framework for perfomance analysis for training custom DNN architectures. A json file is required, which describes the entire network in detail and currently, only ResNet50 (for ImageNet classification) is provided as a template (in SimDIT/DNN_Spec_Training/ResNet50_training_Spec_Locked.json for example).

I tried generating such a specification file for ResNet18 on the same dataset using the compile_benchmark.py script with different configuration/architecture files.

Trying the command below:

python compile_benchmark.py --model resnet18 --config genesys/examples/genesys/configs/benchmark_train.json

gave the error trace:

Number fusion layers: 32
Number fusion layers: 32
Traceback (most recent call last):
  File "compile_benchmark.py", line 138, in <module>
    compile_benchmark(fname,
  File "compile_benchmark.py", line 99, in compile_benchmark
    program.compile(verbose=verbose, finalize=True, stop_stage=stop_stage)
  File "/home/balkon00/anaconda3/envs/verigood_ml/lib/python3.8/site-packages/codelets-0.1.0-py3.8.egg/codelets/compiler/program.py", line 1166, in compile
    codelets = self.instantiate_all_codelets(node_sequence, verbose=verbose)
  File "/home/balkon00/anaconda3/envs/verigood_ml/lib/python3.8/site-packages/codelets-0.1.0-py3.8.egg/codelets/compiler/program.py", line 878, in instantiate_all_codelets
    cdlt = self.instantiate_codelet(n)
  File "/home/balkon00/anaconda3/envs/verigood_ml/lib/python3.8/site-packages/codelets-0.1.0-py3.8.egg/codelets/compiler/program.py", line 371, in instantiate_codelet
    cdlt_template = self.get_template_through_mapping(node)
  File "/home/balkon00/anaconda3/envs/verigood_ml/lib/python3.8/site-packages/codelets-0.1.0-py3.8.egg/codelets/compiler/program.py", line 359, in get_template_through_mapping
    raise RuntimeError(f"Unable to match node operation to codelet with the same name:\n"
RuntimeError: Unable to match node operation to codelet with the same name:
Node operation: reduce_sum
Input shape dimensions: [4]
Output shape dimensions: [1]
Codelet: reduce_sum
Codelet shapes: [2, 1]

I minorly tried converting existing configuration files to accept a training directive (added the line "TRAINING": true in genesys/examples/genesys/configs/benchmark_8x8.json for example) without success, and a different error trace this time:

Traceback (most recent call last):
 File "compile_benchmark.py", line 138, in <module>
   compile_benchmark(fname,
 File "compile_benchmark.py", line 55, in compile_benchmark
   program, _ = compile_full_model(model_name,
 File "/home/balkon00/anaconda3/envs/verigood_ml/lib/python3.8/site-packages/codelets-0.1.0-py3.8.egg/codelets/examples/genesys/genesys_network_sim.py", line 321, in compile_full_model
   program = compile_genesys(model_name,
 File "/home/balkon00/anaconda3/envs/verigood_ml/lib/python3.8/site-packages/codelets-0.1.0-py3.8.egg/codelets/examples/genesys/genesys.py", line 308, in compile_genesys
   graph = run_srdfg_passes(graph, def_cfg, batch_size=batch_size,
 File "/home/balkon00/anaconda3/envs/verigood_ml/lib/python3.8/site-packages/codelets-0.1.0-py3.8.egg/codelets/examples/genesys/genesys.py", line 227, in run_srdfg_passes
   graph = fusion_pass(graph)
 File "/home/balkon00/anaconda3/envs/verigood_ml/lib/python3.8/site-packages/polymath-0.1.0-py3.8.egg/polymath/srdfg/passes/__init__.py", line 229, in __call__
   initialized_node = self.initialize_pass(gcpy, self.ctx)
 File "/home/balkon00/anaconda3/envs/verigood_ml/lib/python3.8/site-packages/polymath-0.1.0-py3.8.egg/polymath/srdfg/passes/dnn_passes.py", line 286, in initialize_pass
   self.fuse_layers(graph, fused_nodes, pf)
 File "/home/balkon00/anaconda3/envs/verigood_ml/lib/python3.8/site-packages/polymath-0.1.0-py3.8.egg/polymath/srdfg/passes/dnn_passes.py", line 392, in fuse_layers
   self.topological_insert(graph, node)
 File "/home/balkon00/anaconda3/envs/verigood_ml/lib/python3.8/site-packages/polymath-0.1.0-py3.8.egg/polymath/srdfg/passes/dnn_passes.py", line 402, in topological_insert
   assert all([i.name in graph.nodes for i in node.inputs])
AssertionError

I ensured that all required tools/packages were installed correctly.
How can I generate the DNN training specification file correctly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant