Our pipeline relies on Python interpreter to execute code generated by LLMs. This creates a security risk, since we are executing arbitrary code that we do not have full control over. To partially address this, we provide a basic sandbox that we use to execute code and validate the correctness of LLM-generated answers.
Please note that our provided sandbox is not fully secure and you are strongly encouraged to setup a properly configured virtual machine such that generated code executes in an unprivileged environment with no external network access unless necessary.
The default sandbox option used in our pipeline is a local docker container. Check out nemo_skills/code_execution/local_sandbox for implementation details.
The support for running code via AWS lambda functions is coming soon!