We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPO的代码中如果将action加入到critic网络的输入向量中,只将action作为一维直接加入,训练效果很差;之后将state的四个值归一化处理后效果还是不好。 请问在critic网络输入加入action后对输入怎么处理能使训练得到较好的效果?
感谢作者提供的代码
The text was updated successfully, but these errors were encountered:
ppo的critic不是用来估计状态值V(s)的吗,为啥要加入action?那成了DDPG吧
Sorry, something went wrong.
johnjim0816
No branches or pull requests
PPO的代码中如果将action加入到critic网络的输入向量中,只将action作为一维直接加入,训练效果很差;之后将state的四个值归一化处理后效果还是不好。
请问在critic网络输入加入action后对输入怎么处理能使训练得到较好的效果?
感谢作者提供的代码
The text was updated successfully, but these errors were encountered: