Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tips in docs #5566

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 16 additions & 9 deletions docs/Learning-Environment-Create-New.md
Original file line number Diff line number Diff line change
Expand Up @@ -231,8 +231,8 @@ calculate an analytical solution to the problem.
In our case, the information our Agent collects includes the position of the
target, the position of the agent itself, and the velocity of the agent. This
helps the Agent learn to control its speed so it doesn't overshoot the target
and roll off the platform. In total, the agent observation contains 8 values as
implemented below:
and roll off the platform. In total, the agent observation contains 8 values
asimplemented below:

```csharp
public override void CollectObservations(VectorSensor sensor)
Expand Down Expand Up @@ -260,14 +260,18 @@ the first determines the force applied along the x-axis; and the
second determines the force applied along the z-axis. (If we allowed the Agent
to move in three dimensions, then we would need a third action.)

The RollerAgent applies the values from the `action[]` array to its Rigidbody
The RollerAgent applies the values from the `actionBuffers.ContinuousActions[]` array to its Rigidbody
component `rBody`, using `Rigidbody.AddForce()`:

```csharp
Vector3 controlSignal = Vector3.zero;
controlSignal.x = action[0];
controlSignal.z = action[1];
rBody.AddForce(controlSignal * forceMultiplier);
public override void OnActionReceived(ActionBuffers actionBuffers)
{
float[] continuousActions = actionBuffers.ContinuousActions;
Vector3 controlSignal = Vector3.zero;
controlSignal.x = continuousActions[0];
controlSignal.z = continuousActions[1];
rBody.AddForce(controlSignal * forceMultiplier);
}
```

#### Rewards
Expand Down Expand Up @@ -315,9 +319,10 @@ public float forceMultiplier = 10;
public override void OnActionReceived(ActionBuffers actionBuffers)
{
// Actions, size = 2
float[] continuousActions = actionBuffers.ContinuousActions;
Vector3 controlSignal = Vector3.zero;
controlSignal.x = actionBuffers.ContinuousActions[0];
controlSignal.z = actionBuffers.ContinuousActions[1];
controlSignal.x = continuousActions[0];
controlSignal.z = continuousActions[1];
rBody.AddForce(controlSignal * forceMultiplier);

// Rewards
Expand Down Expand Up @@ -434,6 +439,8 @@ behaviors:

Hyperparameters are explained in [the training configuration file documentation](Training-Configuration-File.md)

Make sure the Behaviour name in the `Bahaviour Parameters` component matches the one in the config file.

Since this example creates a very simple training environment with only a few
inputs and outputs, using small batch and buffer sizes speeds up the training
considerably. However, if you add more complexity to the environment or change
Expand Down