Skip to content

Commit 377a25c

Browse files
vincentpierreErvin Txstreck1Chris ElionJonathan Harper
authored
Hotfixes for Release 0.15.1 (#3698)
* [bug-fix] Increase height of wall in CrawlerStatic (#3650) * [bug-fix] Improve performance for PPO with continuous actions (#3662) * Corrected a typo in a name of a function (#3670) OnEpsiodeBegin was corrected to OnEpisodeBegin in Migrating.md document * Add Academy.AutomaticSteppingEnabled to migration (#3666) * Fix editor port in Dockerfile (#3674) * Hotfix memory leak on Python (#3664) * Hotfix memory leak on Python * Fixing * Fixing a bug in the heuristic policy. A decision should not be requested when the agent is done * [bug-fix] Make Python able to deal with 0-step episodes (#3671) * adding some comments Co-authored-by: Ervin T <[email protected]> * Remove vis_encode_type from list of required (#3677) * Update changelog (#3678) * Shorten timeout duration for environment close (#3679) The timeout duration for closing an environment was set to the same duration as the timeout when waiting for a response from the still-running environment. This led to long waits for the error response when communication version wasn't matching. This change forces a timeout duration of 0 when handling errors. * Bumping the versions * handle multiple dones in a single step (#3700) * handle multiple dones in a single step * [tests] Make end-to-end tests more stable (#3697) * [bug-fix] Fix entropy computation for GaussianDistribution (#3684) * Fix how we set logging levels (#3703) * cleanup logging * comments and cleanup * pylint, gym * [skip-ci] Update changelog for logging fix. (#3707) * [skip ci] Update README * [skip ci] Fixed a typo Co-authored-by: Ervin T <[email protected]> Co-authored-by: Adam Streck <[email protected]> Co-authored-by: Chris Elion <[email protected]> Co-authored-by: Jonathan Harper <[email protected]>
1 parent 7507a5d commit 377a25c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+320
-147
lines changed

.pylintrc

+2
Original file line numberDiff line numberDiff line change
@@ -44,3 +44,5 @@ disable =
4444
# Appears to be https://github.com/PyCQA/pylint/issues/2981
4545
W0201,
4646

47+
# Using the global statement
48+
W0603,

Dockerfile

+4-2
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,9 @@ COPY ml-agents /ml-agents
132132
WORKDIR /ml-agents
133133
RUN pip install -e .
134134

135-
# port 5005 is the port used in in Editor training.
136-
EXPOSE 5005
135+
# Port 5004 is the port used in in Editor training.
136+
# Environments will start from port 5005,
137+
# so allow enough ports for several environments.
138+
EXPOSE 5004-5050
137139

138140
ENTRYPOINT ["mlagents-learn"]

Project/Assets/ML-Agents/Examples/Crawler/Prefabs/FixedPlatform.prefab

+8-5
Original file line numberDiff line numberDiff line change
@@ -1690,8 +1690,8 @@ MonoBehaviour:
16901690
m_InferenceDevice: 0
16911691
m_BehaviorType: 0
16921692
m_BehaviorName: CrawlerStatic
1693-
m_TeamID: 0
1694-
m_useChildSensors: 1
1693+
TeamId: 0
1694+
m_UseChildSensors: 1
16951695
--- !u!114 &114230237520033992
16961696
MonoBehaviour:
16971697
m_ObjectHideFlags: 0
@@ -1704,6 +1704,9 @@ MonoBehaviour:
17041704
m_Script: {fileID: 11500000, guid: 2f37c30a5e8d04117947188818902ef3, type: 3}
17051705
m_Name:
17061706
m_EditorClassIdentifier:
1707+
agentParameters:
1708+
maxStep: 0
1709+
hasUpgradedFromAgentParameters: 1
17071710
maxStep: 5000
17081711
target: {fileID: 4749909135913778}
17091712
ground: {fileID: 4856650706546504}
@@ -1759,7 +1762,7 @@ MonoBehaviour:
17591762
m_Name:
17601763
m_EditorClassIdentifier:
17611764
DecisionPeriod: 5
1762-
RepeatAction: 0
1765+
TakeActionsBetweenDecisions: 0
17631766
offsetStep: 0
17641767
--- !u!1 &1492926997393242
17651768
GameObject:
@@ -2959,8 +2962,8 @@ Transform:
29592962
m_PrefabAsset: {fileID: 0}
29602963
m_GameObject: {fileID: 1995322274649904}
29612964
m_LocalRotation: {x: 0, y: -0, z: -0, w: 1}
2962-
m_LocalPosition: {x: -0, y: 0.5, z: 0}
2963-
m_LocalScale: {x: 0.01, y: 0.01, z: 0.01}
2965+
m_LocalPosition: {x: -0, y: 1.5, z: 0}
2966+
m_LocalScale: {x: 0.01, y: 0.03, z: 0.01}
29642967
m_Children: []
29652968
m_Father: {fileID: 4924174722017668}
29662969
m_RootOrder: 1

README.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ developer communities.
4444
* Train using concurrent Unity environment instances
4545

4646
## Releases & Documentation
47-
**Our latest, stable release is 0.15.0. Click
47+
**Our latest, stable release is 0.15.1. Click
4848
[here](docs/Readme.md) to
4949
get started with the latest release of ML-Agents.**
5050

@@ -61,6 +61,7 @@ details of the changes between versions.
6161

6262
| **Version** | **Release Date** | **Source** | **Documentation** | **Download** |
6363
|:-------:|:------:|:-------------:|:-------:|:------------:|
64+
| **0.15.0** | March 18, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.15.0) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.15.0/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.15.0.zip) |
6465
| **0.14.1** | February 26, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.14.1) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.14.1/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.14.1.zip) |
6566
| **0.14.0** | February 13, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.14.0) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.14.0/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.14.0.zip) |
6667
| **0.13.1** | January 21, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/0.13.1) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/0.13.1/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/0.13.1.zip) |

com.unity.ml-agents/CHANGELOG.md

+11
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,17 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
55
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
66

77

8+
## [0.15.1-preview] - 2020-03-30
9+
### Bug Fixes
10+
- Raise the wall in CrawlerStatic scene to prevent Agent from falling off. (#3650)
11+
- Fixed an issue where specifying `vis_encode_type` was required only for SAC. (#3677)
12+
- Fixed the reported entropy values for continuous actions (#3684)
13+
- Fixed an issue where switching models using `SetModel()` during training would use an excessive amount of memory. (#3664)
14+
- Environment subprocesses now close immediately on timeout or wrong API version. (#3679)
15+
- Fixed an issue in the gym wrapper that would raise an exception if an Agent called EndEpisode multiple times in the same step. (#3700)
16+
- Fixed an issue where logging output was not visible; logging levels are now set consistently (#3703).
17+
18+
819
## [0.15.0-preview] - 2020-03-18
920
### Major Changes
1021
- `Agent.CollectObservations` now takes a VectorSensor argument. (#3352, #3389)

com.unity.ml-agents/Runtime/Academy.cs

+1-1
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ public class Academy : IDisposable
6464
/// Unity package version of com.unity.ml-agents.
6565
/// This must match the version string in package.json and is checked in a unit test.
6666
/// </summary>
67-
internal const string k_PackageVersion = "0.15.0-preview";
67+
internal const string k_PackageVersion = "0.15.1-preview";
6868

6969
const int k_EditorTrainingPort = 5004;
7070

com.unity.ml-agents/Runtime/Agent.cs

+2-1
Original file line numberDiff line numberDiff line change
@@ -315,6 +315,7 @@ protected virtual void OnDisable()
315315

316316
void NotifyAgentDone(DoneReason doneReason)
317317
{
318+
m_Info.episodeId = m_EpisodeId;
318319
m_Info.reward = m_Reward;
319320
m_Info.done = true;
320321
m_Info.maxStepReached = doneReason == DoneReason.MaxStepReached;
@@ -376,7 +377,7 @@ public void SetModel(
376377
// If everything is the same, don't make any changes.
377378
return;
378379
}
379-
380+
NotifyAgentDone(DoneReason.Disabled);
380381
m_PolicyFactory.model = model;
381382
m_PolicyFactory.inferenceDevice = inferenceDevice;
382383
m_PolicyFactory.behaviorName = behaviorName;

com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs

+12-5
Original file line numberDiff line numberDiff line change
@@ -458,13 +458,20 @@ UnityRLInitializationOutputProto GetTempUnityRlInitializationOutput()
458458
{
459459
if (m_CurrentUnityRlOutput.AgentInfos.ContainsKey(behaviorName))
460460
{
461-
if (output == null)
461+
if (m_CurrentUnityRlOutput.AgentInfos[behaviorName].CalculateSize() > 0)
462462
{
463-
output = new UnityRLInitializationOutputProto();
464-
}
463+
// Only send the BrainParameters if there is a non empty list of
464+
// AgentInfos ready to be sent.
465+
// This is to ensure that The Python side will always have a first
466+
// observation when receiving the BrainParameters
467+
if (output == null)
468+
{
469+
output = new UnityRLInitializationOutputProto();
470+
}
465471

466-
var brainParameters = m_UnsentBrainKeys[behaviorName];
467-
output.BrainParameters.Add(brainParameters.ToProto(behaviorName, true));
472+
var brainParameters = m_UnsentBrainKeys[behaviorName];
473+
output.BrainParameters.Add(brainParameters.ToProto(behaviorName, true));
474+
}
468475
}
469476
}
470477

com.unity.ml-agents/Runtime/Policies/HeuristicPolicy.cs

+4-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,10 @@ public HeuristicPolicy(Func<float[]> heuristic)
2929
public void RequestDecision(AgentInfo info, List<ISensor> sensors)
3030
{
3131
StepSensors(sensors);
32-
m_LastDecision = m_Heuristic.Invoke();
32+
if (!info.done)
33+
{
34+
m_LastDecision = m_Heuristic.Invoke();
35+
}
3336
}
3437

3538
/// <inheritdoc />

com.unity.ml-agents/package.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"name": "com.unity.ml-agents",
33
"displayName": "ML Agents",
4-
"version": "0.15.0-preview",
4+
"version": "0.15.1-preview",
55
"unity": "2018.4",
66
"description": "Add interactivity to your game with Machine Learning Agents trained using Deep Reinforcement Learning.",
77
"dependencies": {

docs/Migrating.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ The versions can be found in
3434
* The interface for SideChannels was changed:
3535
* In C#, `OnMessageReceived` now takes a `IncomingMessage` argument, and `QueueMessageToSend` takes an `OutgoingMessage` argument.
3636
* In python, `on_message_received` now takes a `IncomingMessage` argument, and `queue_message_to_send` takes an `OutgoingMessage` argument.
37+
* Automatic stepping for Academy is now controlled from the AutomaticSteppingEnabled property.
3738

3839
### Steps to Migrate
3940
* Add the `using MLAgents.Sensors;` in addition to `using MLAgents;` on top of your Agent's script.
@@ -45,11 +46,12 @@ The versions can be found in
4546
* We strongly recommend replacing the following methods with their new equivalent as they will be removed in a later release:
4647
* `InitializeAgent()` to `Initialize()`
4748
* `AgentAction()` to `OnActionReceived()`
48-
* `AgentReset()` to `OnEpsiodeBegin()`
49+
* `AgentReset()` to `OnEpisodeBegin()`
4950
* `Done()` to `EndEpisode()`
5051
* `GiveModel()` to `SetModel()`
5152
* Replace `IFloatProperties` variables with `FloatPropertiesChannel` variables.
5253
* If you implemented custom `SideChannels`, update the signatures of your methods, and add your data to the `OutgoingMessage` or read it from the `IncomingMessage`.
54+
* Replace calls to Academy.EnableAutomaticStepping()/DisableAutomaticStepping() with Academy.AutomaticSteppingEnabled = true/false.
5355

5456
## Migrating from 0.13 to 0.14
5557

gym-unity/gym_unity/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.15.0"
1+
__version__ = "0.15.1"

gym-unity/gym_unity/envs/__init__.py

+21-11
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
import logging
21
import itertools
32
import numpy as np
43
from typing import Any, Dict, List, Optional, Tuple, Union
@@ -8,6 +7,7 @@
87

98
from mlagents_envs.environment import UnityEnvironment
109
from mlagents_envs.base_env import BatchedStepResult
10+
from mlagents_envs import logging_util
1111

1212

1313
class UnityGymException(error.Error):
@@ -18,9 +18,8 @@ class UnityGymException(error.Error):
1818
pass
1919

2020

21-
logging.basicConfig(level=logging.INFO)
22-
logger = logging.getLogger("gym_unity")
23-
21+
logger = logging_util.get_logger(__name__)
22+
logging_util.set_log_level(logging_util.INFO)
2423

2524
GymSingleStepResult = Tuple[np.ndarray, float, bool, Dict]
2625
GymMultiStepResult = Tuple[List[np.ndarray], List[float], List[bool], Dict]
@@ -364,9 +363,8 @@ def _check_agents(self, n_agents: int) -> None:
364363

365364
def _sanitize_info(self, step_result: BatchedStepResult) -> BatchedStepResult:
366365
n_extra_agents = step_result.n_agents() - self._n_agents
367-
if n_extra_agents < 0 or n_extra_agents > self._n_agents:
366+
if n_extra_agents < 0:
368367
# In this case, some Agents did not request a decision when expected
369-
# or too many requested a decision
370368
raise UnityGymException(
371369
"The number of agents in the scene does not match the expected number."
372370
)
@@ -386,6 +384,10 @@ def _sanitize_info(self, step_result: BatchedStepResult) -> BatchedStepResult:
386384
# only cares about the ordering.
387385
for index, agent_id in enumerate(step_result.agent_id):
388386
if not self._previous_step_result.contains_agent(agent_id):
387+
if step_result.done[index]:
388+
# If the Agent is already done (e.g. it ended its epsiode twice in one step)
389+
# Don't try to register it here.
390+
continue
389391
# Register this agent, and get the reward of the previous agent that
390392
# was in its index, so that we can return it to the gym.
391393
last_reward = self.agent_mapper.register_new_agent_id(agent_id)
@@ -528,8 +530,12 @@ def mark_agent_done(self, agent_id: int, reward: float) -> None:
528530
"""
529531
Declare the agent done with the corresponding final reward.
530532
"""
531-
gym_index = self._agent_id_to_gym_index.pop(agent_id)
532-
self._done_agents_index_to_last_reward[gym_index] = reward
533+
if agent_id in self._agent_id_to_gym_index:
534+
gym_index = self._agent_id_to_gym_index.pop(agent_id)
535+
self._done_agents_index_to_last_reward[gym_index] = reward
536+
else:
537+
# Agent was never registered in the first place (e.g. EndEpisode called multiple times)
538+
pass
533539

534540
def register_new_agent_id(self, agent_id: int) -> float:
535541
"""
@@ -581,9 +587,13 @@ def set_initial_agents(self, agent_ids: List[int]) -> None:
581587
self._gym_id_order = list(agent_ids)
582588

583589
def mark_agent_done(self, agent_id: int, reward: float) -> None:
584-
gym_index = self._gym_id_order.index(agent_id)
585-
self._done_agents_index_to_last_reward[gym_index] = reward
586-
self._gym_id_order[gym_index] = -1
590+
try:
591+
gym_index = self._gym_id_order.index(agent_id)
592+
self._done_agents_index_to_last_reward[gym_index] = reward
593+
self._gym_id_order[gym_index] = -1
594+
except ValueError:
595+
# Agent was never registered in the first place (e.g. EndEpisode called multiple times)
596+
pass
587597

588598
def register_new_agent_id(self, agent_id: int) -> float:
589599
original_index = self._gym_id_order.index(-1)

gym-unity/gym_unity/tests/test_gym.py

+48
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,50 @@ def test_sanitize_action_one_agent_done(mock_env):
129129
assert expected_agent_id == agent_id
130130

131131

132+
@mock.patch("gym_unity.envs.UnityEnvironment")
133+
def test_sanitize_action_new_agent_done(mock_env):
134+
mock_spec = create_mock_group_spec(
135+
vector_action_space_type="discrete", vector_action_space_size=[2, 2, 3]
136+
)
137+
mock_step = create_mock_vector_step_result(num_agents=3)
138+
mock_step.agent_id = np.array(range(5))
139+
setup_mock_unityenvironment(mock_env, mock_spec, mock_step)
140+
env = UnityEnv(" ", use_visual=False, multiagent=True)
141+
142+
received_step_result = create_mock_vector_step_result(num_agents=7)
143+
received_step_result.agent_id = np.array(range(7))
144+
# agent #3 (id = 2) is Done
145+
# so is the "new" agent (id = 5)
146+
done = [False] * 7
147+
done[2] = True
148+
done[5] = True
149+
received_step_result.done = np.array(done)
150+
sanitized_result = env._sanitize_info(received_step_result)
151+
for expected_agent_id, agent_id in zip([0, 1, 6, 3, 4], sanitized_result.agent_id):
152+
assert expected_agent_id == agent_id
153+
154+
155+
@mock.patch("gym_unity.envs.UnityEnvironment")
156+
def test_sanitize_action_single_agent_multiple_done(mock_env):
157+
mock_spec = create_mock_group_spec(
158+
vector_action_space_type="discrete", vector_action_space_size=[2, 2, 3]
159+
)
160+
mock_step = create_mock_vector_step_result(num_agents=1)
161+
mock_step.agent_id = np.array(range(1))
162+
setup_mock_unityenvironment(mock_env, mock_spec, mock_step)
163+
env = UnityEnv(" ", use_visual=False, multiagent=False)
164+
165+
received_step_result = create_mock_vector_step_result(num_agents=3)
166+
received_step_result.agent_id = np.array(range(3))
167+
# original agent (id = 0) is Done
168+
# so is the "new" agent (id = 1)
169+
done = [True, True, False]
170+
received_step_result.done = np.array(done)
171+
sanitized_result = env._sanitize_info(received_step_result)
172+
for expected_agent_id, agent_id in zip([2], sanitized_result.agent_id):
173+
assert expected_agent_id == agent_id
174+
175+
132176
# Helper methods
133177

134178

@@ -200,6 +244,10 @@ def test_agent_id_index_mapper(mapper_cls):
200244
mapper.mark_agent_done(1001, 42.0)
201245
mapper.mark_agent_done(1004, 1337.0)
202246

247+
# Make sure we can handle an unknown agent id being marked done.
248+
# This can happen when an agent ends an episode on the same step it starts.
249+
mapper.mark_agent_done(9999, -1.0)
250+
203251
# Now add new agents, and get the rewards of the agent they replaced.
204252
old_reward1 = mapper.register_new_agent_id(2001)
205253
old_reward2 = mapper.register_new_agent_id(2002)
+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.15.0"
1+
__version__ = "0.15.1"

0 commit comments

Comments
 (0)