The Pyramid environment

The goal in this environment is to train our agent to get the gold brick on the top of the Pyramid. To do that, it needs to press a button to spawn a Pyramid, navigate to the Pyramid, knock it over, and move to the gold brick at the top.

The reward function

The reward function is:

In terms of code, it looks like this

To train this new agent that seeks that button and then the Pyramid to destroy, we’ll use a combination of two types of rewards:

The extrinsic one given by the environment (illustration above).
But also an intrinsic one called curiosity. This second will push our agent to be curious, or in other terms, to better explore its environment.

If you want to know more about curiosity, the next section (optional) will explain the basics.

The observation space

In terms of observation, we use 148 raycasts that can each detect objects (switch, bricks, golden brick, and walls.)

We also use a boolean variable indicating the switch state (did we turn on or off the switch to spawn the Pyramid) and a vector that contains the agent’s speed.

The action space

The action space is discrete with four possible actions:

< > Update on GitHub

Deep RL Course

The Pyramid environment

The reward function

The observation space

The action space