blinker wrapper can wrap any Gym environment, adding an additional, parallel
observe costs a configurable amount of reward (i.e. it produces a negative reward), but is required to obtain a fresh observation. If
observe is not chosen, the observation will remain stale. This forces agents to choose the best times to observe, and to avoid observation if they can predict the relevant world state.
render(human=true) method will show a visual indication when an observation is being made.