Deep Reinforcement Learning for Navigation in Dynamic Digital Environments

Non-player characters (NPCs) are a crucial component of modern video game development as they directly influence player immersion. A core action of most NPCs is movement, the ability of an agent to take actions in order to navigate from one location in the world to another. The most common approach to navigation is through a Navigation Mesh (NavMesh), a graph representation of the traversable environment. As video game budgets have expanded, so too has the demand for both increased NPC capabilities and world complexity. Unfortunately, complex navigation abilities that extend basic locomotion, e.g. double jumps, wall running, or grapple hooks, greatly increase NavMesh complexity and are intractable for large-scale environments. Additionally, complex environments that utilize dynamic components, such as moving platforms, often require specialized mechanics to ensure optimal, bug-free pathing at runtime and are typically not implemented in production. To increase the potential design space of NPC behavior and level design for game developers in the video games industry, we investigate the application of Deep Reinforcement Learning (DRL) to allow an NPC agent to learn itself how best to apply complex navigation actions in order to navigate both static and dynamic environments. Moreover, to improve the adoption of DRL-based navigation solutions by industry, this research will focus on developing generalized agents, which can perform well across many unseen environments, that can be trained on non-server hardware within reasonable timeframes.


Video Games are the largest entertainment medium in the world valued at over $200 Billion USD gross revenue. In order to maximize revenue, a successful video game product requires both a low development cost and distinctive product features. In the AAA game development space, developing distinctive features such as graphical realism, expansive open-worlds, and accurate physics systems, is an expensive venture due to long development cycles and high hardware requirements. In the Indie game development space however, distinctive features that reduce development time and cost are ideal, such as unique story concepts, inventive game mechanics, and Procedural Content Generation, as they allow for rapid prototyping and shorten development cycles. One feature seldom focused on in either game development space is Artificial Intelligence (AI), an integral system in most games and a core factor in establishing player immersion. Due to AI’s ubiquity in video games, any improvements over the industry standard methods have the potential to increase the range of player experiences whilst also reducing the cost of production if automated to the environment.

 Artificial Intelligence is utilized in video games to enhance player immersion through convincing and interactive non-playable characters (NPCs). NPCs occur in a variety of types (allies, enemies, bystanders, player proxies, etc) and perform many actions (attack, converse, observe, etc), but the majority of them all perform one common action: movement. Movement can be performed in a variety of ways (walking, flying, swimming, etc), but most games implement movement with basic Navigation Mesh methodologies using a typical A* pathfinding algorithm. While this standard handles basic ground movement well, it is unsuitable for non-typical movement patterns such as jumping, dashes, wall-jumps, grapple hooks, etc without utilizing either edge nodes to connect non-adjacent meshes together or sophisticated mathematics to determine possible trajectories, the former results in unnatural movement patterns whilst the latter is prone to resulting in off-mesh agents. Both approaches increase graph complexity and either scale poorly if automated for large environments or if manually handled, require a large number of man-hours to place and tune for desired behavior. 

Additionally, NavMesh methods are typically not used for Dynamic environments, which have features that either affect or react to player actions, as they tend to require custom solutions to update the optimal path or result in off-mesh agents frequently. As such, game designers typically avoid pathing NPCs into dynamic elements or create environments that do not have such features. By finding alternative methods to NavMesh navigation, game designers gain access to a larger design space for NPC behavior and level design as they are no longer constrained to the limitations of the industry standard, thereby allowing for the creation of new experiences for players and distinctive features for new video game products. 



The core focus of this project is the use of Deep Reinforcement Learning algorithms to train agents to navigate a 3D virtual environment created by the Unity game engine. By taking actions in the environment, the agent can learn how to best utilize the actions at its disposal to optimize its pathing to the goal position.

The agent can observe its environment in a variety of ways:

  • Vector Observations
    • Agent position relative to goal
    • Agent distance to goal
    • Agent Velocity
    • Agent Acceleration
    • Last actions taken by agent
  • Basic Raycasting
  • Depth Maps
    • cast rays in front of agent to observe obstacles, represented as grayscale texture
  • 3D Occupancy Maps
    • Create voxel representation of world via boxcasting
    • Each voxel has integer value representing contained objects
    • Agent samples local region each frame


Each of these observation types require a different type of observation encoding for the neural network to learn from such as Linear layers, 2D Convolutional Neural Network (CNN), and 3D CNNs.


The algorithms explored in this work are:

  • Soft Actor Critic (SAC)
  • Proximal Policy Optimization (PPO)


Research Methodology

  • Ablative Studies
    • Compare performance and results when adding/removing algorithmic and/or environmental components
  • Hyperparameter Optimization
    • Sweep through hyperparameters to determine dependence of performance on hyperparameter values.
  • Trajectory/Position Heat mapping
    • Use heat mapping to understand the pathing of agents during training. Useful for comparing pathing before and after optimization. Can also be extended for action selection to understand agent decision making during navigation. By analyzing notable trajectories, insights into agent decision making and the influence of world topology can be gained.