This project is my submission for the Lux AI 2021 code competition hosted on Kaggle. This post aims to explain major technical decisions.
To keep the agent easy to expand, debug and tune, I followed a set of key principles dictated by the round-based nature of the game:
Robustness: The agent must yield similar actions when reading similar game situations. There is no state-transition mechanism between subsequent rounds, and robust bootstrapping of the play from the game observation enables continuity in workers' actions.
Parameterization: As much of the agent's behaviour as possible should be parameterized. This makes it possible to tune the agent and design features for a subsequent ML-driven approach.
Configurability: It should be possible to switch the agent between high-level presets.
Throughout the project, the agent went through 3 major iterations:
In the first agent, all units follow the same hardcoded decision tree, deciding independently based on their situation. Every round, units submit a score for all of their possible actions, signalling their preference to the agent. The agent then finds the best collision-free solution, giving each worker an action as highly scored as possible.
In later versions, the agent could skew the units' decisions by adding a bonus to some actions.
A hardcoded routine is hard to bootstrap. Being essentially clones, the first few units all want to go the same way. It gets better with growing entropy in the game. Tuning the agent for a better kick-start leads to poor decisions later, so I solved this by overriding the decision tree with a custom bootstrap routine for the first few rounds.
Collisions can be caused by the opponent. Also, one collision can propagate over a chain of units — one can't move where another was supposed to make way.
The agent would often benefit if the units could cooperate. This solution relies on each unit to figure that out independently, which is unreliable.
This agent couples the tasks of understanding the situation, identifying the opportunities and prioritizing the options into one routine — a design that reached its limits with the inability to orchestrate a multi-unit mission.
From there, I started over with a new, modular agent.
This agent is designed around operations — units of work to be performed by a defined group of units. The main workflow is as follows:
Identifying operations: Specify how many units are needed, where, and what are the requirements (e.g. carried resources).
Scoring operations: Score each operation by its contribution to the game.
Assigning workers: Split the pool of available workers into groups executing the operations.
The agent identifies possible operations by matching known patterns to the observation of the game. This is moderated by the high-level strategy preset of the agent. The implemented operations are as follows:
Develop: Assigned to workers at or near resources that have been already researched. Mine sustainably and build cities so that they survive the night. This is the first operation of the game.
Conquer: Go to an unoccupied resource cluster and establish the first city tile there.
Attack: Go to a cluster controlled by the opponent and try to penetrate its protection. This operation is surprisingly likely to succeed. Some parts of the opponent safeguards (city tiles or units) usually fall during the night, and then it's just race.
Mine: Mine the available resources as aggressively as possible. Usually triggered when a group of workers shares a resource cluster with the opponent.
Protect: Keep the opponent locked out of a resource cluster.
Refuel: Go to a cluster resource that can provide high-density fuel and carry it back to a city that's low on fuel before it dies.
The score for each operation is the ratio of the newly generated or saved city tiles over the number of days it will take to complete.
The full score applies when the operation is executed with the workers as specified. On top of that, the algorithm embeds into the operation a set of variants with reduced score and relaxed conditions of execution. That takes into account the following considerations:
Some operations only start to make sense with a specific number of participating workers (Attack, Protect), some are fine with any number of units participating (Mine).
Some operations need to assume certain tiles, some are fine with general direction.
Partial success is acceptable in some cases (Conquer needs just one unit to make it, but the more the better), and unacceptable in others (Refuel, Protect).
This final step in the pipeline reduces the set of possible operations into the realistic subset that's achievable with available resources.
The objective when assigning workers to the operations is to maximize the cumulative score of completed operations while minimizing the risk and impact of a chain of collisions.
The agent explores the space of possible solutions, combining feasible variants of operations and subsets of unassigned workers. The heuristic works roughly as follows:
The final value of a solution is roughly the sum of included operations' values.
The value of an operation is roughly the product of its score, the probability of success and of the acceptability of partial success.
The more promising a sub-solution looks, the deeper detail the heuristic goes into when exploring the rest of the space.
The value of a worker sometimes depends on the operation it's being evaluated for. Releasing a valuable worker by sacrificing an operation is likely to waste its potential, so it pays off to try that combination with as many solutions as possible first.
The probability of operation success in general decreases with
increased number of opponent units in the surrounding area and
the number of own units that have already tried to move.
In the end, the heuristic adds a bonus for unassigned workers that can still perform some useful action outside existing operations to each solution .
This agent solves all combinations of available operations, using all free units.
On the smallest plans (12x12) and in the busier games (where there's fewer operations) this is good enough to identify the operations and assign the workers somewhat correctly.
In larger games with scattered centers of activity, there are too many combinations to handle at once.
The next iteration of the agent splits the game plan into logical sections based on game observation and treat them independently.
This agent is still based on the original set of operations, but lets isolated dense areas — clusters — manage themselves.
Clusters are identified as areas of city tiles and resources separated by more than two empty tiles.
Clusters are the exclusive agent on their domain — no one can act within their perimeter and they can't act outside.
Clusters execute the Develop, Mine and Protect operations.
Besides clusters, there's the open space — tiles that don't belong to any cluster.
The agent takes care of the open-space operations Conquer, Attack and Refuel.
The major benefit is that the agent no longer scores and assigns workers to all operations as a whole — they are now separated into many independent pipelines.
As clusters don't act outside their perimeter, they need a mechanism for signalling the need for outer intervention. The same is true the other way around, namely when the agent needs the cluster to export units to the open space.
Cluster-agent communication includes the need for refuel.
Agent-cluster communication includes
units and resources export count and preferred coordinates,
units production intensity requirements and
hints towards the best direction of development.
The final agent operates as follows:
Reconnaissance: The agent identifies clusters, free space and free units.
Cluster-agent communication: The newly instantiated clusters observe their state and signal appropriately to the agent.
Open space: Executing the Conquer, Attack and Refuel operations.
Agent-cluster communication. Based on the observed shortage of units for the open-space operations and other clusters' flags, the agent signals to the clusters.
Clusters: Executing the operations Develop, Mine and Protect.
I believe that the fundamental part is carefully combining, abstracting and executing the tons of very low-level decisions.
Python is cool. 3 months is not enough.
Also, the top agents blew my mind.