TY - GEN
T1 - Learning Cooperative Behaviours in Adversarial Multi-agent Systems
AU - Wang, Ni
AU - Das, Gautham
AU - Millard, Alan Gregory
PY - 2022/9/1
Y1 - 2022/9/1
N2 - This work extends an existing virtual multi-agent platform called RoboSumo to create TripleSumo—a platform for investigating multi-agent cooperative behaviors in continuous action spaces, with physical contact in an adversarial environment. In this paper we investigate a scenario in which two agents, namely ‘Bug’ and ‘Ant’, must team up and push another agent ‘Spider’ out of the arena. To tackle this goal, the newly added agent ‘Bug’ is trained during an ongoing match between ‘Ant’ and ‘Spider’. ‘Bug’ must develop awareness of the other agents’ actions, infer the strategy of both sides, and eventually learn an action policy to cooperate. The reinforcement learning algorithm Deep Deterministic Policy Gradient (DDPG) is implemented with a hybrid reward structure combining dense and sparse rewards. The cooperative behavior is quantitatively evaluated by the mean probability of winning the match and mean number of steps needed to win.
AB - This work extends an existing virtual multi-agent platform called RoboSumo to create TripleSumo—a platform for investigating multi-agent cooperative behaviors in continuous action spaces, with physical contact in an adversarial environment. In this paper we investigate a scenario in which two agents, namely ‘Bug’ and ‘Ant’, must team up and push another agent ‘Spider’ out of the arena. To tackle this goal, the newly added agent ‘Bug’ is trained during an ongoing match between ‘Ant’ and ‘Spider’. ‘Bug’ must develop awareness of the other agents’ actions, infer the strategy of both sides, and eventually learn an action policy to cooperate. The reinforcement learning algorithm Deep Deterministic Policy Gradient (DDPG) is implemented with a hybrid reward structure combining dense and sparse rewards. The cooperative behavior is quantitatively evaluated by the mean probability of winning the match and mean number of steps needed to win.
U2 - 10.1007/978-3-031-15908-4_15
DO - 10.1007/978-3-031-15908-4_15
M3 - Conference contribution
SN - 9783031159077
T3 - Lecture Notes in Computer Science (LNCS)
SP - 179
EP - 189
BT - Towards Autonomous Robotic Systems
PB - Springer
T2 - 23rd Annual Conference Towards Autonomous Robotic Systems, TAROS 2022
Y2 - 7 September 2022 through 9 September 2022
ER -