Most of the current Reinforcement-Learning solutions, although having real-world-inspired scenarios, focus on a direct space-action-reward mapping between the agent's actions and the environment’s state. That translates to agents that can adapt to dynamic scenarios, but, when applied to competitive and/or cooperative cases, fail to assess and deal with the impact of their opponents. In most cases, when these agents choose an action, they do not take into consideration how other agents can affect the state of the scenario. In competitive scenarios, the agents have to learn decisions that a) maximize their chances of winning the game, and b) minimize their adversaries' goals, while in cooperative scenarios b) is inverted. Besides dealing with complex scenarios, such solutions would have to deal with the dynamics between the agents themselves. In this regard, social reinforcement learning is still behind the mainstream applications and demonstrations of the last years.
We recently introduced a card game scenario for reinforcement learning, named Chef’s Hat, which contains specific mechanics that allow complex dynamics between the players to be used in the development of a winning game strategy. A card game scenario allows us to have a naturally-constrained environment and yet obtain responses that are the same as the real-world counterpart application. Chef’s Hat implements game and interaction mechanics that make it easier to be transferred between the real-world scenario and the virtual environment.
Our challenge will be based on Chef’s Hat and will be separated into two tracks: a competitive and a cooperative scenario. In the first track, the participants will use the already available simulation environment to develop the most effective agents to play the Chef’s Hat card game and be the winner. In the second track, they will have to develop an agent that can increase the chances of a dummy agent winning the game.
Challenge Organization
Each participant has to produce up to five agents that have learned how to play the game. For each track, the winner will be chosen based on the track’s specific goal. For both tracks, each competitor will pass through the following process:
- Validation: Each participant’s agent has to pass the baseline test. It will play a single 15 points game against three baseline competitors. The agent has to win the game to be eligible for the next step.
- Track 1: The agents who pass the validation step will be organized in a competition. Brackets of 4 players will be randomly drawn and separated into a competition cup scenario. For each bracket, the two best agents will pass to the next phase. The agent who finishes the championship in the first position will be crowned the winner of the first track.
- Track 2: The agents who pass the validation step will be organized in a competition. Brackets of 4 players, composed of 2 competitors agents and 2 dummy agents will be randomly drawn and separated into a competition cup scenario. Each competitor agent will be associated with one dummy agent. The two best players of each bracket will advance to the next competition phase. The agent who manages to reach the furthest, together with its associated dummy agent, will be crowned the winner of track 2.
To participate on the compeition: https://www.whisperproject.eu/chefshat#competition