After 50,000 hours: Artificial intelligence is supposed to solve humanity’s problems – but fails because of Pokémon Red

0
216

Few things are as nostalgically glorified in pop culture as the Gameboy games with the pocket monsters. But what happens when you put artificial intelligence on the Pokémon?

In addition, the technology is (hopeful) for many areas of life, from autonomous driving cars, robots in factories or simply smartphones and home computers.

But can AI also excel in a game of the classic GameBoy game Pokémon Red?

A (YouTuber) with programming skills and a soft spot for the little monsters has tackled this question – with astonishing results.

How does the YouTuber’s Pokémon experiment work?

The AI behind the experiment has played a total of over (50,000 hours) Pokémon Red, guided by software developer and YouTuber Peter Whidden.

How do you train an AI for Pokémon Red? For (Peter Whidden) one of the challenges was to train the artificial intelligence to behave like a human player. According to the YouTuber, the AI’s behavior is very similar to that of a human player

After each action, the AI is supposed to check what is happening on the screen before deciding on the next action – similar to a person playing Pokémon Red with a Gameboy handheld in front of their eyes. To train the AI as quickly as possible, Whidden ran 40 test sessions in parallel.

(The resourceful inventor used Gameboy emulator PyBoy for his experiment. (Joaquin Corbalan/Adobe Stock; Peter Whidden))
(The resourceful inventor used Gameboy emulator PyBoy for his experiment. (Joaquin Corbalan/Adobe Stock; Peter Whidden))

The trick with the reward system: How do you teach an AI how to play a game of Pokémon Red? The solution chosen by the YouTuber: He set up a reward system so that the algorithm pursues the goal of winning a game. Whenever the AI discovers something new in the game, it is rewarded with a reward point

What counts as “new” was measured here with the number of different pixels on the screen. However, this method also had the disadvantage that the character controlled by the AI looked at a water animation instead of continuing to play the game in the sense of a successful completion.

In addition, further reward points have been established, for example for catching Pokémon, winning in the arena or winning a trainer battle.

Whidden has packed his experiment into a 33-minute YouTube video.

What hurdles did the Pokémon experiment face?

When visiting the Pokémon Center, the AI stored some Pokémon. This led to a reduced overall level of the team. This bad experience resulted in the AI avoiding the Pokémon Center from then on. The resulting disadvantage: the team was no longer healed from this point onwards

Whidden says in his YouTube video about the experiment:

“It [the AI] doesn’t have emotions like a human, but a single event with an extremely high reward value can have a lasting effect on its behavior. […] In this case, it is enough for it to lose its Pokémon just once. This develops a negative association with the entire Pokémon Center, causing the AI to avoid it completely in all future games. “

So Whidden had to further adjust its reward system.

By the way: The Pokémon have recently made the leap to the big screen, as the movie trailer below proves.

The AI in kamikaze mode: No less remarkable: the AI’s initial combat behavior. At the beginning, it stormed into every fight – regardless of its chances of victory. The YouTuber therefore introduced a penalty for lost battles

However, what was really curious – but somehow logical – was the AI’s behavior after a lost fight: Because then it lingered on the fight screen, simply not reading it anymore. And that’s because it didn’t want to lose any points. 

The biggest challenge for Whidden was to teach the algorithm the desired behavior in the game step by step. Since in this case there was no large data set as with text or voice AIs, he had to teach the algorithm the behavior leading to the game goal in small pieces.

(TechCrunch) Whidden was delighted with the great success of his YouTube video, which has reached over 4.3 million viewers to date (10.11.2023). He says:

“Seeing how many people are engaging [with the video] gives me a lot of pleasure. “

What the AI did well

But the AI didn’t just cause trouble – sometimes it was even really clever. Then, at a certain point, it used the same route again and again. What seemed pointless at first glance turned out to be clever. The AI made use of a glitch. This means that the first Pokémon it encounters is immediately captured with a single throw.

Can you tweak the YouTuber’s Pokémon experiment?  In addition, Whidden offers some tips in his video on how you can attempt such an experiment yourself.

For example, he used the so-called Proximal Policy Optimization (PPO). At (OpenAI) this learning algorithm is said to deliver “comparable or better performance than current approaches”. PPO is also said to be easy to implement and adjust.

What do you think of the YouTuber’s experiment? And also: Did you find the accompanying video entertaining? Were you flooded with nostalgic feelings at the sight of Pokémon Red, or do you swear by Digimon and turn up your nose suspiciously at the mere mention of Pikachu & Co. Let us know what you think in the comments below