The Quest for Artificial General Intelligence

Where we are, and where we’re going

Abhav Kedia

Published in

Towards Data Science

11 min readJun 18, 2020

“Artificial Intelligence began with an ancient wish to forge the Gods.” -Pamela McCorduck

Artificial General Intelligence (AGI) refers to the ability of artificial agents/programs to display human-level proficiency in reasoning about and performing tasks in their environment. AGI has long been a mainstay of science fiction movies and books, famously embodied in likeable characters such as Tony Stark’s assistant JARVIS from the Iron Man series and the humanoid robot C3PO in Star Wars.

Crucially, AGI is also considered the holy grail for most AI research today. In a sense, the formulation of most reinforcement learning problems — problems that require an agent to interact with the environment to maximize reward— is a subclass of the larger problem of general intelligence. Instead of focusing on precisely defining AGI, I will leave it at the reader’s intuition that it is a system that displays ‘human-like’ intellectual capacity.

Billions of dollars are being spent on AGI research today. For example, Elon-Musk-backed OpenAI’s mission is to achieve or facilitate the development of AGI (and also ensure that it is responsibly used). Similarly Google’s DeepMind and the Human Brain Project are also working towards similar goals.

Where do we stand today?

Today, we have the tools to create AI systems that display remarkable levels of understanding. Let’s take a look at some existing programs that display almost human level reasoning, performance or control — on specific tasks. We will look at this from progress in three different domains — Natural Language Processing, Reinforcement Learning and Autonomous Vehicles.

NLP and Chatbots

Natural Language Processing is the analysis and understanding of human languages for tasks like translation and sentiment classification. One application of Natural Language Processing is the creation of chatbots. In 2016, Microsoft revealed an AI Bot called Tay (Thinking About You). Although Tay made headlines for all the wrong reasons and had to be taken offline 16 hours after its release, its ability to tweet about a wide variety of topics appeared to be impressive.

Take the above tweet for example. Tay was trained to answer questions based on other examples of tweets discussing similar topics. In particular, to reply to this tweet it was able to identify the tweet as asking for an opinion, and then generated an aggregated answer based on other people’s replies to similar questions. Tay displays good textual analysis, paraphrasing and the ability to form meaningful sentences, but not exactly generalized intelligence. A real AGI system could reasonably come up with a reply all by itself, without paraphrasing other people’s opinions.

To imagine a true AGI system, think about the kind of reasoning the bot would have to perform to come up with such a reply by itself (even assuming it was aware of the ongoing meme comparing Ted Cruz to the Zodiac killer). First, in addition to knowing that it’s being asked for an opinion, it would have to know about the two entities that are being compared — Ted Cruz and the Zodiac killer. Second, it would have to tie in its opinion of Ted Cruz with the actions of the Zodiac killer AND frame it in a humorous way by first disagreeing with the premise and then amplifying it. We’re not quite there yet.

Nevertheless, since 2016 NLP has made substantial progress with the release of large transformer-based models like BERT and GPT-2 in 2018 that were able to handily surpass the then state-of-the-art models. These models were created by feeding a program hundreds of thousands of Wikipedia and other publicly available text documents to “understand” the English language. This is akin to teaching a child to read by giving him knowledge of the meaning of words in their contexts. Once proficient at this, the child can then use this knowledge to reason about new questions and answer them appropriately.

Famously, the GPT-2 model demonstrated an impressive ability to create near-flawless text based on any context that it was supplied with. For example, it was given the following prompt:

In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

and here is an extract of GPT-2’s output (the full text can be found here).

The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science. Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved. Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions, were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow. Pérez and the others then ventured further into the valley. “By the time we reached the top of one peak, the water looked blue, with some crystals on top,” said Pérez.

This model works by consuming the input prompt, and then generating output text one word at a time. Given this sequential generation process, the generated text shown above is surprisingly coherent, both syntactically (no grammatical or punctuation errors) and semantically (no non-sensical statements). It also contains multiple references to system-generated characters like Pérez and his companions.

NLP is still a rapidly evolving field, with new enhancements almost every day.

RL and game-playing agents

Reinforcement Learning is a sub-field of AI concerned with the design and analysis of agents that act in their environments to maximize a notion of reward (e.g. score in a game). In the field of Reinforcement Learning too, there has been substantial progress towards intelligent systems. Since the 2013 demonstration[1] for playing ATARI games using neural networks, reinforcement learning techniques have evolved rapidly.

In 2019, OpenAI showcased a bot called OpenAI Five that controlled five agents in a game of Dota 2. This is a MOBA (Multiplayer Online Battle Arena) game that pits two teams of 5 people against each other in a contest to destroy the other team’s base. Over the course of one approximately 30-minute game the players make hundreds of short and long term decisions including skill and item choices that influence other players and may have delayed consequences. The bot was able to defeat Dota 2 world champions OG 2–0 in a publicly broadcast game. To test the robustness of the agent, it was also released publicly for a while and played about 7000 games with the community, achieving a 99.4% win rate.

This agent was trained almost entirely through self-play. Crucially, the focus of this algorithm was not on mechanical advantages that an AI system typically has over humans like faster reaction time — the algorithm was even artificially penalized for highly mechanical objectives like last hitting and denying. Instead, the training rewards were geared towards a general understanding of the state of the game including things like when it should pick a fight and its current strength relative to its opponents.

In mid-2018, OpenAI showcased a robotic hand that was trained entirely in simulation to manipulate objects into various orientations.

Learning Dexterity through simulation [Source: OpenAI blog]

Although the tasks look fairly simple, the key thing it achieved was the ability to perform well in novel situations despite never being trained specifically to behave in those situations. This was achieved through a training technique called Domain Randomization that allowed the system to identify key features of the environment, helping it generalize to new situations.

One of the systems I personally found most impressive among these was multi-agent hide-and-seek, released in September 2019. This was a simulated series of games between two independently evolving teams of agents, one tasked to seek and the other to hide. They were trained over almost 500 million episodes (independent runs of hide-and-seek) and learned to use a variety of fixed and movable tools like walls, ramps and boxes. Here’s a video summarizing the evolution of the agents.

Multi-Agent hide and seek

It is very fascinating to see the two sets of adversaries improve over time, continuously evolving new strategies to counteract their opponents’ progress. For example, when the hiders learned to use a room and sealed the entrance using boxes, the seekers learned to use a ramp to ‘jump’ over the wall to find them. In response, the hiders learned to take the ramp away by bringing it along with them to their room before blockading the entrance.

Autonomous Vehicles

The creation of self-driving cars is arguably the most mainstream application of Artificial Intelligence. Autonomous driving requires the agent to have a range of diverse capabilities, such as computer vision and strong object classification, along with continuous assessment of its situation and interaction with its environment (similar to reinforcement learning).

Big industry players like Uber, Tesla and Google-backed Waymo are actively working towards building self-driving cars, and some of these companies have begun testing almost-fully automated vehicles in controlled and extensively-mapped environments. The front runners in this race have already achieved what is called Level 4 automation (the car can operate without human input or oversight but only under select conditions defined by factors such as road type or geographic area) in such environments. But the jump to the final Level 5 — which is intuitively the stage where the car is able to operate in any condition or geography, fully autonomously — is some way off still.

I suspect that unless all cars on the road are replaced with driverless cars (and so are always aware of and know what other cars are doing), it will be difficult to achieve Level 5 automation. It may even be the case that level 5 automation is only achieved with AGI, because of the requirement to be able to deal with so many different scenarios and failure modes.

The Road to AGI

The systems described above, although individually powerful in the tasks that they have been trained to perform, do not contain a generalized understanding of the world. As an example, an agent that is trained to play Dota 2 will not be able to read about changes to the game in a patch note and adapt its playing style to incorporate the changes in the patch. On the other hand, a human that is able to play the game proficiently can reasonably adapt his playing style after reading the changes a new update brings.

For tasks like these, we need to build an agent and endow it with abilities that span all the domains of Artificial Intelligence research, from Natural Language Processing to computer vision, to problem solving and game playing.

How can we build such a system? Broadly, there are two schools of thought on how such an AGI system will be built.

Ilya Sutskever, chief scientist at OpenAI, openly claims that we already have the most crucial tool we need to build AGI — good ol’ Deep Learning. If you don’t know what that is, it is a foundational technology that is used in all the domains of AI described above — NLP, Reinforcement Learning and Computer Vision and Control. Deep learning essentially allows a system to build hierarchical representations of the problem it is trying to solve. On a recent podcast, Ilya claimed that AGI will be achieved through a combination of deep learning “plus some other ideas”. Crucially, these other ideas might already be with us. They could be existing tools like self-play and domain randomization, or a different reformulation of the same problem.

A step that OpenAI in collaboration with an MIT PhD student (Joseph Suarez) have taken towards this is the Neural MMO — a simulation environment based on Massively Multiplayer Online Role-Playing Games (MMORPGs) like World of Warcraft and Runescape.

The Neural MMO environment: The dark green tiles represent patches of ‘food’ and the blue tiles represent ‘water’. Agents must explore to discover and then compete for fertile lands. [source]

This is a competitive online environment that requires agents to forage for resources like food and water, and engage in melee and ranged combat with other agents in order to stay alive. This environment creates a basic framework similar to the real world, where organisms compete for resources to stay alive and procreate. The rationale[2] behind building this MMO environment is that the problem of creating “agents that scale to the real world” can be split into two subproblems:

Agents that scale to their environment, i.e. learn to perform well in whatever environment they are placed.
Environments that scale to the real world.

“Agents that scale to their environment” is the current focus of most machine learning research. It requires the development of better algorithms that allow agents to maximize their potential in their environment. This requires better understanding of and solutions to the core tasks that an agent must perform in its environment, such as exploration and handling memory.

“Environments that scale to the real world” is a crucial part, because as agents get better, there comes a point where they’re limited by their environment. Preventing this requires developing simulations that are better approximations to the real world. Neural MMO is an attempt to create such an environment, and it is more complex than existing environments that typically present very specific challenges. Basing the environment on MMORPG’s, the author argues, also allows it to be extremely scalable.

The other school of thought on how to build AGI systems believes that AGI will be radically different from existing systems. Microsoft Co-founder Paul Allen famously claimed in his 2014 post in MIT Technology review that we need radical new technologies to achieve AGI.

“But if the singularity is to arrive by 2045, it will take unforeseeable and fundamentally unpredictable breakthroughs, and not because the Law of Accelerating Returns made it the inevitable result of a specific exponential rate of progress.” — Paul Allen

Timelines

When will AGI be realized? This is where we turn from facts and hard research to conjecture and hope!

The answer, in short, is that nobody knows for certain. Mainstream AI researchers express any optimism cautiously, partly owing to the long history of disappointments in AI. Indeed, AI research saw a huge first wave in the 1960s and ’70s with some people predicting that they’d be able to build fully conversational and human-like AI within a generation. When the research community failed to produce the results it had promised, the field saw a significant slowdown in the ’80s and ‘90s.

The dawn of the 21st century, however, saw a resurgence in the field, this time with a focus on AI that was applicable to the real world. Research advanced steadily throughout the first decade. And then, in 2012 Deep Learning came into the mainstream. The years following the advent of Deep Learning were something like the Cambrian explosion of modern AI research.

A survey of experts[3] in June 2016 showed optimism in the belief that AGI will be accomplished within the 21st century. According to the survey, experts on average predict a 50% chance that high-level machine intelligence will be developed around 2040–2050, rising to a 90% chance by 2075.

Hopefully, this gets you thinking about some of the problems with building Artificial General Intelligence and potential ways to solve them! What is your opinion on how AGI will be achieved, and how far off are we?

Citations and further reading