DeepMind’s “Gato” is mediocre, so why build it?

DeepMind’s “Gato” neural network excels at many tasks, including controlling robotic arms that stack blocks, playing Atari 2600 games, and captioning images. Image: DeepMind.

The world is used to seeing headlines about the latest breakthrough in artificial intelligence forms of deep learning. However, the latest achievement from Google’s DeepMind division could be summed up as “an AI program that does a mediocre job in many respects.”

Gato, the name of the DeepMind program, was presented recently like a so-called multimodal program, capable of playing video games, chatting, writing compositions, captioning images and controlling a robotic arm that stacks blocks. It is a neural network capable of working with multiple types of data to accomplish multiple types of tasks.

“With just one set of weights, Gato can chat, caption pictures, stack blocks with a real robotic arm, outplay humans in Atari games, navigate simulated 3D environments, follow directions, and more. write lead author Scott Reed and his colleagues in their article entitled “A Generalist Agent”.

Create a precedent

DeepMind co-founder Demis Hassabis encouraged the team, exclaiming in a tweet : “Our most generalist agent to date! Fantastic work from the team! The only catch is that Gato isn’t that great at multiple tasks.

On the one hand, the program is able to do better than a dedicated machine learning program to control a Sawyer robotic arm that stacks blocks. But on the other hand, it produces image captions which in many cases are quite poor. His ability to dialogue by chat with a human interlocutor is just as mediocre, sometimes giving rise to contradictory and absurd statements.

And its ability to play Atari 2600 video games is lower than most dedicated machine learning programs designed to compete in the Arcade Learning Environment benchmark.

Why create a program that does some things quite well and a lot of other things not so well? According to the authors, this is an expectation. There is precedent for more general types of programs to become the state of the art in artificial intelligence, and it is expected that increasing amounts of computing power will fill in the gaps in the future.

Multitasking agent

Generality may tend to triumph in artificial intelligence (AI). As the authors note, quoting subject matter expert Richard Sutton, “Historically, generic models that make the best use of computations have also tended to outperform more specialized approaches in a specific area.”

As Richard Sutton wrote in his own blog post“the biggest lesson that can be learned from 70 years of AI research is that general methods that take advantage of computation are ultimately the most effective, by far.”

In a formal thesis, Scott Reed and his team test “the hypothesis that training an agent that is generally capable of performing a large number of tasks is possible, and that this general agent can be adapted with little data to succeed in an even greater number of tasks”.

The model, in this case, is indeed very general. It is a version of the “Transformer”, the dominant type of attention-based model that has become the basis of many programs, including GPT-3. A Transformer models the probability of a given element given the elements around it – for example the words in a sentence.

1.18 billion network parameters

In Gato’s case, DeepMind scientists are able to use the same conditional probability search across many types of data.

“During Gato’s training phase, data from different tasks and modalities is serialized into a flat sequence of tokens, batched, and processed by a Transformer neural network similar to a large language model. The loss is masked, so Gato only predicts action and text targets,” describe Scott Reed and colleagues regarding the program’s training task.

In other words, Gato treats tokens no differently, whether they are words in a chat or motion vectors in a block-stacking exercise. It’s all the same.

The hypothesis of Scott Reed and his team has a corollary, namely that ever greater computing power will eventually prevail. For now, Gato is limited by the response time of a Sawyer robotic arm that does the block stacking. With 1.18 billion network parameters, Gato is much smaller than very large AI models like GPT-3. As deep learning models grow, inference leads to latency that can fail in the non-deterministic world of a real-world robot.

But Scott Reed and his colleagues expect that limit to be exceeded as AI hardware becomes faster to process. “We focus our training on the model-scale operating point that enables real-time control of real-world robots, currently around 1.2 billion parameters in Gato’s case,” they write. “As hardware and model architectures improve, this operating point will naturally increase the achievable model size, pushing general purpose models higher on the scaling law curve. »

The potential dangers of a generalist program?

Therefore, Gato is really a model of how computational scale will continue to be the primary driver for machine learning development, making general purpose models bigger and bigger. Bigger is better, in other words.

And the authors have some proof of that. Gato does indeed seem to improve as his height increases. They compare average scores across the benchmark tasks for three model sizes by parameter, 79 million, 364 million, and the main model, 1.18 billion. “We can see that, for an equivalent number of tokens, performance improves significantly with increasing scale,” the authors write.

An interesting question for the future is whether a general purpose program is more dangerous than other types of AI programs. The authors spend a lot of time in their article discussing that there are potential dangers that are not yet fully understood.

The idea of ​​a program that can handle multiple tasks suggests to the layman some sort of human adaptability, but that can be a dangerous misconception. “For example, physical embodiment could cause users to anthropomorphize the agent, which would lead to misplaced trust in the event of a system malfunction, or could be exploited by bad actors,” write Scott Reed and his team.

“Furthermore, while cross-domain knowledge transfer is often a goal in machine learning research, it could create unexpected and unwanted outcomes if certain behaviors (e.g., arcade game fighting) are transferred into the wrong context. »

Therefore, they state that “the ethical and security considerations of knowledge transfer may require substantial new research as generalist systems progress.”

The field of robotics

Gato is by no means unique in his tendency to generalize. It is part of a general trend towards generalization and larger models that use a lot of power. Among its peers are PaLM, the Pathways Language Model, introduced this year by Google experts. It is a model with 540 billion parameters that uses new technology to coordinate thousands of chips, known as Pathwaysalso invented by Google.

What’s new with Gato, it seems, is the intention to take AI used for non-robotic tasks and push it into the realm of robotics. The creators of Gato, noting the achievements of Pathways and other generalist approaches, see the ultimate achievement as an AI that can work in the real world, for any kind of task.

“Future work should consider how to unify these textual capabilities into a fully generalist agent that can also act in real-time in the real world, in various environments and incarnations. »

So you might consider Gato an important step on the way to solving the toughest problem in AI, robotics.

Source :

We wish to give thanks to the author of this post for this awesome content

DeepMind’s “Gato” is mediocre, so why build it?

Take a look at our social media accounts along with other related pages