ChatGPT explained to my grandma: how does artificial intelligence (AI) learn to speak?

“How does he know how to speak ChatGPT?” So that, Grandma, is the $100,000 question. To begin with, ChatGPT is an artificial intelligence that is part of a discipline called “Machine learning” (automatic learning), and more precisely of a family that bears the name of “deep learning” (the deep learning). These are fields of research whose goal is to determine how a computer’s algorithms can learn from examples. The idea is that the algorithm will generalize from a regularity that it has observed in a series of data, in what is called the learning phase. Very roughly, he derives a general rule from a series of examples. Then, in the use phase, the algorithm will use this general rule to extrapolate and make predictions.

The gentleman in this video below, Yann Le Cun, is one of the inventors of “deep learning”. He is so strong that Facebook hired him and he gave courses at the College de France on the issue. You can listen to him if you prefer, but it’s a bit technical.

Lots of little “weights”

Well until then, it’s not too hard is it? But beware, it gets tough. To make these predictions on complex questions and tasks, one needs an artificial neural network. Very roughly, this neuron is a cell that activates or not depending on what it receives as input. This entry condition is configurable, it is called “weight” to designate it. You have to see these weights as a series of knobs to turn until you find the right setting. In fact, each neuron performs a simple mathematical function, such as addition, multiplication, etc. We must therefore keep in mind that the neuron always deals with numbers, it is its language. And since it is indeed a network, we will connect several neurons together, to make a “brain” which will be the seat of our artificial intelligence. By playing on its weights and its structure – there is a very large catalog of network architectures, more or less efficient for accomplishing a task – we can configure the network in a very fine way. By submitting a large series of examples to the network, we will be able to “turn the knobs” to gradually reduce the distance that separates what the network produces and the expected result. When the network reaches this goal, its learning is complete: now, when it is submitted new input data, it will be able to process them correctly.

READ ALSO :Artificial intelligence: soon a software to detect cheating with ChatGPT at school

We see that we are still very far from speaking a language. A little background is helpful to understand what follows: in recent years, neural networks have become more and more efficient due to the creation of new architectures, the computing power of computers and the availability of large databases, which constitute millions of examples to perfect the education of an artificial intelligence. AIs have become extremely good at recognizing objects in an image, and even at inventing them. With a sufficient number of examples and the right “connections”, an AI can indeed “hallucinate” an image based on the general characteristics of an object, which it has identified in its database.

Without going into too much detail, let’s take a little detour through image generation to understand the philosophy of this technology. Suppose we take all the images of rabbits in the world and put them in a database: a digital image is an assembly of pixels, each colored according to three values ​​– red / green / blue. By representing these values ​​in a mathematical space – let us say, for simplicity, in a two-dimensional plane, in abscissa and ordinate – we realize that their distribution is not random: all the images of rabbits in the world occupy the same “region” of this space. If we understand how this distribution works, it becomes possible to do sampling, that is to say to produce a new example which will be located in the “rabbit region”. Are you still a grandma?

Semantic field

For “natural language processing” – this is the name for AI applied to languages ​​– we will also use an abstract mathematical space to convert words into numbers – that is to say a language understandable by the algorithm. Except it’s a little more complicated. The problem with language is that the algorithm must understand the meaning relationships between words. This means that the semantic proximity link between two terms must be translated numerically, such as “balloon” and “sphere” for example. For images, translating a color into a number – we speak of “encoding” – requires a small amount of information. For a word, or even a sentence, more is needed, so we use a “vector”: it is a list of numbers associated with a word or a sentence, which serves as coordinates in our abstract mathematical space. Instead of having one number on the abscissa and the other on the ordinate, we will therefore have more. To resume our comparison with the color of a pixel, we go from a two-dimensional space to an x-dimensional space. If we think about it, this space actually becomes a kind of “semantic field”. In this field, two sentences whose meanings are close will therefore have close coordinates.

A stroke of luck: our neural networks are able to identify semantic proximities and assign vectors to a very large repertoire of words. Once our network has done this work, we move on to a new step: give these vectors to another neural network to make it learn a language. But it’s still not over grandma: the other big problem with language is that the meaning is dependent on scattered elements in the sentence, so we have to teach our algorithm to identify and remember important words in a sentence. Why is it so annoying? Because by having the algorithm read a sentence conventionally, by discovering each word one by one – we speak of “sequential” processing –, the vector that defines the meaning of the sentence will gradually “forget” the first words. More precisely, the importance of the first words of the sentence will weigh less and less heavily in the series of numbers supposed to restore its meaning. However, it is obvious that the last terms of a sentence are not always those which contribute the most to its meaning.

READ ALSO :Artificial intelligence: “Who speaks behind ChatGPT?”

To date, the most effective way to solve this problem is a learning model called “transform”. This solution is not new: its discovery dates back to 2017, in a study titled Attention is all you need », and its first applications date back to 2020. In order for the algorithm to remember the beginnings of sentences, an additional layer of neurons was introduced into the network capable of extracting a vector at any place in the sentence and recalling its meaning to the rest of the algorithm. This is referred to as the “attention mechanism”.

In 2017, Open IA researchers discovered that this extra layer of neurons could do the job of processing language on its own. This is the GPT model (for Generative Pre-Training Transformer), launched in 2020. With this network model, we completely get rid of sequential processing, to give the whole sentence to the algorithm at once. GPT processes each word in parallel, whose place within the sentence is also encoded, and manages to determine the relative importance of each term in the overall meaning of a sentence thanks to its attention mechanism. To achieve this, it was necessary to train GPT, now in version 3.5, on tens of billions of sentences, and to adjust hundreds of billions of “weights”. Granny? You sleep ?

We want to thank the writer of this short article for this awesome material

ChatGPT explained to my grandma: how does artificial intelligence (AI) learn to speak?

Take a look at our social media accounts and other pages related to them