Point-E: OpenAI’s new tool generates 3D objects in the blink of an eye

The new tool of the wizards of artificial intelligence may have laid the foundations for a revolution in computer imaging.

In recent months, even the uninitiated have become aware of the dazzling progress of artificial intelligence through innovative tools. The Internet has notably fallen in love with generative tools; with their ability to produce stunning images or larger-than-life pieces of text, among other things, they are as impressive as they are entertaining. And just recently, the troops of OpenAI have further extended this already huge range of possibilities; the designers of DALLE-E, the pioneer of image generation which has been emulated by many, are now tackling 3D objects.

Like DALL-E, their new creation called Point-E reacts to a simple textual description written in everyday language. The difference is that it does not offer a return image, but rather a 3D object made up of a cloud of points.

The DALL-E 2 AI generates images from text, and the result is fabulous

The meshes 3D, complex objects

In your favorite video games and animated films, the 3D models that make up the environment, the characters, and the vast majority of solid objects in the scenery are built from a mesh of vertices (vertex, or vertices in the plural). They can be connected to define edges (edges) which then delimit faces. It is the latter that are colored using different textures and light effects to give the model its final appearance.

A 3D object often consists of a large number of vertices. Even the most rudimentary, like the dolphin below, often number a few hundred; in some cases, detailed models can blithely exceed a million points for. Thanks to the technical prowess of Epic Games with the Nanite technology that equips Unreal Engine 5, we are even moving towards the implementation of models with several billion points in a real-time engine.

A mesh of triangular faces representing a dolphin. © Chrschn – Wikimedia Commons

This was inconceivable before, because the computer has to manage all these elements individually. For a very detailed model, it has to constantly recalculate the position of each point, the color of each face in relation to its texture and the different light sources… it is partly for this reason that a superb game like the recent God of War: Ragnarok needs a powerful machine to shoot in good conditions.

And this problem does not only concern game artists. This is also a big problem for AI researchers. Because generating such objects automatically is an immensely complex task in terms of computation. Even for specialized AIs do not work miracles at this level. Research on the generation of 3D models therefore tended to progress quite slowly. Algorithms of this kind remain appallingly slow, in addition to offering poor quality results compared to a human artist.

Two intermediaries for a dazzling algorithm

To circumvent this limitation, OpenAI researchers explored another approach. It all starts with a first algorithm called “text-to-image”. The concept is exactly the same as that of DALL-E et al; once trained, just provide it with a textual description, and the program spits out a corresponding image.

But here, in this case, it is not a question of producing a work of art; the objective is to propose a fairly rudimentary image, then to use it as an intermediary to feed a second subsystem.

Once the proxy is generated, it is swallowed by another algorithm. This is responsible for converting the image into three dimensions. To train it, the researchers simply provided it with pairs of images and corresponding 3D models. He was thus able to learn to make the link between the two formats.

The subtlety is that it is not yet the finished product; it is in fact a second intermediary. Because instead of immediately spitting out a standard 3D model based on vertices, edges and faces, Point-E starts by producing a simple point cloud.

© Open AI

Unlike the vertices of a regular mesh, these units have no functional relationship to each other. For a computer, it is much easier — and therefore faster — to describe a three-dimensional shape with such a point cloud. From there, there is therefore one last step: converting this point cloud into a real mesh. This is the easiest step, because there are already powerful algorithms capable of performing this task.

Since it is based on two proxies which are by definition approximate, the object produced will not be particularly precise and realistic. On the other hand, the whole process is extremely fast. By introducing these intermediate steps instead of rushing headlong towards the end result, the researchers achieved a mind-boggling speed boost.

According to them, Point-E is faster than the best 3D generators of the moment. And by several orders of magnitude.”Our method performs worse than state-of-the-art techniques in terms of realism, but produces samples in a fraction of the time”, explain the authors of this work.

Practical potential even greater than image generation

And that is anything but negligible. Because if we think on a global scale, there are ultimately quite a few applications that require ultra-detailed 3D models. They are very useful in advertising and “high end” entertainment products. We think in particular of the big Hollywood blockbusters or AAA video games. But for everything else, lower quality models can often do the trick. And in these cases, a tool like Point-E could represent a real revolution in terms of productivity.

To find a simple example, one can imagine the case of an independent developer. Let’s say he wants to develop a mobile app, like a farm-themed management game. To give shape to the animals that will be an integral part of his program, he has several options available.

DALLE-E and consorts certainly produce fascinating results, but their practical interest remains limited. © DALL-E 2 / OpenAI

The first is to learn the basics of 3D modeling. He could thus create everything himself; but it is a very long process which will inevitably slow down the development. He can also hire an artist to make them for him. But that obviously costs money.

Or, he could simply subscribe to a service based on a Point-E type algorithm. It would then suffice to ask him successively for “a goat”, “a sheep” or “a hen” to generate all his bestiary in the blink of an eye. And why stop in such a good way: as long as he can, he could also order models of vegetables, buildings…

This is only an isolated and rather reductive example that is not enough to describe the capabilities of this technology. But it allows at least to illustrate its enormous potential. Because this is just the beginning.

Towards maturation, then revolution?

Indeed, general computing and machine learning algorithms are progressing very rapidly today. It can therefore be expected that these programs will also progress much faster than before. Kind of like image and text generators. As a reminder, the latter more or less stagnated for years before experiencing a spectacular explosion.

But if so, it may be a step forward that may be even more important. Because ultimately, even if they are fascinating, in practice, the applications of image generators are still quite limited because of the many debates on intellectual property that are raging at the moment. On the other hand, a system that would allow anyone to produce 3D models on the fly could eventually be used in a very concrete way in lots of contexts where this copyright problem is less noticeable.

Just imagine a Point-E type program being able to produce AAA caliber 3D models from simple concept art from the studio itself. It would be a real revolution in the world of video games. And for good reason: this modeling work is exceptionally time-consuming. By overcoming this constraint, small thumbs could reach a level of quality comparable to that of the titans of the sector. At the same time, the latter could significantly increase the rate of production.

The day when an AI will offer us 3D models of this quality, it will be an unprecedented revolution in the world of video games… and it will only be the beginning. © Santa Monica

And it goes even further. Because for now, it’s only 3D modeling. But in principle, this approach is also applicable to other disciplines. At first, this approach could be extended to animation or even to development itself. And eventually, we can even imagine other industries such as architecture, structural engineering… and even the generation of static images like those of SLAB-E type generators!

Particular attention should therefore be paid to the development of these technologies. Because OpenAI is obviously not the only company working on this. We can also mention Google’s DreamFusion, and other big names will no doubt be interested in it in the future. And that’s typically the kind of competition that could blow up this technology, with profound implications for many industries when it comes of age.

The research paper is available here. See also the GitHub repository.

We want to say thanks to the writer of this article for this awesome web content

Point-E: OpenAI’s new tool generates 3D objects in the blink of an eye

Check out our social media profiles as well as the other related pageshttps://yaroos.com/related-pages/