Specializing in research, Nuclia has developed a platform that uses AI resources to analyze large sets of unstructured data.
Founded in 2019 in Barcelona by Eudald Camprubi and Ramon Navarro, the start-up Nuclia benefited from the experience of its two founders in the United States. “We had technology that we didn’t know how to sell, and we met the team of an American start-up that didn’t have technology, but was good at sales. They bought our technology and we decided to create an ediscovery platform for the legal sector. We managed to have Facebook, Twilio and Electronic Arts as our first client. After a few years, we realized that legal tech wasn’t sexy. We didn’t like legal tech and the American way of life. And so we came back to Barcelona to create Nuclia with everything we learned from our conversations with customers, including Facebook.”
Eudal Camprubi, the CEO of Nuclia whom we met during an IT Press Tour in Lisbon, explains “So what do we solve? We’re solving a problem you’ve probably all encountered at some point, which is that you have a lot of unstructured data. So you have a lot of PDF documents, videos, PowerPoint, etc. In fact, in business, around 80% of the data you manage in your day-to-day work is unstructured. It is therefore very difficult to find unstructured data in companies. There is another problem, which is that if you want to index this type of data, it is a huge challenge for IT departments. So data that’s in different data sources in SharePoint, in Amazon S3, and wherever you want. These data are not only in different sources, they are also in different formats, from videos to PDFs or others, and also in different languages. Not all companies work only in English, some work in German, French, Spanish or other. So different data sources, different formats, different languages. It’s a nightmare, if you want to search through your data to find different terminologies”.
Solve 3 very complex problems
Setting up the infrastructure to load data, configure indexing and develop search experiences is a particularly difficult task in companies: a good example is the search engine of the site Le Monde Informatique which is far from satisfactory. The work of collecting data, configuring search algorithms and developing applications is only the first step: adjusting the relevance of responses is an ongoing battle. With the transition to open source of its NucliaDB vector database and its API allowing access to its indexing engine, the young Spanish company is responding to requests from developers wishing to make more efficient use of the information contained in PDFs and audio files. or video. Nuclia’s API is able to connect to applications, sites or web services to automatically index available content and perform multilingual semantic searches on all unstructured data. A very interesting proposition, because setting up a personalized, relevant and up-to-date search engine was not trivial before the advent of the cloud, SaaS with API, integration platforms and machine learning .
The data collected is in different data sources and is in different formats and languages.
Eudald Camprubí highlights the capabilities of his search engine that can expand the reach and scale of a business. “We are facing three very complex problems, and very difficult to solve.” Namely the ingestion, processing and indexing of unstructured data and only search engines powered by AI can help companies overcome this chaos. “What we’ve done is create something we call AI size of the service, a search engine as a service. And basically what we’re building is something that sets up a very easy and fast way to index unstructured data, wherever the data is, and whatever the language and the format, and we put them together with the power of AI search. So by putting it all together, businesses can understand and index any kind of data […] Basically, with Nuclia we have built a pipeline that we can connect to any type of source and we are able to extract all text from any type of file in all languages. The way to do this is to use our REST API, our SDK, or our Nuclia application. The app is an app that you download to your computer and from that specific app you can connect to any data source in your country”.
Automatic OCR to create summaries
“We are able, from any file, to extract all the text [avec OCR], if it is a video, we will find an automatic speech to text conversion process. In almost all languages. We also OCR this text on your images. We are therefore able to extract data from external URLs. So if you find an interesting article, you can just copy-paste the URL and all that content gets indexed. And other content extractions which are much more related to metadata and all that we can extract from forms”, further indicates the manager. For the analysis of texts, everything is based on semantics. There are no dictionaries. “In the case of Chinese, for example, we are not the best, neither with Japanese nor with Korean. So older languages that use symbols to write, we’re not good at. But, we are not bad at Arabic. We can still improve: the pictograms are still quite difficult to understand. We are able to understand and extract all key information from any file to create a summary if needed. We not only extract all the text, but also all the paragraphs and each paragraph to get the excerpt as well. We also create a classification model to externally classify this data. Then we store everything in NuclearDB, an open source database we built ourselves in Rust and Python.”
Nuclia works with partners and consulting companies to support customers. A plug-in for Drupal has also been launched, and one for WordPress should arrive very soon. “Thus, any website using WordPress, Drupal or an open source clone will be able to use our services with just one click”, assures the CEO. For prices of the solution, difficult to obtain a clear answer , during our discussions: “Pricing. Well, it’s a bit of a nightmare. We price our solution based on three different elements. The first is if we host the database, or if our customers have the database so it makes a difference the second is how much data you want to index it’s not the same if you want to index a terabyte of data or 10 gigabytes it’s not the same thing and therefore the prices are very different. From one to the other, I would say that we start at 5k€ per year which is very low, up to 60k€. It depends on the negotiation”.
Search engines with built-in, configurable machine learning algorithms provide significant benefits to companies that have multiple business applications and different types of users searching large repositories of information. Nuclia’s search platform provides both quality indexing, machine learning capabilities including feature enrichment algorithms, automatic relevance tuning and recommendation engines. A start-up to follow.
We wish to thank the author of this post for this amazing web content
Nuclia comes to modernize research in companies Computerworld
You can find our social media profiles , as well as other pages related to it.https://yaroos.com/related-pages/