IARPA (Intelligence Advanced Research Projects Activity), an organization of the Office of the U.S. Director, has as its mission ” to push the boundaries of science to develop solutions that allow the IC (intelligence community) to do its job better and more effectively for national security. HIATUS is one of its research programs that aims to authenticate the author of a text and ensure their privacy through human-explainable algorithms.

IARPA initiates research programs and communicates the results to its IC customers who themselves deploy the resulting innovative technologies. The four main areas of research in which it invests are artificial intelligence, quantum computing, machine learning and synthetic biology.

The HIATUS program, human interpretable attribution of text using the underlying structure

Whether spoken or written, the linguistic components differ from one person to another, the organization of words, sentences, their content can reveal who pronounced or wrote them.

Timothy McKinnon, the Hiatus program manager, told Nextgov in an interview:

“For a bit of context, it’s like if you had 100 different people, and you asked them to describe something simple – like how to open a door – in two sentences or one sentence, you’d probably get about 100 different answers. . Each person somehow has their own idiosyncrasies as an author that are potentially used by systems of attribution of authorship. »

Every day, a mass of texts is written by anonymous authors, human or machine. Timothy McKinnon points out that these documents mostly contain linguistic components that can be used to identify who wrote the information, or to protect the identity of the authors if the attribution could put them at risk.

He explains :

“With attribution, we identify stylistic characteristics. So it’s things like word placement and syntax that can identify who wrote a given text. Think of it like your written fingerprint. What characteristics make your writing unique? Thus, the technology would be able to identify this fingerprint against a corpus of other documents and compare if they come from the same author. On the privacy side, the technology would find ways to alter the text so that it no longer resembles a person’s handwriting. »

Currently, there are three ways to authenticate the author of a text: linguistic experts can do it by analyzing the text, one can also use the machine learning, including logistic regression or using a Bayesian model, but according to Timothy McKinnon, these methods would not be valid for all texts. The third alternative is to use a neural language model but for him they are not sufficiently explainable.

He declares :

“The problem with these models is that even though they are very, very fast and work very well, we don’t really understand what’s going on inside. They are very complex.

And so what HIATUS is looking to do, among other things, is uncover some of the reasons behind the behavior of these patterns, so that when we do attribution or confidentiality of authorship, we’re able to really understand why the system behaves the way it does, and be able to verify that it is not detecting false information and that it is doing the right thing. »

The HIATUS program therefore aims to develop new human-useable systems for attributing authorship and protecting the privacy of authors through the identification and exploitation of explainable and actionable linguistic fingerprints in different languages. It should last 42 months, from September 30, 2022 to March 29, 2026 approximately, the BAA (call for proposals was published on February 25.

