ASTRID Guerre cognitive - Accompagnement spécifique des travaux de recherches et d'innovation défense - appel thématique Guerre cognitive

Tracking and detecting fake news & deepfakes on Arab social networks – TRADEF

TRADEF

Tracking and detecting fake news and deepfakes on Arab social<br />networks

Objectives

The aim of cognitive warfare is to destabilise a country's institutions by influencing public opinion.<br /><br />This destabilisation is achieved by manipulating information, in particular via social networks. The result is the dissemination of what is commonly known as fakenews or deepfakes.<br /><br />Our aim in this project is to detect this type of disinformation, denigration and propaganda at an early stage. The aim of TRADEF is to detect a suspicious event, which may take the form of a text, podcast or video. This suspicious event is analysed and, above all, tracked over time. A piece of information may be suspicious, but over time it may change its status and become true. Furthermore, a piece of information may appear credible at a given moment, but be denied or invalidated by new elements.<br /><br />One of the original features of this project is to track the status of an event longitudinally, explaining it at any given moment.<br /><br />Another original aspect of this project is its focus on disinformation in the context of Arabic dialects. These dialects are vernacular languages that differ from one country to another, and even from one region to another within the same country. This makes monitoring even more complicated. Moreover, the phenomenon of code-switching is particularly prevalent in this context: the communities concerned frequently switch from one language to another within the same sentence. In France, this phenomenon is even more interesting and takes two forms: texts in social networks with a predominance of Arabic and a few words in French or, conversely, a few words of Arabic within texts that are mainly in French. The latter case mainly corresponds to people who have completed their training in France, who have a poor command of Arabic but persist in incorporating a few expressions and words in Arabic into their communications on social networks.

With regard to deepfake processing in our project, our aim is to develop robust methods for detecting deepfakes that have not been created under the same conditions as those used for learning. In fact, most current facial manipulations seem easy to detect in controlled and mastered scenarios. In this context, it has been shown in most existing benchmarks that detection methods achieve very low error rates but it is difficult to correctly identify those shared in social networks, due to strong variations such as compression level, resizing, noise, etc. In addition, facial manipulation techniques are continually improving, leading to the development of new research methods that enable better generalisation to detect deepfakes created in unprecedented conditions.

The approach we are taking in TRADEF to identify these deepfakes is based on deep and reinforcement learning techniques that use algorithms similar to those used to construct the deepfakes themselves, such as methods based on the principle of the generative adversarial network (GAN).

Approaches based on a GAN-type network are attractive. They consist of a Generator and a Discriminator, each of which is a Convolutional Neural Network (CNN). They are trained together so that the discriminator learns to distinguish false images/videos from real ones and the generator network learns to fool the discriminator. This method is considered to be one of the best neural network inventions of the last decade.

We believe that the method used to create manipulated images could be exploited to identify fake images/videos. In other words, we can reverse the process of creating deepfakes, so we propose to carry out a sort of reverse engineering process. We will use multi-GANs, testing with at least two generators to refine the discrimination of manipulated images from the originals.

We believe that it is not appropriate to consider the detection of deepfakes as a simple binary classification problem, but as a one-class anomaly detection problem.

As far as learning and test corpora are concerned, we will use existing benchmarks such as github.com/ondyari/FaceForensics, which is accessible to the public, before moving on to more complex reference datasets. At LORIA, in the SMarT team, we have built the ArabDeep corpus, a set of manipulated videos about Arab personalities.

As for the evaluation measures, we will mainly use the accuracy of the models. We will also evaluate the method using the FID score and the conditional FID.

(last modification: start + 6 months)

We created the BOUTEF corpus ((Bolstering Our Understanding Through an Elaborated Fake News Corpus). This corpus contains posts from social media (Facebook, Twitter, YouTube et TikTok) in Algerian dialect, Tunisian dialect, Modern Standard Arabic, French, English. We preserved the script (latin, arabic, arabizi) used in the posts. BOUTEF contains 3,600 fakenews collected on posts from 2010 to 2024. For each post, we collected account id, theme, type fo fakenews, and the list of comments associated to the post.

We studied the features of bots in order to automatically detect them but we found that their behavior is very closed to real accounts.

For the Arabic dialects transcription task, we built a first reference system. This showed that the comparison between output and reference is far to be obvious because of the high variation of writings. To solve this problem we started to work on error rate measures based on semantics and pronunciation. Last, we started to work on detection on false translations for videos.

(last modification: start + 6 months)

We will continue to feed the BOUTEF corpus.

The next step will consist in to build an automatic fakenews classifier from BOUTEF data.

(last modification: start + 6 months)

K. Smaïli, A. Hamza, D. Langlois, D. Amazouz, ”BOUTEF: Bolstering Our Understanding Through an Elaborated Fake News Corpus”, 8th Internnational Conference on Arabic Natural Language Processing, Rabat, Morocco,

The 4th Generation Warfare (4GW) is known as information warfare involving non-military populations. It is carried out by national or transnational groups that follow ideologies based on cultural, religious, economic or political beliefs with the aim of creating chaos in a targeted part of the world. In 1989, the authors of an article on the fourth generation warfare, some of whom are military, explained that the fourth generation warfare would be widespread and difficult to define in the decades to come. With the advent of social media, the blurring battlefield has found a place for 4GW. Indeed, one of the points of penetration of 4GW is the massive use of social networks to manipulate opinions. The objective is to prepare the opinion of a part of the world to accept a state of affairs and to make it humanly acceptable and politically correct. Like fourth generation warfare, cognitive warfare aims to blur the mechanisms of political, economic, religious, etc. understanding. The consequence of this action is to destabilize and reduce the adversary. This cognitive war therefore targets the brain of what is supposed to be the enemy. Eventually, the new ill-defined battlefield in 4GW moves into the opponent's brain or more specifically into the opponent's subconscious population. This war aims to alter reality, among other things, by often flooding the opponent's population with misinformation, rumors, fabricated videos or deepfakes. In addition, the proliferation of social bots now makes it possible to automatically generate disinformation in social networks. According to some sources, for the 2016 US elections, 19% of the total volume of tweets generated were due to these automatic robots. With TRADEF, we are interested in a few disinformation channels: fake news and deepfake. The idea is to detect very quickly in social networks, the birth of a fake in its textual, audio or video form and its propagation through the networks. It is a question of detecting the birth of a fake and following it over time. At any time this potential rumor is analyzed and trusted, it is tracked through social networks in the reference language as well as in different languages. The evolution of suspicious information over time will see its score evolve according to the data with which it will be confronted. The information to be tested is matched with audio or video data that may invalidate or confirm the credibility of the information. Videos that can be used as sources to denounce a fake can themselves be deepfakes. This leads us to be vigilant about examining these videos by developing robust methods for detecting deepfakes. Finally, a dimension of explainability of the results is introduced in this project. Considering the experience of the participating teams in deep learning and the processing of the standard Arabic language and its dialects, we propose to track down and identify fakes in Arabic social networks, which leads to raise other scientific challenges such as the management of code-switching phenomenom, the variability of Arabic dialects, the identification in the speech continuum of named entities, the development of neural methods for poorly resourced languages ??and the explainability of the achieved results.

Project coordination

Kamel SMAILI (Laboratoire lorrain de recherche en informatique et ses applications)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

LIA Laboratoire d'Informatique d'Avignon
LORIA Laboratoire lorrain de recherche en informatique et ses applications

Help of the ANR 295,715 euros
Beginning and duration of the scientific project: December 2022 - 36 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter