How artificial media could impact US Elections

Deep fakes, extremely convincing artificial media produced by deep neural networks, have entered the political arena. In 2017, Buzzfeed published a short video on YouTube that appeared to feature President Barack Obama sharing some surprisingly candid viewpoints. In reality, the visual portions of the video were artificially produced. The voice on the video was an impression done by comedian Jordan Peele. Many comments throughout social media lamented how terrifying this new technology is.

In my presentation, “The Death of Trust: Exploring the Deep Fake Threat”, at BSides Vancouver Island, I discussed the many threats posed by Deep Fakes. Among those threats is the expected role that Deep Fake media will play in the 2020 US Presidential election. While the technology continues to advance, it still has some important limitations that may help mitigate it’s impact. Moreover, there is a considerable amount of research being done into detection techniques. So far, this research boasts some impressive results.

How Deep Fakes are Created

Deep Fake videos are created using the learning capabilities of deep neural networks called Generative Adversarial Networks (GANs). In a GAN, two neural networks are pitted against each other. A Generator network is responsible for creating video frames that appear real. A Discriminator network in turn is attempts to validate whether the frame is “authentic” or not. The Generator essentially repeatedly tries to trick the Discriminator into believing images it creates are real.

Depiction of a GAN
A simplified view of a Global Adversarial Network (GAN)

Both networks are trained with a large set of still images of faces, often hundreds of thousands. After learning, the GAN can be provided a relatively small number of still images of the intended subject (for instance President Obama). The Generator is also typically provided a target video into which the subject’s face will be inserted. Frame-by-frame, the generator creates new artificial frames. The discriminator in turn decides if they belong with the set of subject images. Each time the discriminator rejects a frame, the Generator learns and refines it’s algorithms. The end result is very convincing video content.

The Political Threat

Deep Fakes are concerning for the political process because they further support the distribution of dis-information. As the capabilities of deep fake producing GANs improves, politically motivated actors can create false videos of their adversaries. These can be used to convince voters that a particular candidate said or did things things that are detrimental to their reputation. A striking characteristic of this type of dis-information is that once it is in the minds of the public, it is very hard to combat. Even with well documented evidence that a video is fake, many will still believe it is true.

However, another issue that is less talked about is the the opposite case. What happens when compromising video of a politician surfaces but they claim it is a fake? There are already an abundance of claims of “Fake News” echoing in political discord. Trying to prove the authenticity of a video claimed to be fake can be quite challenging. In this way, deep fake technology puts a heavy strain on our ability to trust anything we see or hear.

Limitations of Deep Fake Technology

The good news is, deep fake technology is still far from perfect. The limitations of the technology are constantly changing but researches continue to work on methods for exploiting those limitations. One limitation is that in the training process, GANs rely on facial images of a fixed size. This is due to processing limitations. As a result, researchers from University at Albany, SUNY have been able to train neural networks to find warping artifacts that are indicative of deep fake videos.

Another limitation of deep fake video creation is that currently the GANs do not account for context and linguistics. Facial habits that are specific to the content and or emotions being delivered are not easily replicated. Since training relies on static images, context-related expressions are not easily replicated. As a result, researchers from Dartmouth released research earlier this year that analyzes video for consistency with these “soft biometrics”. As of the release date, the study achieved a 95% accuracy rate. The researchers estimate that by the start of the 2020 primary season, that accuracy could be as high as 99%.

Finally more development needs to be done before fully synthesized (both audio and video) deep fake videos can be reliably produced. Tools like Adobe VoCo and Baidu’s “Deep Voice” can produce very realistic synthesized voices. However, combining both deepfaked audio and video has yet to be demonstrated with consistent reliable results. That said, it seems reasonable to expect that it is only a matter of time before fully synthesized video can be created from nothing more than a typewritten script.

Proving Authenticity

Researchers have also been working on ways to ensure that truly authentic videos can be validated. NYU researchers recently demonstrated how current high-end digital cameras can be modified to create digital watermarks. Their study went further however. They also used neural networks to overcome loss of forensic data due to regeneration (re-encoding an image/video). Overall they were able to build the framework of what could be an all-new approach to digital forensics.

Looking ahead

It certainly seems clear that for 2020, deepfakes will be a part of the (dis-) information bombarding the American public. If there is any good news in this it’s that we’ve not yet reached a level of capability talked about in the many doomsday scenarios regarding deepfakes. To truly limit the impact of deepfake media will require a coordinated approach of public awareness, careful and responsible journalism, and of course technological countermeasures. Security professionals can help shape the course of these three elements through our evangelism, influence and research.