(Question) Deepfake Detection

vokermiracle · July 24, 2023

I am working on a project 'Deepfake Detection' using Deep learning (potentially detecting using both audio and video). Is there anyone who has any idea or resource that could help in approaching this project? Any type of idea or help would be greatly appreciated. Thank you!

Eigenvektor · July 24, 2023

I would assume training your network works the same as any other machine learning project. You "show" it examples of deepfakes and non-deepfakes until it learns to distinguish them. So you'll primarily need a ton of (good) deepfakes and legit media.

starsmine · July 24, 2023

Is that not just half of a GAN?

vokermiracle · July 26, 2023

On 7/24/2023 at 10:07 AM, Eigenvektor said:

I would assume training your network works the same as any other machine learning project. You "show" it examples of deepfakes and non-deepfakes until it learns to distinguish them. So you'll primarily need a ton of (good) deepfakes and legit media.

yep but the modern deepfakes are highly indistinguishable. So, we need to try a different approach or develop a very very good network that can do so. And i have no idea where do i start from to create such a neural network.

vokermiracle · July 26, 2023

On 7/24/2023 at 10:13 AM, starsmine said:

Is that not just half of a GAN?

Can you please elaborate?

Kisai · July 26, 2023

23 minutes ago, vokermiracle said:

yep but the modern deepfakes are highly indistinguishable. So, we need to try a different approach or develop a very very good network that can do so. And i have no idea where do i start from to create such a neural network.

Nah. Humans can usually pick out the deepfakes pretty easily if they are a fan of the subject being deepfaked. A general purpose deepfake detector will never work because they can only be trained against the model if it remains static.

Here's some trademark deepfake attributes:

- Low resolution / upscaled. Art deepfakes are 512x512pixels and upscaled. Voice deepfakes are 16khz or 22khz instead of 48khz

- compression artifacts amplified. Video and artwork generated by deep learning often amplify watermarks, compression artifacts and intentional damage. Voice and Music often don't "separate" cleanly, and the same with voice and noise. There are tools to separate speakers and background audio, but often the result sounds like an over-compressed MP3. When audio comes from OPUS and AAC, it is supposed to be wideband, but training on that data is often not.

For an example, the reason why voice conversion deepfakes are easily identified, is because the base dataset for pretty much all ASR, TTS, and VC comes from LibriTTS and VCTK. So you analyze these datasets, and thus any time these voices are used, they are easily identified. VoxCeleb is a project for identifying speakers, and as a consequence it can also be used to "deepfake" a speaker, but not usually well since they are just youtube recordings, and often contain background music/noise. Anytime LibriTTS or VCTK is used as the "base" data, or fine tuned with a new voice, that does not erase the original prosody of the LibriTTS data. A lot of LibriTTS data is not fullband, and some of it has high levels of noise. So you can detect zero-shot TTS and Voice conversions that used LibriTTS by looking for how certain words are pronounced in a zero-shot training. Zero-shot only "re-skins" an existing library for a few pieces. No zero-shot training ever sounds like the subject.

Audio deepfakes will never fool family, friends and fans, because it's just not reliably capable of doing so. You need an impersonator to "speak like" the subject, and then use the AI to adjust the prosody to sound like the subject." If you want to fool everyone, the model has to be trained ONLY on the subject, and trying to get the necessary audio without their consent is usually impossible. The people most likely able to deepfake someone, are the people who work for production companies who have these people on as guests, because that's the only time celebrities talk like "people" and not like "characters".

Visual data is a lot harder to deal with because it's already inherently lossy, and people change clothes, and also age faster than their voice does. So an AI trained to deepfake someone, needs a lot of different visual data labeled by age, haircut, clothing, etc.

It has to be said that, a "deepfake" detection as a general purpose tool, can't be made, because it requires access to the dataset the deepfake was trained on. So the more popular a deepfake target is, the harder it is to stay on top of it.

Audio deepfakes will always be easier to detect because it's hard to change your voice and a lot more goes into generating a human voice than simply speaking words. Detecting deepfake video will be easy "for now" because you usually need someone to "paste over" the performance of.

(this was produced by the south park people)

Eigenvektor · July 26, 2023

33 minutes ago, vokermiracle said:

Can you please elaborate?

Generative adversarial network (GAN)

The basic idea would be, you train two neural networks. One that generates deepfakes and one that detects them. As the first network gets better and better at generating deepfakes, the second network should get better and better at detecting them. Those two are "adversaries" because they compete against each other with that goal that both get better at what they do.

48 minutes ago, vokermiracle said:

yep but the modern deepfakes are highly indistinguishable. So, we need to try a different approach or develop a very very good network that can do so. And i have no idea where do i start from to create such a neural network.

The approach would be as I said. Train a network on samples of movies where you know they are not deepfakes and samples where you know they are. The more samples you can give your network to train on, the better it should get at distinguishing them. That's also the idea behind using a GAN. Instead of trying to find actual deepfakes to train on, you simply generate your own.

Just fyi, there are big companies working on deepfake detection already, so this is not a new idea

https://www.intel.com/content/www/us/en/newsroom/news/intel-introduces-real-time-deepfake-detector.html#gs.31ao8f

Here's a random Python tutorial I found for detecting deepfakes:

Sign In

(Question) Deepfake Detection

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Topics

Latest From Linus Tech Tips:

Google’s Best Feature In Years - WAN Show June 5, 2026

Latest From ShortCircuit:

The coolest looking monitor. Period. - ASUS ROG display at Computex (Sponsored)

Latest From TechLinked:

This Summer’s Lookin’ Steamy

Latest From GameLinked:

This Was A GOOD One...

Latest From Tech Quickie:

The Secret Council Behind Every Emoji

Latest From The WAN Show:

Google’s Best Feature In Years - WAN Show June 5, 2026