The rise of deepfakes and how to stop them
FT AI correspondent Melissa Heikkilä steps inside the world of deepfakes - swapping faces with a colleague, dissecting scams, and examining what it will take to stop their spread
Produced by Tom Hannen. Filmed by Petros Gioumpasis, Joe Sinclair, Gregory Bobillot and Chris Kimling
Transcript
You can enable subtitles (captions) in the video player
New AI technologies have made deepfakes - digitally altered images and videos of someone's face - easier than ever to create. Anyone with a smartphone can steal your image and reuse it.
It really takes no technical skill. It takes an app store. It takes a keyboard.
In the age of deepfakes, no one is safe. Bad actors can use freely available AI tools, such as Grok, to transform images of people against their will and without their consent. This technology has also been embraced by criminals, who have been able to use deepfakes to scam companies and individuals out of millions of pounds.
If you could see the person, the person sounds exactly like someone that you know, it's going to be spectacularly effective.
Thanks to these AI tools, it's becoming almost impossible to verify what's real and what's not. We're looking at the rise of deepfakes, where they came from, why they matter, and what we can do about them. And the best way to do that is to figure out how to make one myself.
We're going to make a deepfake of us.
Mm-hmm.
It might be creepy.
I think it probably will be.
But I'll show you how easy it is. All I have to do is drag this video we found in our archives of you into the platform, and then my headshot. And so this is completely free software. It can run locally on your laptop. OK, so processing.
Oh my God
I know, I know. OK. Now that's done. Ready?
Yeah.
This is so eerie already.
So this year as we approach busy season, KPMG's junior auditors won't be able to use their recharger system any more. They won't be able to claim back for any hours they work over their 50 contracted hours, which means that they'll lose about £1,000 to £2,000 that they were able to claim back before, roughly.
I feel like this is what it would look like if we had a baby.
It looks pretty good. I sound Great British.
I feel like if I saw this on social media, ooh, I don't know. I don't whether I would be able to tell that it was AI. I think that I would but maybe that's because I know that it's AI.
I looked like you or...
I looked like you.
Yeah, exactly. But it still had your voice. But if we wanted to change that, so put my voice on top of yours, that would be really easy as well. Using ElevenLabs technology, I just drag your audio, Generate Speech.
So this year, as we approach busy season, KPMG's junior auditors won't be able to use their recharger system anymore. They won't be able to claim back for any hours they work over their 50 contracted hours, which means that they'll lose about £1,000 to £2,000 that they were able to claim back before, roughly.
This still has a British accent.
I know.
So it clearly can't convert. But yeah, my voice is a lot less nasally in that clip than it usually is.
Which means that they'll lose about £1,000 to £2,000.
Which means that they'll lose about £1,000 to £2,000.
So our producer has combined the two deepfake clips into one video, and we sent it to this company called Pindrop to see if they could figure out what we did.
Right.
Do you want to hear what they said?
Yes.
OK.
If you look at the audio-video that you folks created, they're really, really good. In fact, when we asked some humans to look at it and say: what do you see that's weird here?
The thing that they pointed to is the hand movement. And that hand movement feels like that was exactly the hand movement that the other reporter actually used. They moved their hands uncomfortably, so the humans picked up the completely wrong thing on those deepfakes when the AI engines actually picked it up.
They picked up on the hands, but not the face.
Yeah.
And these were also humans that were primed to ask the question, is this AI or is this real? Right.
So they were already looking out for any issues. And even then, they didn't necessarily spot that the faces looked uncanny.
In the age of deepfakes, no one is safe. The FT's Martin Wolf experienced this firsthand when his likeness was used by scammers to create a deepfake of him to sell investment advice on social media.
I received a WhatsApp message, I think, from a former colleague, somebody I knew fairly well, to tell me that he'd seen me on an advertisement on Instagram.
If you don't want to miss it, join us now.
It was clearly an attempt to represent me talking. Though the accents weren't perfect, it was pretty good and the appearance was pretty plausible. And it was selling or recommending investments.
I predicted Amazon would be the biggest winner. Soon after, it surged 89 per cent. I'm Martin Wolf.
I was surprised and shocked at first. It seemed so weird to see this. But afterwards, when I realised the scale of this scam, basically a million people had viewed this in the EU alone.
So if you include the UK and the US, it could easily have been 2mn or 3mn. It was impossible not to imagine that quite a few people had clicked on it. And quite probably they'd lost a lot of money. And that's very, very upsetting.
But in order to protect ourselves from these scams we need to know what we're dealing with. So how can we detect deepfakes? We called Pindrop, a company trying to combat digital fakery. They specialise in detecting deepfakes in video and audio calls.
We do that by answering two questions - is this a human? And then the second thing we answer is, is this the right human? We have partnerships with Zoom, Cisco, Microsoft Teams, and soon Google such that we're able to provide these capabilities in each of these video conferencing systems. We're looking for minor anomalies in both audio and video to determine if it's a deepfake.
So to give you an example, when we are doing this video call, there are 16,000 samples of my voice every single second. And what we're looking for is, is there an issue with that voice every second and as it progresses over time? So those are both spatial and temporal characteristics. And we're doing that both on the audio and the video side.
So the better quality you have of audio and video, the higher quality deepfake you can create. But with GenAI, what's also happened is GenAI has contributed rapidly to denoising capabilities in both audio and video. So even if you have fairly weak audio, these GenAI systems are able to clean up that audio or video and then make really good representations of you.
And so can you do it with just a regular laptop or even a smartphone? You could create an audio deepfake with a smartphone. A video deepfake, you'll probably need a laptop with a beefy enough processing unit. Because especially if you're trying to do this in real time, that's when things get a little messy. If you don't have the real-time constraint, you can absolutely create a video deepfake on a smartphone itself.
One of the things that gives me great hope is the fact that in order to generate a deepfake and in order to detect a deepfake, there is a cost asymmetry. It's four orders of magnitude cheaper to detect it than generate it. In order to create a deepfake, it has to take care of everything about you - how you speak, how you move, your accents, your demographics, all of that.
In order to detect a deepfake, all we need to do is catch one mistake. These deepfake engines are going to get better, but the deepfake detection engines are going to keep par because it's so much cheaper to detect a deepfake.
The first face-swapping algorithms came out in the late '90s, but even by the time this paper came out in 2008 it was still difficult, time consuming, and expensive to achieve, which is why you could argue that the first face-swapping scams were not digital at all. If you wanted to scam someone out of a lot of cash in 2015, your best bet was a Skype call and a fake face made out of rubber, which is exactly what Gilbert Chikli did in 2015.
So by putting a desk, a telephone, some documents, and a couple of flags behind him, he was able to pull off the most remarkable scam. The ruse worked surprisingly well - until it didn't, and Chikli was caught. The FT's Victor Mallet was there for the trial.
The rubber mask scammer in France was this guy called Gilbert Chikli, who was like a Franco-Israeli crook, essentially. And he and an accomplice pretended to be Jean-Yves Le Drian, the then defence minister of France. Most of the calls were video calls on Skype from an alleged secret location below the Defence Ministry, in the basement, as it were. The screen was slightly fuzzy, so you couldn't see his features that well.
So the rubber mask obviously was sufficiently convincing in those circumstances of that rather poor quality Skype call for it to all happen. And they extracted something like $85mn from the people who fell for this. Basically, he was asking for money, pretending to be on a secret mission from the French state to get money to pay ransoms for hostages. Then, of course, they were told to keep absolute secrecy about this and to transfer the money to a secret account, which they did.
There were also lots of recordings of Chikli when he was in jail, boasting about it as being the scam of the century, which in some ways, it was. The known victims were the Aga Khan, who was the head of the Ismaili Muslims of the world at the time. I think he gave something like 20mn euros to different accounts in places like China and Poland and so on.
And then there was the owner of the Chateau Margaux vineyard in France, Corinne Mentzelopoulos, who gave something like 5mn more. And then there was also a Turkish businessman who sort of fessed up and said that he had been scammed as well. And the interesting thing is that we don't whether there might have been more people who, of course, didn't come forward and admit that they had been scammed.
So in the course of that trial, it emerged that there were other people whose rubber masks could be used to deceive potential victims of this. I think they actually found a rubber mask of Prince Albert, but they also suspected that they were going to try and impersonate President Emmanuel Macron.
And in a way, what it shows is that low-tech solutions can still be very effective. Would I have fallen for it? I don't think so. But then, I would never have been a mark because I don't have enough money to hand over to free a hostage.
So how do you make a mask that is so good it could fool a stranger? To answer that question, we spoke to AnthroTek, a company making high-quality masks for the film industry.
I am Raoul Peltier. I am CEO of AnthroTek. I used to be a scientist for most of my life. And on the side of being a scientist I used to have a hobby where I was making monster masks in my garage. And eventually, it started to pick up. I started to be contacted by artists, musical artists at the start a lot, and then eventually a little bit by the cinema industry, which pushed me to maybe expand a little bit the business from the spare room to a bigger garage.
And eventually I took a leap of faith, decided to do masks full time. And that's when I met my co-founder, Nazmus, who convinced me that having a science background and a mask-making hobby could be all fused together into a company that does hyperrealistic simulation for the medical, the robotic industry, which is where we are now.
We make synthetic versions of the human body. So we make synthetic replicas of skin that behaves like skin. We make synthetic organs that behave and maybe have the right blood pressure, microfluidics, similarly to the human body. Or simply sometimes we just make realistic faces that look like a real human.
How did the rubber mask face swap scam work?
So quite an elaborate scam. Also, 10 years ago, it probably would have been more difficult than it is now to get your hands on a hyperrealistic mask. One thing that is unknown is where he got the mask from, whether he did it himself - this is quite unlikely - or whether he managed to source it from a company that maybe doesn't have such an ethical attitude as we do.
I promise you, I was never contacted to try and impersonate the minister of defence. The story was crazy and I certainly remember my friends all forwarding me the article.
Yes, so this is a hyper-realistic face that was created by an award-winning special effect artist David Malinowski. And it was painted by Anna, one of our artists. And that just exemplifies how realistic a face you can make with years of training. So this is a silicone slab. So this particular face is a training face. So they are used by special effect artists to practise painting.
So the combination of various technologies... so using hyper-realistic masks and AI to do fake voices, that certainly has a very strong potential for enhanced scamming ability. And that's something as a business we take very seriously. This is why we don't really make masks of someone if we're not able to physically have them in the building to check their identity.
With the AI ability to do deepfake videos and face swap, there is potentially less need for hyper-realistic masks, and people are going to switch to a more virtual-based way to scam people. But to be honest, we still have the same amount of work just because the technology isn't there yet. When you do a face swap using AI, it does not move naturally.
And for that reason, for any sort of stunt or face replacement, most movie production, if they can afford it, will rely on hyper-realistic masks. It's just that the motion, the way the light interacts with the mask is just a lot more natural. It's very hard to replicate lighting effect onto a mask. Most cinema production will still rely on hyper-realistic masks. And I don't think this will change too soon.
To understand what the deepfake landscape of today looks like, I spoke to Henry Ajder, an expert in synthetic media.
The landscape, when it comes to detecting deepfakes, has always been adversarial. That is, it's kind of a cat-and-mouse game - models are released, they have certain flaws. Those flaws are identified. They're often kind of broadcasted as, this is what to look for.
And then the models change. And often those flaws are trained out. What this means is over the last eight years there's been lots of different flaws that have been the telltale sign of how to spot a deepfake, whether that was not blinking - that was some of the very early signs - all the way through to six fingers or text in backgrounds being kind of jumbled nonsense.
The problem is, it's evolving so quickly, and people are fine tuning to train these flaws out that by the time an article or a piece of information is shared about, here's the top five things to look for, it's almost always redundant. And in some respects, I think it actually places the emphasis in the wrong place. It's a little bit talking about how you can reduce your carbon footprint.
And by saying to the everyday person: here's what you can do to spot a deepfake, it places the responsibility, which is unreasonable, on the everyday person to become a kind of digital Sherlock Holmes. When actually we're talking about a much bigger problem when it comes to the way that our digital infrastructure now functions in a world where anything could be fake, because everything can now be generated using AI to a hyper-realistic level.
And when a deepfake generator creates a deepfake, what is actually happening? Could you walk us through the technology and what's happening under the hood?
There are so many different kinds of AI-generated content out there now. To say, how do you make a deepfake... we've got kind of diffusion-based models for creating images that learn how to basically take noise and remove noise to create images based on a prompt. We have specific lip synchronisation tools that learn how to replace very particular elements of the mouth.
We have the voice cloning, which works on a different level. And then we also have even things like creating full-body avatars. So there are loads of different techniques. It's not like there's one GenAI algorithm to rule them all, so to speak, when it comes to creating deepfakes. There's a whole range of different cases. But for face swapping, autoencoders still rule the roost, so to speak.
For myself, I've got quite a lot of footage of me online. If someone wants to try and clone my face or my voice, this documentary is going to be a great sample for them to be able to create this content. And because it now only takes a couple of images to create something which is good enough, it's not feasible to say to people, completely banish yourself and your online presence.
So it's really hard, particularly for young women with deepfake image abuse, which is still the biggest challenge that we face in terms of number of victims. A lot of young women want to how can they stop their image being abused. And it's unfortunately something where I say, I wish I could tell you, this is what you have to do. But there isn't really an easy answer there.
Regulators around the world have woken up to the harms caused by deepfake image abuse. Lawmakers and companies are racing to find ways to hold perpetrators accountable and develop these technologies responsibly.
I'm Claire Liebowicz, and I've worked for the past decade at the Partnership on AI, leading work on AI in society and what AI means for image authenticity, the future of media, and also what it means to be human. This concept of fakery, fraud, deception is not new, but the speed, scale, and commercialisation we've seen over the past few years has made it enormously threatening to public discourse, to belief in information, to weaponisation of doubt, and plausible deniability.
And what are tech companies doing? What's the gold standard for verifying content online now?
So when you say technology companies, there's the technology companies like TikTok who are ingesting videos that people like you or me can upload, and then having to decide how that gets shared. But there are also companies like OpenAI who create Sora, but they also build the models and the underlying infrastructure that really bakes in trust and verification in content.
There's a big effort called the Coalition for Content Provenance and Authenticity, which is an independent effort to have this consistent way of verifying content. But there's also something called digital watermarking and invisible watermarking. That's somewhat separate.
There's lots of different technical mechanisms, but the idea is that you can bake these signals into content that travel with it throughout the web so that one technology company can both read that information and make decisions like TikTok has, which is to give users control over their feeds.
So there are a lot of policies about the underlying bits and 1s and 0s of what's in content. But there's also a ton of work being done on AI labelling - so what goes on a video, whether it's a visible signal for an audience, that will hopefully help them make sense of this very fast-paced world of fact and fiction that's collapsing.
And how effective are these technical verification methods? There is no global standard, right? So it's kind of relying on the goodwill of these companies to work.
That's absolutely right. And when you say effective, there are several ways you might think about effectiveness. One is, will that actually travel over the lifespan of a piece of content? So let's say the Financial Times is using a camera that actually has this signal. When you guys post that on Instagram and post that on LinkedIn and when I share it, will that travel and be technically resilient to manipulation or forgery?
That's one measure of effectiveness, but it's not the final one. The final one is what the end user understands, which is a much more complicated question. So how they label it, how they say, was this AI generated? That's really up for debate if those signals actually help users. So we're moving in the right direction, but they're not resilient to being removed or stripped out.
And second, users interpret them in many different ways. So for instance, Meta faced a ton of backlash when they started labelling content as 'Made with AI.' Ostensibly, they were doing what we said, right? They were adding context to content. But filmmakers, photographers were very angry that content that just used Photoshop or tweaked out a blemish was being labelled as 'Made with AI.' And that was perceived as punitive.
And what's the state of regulation? Are there any laws governing this at the moment?
In America, there's been a push towards deregulation. In the past, at the federal level, there was a lot of interest in adding transparency, and there were voluntary norms under the Biden administration that have now since been pushed back. California is really the regulatory trendsetter in the US for mandating this transparency. But there's also a suite of legislation on adjacent interventions for deep fakery, about what does it mean to control your likeness.
Right now in the United States, there's a proposal, the NO FAKES Act, so that I could control my face and my voice in an era where that's increasingly manipulated because that's fractured across the States. But in Europe, that's really where we're seeing the attention on this transparency question. The EU AI Act is working on implementing Article 50.
Not to get really technical with terminology, but the question that asks is, how do we mandate this type of transparency so that verification becomes almost a human right across the web, such that you can know what you're seeing? There's this aura of inevitability in technology development that I think we need to push back on. And that's not to be naive about the geopolitical realities or the fact that AI is here.
But I want to make sure that people, real people, whether you're in AI or not, feel they can have a say or post on Reddit or push back or talk to journalists about how they're using this technology, one. And, two, those of us who have any position of influence, whether you're a journalist, a video producer, or a human rights official, especially people at the companies, need to not have this sense of resignation, that this is here and therefore we can't do anything about it.
Because I have seen that from some of my peers. And I think there's a fine line between pragmatism and working within constraints, which we embrace, and also shrugging your shoulders and saying: this isn't the world we want, but it's here.
What started as an academic experiment in cutting-edge technology has now grown into a multi-million dollar industry, which is causing real harm to people. We've only just begun to grapple with the consequences of unleashing AI technologies into the wild. In the meantime, they are warping our sense of reality and identity. And as image generation models become better, it will become harder to know what is real and what is not.