My deepfake shows how valuable our data is in the age of AI [MIT Tech Review]

April 30, 2024 Melissa Heikkila 0 Comments

View Article on MIT Tech Review
This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Deepfakes are getting good. Like, really good. Earlier this month I went to a studio in East London to get myself digitally cloned by the AI video startup Synthesia. They made a hyperrealistic deepfake that looked and sounded just like me, with realistic intonation. It is a long way away from the glitchiness of earlier generations of AI avatars. The end result was mind-blowing. It could easily fool someone who doesn’t know me well.

Synthesia has managed to create AI avatars that are remarkably humanlike after only one year of tinkering with the latest generation of generative AI. It’s equally exciting and daunting thinking about where this technology is going. It will soon be very difficult to differentiate between what is real and what is not, and this is a particularly acute threat given the record number of elections happening around the world this year.

We are not ready for what is coming. If people become too skeptical about the content they see, they might stop believing in anything at all, which could enable bad actors to take advantage of this trust vacuum and lie about the authenticity of real content. Researchers have called this the “liar’s dividend.” They warn that politicians, for example, could claim that genuinely incriminating information was fake or created using AI.

I just published a story on my deepfake creation experience, and on the big questions about a world where we increasingly can’t tell what’s real. Read it here.

But there is another big question: What happens to our data once we submit it to AI companies? Synthesia says it does not sell the data it collects from actors and customers, although it does release some of it for academic research purposes. The company uses avatars for three years, at which point actors are asked if they want to renew their contracts. If so, they come into the studio to make a new avatar. If not, the company deletes their data.

But other companies are not that transparent about their intentions. As my colleague Eileen Guo reported last year, companies such as Meta license actors’ data—including their faces and expressions—in a way that allows the companies to do whatever they want with it. Actors are paid a small up-front fee, but their likeness can then be used to train AI models in perpetuity without their knowledge.

Even if contracts for data are transparent, they don’t apply if you die, says Carl Öhman, an assistant professor at Uppsala University who has studied the online data left by deceased people and is the author of a new book, The Afterlife of Data. The data we input into social media platforms or AI models might end up benefiting companies and living on long after we’re gone.

“Facebook is projected to host, within the next couple of decades, a couple of billion dead profiles,” Öhman says. “They’re not really commercially viable. Dead people don’t click on any ads, but they take up server space nevertheless,” he adds. This data could be used to train new AI models, or to make inferences about the descendants of those deceased users. The whole model of data and consent with AI presumes that both the data subject and the company will live on forever, Öhman says.

Our data is a hot commodity. AI language models are trained by indiscriminately scraping the web, and that also includes our personal data. A couple of years ago I tested to see if GPT-3, the predecessor of the language model powering ChatGPT, has anything on me. It struggled, but I found that I was able to retrieve personal information about MIT Technology Review’s editor in chief, Mat Honan.

High-quality, human-written data is crucial to training the next generation of powerful AI models, and we are on the verge of running out of free online training data. That’s why AI companies are racing to strike deals with news organizations and publishers to access their data treasure chests.

Old social media sites are also a potential gold mine: when companies go out of business or platforms stop being popular, their assets, including users’ data, get sold to the highest bidder, says Öhman.

“MySpace data has been bought and sold multiple times since MySpace crashed. And something similar may well happen to Synthesia, or X, or TikTok,” he says.

Some people may not care much about what happens to their data, says Öhman. But securing exclusive access to high-quality data helps cement the monopoly position of large corporations, and that harms us all. This is something we need to grapple with as a society, he adds.

Synthesia said it will delete my avatar after my experiment, but the whole experience did make me think of all the cringeworthy photos and posts that haunt me on Facebook and other social media platforms. I think it’s time for a purge.

Now read the rest of The Algorithm

Deeper Learning

Chatbot answers are all made up. This new tool helps you figure out which ones to trust.

Large language models are famous for their ability to make things up—in fact, it’s what they’re best at. But their inability to tell fact from fiction has left many businesses wondering if using them is worth the risk. A new tool created by Cleanlab, an AI startup spun out of MIT, is designed to provide a clearer sense of how trustworthy these models really are.

A BS-o-meter for chatbots: Called the Trustworthy Language Model, it gives any output generated by a large language model a score between 0 and 1, according to its reliability. This lets people choose which responses to trust and which to throw out. Cleanlab hopes that its tool will make large language models more attractive to businesses worried about how much stuff they invent. Read more from Will Douglas Heaven.

Bits and Bytes

Here’s the defense tech at the center of US aid to Israel, Ukraine, and Taiwan
President Joe Biden signed a $95 billion aid package into law last week. The bill will send a significant quantity of supplies to Ukraine and Israel, while also supporting Taiwan with submarine technology to aid its defenses against China. (MIT Technology Review)

Rishi Sunak promised to make AI safe. Big Tech’s not playing ball.
The UK’s prime minister thought he secured a political win when he got AI power players to agree to voluntary safety testing with the UK’s new AI Safety Institute. Six months on, it turns out pinkie promises don’t go very far. OpenAI and Meta have not granted access to the AI Safety Institute to do prerelease safety testing on their models. (Politico)

Inside the race to find AI’s killer app
The AI hype bubble is starting to deflate as companies try to find a way to make profits out of the eye-wateringly expensive process of developing and running this technology. Tech companies haven’t solved some of the fundamental problems slowing its wider adoption, such as the fact that generative models constantly make things up. (The Washington Post)

Why the AI industry’s thirst for new data centers can’t be satisfied
The current boom in data-hungry AI means there is now a shortage of parts, property, and power to build data centers. (The Wall Street Journal)

The friends who became rivals in Big Tech’s AI race
This story is a fascinating look into one of the most famous and fractious relationships in AI. Demis Hassabis and Mustafa Suleyman are old friends who grew up in London and went on to cofound AI lab DeepMind. Suleyman was ousted following a bullying scandal, went on to start his own short-lived startup, and now heads rival Microsoft’s AI efforts, while Hassabis still runs DeepMind, which is now Google’s central AI research lab. (The New York Times)

This creamy vegan cheese was made with AI
Startups are using artificial intelligence to design plant-based foods. The companies train algorithms on data sets of ingredients with desirable traits like flavor, scent, or stretchability. Then they use AI to comb troves of data to develop new combinations of those ingredients that perform similarly. (MIT Technology Review)

Now read the rest of The Algorithm

Deeper Learning

Bits and Bytes

Spread the word!

Leave a ReplyCancel reply