If David Attenborough were to ever do a voice-over in a movie in which the antagonist brutally murders a person on screen, his voice would still calm your nerves as the scene plays out… as it has done so many times when, say, a killer whale attacks a minke.
His voice is possibly the most famous and recognized voice on the planet today. Maybe at par with Morgan Freeman.
This is why Charlie Holtz, an engineer at Replicate – a machine learning startup – got Attenborough’s voice to narrate and vividly describe every action as Holtz moved around in front of the camera.
Imitating the great Attenborough
He fused the GPT-4 Vision with ElevenLabs' voice cloning technology. The result? An unauthorized AI iteration featuring the unmistakable narration of the legendary naturalist. The AI-generated voice of the 97-year-old British broadcaster wasn’t as raspy as it is today.
David Attenborough is now narrating my life
Here's a GPT-4-vision + @elevenlabsio python script so you can star in your own Planet Earth: pic.twitter.com/desTwTM7RS
— Charlie Holtz (@charliebholtz) November 15, 2023
In the video, Holtz can be seen writing code to command his web camera to click a photo of him every five seconds. As he hits send, Attenborough’s voice flows through the speakerphones: “Here we have a remarkable specimen of Homo sapiens distinguished by his silver circular spectacles and a mane of tousled curly locks.”
"He's wearing what appears to be a blue fabric covering, which can only be assumed to be part of his mating display," continues Attenborough.
He also shared the code for the real-time narration on GitHub.
Here's the code!https://t.co/cQwtYt3Y7o
— Charlie Holtz (@charliebholtz) November 15, 2023
In Holtz’s setup, a Python script named "narrator" orchestrates a fusion of technologies, creating a dynamic narrative. OpenAI's language multimodal model GPT-4V allowed Holtz to upload images of himself as input and converse with the model, while ElevenLabs' AI voice profile trained on audio samples of Attenborough's distinctive speech.
The "narrator" script is like a director orchestrating a show. It uses an API to link the two smart tools. The script takes pictures from a webcam, sends them to GPT-4V, and gets back text that sounds like Attenborough talking about the pictures. Then, this text is sent to ElevenLabs, which is like a voice actor that copies Attenborough's voice. Finally, you get a cool, narrated story with Attenborough's style and voice.
Misuse via voice cloning
But, as far as we know, Attenborough hasn’t signed a contract to use his likeness for voice cloning or to be duplicated using AI technology. While voice cloning gives access to produce content at scale, security concerns exist, and a high potential for misuse and fraud. Musicians like Grimes have consented to use their voices in AI-generated songs as long as the revenues are split 50/50 with her.
As a slew of generative AI tools came our way, there was also a debate about how automation would first reach low-paying jobs. Still, in reality, it appears to be rapidly replacing skilled professionals instead. Automation's evolving nature and speed challenge the conventional expectations about which jobs are most susceptible to technological disruption.
Reacting to the video, one Reddit user wrote, “Can you imagine how tough it would be to get an acting gig when a computer-generated AI version of Robert De Nero (in his prime) is available for less money.”
Originally published on Interesting Engineering : Original article