ai Linus AI Voice Clone

AnujSaharan · April 8, 2023

I've been training my own GAN-based TTS and now diffusion-based TTS models for quite a bit (the eventual goal is to have a 'teacher' model teach a cloned voice how to sing and rap fwiw). I've seen the guys try a couple different models zero-shot to try and clone their voices - hasn't quite hit ever, so trying to fix that.

Here's a super early preliminary attempt on a ~500m parameter TTS model fine-tuned with Linus' voice from the most recent WAN show. Only fine-tuned it for ~15 minutes on a single 3080, very undertrained obviously, can probably get much better with more time.

Just making a thread to track progress until it sings.

Novel text from a random The Verge review to test Linus' voice against:

Generated Audio:

The model is autoregressive a la GPT-2 and Tortoise. So based on speech and words it's seen before - it may choose to change the emotional tone, add pauses, different words etc based on the training data and preceding text while generating - for example it added "i.e." and uhhs and umms near the end of the clip on its own - I straight copy pasted the highlighted text above.

Rate on a scale of 1-10 in its current state?

---

If you're interested in TTS btw - I post some experiments on my Twitter - (Anuj Saharan (@theAnujSaharan) / Twitter).

UnknownWalls · April 8, 2023

7. Try training by for 1hr+ and try again. There is a very odd stutter before put the two

Zando_ · April 8, 2023

Have you thought to check with Linus as to whether he's comfortable with his voice being cloned or no? If no, then 0. Should get folk's permission first.

LogicalDrm · April 8, 2023

16 hours ago, Zando_ said:

Have you thought to check with Linus as to whether he's comfortable with his voice being cloned or no? If no, then 0. Should get folk's permission first.

Waiting for Linus to copyright his voice and likeness.

AnujSaharan · April 8, 2023

Of course, I am happy to stop experimenting and building if he doesn't approve or is uncomfortable - very obviously haven't posted or distributed the model itself or any inferencing scripts for privacy reasons. Will let @nicklmg or someone from the team make that call. If this gets good enough - happy to even share the model for video editing voiceovers, dubbing into foreign languages, and whatever else use case.

Although I will say - similar technology is out there on the web - anyone can take a small snippet and try zero-shot cloning on elevenlabs or something like that - irrespective of whether it is a good result in the end or not - and that'd be from fully anonymous sources cloning using fully untraceable models that live behind at least an LLC level protection. Taking offence on someone else's behalf on the output of a model is a conversation that goes far beyond just this forum thread - and applicable to gpt, dall-e, stable diffusion etc etc (all of which are in the open domain and easily accessible to everyone) - and I am happy to take the direction of wherever that public discourse goes.

IkeaGnome · April 8, 2023

29 minutes ago, AnujSaharan said:

Will let @nicklmg or someone from the team make that call.

Could also just ask @LinusTech

Needfuldoer · April 8, 2023

It's a start, but the pace and tone make it sound more like somebody else doing a Linus impression.

I threw your WAV at Audacity, sped the clip up by 6.9%, then sped the tempo up a further 8%, and manually tightened up some of the weird pauses it put in the middle of sentences. Here's the result:

It still has problems choosing appropriate pacing and inflection like all other AI-generated speech, but it's a good start! I want to sic this on Majel Barrett and Lorenzo Music's voices.

AI-generated WAN Show Forver let's gooooooooooo

Zando_ · April 8, 2023

3 hours ago, AnujSaharan said:

Although I will say - similar technology is out there on the web - anyone can take a small snippet and try zero-shot cloning on elevenlabs or something like that - irrespective of whether it is a good result in the end or not - and that'd be from fully anonymous sources cloning using fully untraceable models that live behind at least an LLC level protection. Taking offence on someone else's behalf on the output of a model is a conversation that goes far beyond just this forum thread - and applicable to gpt, dall-e, stable diffusion etc etc (all of which are in the open domain and easily accessible to everyone) - and I am happy to take the direction of wherever that public discourse goes.

"Someone else would do the wrong thing if I didn't first" is a poor argument. Ask folk's permission first. This isn't some crazy new ground we're treading as a society. Disney has already reproduced dead actors likeness' (face and voice), only after permission/license from their estate (as obviously they weren't around to ask). Not a wild stretch to expect the same for the living.

4 hours ago, LogicalDrm said:

Waiting for Linus to copyright his voice and likeness.

Yeah... even remotely public figures are going to have to start doing that aren't they :/.

AnujSaharan · April 9, 2023

On 4/8/2023 at 1:56 PM, Needfuldoer said:

It's a start, but the pace and tone make it sound more like somebody else doing a Linus impression.

I threw your WAV at Audacity, sped the clip up by 6.9%, then sped the tempo up a further 8%, and manually tightened up some of the weird pauses it put in the middle of sentences. Here's the result:

It still has problems choosing appropriate pacing and inflection like all other AI-generated speech, but it's a good start! I want to sic this on Majel Barrett and Lorenzo Music's

voices.

Seems to be find on the tempo now, thanks for that callout.

First line from the new video to test - "It looks like a children's toy but it's actually one of the most versatile hacking tools to ever hit the market. And if you've been on TikTok in the last six months, there's a good chance you've seen people using it to change gas station signs, set off department store PA systems and open up Tesla charging ports."

The base model is meant to be more 'conversational' than presenter voice or whatever it may be, and that's what's reflected here - of course WAN show being the fine-tune data is also conversation and unscripted audio - and therefore the model seems to be making choices around where to take breaths, pauses etc. (it adds uhms and ahhs even though its not explicitly in the sentence it should be generating) - which is obviously uncharacteristic for edited audio like on all the videos the channel - so the comparison isn't apples to apples.

WAN show conversation isn't as high energy and tempo etc - that being said, it did just learn to speed it up and sample better with a little bit more training. I can try making the fine-tune dataset more diverse later for better results.

AnujSaharan · April 9, 2023

21 hours ago, Zando_ said:

"Someone else would do the wrong thing if I didn't first" is a poor argument. Ask folk's permission first. This isn't some crazy new ground we're treading as a society. Disney has already reproduced dead actors likeness' (face and voice), only after permission/license from their estate (as obviously they weren't around to ask). Not a wild stretch to expect the same for the living.

Disney made money from it and publicly distributed the likeness for monetary gains. I would be in the wrong if I were publicly sharing checkpoints and inference scripts myself for someone else's voice - I fully agree with you. I have no plans to do that.

Asked for permission above - if unacceptable, happy to stop posting the little snippets.

Needfuldoer · April 10, 2023

Now all we need is a Luke bot voice.

AI generated WAN Show Forever let’s gooooo!

Sign In

ai Linus AI Voice Clone

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Featured Topics

Topics

Latest From Linus Tech Tips:

I shouldn’t have kept the $1,000,000 computer

Latest From Tech Quickie:

This Guy BUILT His Own Graphics Card!

Latest From TechLinked:

Microsoft, Give Up Already.

Latest From GameLinked:

Roblox and Walmart... Are One

Latest From ShortCircuit:

Dell Has Destroyed the XPS - Dell XPS 16 (2024)

Latest From Mac Address:

Why did you buy an Apple Vision Pro?

Latest From Channel Super Fun:

I Swapped the CEO's Assistant For a Day!