Jump to content

I've been training my own GAN-based TTS and now diffusion-based TTS models for quite a bit (the eventual goal is to have a 'teacher' model teach a cloned voice how to sing and rap fwiw). I've seen the guys try a couple different models zero-shot to try and clone their voices - hasn't quite hit ever, so trying to fix that.

 

Here's a super early preliminary attempt on a ~500m parameter TTS model fine-tuned with Linus' voice from the most recent WAN show. Only fine-tuned it for ~15 minutes on a single 3080, very undertrained obviously, can probably get much better with more time. 🙂

 

Just making a thread to track progress until it sings.

 

Novel text from a random The Verge review to test Linus' voice against:

image.png.16d7a577ab5e682d2a5597e769ecbf1b.png

Generated Audio:

 

 

The model is autoregressive a la GPT-2 and Tortoise. So based on speech and words it's seen before - it may choose to change the emotional tone, add pauses, different words etc based on the training data and preceding text while generating - for example it added "i.e." and uhhs and umms near the end of the clip on its own - I straight copy pasted the highlighted text above. 

 

Rate on a scale of 1-10 in its current state?

 

---

If you're interested in TTS btw - I post some experiments on my Twitter - (Anuj Saharan (@theAnujSaharan) / Twitter).

 

 

 

Link to comment
https://linustechtips.com/topic/1499411-linus-ai-voice-clone/
Share on other sites

Link to post
Share on other sites

7. Try training by for 1hr+ and try again. There is a very odd stutter before put the two

I try to be a human, but I cannot, because I have returned to monke.

Spoiler

Hehe boi

Spoiler

POV- when it can run crysis-

 ( ͝° ͜ʖ͡°)

 

Link to comment
https://linustechtips.com/topic/1499411-linus-ai-voice-clone/#findComment-15883033
Share on other sites

Link to post
Share on other sites

Have you thought to check with Linus as to whether he's comfortable with his voice being cloned or no? If no, then 0. Should get folk's permission first.

Gaming PC NAS Laptop Workstation

CPU: i5 12600KF 6P+4E Ryzen 7 3700X M4 SoC 4P+6E Xeon X5690 6c12t

Cooler: Noctua NH-D15S Wraith Stealth w/NF-A9 Passive Apple CPU Cooler

Motherboard: ASRock Z690 ITX/ax ASUS Pro B550M-C/CSM Apple J713AP Mac-F221BEC8 (Mac Pro 5,1)

RAM: 2x16GB 3600Mhz DDR4 2x16GB 2400MHz DDR4 24GB Micron LPDDR5 4x8GB 1333MHz ECC DDR3

GPU: Sapphire Pulse Radeon 9060 XT 16GB Radeon WX2100 M4 SoC 10C Radeon RX 5700

Storage: 1TB MP34 + 2TB P41 500GB SSD + 2x4TB IronWolf Pro in ZFS Mirror Apple AP0512Z 1TB Crucial MX500

ODD: LG WH14NS40 None LG GP65NB60 USB DVD Writer Don't know

PSU: EVGA 850W GM Silverstone SST-TX300 53.8Wh LiPo Battery Delta DPS-980BB

Case: Silverstone Sugo 14 Dell Inspiron 530S Mac16,12 chassis (13" MBA) 2009-2012 Mac Pro "Cheese Grater"

OS: Gentoo Linux TrueNAS Scale macOS 26 Tahoe Fedora Linux

 

Display: LG 27UK650-W (4K 60Hz IPS panel)

Mouse: EVGA X17

Keyboard: Corsair K55 RGB

 

Mobile/Work Devices: 14" M5P MacBook Pro (work) - iPhone 17 Pro - Apple Watch S11

 

Other Misc Devices: iPod Video (Gen 5.5E, iFlash Solo w/128GB SD Card, Rockbox), Nintendo Switch

 

Vehicles: 2002 Ford F150, 2003 Harley-Davidson Sportster 1200, 2022 Kawasaki KLR650, 1994 DR350SE

Link to comment
https://linustechtips.com/topic/1499411-linus-ai-voice-clone/#findComment-15883096
Share on other sites

Link to post
Share on other sites

16 hours ago, Zando_ said:

Have you thought to check with Linus as to whether he's comfortable with his voice being cloned or no? If no, then 0. Should get folk's permission first.

Waiting for Linus to copyright his voice and likeness.

^^^^ That's my post ^^^^
<-- This is me --- That's your scrollbar -->
vvvv Who's there? vvvv

Link to comment
https://linustechtips.com/topic/1499411-linus-ai-voice-clone/#findComment-15884015
Share on other sites

Link to post
Share on other sites

Of course, I am happy to stop experimenting and building if he doesn't approve or is uncomfortable - very obviously haven't posted or distributed the model itself or any inferencing scripts for privacy reasons. Will let @nicklmg or someone from the team make that call. If this gets good enough - happy to even share the model for video editing voiceovers, dubbing into foreign languages, and whatever else use case. 

 

Although I will say - similar technology is out there on the web - anyone can take a small snippet and try zero-shot cloning on elevenlabs or something like that - irrespective of whether it is a good result in the end or not - and that'd be from fully anonymous sources cloning using fully untraceable models that live behind at least an LLC level protection. Taking offence on someone else's behalf on the output of a model is a conversation that goes far beyond just this forum thread - and applicable to gpt, dall-e, stable diffusion etc etc (all of which are in the open domain and easily accessible to everyone) - and I am happy to take the direction of wherever that public discourse goes.

Link to comment
https://linustechtips.com/topic/1499411-linus-ai-voice-clone/#findComment-15884091
Share on other sites

Link to post
Share on other sites

29 minutes ago, AnujSaharan said:

Will let @nicklmg or someone from the team make that call.

Could also just ask @LinusTech

I'm not actually trying to be as grumpy as it seems.

I will find your mentions of Ikea or Gnome and I will /s post. 

Project Hot Box

CPU 13900k, Motherboard Gigabyte Aorus Elite AX, RAM CORSAIR Vengeance 4x16gb 5200 MHZ, GPU Zotac RTX 4090 Trinity OC, Case Fractal Pop Air XL, Storage Sabrent Rocket Q4 2tbCORSAIR Force Series MP510 1920GB NVMe, CORSAIR FORCE Series MP510 960GB NVMe, PSU CORSAIR HX1000i, Cooling Corsair XC8 CPU block, Bykski GPU block, 360mm and 280mm radiator, Displays Odyssey G9, LG 34UC98-W 34-Inch,Keyboard Mountain Everest Max, Mouse Mountain Makalu 67, Sound AT2035, Massdrop 6xx headphones, Go XLR 

Oppbevaring

CPU i9-9900k, Motherboard, ASUS Rog Maximus Code XI, RAM, 48GB Corsair Vengeance LPX 32GB 3200 mhz (2x16)+(2x8) GPUs Asus ROG Strix 2070 8gb, PNY 1080, Nvidia 1080, Case Mining Frame, 2x Storage Samsung 860 Evo 500 GB, PSU Corsair RM1000x and RM850x, Cooling Asus Rog Ryuo 240 with Noctua NF-12 fans

 

Why is the 5800x so hot?

 

 

Link to comment
https://linustechtips.com/topic/1499411-linus-ai-voice-clone/#findComment-15884125
Share on other sites

Link to post
Share on other sites

It's a start, but the pace and tone make it sound more like somebody else doing a Linus impression.

 

I threw your WAV at Audacity, sped the clip up by 6.9%, then sped the tempo up a further 8%, and manually tightened up some of the weird pauses it put in the middle of sentences. Here's the result:

 

 

It still has problems choosing appropriate pacing and inflection like all other AI-generated speech, but it's a good start! I want to sic this on Majel Barrett and Lorenzo Music's voices.

 

AI-generated WAN Show Forver let's gooooooooooo

I sold my soul for ProSupport.

Link to comment
https://linustechtips.com/topic/1499411-linus-ai-voice-clone/#findComment-15884131
Share on other sites

Link to post
Share on other sites

3 hours ago, AnujSaharan said:

Although I will say - similar technology is out there on the web - anyone can take a small snippet and try zero-shot cloning on elevenlabs or something like that - irrespective of whether it is a good result in the end or not - and that'd be from fully anonymous sources cloning using fully untraceable models that live behind at least an LLC level protection. Taking offence on someone else's behalf on the output of a model is a conversation that goes far beyond just this forum thread - and applicable to gpt, dall-e, stable diffusion etc etc (all of which are in the open domain and easily accessible to everyone) - and I am happy to take the direction of wherever that public discourse goes.

"Someone else would do the wrong thing if I didn't first" is a poor argument. Ask folk's permission first. This isn't some crazy new ground we're treading as a society. Disney has already reproduced dead actors likeness' (face and voice), only after permission/license from their estate (as obviously they weren't around to ask). Not a wild stretch to expect the same for the living.

4 hours ago, LogicalDrm said:

Waiting for Linus to copyright his voice and likeness.

Yeah... even remotely public figures are going to have to start doing that aren't they :/.

Gaming PC NAS Laptop Workstation

CPU: i5 12600KF 6P+4E Ryzen 7 3700X M4 SoC 4P+6E Xeon X5690 6c12t

Cooler: Noctua NH-D15S Wraith Stealth w/NF-A9 Passive Apple CPU Cooler

Motherboard: ASRock Z690 ITX/ax ASUS Pro B550M-C/CSM Apple J713AP Mac-F221BEC8 (Mac Pro 5,1)

RAM: 2x16GB 3600Mhz DDR4 2x16GB 2400MHz DDR4 24GB Micron LPDDR5 4x8GB 1333MHz ECC DDR3

GPU: Sapphire Pulse Radeon 9060 XT 16GB Radeon WX2100 M4 SoC 10C Radeon RX 5700

Storage: 1TB MP34 + 2TB P41 500GB SSD + 2x4TB IronWolf Pro in ZFS Mirror Apple AP0512Z 1TB Crucial MX500

ODD: LG WH14NS40 None LG GP65NB60 USB DVD Writer Don't know

PSU: EVGA 850W GM Silverstone SST-TX300 53.8Wh LiPo Battery Delta DPS-980BB

Case: Silverstone Sugo 14 Dell Inspiron 530S Mac16,12 chassis (13" MBA) 2009-2012 Mac Pro "Cheese Grater"

OS: Gentoo Linux TrueNAS Scale macOS 26 Tahoe Fedora Linux

 

Display: LG 27UK650-W (4K 60Hz IPS panel)

Mouse: EVGA X17

Keyboard: Corsair K55 RGB

 

Mobile/Work Devices: 14" M5P MacBook Pro (work) - iPhone 17 Pro - Apple Watch S11

 

Other Misc Devices: iPod Video (Gen 5.5E, iFlash Solo w/128GB SD Card, Rockbox), Nintendo Switch

 

Vehicles: 2002 Ford F150, 2003 Harley-Davidson Sportster 1200, 2022 Kawasaki KLR650, 1994 DR350SE

Link to comment
https://linustechtips.com/topic/1499411-linus-ai-voice-clone/#findComment-15884284
Share on other sites

Link to post
Share on other sites

On 4/8/2023 at 1:56 PM, Needfuldoer said:

It's a start, but the pace and tone make it sound more like somebody else doing a Linus impression.

 

I threw your WAV at Audacity, sped the clip up by 6.9%, then sped the tempo up a further 8%, and manually tightened up some of the weird pauses it put in the middle of sentences. Here's the result:

 

It still has problems choosing appropriate pacing and inflection like all other AI-generated speech, but it's a good start! I want to sic this on Majel Barrett and Lorenzo Music's

voices.

Seems to be find on the tempo now, thanks for that callout.

 

First line from the new video to test - "It looks like a children's toy but it's actually one of the most versatile hacking tools to ever hit the market. And if you've been on TikTok in the last six months, there's a good chance you've seen people using it to change gas station signs, set off department store PA systems and open up Tesla charging ports."

 

 

 

The base model is meant to be more 'conversational' than presenter voice or whatever it may be, and that's what's reflected here - of course WAN show being the fine-tune data is also conversation and unscripted audio - and therefore the model seems to be making choices around where to take breaths, pauses etc. (it adds uhms and ahhs even though its not explicitly in the sentence it should be generating) - which is obviously uncharacteristic for edited audio like on all the videos the channel - so the comparison isn't apples to apples. 

 

WAN show conversation isn't as high energy and tempo etc - that being said, it did just learn to speed it up and sample better with a little bit more training. I can try making the fine-tune dataset more diverse later for better results.

Link to comment
https://linustechtips.com/topic/1499411-linus-ai-voice-clone/#findComment-15885325
Share on other sites

Link to post
Share on other sites

21 hours ago, Zando_ said:

"Someone else would do the wrong thing if I didn't first" is a poor argument. Ask folk's permission first. This isn't some crazy new ground we're treading as a society. Disney has already reproduced dead actors likeness' (face and voice), only after permission/license from their estate (as obviously they weren't around to ask). Not a wild stretch to expect the same for the living.

Disney made money from it and publicly distributed the likeness for monetary gains. I would be in the wrong if I were publicly sharing checkpoints and inference scripts myself for someone else's voice - I fully agree with you. I have no plans to do that.

 

Asked for permission above - if unacceptable, happy to stop posting the little snippets.

Link to comment
https://linustechtips.com/topic/1499411-linus-ai-voice-clone/#findComment-15885328
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×