Posted November 12, 2019 Source video ^ His sources are quoted in description for the particularly scientific among you Some of you may be aware of previous techniques that could realistically synthesize human speech, but that requires hours of input data. This new system needs just seconds of source data. Quote The timbre of the voice is very similar [to the original], and it is able to synthesize sounds and consonants that have to be inferred because they were not heard in the original voice sample. This requires a certain kind of intelligence, and quite a bit of that. [...] The speaker encoder is a neural network that was trained on thousands and thousands of speakers and is meant to squeeze all this learned data into a compressed representation. In other words, it tries to learn the essence of human speech from many many speakers. [...] This training step needs to be done only once, and after that it was allowed just 5 seconds of speech data from someone they haven't heard [before]. I'm sure you don't need me to explain how this has absolutely loads of possible applications, ranging from incredibly useful, to incredibly bad. I don't have test results to see how this might or might not be able to fool voice print security systems, but it's definitely conceivable. It also will call into question the reliability of any recorded audio. This has implications for politics, blackmail, crime cases, etc. The fact it is so easy to do is what really makes it interesting. Being able to do it with hours of samples is one thing, but that's not always easy or possible to get. 5 seconds is quite a different story. On the flip side, imagine the usefulness for having retired or deceased actors posthumously fulfill a role, particularly in a cartoon where video is not necessary, or for repairing or redoing dialog in a movie or voiceover. Imagine using this to replace traditional voice synthesizers that, though better now than they used to be, often still sound noticeably more robotic than people. Imagine, if like the creator of this video, you often do a lot of speaking content - you can just synthesize the audio using a script if you're feeling ill or lazy. Imagine using old home movies to capture the voice of someone who for one reason or another is no longer able to speak and through this technology, give them the ability to sound like themselves again. Solve your own audio issues | First Steps with RPi 3 | Humidity & Condensation | Sleep & Hibernation | Overclocking RAM | Making Backups | Displays | 4K / 8K / 16K / etc. | Do I need 80+ Platinum? If you can read this you're using the wrong theme. You can change it at the bottom. Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted November 12, 2019 It won't be long before video and audio evidence can be spoofed so well that such evidence is considered flimsy in courts. Grammar and spelling is not indicative of intelligence/knowledge. Not having the same opinion does not always mean lack of understanding. Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted November 12, 2019 Pretty soon a 00s Shaggy song will be a legal defense. Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted November 12, 2019 This+deepfake and we cant trust anything anymore. CPU i7 6700k MB MSI Z170A Pro Carbon GPU Zotac GTX980Ti amp!extreme RAM 16GB DDR4 Corsair Vengeance 3k CASE Corsair 760T PSU Corsair RM750i MOUSE Logitech G9x KB Logitech G910 HS Sennheiser GSP 500 SC Asus Xonar 7.1 MONITOR Acer Predator xb270hu Storage 1x1TB + 2x500GB Samsung 7200U/m - 2x500GB SSD Samsung 850EVO Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted November 12, 2019 Author 5 minutes ago, Praesi said: This+deepfake and we cant trust anything anymore. Absolutely. I'm sure that as with deepfakes, new AIs will be created to detect content made this way, but that does little to decrease the risk. Solve your own audio issues | First Steps with RPi 3 | Humidity & Condensation | Sleep & Hibernation | Overclocking RAM | Making Backups | Displays | 4K / 8K / 16K / etc. | Do I need 80+ Platinum? If you can read this you're using the wrong theme. You can change it at the bottom. Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted November 12, 2019 New tech both solves old problems and creates new ones. The progression of technology is logarithmic and it’s reaching towards vertical as we speak. Not a pro, not even very good. I’m just old and have time currently. Assuming I know a lot about computers can be a mistake. Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere. Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted November 12, 2019 From listening to the audio samples, they still lack emotion and sound monotone. Take Trump as an example, his speaking pattern often goes up and down with different inflections so I'm really curious on how it does on a sample of someone speaking freely, vice from a preset sentence. Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted November 12, 2019 New AI able to clone any voice with just seconds of input data Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted November 12, 2019 Author 9 minutes ago, Dissitesuxba11s said: From listening to the audio samples, they still lack emotion and sound monotone. Take Trump as an example, his speaking pattern often goes up and down with different inflections so I'm really curious on how it does on a sample of someone speaking freely, vice from a preset sentence. I think the most interesting example is the 3rd last on the page: It takes someone singing in what sounds to me like Chinese (apologies if it is not) and from it renders their voice speaking in English, without any perceivable accent, and yet, with the melodic quality of their speech retained. Solve your own audio issues | First Steps with RPi 3 | Humidity & Condensation | Sleep & Hibernation | Overclocking RAM | Making Backups | Displays | 4K / 8K / 16K / etc. | Do I need 80+ Platinum? If you can read this you're using the wrong theme. You can change it at the bottom. Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted November 12, 2019 9 minutes ago, Ryan_Vickers said: I think the most interesting example is the 3rd last on the page: It takes someone singing in what sounds to me like Chinese (apologies if it is not) and from it renders their voice speaking in English, without any perceivable accent, and yet, with the melodic quality of their speech retained. Ooh I totally missed those. That's interesting that it reused the melody of the song. From that, it looks like the synthesized voice might be limited to however long the reference is. In other words, if they used that Chinese(?) singing as a reference, it would repeat the melody every ~7 secs for longer synthesized samples. Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted November 12, 2019 Author Just now, Dissitesuxba11s said: Ooh I totally missed those. That's interesting that it reused the melody of the song. From that, it looks like the synthesized voice might be limited to however long the reference is. In other words, if they used that Chinese(?) singing as a reference, it would repeat the melody every ~7 secs for longer synthesized samples. I'm not sure, but it gives me hope that "emotion" and inflections can be preserved, if not now, then perhaps with the next version Solve your own audio issues | First Steps with RPi 3 | Humidity & Condensation | Sleep & Hibernation | Overclocking RAM | Making Backups | Displays | 4K / 8K / 16K / etc. | Do I need 80+ Platinum? If you can read this you're using the wrong theme. You can change it at the bottom. Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted November 12, 2019 1 hour ago, Praesi said: This+deepfake and we cant trust anything anymore. You still trusted things? I stopped long ago. Maybe we'll get lucky and some intrepid do gooder will bring audio and video analysis technology to match this kind of shit. Making it easy to spot fakes. The good news is, that maybe the wider majority of people will realize that the media lies about almost fucking everything, when someone uses this technology to troll the shit out of them. Ketchup is better than mustard. GUI is better than Command Line Interface. Dubs are better than subs Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted November 12, 2019 14 minutes ago, Trik'Stari said: You still trusted things? I stopped long ago. Maybe we'll get lucky and some intrepid do gooder will bring audio and video analysis technology to match this kind of shit. Making it easy to spot fakes. The good news is, that maybe the wider majority of people will realize that the media lies about almost fucking everything, when someone uses this technology to troll the shit out of them. No. But thats again a new Level. CPU i7 6700k MB MSI Z170A Pro Carbon GPU Zotac GTX980Ti amp!extreme RAM 16GB DDR4 Corsair Vengeance 3k CASE Corsair 760T PSU Corsair RM750i MOUSE Logitech G9x KB Logitech G910 HS Sennheiser GSP 500 SC Asus Xonar 7.1 MONITOR Acer Predator xb270hu Storage 1x1TB + 2x500GB Samsung 7200U/m - 2x500GB SSD Samsung 850EVO Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted November 13, 2019 4 hours ago, Ryan_Vickers said: I think the most interesting example is the 3rd last on the page: It takes someone singing in what sounds to me like Chinese (apologies if it is not) and from it renders their voice speaking in English, without any perceivable accent, and yet, with the melodic quality of their speech retained. The quoted reference is "輕輕敲醒沉睡的心靈,慢慢張開你的眼睛" , the voice seems like it's from someone else in the research team. The last two references are french but I'm not fluent enough to translate. Specs: Motherboard: Asus X470-PLUS TUF gaming (Yes I know it's poor but I wasn't informed) RAM: Corsair VENGEANCE® LPX DDR4 3200Mhz CL16-18-18-36 2x8GB CPU: Ryzen 9 5900X Case: Antec P8 PSU: Corsair RM850x Cooler: Antec K240 with two Noctura Industrial PPC 3000 PWM Drives: Samsung 970 EVO plus 250GB, Micron 1100 2TB, Seagate ST4000DM000/1F2168 GPU: EVGA RTX 2080 ti Black edition Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted November 13, 2019 I knew that video I saw of Betty White in the gym dead lifting 600lbs while reading Harry Potter was a bit iffy. Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted November 13, 2019 5 hours ago, mr moose said: It won't be long before video and audio evidence can be spoofed so well that such evidence is considered flimsy in courts. They've been able to do this for on or about 5 years already. It just used to cost a lot of money. Now, everyone is going to call every captured Audio or Video a "deep fake". Politicians of the world rejoice at this news. Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted November 13, 2019 5 hours ago, mr moose said: It won't be long before video and audio evidence can be spoofed so well that such evidence is considered flimsy in courts. It already can be though. video/audio evidence alone is already considered flimsy. Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted November 13, 2019 8 minutes ago, Ehmc130 said: I knew that video I saw of Betty White in the gym dead lifting 600lbs while reading Harry Potter was a bit iffy. Yeah, it's well documented that she never goes below 750 and that she can only read in Klingon! Come Bloody Angel Break off your chains And look what I've found in the dirt. Pale battered body Seems she was struggling Something is wrong with this world. Fierce Bloody Angel The blood is on your hands Why did you come to this world? Everybody turns to dust. Everybody turns to dust. The blood is on your hands. The blood is on your hands! Pyo. Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted November 13, 2019 5 hours ago, Praesi said: This+deepfake and we cant trust anything anymore. Well rip, big corporation have already been collecting both our face/voice data for years. Magical Pineapples