Jump to content

Did Google fake it's Google Duplex demonstration?

9 hours ago, RedRound2 said:

They do that purposely. Google says it's to give time for the AI to process and compute information, just like why we use them for lol. Also, it's to also make it sound natural.

I don't see the link. It knows what it has to ask. Why would it try to mimic an exaggerated type of phrasing.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Trixanity said:

They said it's to stall for time when it's processing but other times it remains silent so I'm guessing it only does it occasionally as to not sound too weird. That and make it sound more human but that's pointless to me because it honestly shouldn't sound human in the first place.

 

I don't think adding stuff like this is the hard part. It's processing the speech of the person on the line and formulating a proper reply. They've shown before that they've mostly nailed the natural sounding voice (although I could hear several wrong intonations in sentences in that demo). It's how to use that voice that's the hard part now. Some things seem somewhat hard coded which makes things sound less natural.

Except it used it when it didn't have to process anything: when the woman told it to wait a bit.

Link to comment
Share on other sites

Link to post
Share on other sites

35 minutes ago, laminutederire said:

Except it used it when it didn't have to process anything: when the woman told it to wait a bit.

That's one exception and that's only a maybe (we don't exactly know the details - it may be trying to predict what answer it may need to formulate). The other times it was stalling a bit. There are latencies and processing times to take into account. It's not exactly instantaneous considering the steps it needs to take.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Trixanity said:

That's one exception and that's only a maybe (we don't exactly know the details - it may be trying to predict what answer it may need to formulate). The other times it was stalling a bit. There are latencies and processing times to take into account. It's not exactly instantaneous considering the steps it needs to take.

Sure, but that hmm-hmm is weird anyway.

Processing time really depends on what network they use to be fair. I'm personally waiting on a paper or patent on duplex to assess the credibility of it.

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, laminutederire said:

Sure, but that hmm-hmm is weird anyway.

Processing time really depends on what network they use to be fair. I'm personally waiting on a paper or patent on duplex to assess the credibility of it.

Well, it'll probably not launch before the next Google IO anyway so it'll be a long while before it actually rolls out. Things may or may not change. Who knows? They may end up scrapping it because it isn't functional or working as intended. A demo is just a demo and in this case the very best case scenario. I'm sure if it's real (which I think it is to some degree) that they've had many attempts to get the perfect take and probably trained the AI in this particular scenario for demo purposes. I doubt any current version of it is particularly versatile. The exchange was pretty short and it avoided any difficult questions like what services the client wanted by just giving a basic answer.

Link to comment
Share on other sites

Link to post
Share on other sites

13 hours ago, laminutederire said:

What I'd personally would be sceptical about us the intonation of the presumed AI voice. Why the fuck would an AI assistant be seeming to hesitate on phrasing or things like this?

 

4 hours ago, TheSLSAMG said:

But yeah, it seemed way too natural. Why would Google program "mhms", "uhhs" and "umms" in it? Yes, it sounds more natural, but there's really little point. I'm not sure that most people would even notice the lack of them.

I do just want to point out that this is a neural net that's processing all of this, and it could *very* easily be an emergent behavior of the typical conversation flow.

 

This isn't even a single neural net handling things. It's going to be many of them stacked on top of each other handling different single tasks. It's very possible that the creation of Umms and Uhh-huhs is just an initial nets reaction to the pattern of conversation where "acknowledgement, but also need a time delay".

 

With neural nets they're not programming these things explicitly into their networks, they don't tweak individual items for things like this, and many of the speech traits could simply be emergent learned patterns.

 

And all of this is even more the case if the nets used in Duplex are being created by their neural net that makes neural nets, where it could increase the chances of such emergent properties if they add to the likelihood of successful communication.

 

P.S. Why is everybody under the impression that this was a call to a real salon? With the way wiretapping laws in Canada and the US work, I would have assumed it was just a call between the assistant and a Googler to test and train the system.

Link to comment
Share on other sites

Link to post
Share on other sites

33 minutes ago, Sniperfox47 said:

 

Well they had to train that thing on a test set comprising mostly toefl/ielts/ whatever else from education based resources. Because it sounds like those, which aren't really natural speech.

I know how neural networks work, but they did seem to imply they were using text to speech components, in which case those hesitation marks wouldn't be present in written, and should probably be discarded as noise by the neural network because of that fact.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, laminutederire said:

 but they did seem to imply they were using text to speech components, in which case those hesitation marks wouldn't be present in written,

Except that for it to generate tone and emphasis contextually (via wavenet) it's not going to be just generating raw strings for your text to speech engine to recreate. You're going to be creating a markup with words and tones and emotional indicators based on your conversation flow. 

 

Wavenet is *far* more complex than traditional text to speech in terms of the kinds of cues you can provide to it and it can adjust for.

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, RedRound2 said:

Actually, it would. As long as people can understand what is going on it would be a great demo and much more convincing. This seemed too ideal world scenario.

 

They could've easily used well-established chain of restaurants and a fake name (anyway the Californian law requires the consent of both the parties for the call to be recorded). The point is, they could've made it alot more realistic but they didn't questioning the legitimacy of the demo. Plus Google as of now still haven't responded to any of this

 

I merely quoted the from the source link

 

I know it's not your fault, I was referring to the Android Authority writer.

Main Rig: CPU: AMD Ryzen 7 5800X | RAM: 32GB (2x16GB) KLEVV CRAS XR RGB DDR4-3600 | Motherboard: Gigabyte B550I AORUS PRO AX | Storage: 512GB SKHynix PC401, 1TB Samsung 970 EVO Plus, 2x Micron 1100 256GB SATA SSDs | GPU: EVGA RTX 3080 FTW3 Ultra 10GB | Cooling: ThermalTake Floe 280mm w/ be quiet! Pure Wings 3 | Case: Sliger SM580 (Black) | PSU: Lian Li SP 850W

 

Server: CPU: AMD Ryzen 3 3100 | RAM: 32GB (2x16GB) Crucial DDR4 Pro | Motherboard: ASUS PRIME B550-PLUS AC-HES | Storage: 128GB Samsung PM961, 4TB Seagate IronWolf | GPU: AMD FirePro WX 3100 | Cooling: EK-AIO Elite 360 D-RGB | Case: Corsair 5000D Airflow (White) | PSU: Seasonic Focus GM-850

 

Miscellaneous: Dell Optiplex 7060 Micro (i5-8500T/16GB/512GB), Lenovo ThinkCentre M715q Tiny (R5 2400GE/16GB/256GB), Dell Optiplex 7040 SFF (i5-6400/8GB/128GB)

Link to comment
Share on other sites

Link to post
Share on other sites

Google has plenty of places to make the call from. In Georgia, where they have a data center, they could have done that, since it is legal to do so without consent from both parties. Also, I have a Google home, and it sounds that natural when it talks to me. Whenever my mom schedules an appointment at a nail salon, they just ask for a name, which is the same at a restaurant, they only ask for a name in a reservation.

i like trains 🙂

Link to comment
Share on other sites

Link to post
Share on other sites

11 hours ago, RedRound2 said:

They could've easily used well-established chain of restaurants and a fake name (anyway the Californian law requires the consent of both the parties for the call to be recorded). The point is, they could've made it alot more realistic but they didn't questioning the legitimacy of the demo. Plus Google as of now still haven't responded to any of this

Major chain or not, revealing the specific hair salon or restaurant would still have been problematic -- I don't think enabling harassment is okay just because it's, say, Outback Steakhouse.

 

It's true that California does require the consent of both parties for a call to be recorded, and that part is problematic.  However, Bloomberg recently learned through sources that Google will notify businesses that it's recording calls in states where that's required.  (It also learned that Google only edited the calls to remove identities.)

 

I suspect Google isn't responding because Axios' story is long on speculation and short on actual evidence.  I respect Axios most of the time because they're very accurate when citing sources, but their piece on Duplex doesn't have much meat to it.  A lack of response isn't evidence of guilt by itself.

Link to comment
Share on other sites

Link to post
Share on other sites

15 minutes ago, Commodus said:

Major chain or not, revealing the specific hair salon or restaurant would still have been problematic -- I don't think enabling harassment is okay just because it's, say, Outback Steakhouse.

 

What kind harassment do you expect to happen just because the restaurant turned out to be TGI Fridays? I am pretty sure no one would give a shit.

Quote

It's true that California does require the consent of both parties for a call to be recorded, and that part is problematic.  However, Bloomberg recently learned through sources that Google will notify businesses that it's recording calls in states where that's required.  (It also learned that Google only edited the calls to remove identities.)

 

That confirmation was due to response over ethics. At no point I believe, they have said that they informed the callers at the receiving end shown in the demo

Quote

I suspect Google isn't responding because Axios' story is long on speculation and short on actual evidence.  I respect Axios most of the time because they're very accurate when citing sources, but their piece on Duplex doesn't have much meat to it.  A lack of response isn't evidence of guilt by itself.

Really? Even if it say, it's a baseless allegation, if it has gained this much traction from various publications it's idiotic not to shut it down. It's far more likely that there was some kind of partial modification to this whole thing and Google is afraid to completely deny it for obvious reasons

31 minutes ago, ElfFriend said:

Sure just like this https://deepmind.com/blog/wavenet-generative-model-raw-audio/ is also fake.... /s

 

Point is they've shown all the tech that went into that demo over the past year or two... That demo was basically just the result of them stitching it all together into a nice use case.

The demo they showed sound far more realistic than wavenet. Just listen to how the pronunciation, the pacing and that general tone of the voice shown in the demo

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, RedRound2 said:

What kind harassment do you expect to happen just because the restaurant turned out to be TGI Fridays? I am pretty sure no one would give a shit.

People calling in a bid to pretend they're Google Assistant, to book non-existent appointments or just give the place grief.  Did you consider that these places don't want to be flooded with crank calls and bogus appointments that prevent real customers from getting through?

 

8 minutes ago, RedRound2 said:

That confirmation was due to response over ethics. At no point I believe, they have said that they informed the callers at the receiving end shown in the demo

They didn't clarify that situation, and that's a problem.  I was just stating that this wasn't going to be a sustained pattern.

 

12 minutes ago, RedRound2 said:

Really? Even if it say, it's a baseless allegation, if it has gained this much traction from various publications it's idiotic not to shut it down. It's far more likely that there was some kind of partial modification to this whole thing and Google is afraid to completely deny it for obvious reasons

Idiotic?  Maybe.  But my concern wasn't that -- it's that people are leaping to conclusions (the whole thing was faked, etc.) because Google isn't entertaining their theories.  You're probably right in that that there was partial modification, but I don't think we should be upset.  As it is, if Apple denied the claims, that probably wouldn't silence the people convinced it's all a fraud.

Link to comment
Share on other sites

Link to post
Share on other sites

And why would google fake something like this?

It is not like people expected a new jump in AI like this. So they had no pressure to show something like it.

 

Scripting stuff for the show is pretty common and i honestly never expect something to be live. At best it is being recorded beforehand and you can guess how many failed attempts they would then show on stage.

 

That being said: It is not even an AI thing, just a bunch trained reactions to data sets. The real kicker was how natural the voice sounded and you will have it on your phone soon enough. So they would be pretty dumb to fake that. ;-)

Link to comment
Share on other sites

Link to post
Share on other sites

On 19/05/2018 at 12:28 AM, RedRound2 said:

Source: https://www.androidauthority.com/google-duplex-calls-edited-faked-866951/

 

So yeah, it's a tin foil hat thingie but I did feel that the demo was too perfect. One of the things that raised my skepticism was how natural the Google assistant voice sounded (not talking about fillers but rather the pronunciation and the continuity). But now more reputable media outlets (Axios) have done some more thorough check and it kinda raises questions.

Maybe they edited out the outlet and employee name to avoid unwanted attention, but they could've easily used a well-established outlet and a fake name

The second and third may not necessarily need to happen, but Axios did contact a bunch of restaurants and hair salon and there were background noises and they did ask for more customer details

What adds fuel to the fire is the fact that Google hasn't responded to any of these claims and is seemingly being very tight-lipped about all this. 

 

It's a pretty impressive demo, no doubt. But I personally felt that this was a huge jump in voice assistant advancement given what we have today. Yes, Google Assistant is really good, but I don't think it would fair well with accents especially the one shown in the second call.

Maybe they didn't call an actual hair salon or restaurant, maybe they called a Google employee sat in a back room somewhere.

 

Does that mean they faked the call? No it doesn't. It's not the call answerer that was being shown off, it was the assistants ability to make the call.

 

Main Rig:-

Ryzen 7 3800X | Asus ROG Strix X570-F Gaming | 16GB Team Group Dark Pro 3600Mhz | Corsair MP600 1TB PCIe Gen 4 | Sapphire 5700 XT Pulse | Corsair H115i Platinum | WD Black 1TB | WD Green 4TB | EVGA SuperNOVA G3 650W | Asus TUF GT501 | Samsung C27HG70 1440p 144hz HDR FreeSync 2 | Ubuntu 20.04.2 LTS |

 

Server:-

Intel NUC running Server 2019 + Synology DSM218+ with 2 x 4TB Toshiba NAS Ready HDDs (RAID0)

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Master Disaster said:

Maybe they didn't call an actual hair salon or restaurant, maybe they called a Google employee sat in a back room somewhere.

 

Does that mean they faked the call? No it doesn't. It's not the call answerer that was being shown off, it was the assistants ability to make the call.

 

They said it was a real call. Plus if it was a Google employee at the back end, then surely the entire thing was scripted. If it was scripted, it raises questions on how well Duplex can handle curveball questions, because the way they showed it the demo it seemed like Duplex could handle anything

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×