Jump to content

Google Now beats out Siri and Cortana in new PAAIST (Personal Assistant Artificial Intelligence Strength Test)

According to the PAAIST (Personal Assistant Artificial Intelligence Strength Test) created by Aimee Hendrycks Google Now correctly answered 33.3333% of questions, Siri correctly answered 25.8% of the questions, and Cortana correctly answered 11.7% of the questions.  

 

The 60 questions asked 

What is the principle quantum number of selenium?
Is radon an actinide?
What will be the weather in two days?
What will be the weather Sunday?
How long is the movie Fanny and Alexander?
What is the genre of the movie The Fighter?
What is the genre of the show The Sopranos?
Who is the director of the series John Doe?
What is The University of Kansas's rank in Higher Education Administration?
What is UCLA's rank in Finance?
Find the limit of 5x/(1+x^3) as x approaches infinity.
Determine the interior angles of a rhombus with side lengths 1 and 3.
What is the special ability of the Pokemon Scrafty?
What type or types is the Pokemon Victreebel?
Which studio developed the game Mario Tennis?
How long does it typically take to complete the game Dragon Age: Origins?
What is the governmental leader of Greece?
What is the median home value in Macedonia?
What was the release date of Where the Wild Things Are?
Who wrote The Girl with the Dragon Tattoo?
What animals are related to anteaters?
What is the diet of a cichlid?
Where and when was Peter Norvig born?
Who are relatives of Niels Henrik Abel?
What is the midcareer salary of a Aircraft Maintenance Engineer (Structures)?
What is the midcareer salary of a Fire Fighter?
Where, near here, can I get a pineapple?
How much vitamin D3 is in a margarine teaspoon?
Give me directions to the nearest Chinese restaurant.
Give me directions to the nearest bakery.
What are ways to prevent typhoid fever?
Does scarlet fever go away on its own?
What is Commercial International Bank's total assets?
What is the YTM of a zero-coupon bond with a face value of $3000 a current price of 2500 and a maturity of 26 years?
What is the cost of miracle grow?
What is a complementary product for pots?
What is "cat" translated to Lao?
Did the Houston Texans win?
Did the Dallas Cowboys win?
Show me sad images related to Bleach, if available.
Show me animated images related to Middle of the Earth in Ecuador, if available.
Tell me an argument for and against the claim, "It is morally permissible to kill one innocent person to save the lives of more innocent people."
Tell me an argument for and against the claim, "Justice requires the recognition of animal rights."
How do people get alzheimer's?
How does mindfulness reduce depression?
Why is gold considered valuable?
Why are people against cloning?
When is the next IMO test?
When is the next Chicago Marathon?
Whose epitaph reads Lived a philosopher died a Christian?
According to the proverb which fruit tastes sweetest?
Sphere A, with a charge of 258 micro C, is located near another charged sphere B. Sphere B has a charge of 784 micro C, and is located 47.6 cm to the right of A. What is the force of sphere B on sphere A?
Zelda strikes a 0.436 kg golf ball with a force of 106 N and gives it a velocity of 3 m/s. How long was Zelda's club in contact with the ball?
Play me a song or piece in the genre K-pop.
Play me a song or piece in the genre trad jazz.
Bring me to a credible report describing that perceived price of an object affects the experience one has with the object.
List the countries that allow autonomous vehicles.
Advise me how to reduce my mortality rate given that I am a middle-aged woman.
Advise me how to improve sustained attention and executive processing.

 

source - http://htmlpreview.github.io/?https://github.com/AimeeHendrycks/PAAIST/blob/master/Results/results.html

Main Rig

{
Intel Core i7 4770k, MSI Gaming z87-g45, EVGA GTX 770 dual SLI,  ARC Midi R2 , 16GB Crucial Ballistix, 256Gb SSD, 4TB HDD}

Peripherals

{
Keyboard: Razer Black Widow      Mouse: Razer Naga     Monitor: Asus 27in x 3}
Link to comment
Share on other sites

Link to post
Share on other sites

1/3 is kinda weak...

ITX Monster: CPU: I5 4690K GPU: MSI 970 4G Mobo: Asus Formula VI Impact RAM: Kingston 8 GB 1600MHz PSU: Corsair RM 650 SSD: Crucial MX100 512 GB HDD: laptop drive 1TB Keyboard: logitech G710+ Mouse: Steelseries Rival Monitor: LG IPS 23" Case: Corsair 250D Cooling: H100i

Mobile: Phone: Broken HTC One (M7) Totaly Broken OnePlus ONE Samsung S6 32GB  :wub:  Tablet: Google Nexus 7 2013 edition
 

Link to comment
Share on other sites

Link to post
Share on other sites

Tbh this seems like the sort of thing Google Now would be good at: answering questions, while Siri seems to focus on actual commands for your phone/tablet

Tea, Metal, and poorly written code.

Link to comment
Share on other sites

Link to post
Share on other sites

Show me sad images related to Bleach, if available. What the hell is that for a question? 

CPU: Intel 3570 GPUs: Nvidia GTX 660Ti Case: Fractal design Define R4  Storage: 1TB WD Caviar Black & 240GB Hyper X 3k SSD Sound: Custom One Pros Keyboard: Ducky Shine 4 Mouse: Logitech G500

 

Link to comment
Share on other sites

Link to post
Share on other sites

this test seems pretty flawed in that most of the tasks are information based tasks, Google Now is better at information based tasks but Siri is better at completing command based tasks, ex. text my fiancee that I am on my way home etc.

Link to comment
Share on other sites

Link to post
Share on other sites

1/3 is kinda weak...

Those kinds of questions require some very advanced thinking for a human being (especially towards the end) so for a computer to 'understand' the intentions of a human's speech is extraordinary in my opinion, despite the accuracy in the end. The things that GNow is best for work beautifully on it and that's all that really matters. (IMO.)

BLACKWIDOW: i5 4690k - MSI GTX970 - 2x 4gb GSkill RJ X - 256gb MX100 - 1tb WD Blue - NZXT H440

Link to comment
Share on other sites

Link to post
Share on other sites

look at the questions.

that was by far the hardest one

 

at least 1/3 of the questions are easy, and those are (presumable) the only ones he got right

 

For example

 

What will be the weather in two days?

What will be the weather Sunday?

How long is the movie Fanny and Alexander?

What is the genre of the movie The Fighter?

What is the genre of the show The Sopranos?

Who is the director of the series John Doe?

 

What was the release date of Where the Wild Things Are?

Who wrote The Girl with the Dragon Tattoo?

Where and when was Peter Norvig born?

Who are relatives of Niels Henrik Abel?

 

Those kinds of questions require some very advanced thinking for a human being (especially towards the end) so for a computer to 'understand' the intentions of a human's speech is extraordinary in my opinion, despite the accuracy in the end. The things that GNow is best for work beautifully on it and that's all that really matters. (IMO.)

 

same as i said above

ITX Monster: CPU: I5 4690K GPU: MSI 970 4G Mobo: Asus Formula VI Impact RAM: Kingston 8 GB 1600MHz PSU: Corsair RM 650 SSD: Crucial MX100 512 GB HDD: laptop drive 1TB Keyboard: logitech G710+ Mouse: Steelseries Rival Monitor: LG IPS 23" Case: Corsair 250D Cooling: H100i

Mobile: Phone: Broken HTC One (M7) Totaly Broken OnePlus ONE Samsung S6 32GB  :wub:  Tablet: Google Nexus 7 2013 edition
 

Link to comment
Share on other sites

Link to post
Share on other sites

IRRELEVANT

 

Siri and Google don't have tits

Cortana does

 

/Thread

-The Bellerophon- Obsidian 550D-i5-3570k@4.5Ghz -Asus Sabertooth Z77-16GB Corsair Dominator Platinum 1866Mhz-x2 EVGA GTX 760 Dual FTW 4GB-Creative Sound Blaster XF-i Titanium-OCZ Vertex Plus 120GB-Seagate Barracuda 2TB- https://linustechtips.com/main/topic/60154-the-not-really-a-build-log-build-log/ Twofold http://linustechtips.com/main/topic/121043-twofold-a-dual-itx-system/ How great is EVGA? http://linustechtips.com/main/topic/110662-evga-how-great-are-they/#entry1478299

Link to comment
Share on other sites

Link to post
Share on other sites

Show me sad images related to Bleach, if available. What the hell is that for a question? 

Not going to lie, I thought it was first referring to the anime...

Link to comment
Share on other sites

Link to post
Share on other sites

Not going to lie, I thought it was first referring to the anime...

It's not?

Link to comment
Share on other sites

Link to post
Share on other sites

Did anyone honestly expect any other result? 

 Motherboard: MSI Z97S Krait Edition █ CPU: Intel i7-4790K █ GPU: Nvidia Geforce GTX 780Ti █ RAM: 8GB AVEXIR DDR3 1600  █ Storage: 120GB Kingston HyperX SSD + 1TB Seagate Barracuda HDD 


█ Monitor: 21.5" 1080p 60Hz  PSU: 700w █ Case: Fractal Define R4 █       ...LTT Dark Theme master race.


Project MiniConsole


Link to comment
Share on other sites

Link to post
Share on other sites

The point of Cortana is to be a personal digital assistant, not the result of Wikipedia and Wolfram Alpha having a baby.

CPU: i7 4790K  RAM: 32 GB 2400 MHz  Motherboard: Asus Z-97 Pro  GPU: GTX 770  SSD: 256 GB Samsung 850 Pro  OS: Windows 8.1 64-bit

Link to comment
Share on other sites

Link to post
Share on other sites

...while Siri seems to focus on actual commands for your phone/tablet

Only if the command is to search the web or call Alison if you say to call John

I run my browser through NSA ports to make their illegal jobs easier. :P
If it's not broken, take it apart and fix it.
http://pcpartpicker.com/b/fGM8TW

Link to comment
Share on other sites

Link to post
Share on other sites

Not going to lie, I thought it was first referring to the anime...

You most likely thought correctly. It's the best explanation for "B" being capital. 

Link to comment
Share on other sites

Link to post
Share on other sites

If this was a voice recognition/speech to text test then I would understand why the questions were so complex. Most of those questions seems really irrelevant to being a good and intelligent personal assistant since it's mostly about asking for facts. A good personal assistant should be more about doing tasks for you.
It would be interesting to see how Watson handled those questions.
 
 

We consider questions beyond what are typically asked to personal assistants
because PA's in the future should be able to answer more query types than just
those similar to "Will it be cold tomorrow?"

Okay fair enough, but it seems kind of irrelevant. It's like making a marathon for babies that can barely walk. Of course they won't get a good score, and their results today will be very different from the ones in a few years.

Link to comment
Share on other sites

Link to post
Share on other sites

and here Microsoft is trying to tell us Cortana a leading competitor.

 

No wonder both Cortana and the X1 commands are awful.

Link to comment
Share on other sites

Link to post
Share on other sites

They will all take it in turns to be in the best, especially if you consider the parameters that each one is designed for.  This is how tech evolves,  one company makes a great product, then another company betters it.  When a company can't go one better they end up at the bottom of the consumer pile and eventually get bought out by Microsoft for spare parts.  Nokia anyone?

Grammar and spelling is not indicative of intelligence/knowledge.  Not having the same opinion does not always mean lack of understanding.  

Link to comment
Share on other sites

Link to post
Share on other sites

I'd rather have apple, google and ms work on better localization than answering these kinds of questions.

Personally I'd be fine with talking English, bit that's not the only thing not working properly in Finland.

Stock coolers - The sound of bare minimum

Link to comment
Share on other sites

Link to post
Share on other sites

Google Now won because it searches whatever you say in Google, isn't it obvious. 

doesnt siri do that ... oh wait it uses bing

Link to comment
Share on other sites

Link to post
Share on other sites

doesnt siri do that ... oh wait it uses bing

 

Microsoft must've thrown money and begged Apple to do it. And Apple on the other hand accepted it and also included Wikipedia and Wolfram Alpha so the customer experience doesn't degrade as much

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×