Jump to content

The Asian kid is always better - Baidu tops GLUE benchmark

williamcll

image.thumb.png.af4f75c2e58470c65734755bc8b33af4.png

Recently, Baidu managed to beat Microsoft for an AI's ability to understand human language in a benchmark known as GLUE (General Language Understanding Evaluation). This is the first time a non-English company has reached the top score. 

Quote

Earlier this month, a Chinese tech giant quietly dethroned Microsoft and Google in an ongoing competition in AI. The company was Baidu, China’s closest equivalent to Google, and the competition was the General Language Understanding Evaluation, otherwise known as GLUE. GLUE is a widely accepted benchmark for how well an AI system understands human language. It consists of nine different tests for things like picking out the names of people and organizations in a sentence and figuring out what a pronoun like “it” refers to when there are multiple potential antecedents. A language model that scores highly on GLUE, therefore, can handle diverse reading comprehension tasks. Out of a full score of 100, the average person scores around 87 points. Baidu is now the first team to surpass 90 with its model, ERNIE.

 

The public leaderboard for GLUE is constantly changing, and another team will likely top Baidu soon. But what’s notable about Baidu’s achievement is that it illustrates how AI research benefits from a diversity of contributors. Baidu’s researchers had to develop a technique specifically for the Chinese language to build ERNIE (which stands for “Enhanced Representation through kNowledge IntEgration”). It just so happens, however, that the same technique makes it better at understanding English as well. To appreciate ERNIE, consider the model it was inspired by: Google’s BERT. (Yes, they’re both named after the Sesame Street characters.) Before BERT (“Bidirectional Encoder Representations from Transformers”) was created in late 2018, natural-language models weren’t that great. They were good at predicting the next word in a sentence—thus well suited for applications like Autocomplete—but they couldn’t sustain a single train of thought over even a small passage. This was because they didn’t comprehend meaning, such as what the word “it” might refer to. But BERT changed that. Previous models learned to predict and interpret the meaning of a word by considering only the context that appeared before or after it—never both at the same time. They were, in other words, unidirectional. BERT, by contrast, considers the context before and after a word all at once, making it bidirectional. It does this using a technique known as “masking.” In a given passage of text, BERT randomly hides 15% of the words and then tries to predict them from the remaining ones. This allows it to make more accurate predictions because it has twice as many cues to work from. In the sentence “The man went to the ___ to buy milk,” for example, both the beginning and the end of the sentence give hints at the missing word. The ___ is a place you can go and a place you can buy milk. The use of masking is one of the core innovations behind dramatic improvements in natural-language tasks and is part of the reason why models like OpenAI’s infamous GPT-2 can write extremely convincing prose without deviating from a central thesis.

 

 

When Baidu researchers began developing their own language model, they wanted to build on the masking technique. But they realized they needed to tweak it to accommodate the Chinese language.

In English, the word serves as the semantic unit—meaning a word pulled completely out of context still contains meaning. The same cannot be said for characters in Chinese. While certain characters do have inherent meaning, like fire (火, huŏ), water (水, shuĭ), or wood (木, ), most do not until they are strung together with others. The character 灵 (líng), for example, can either mean clever (机灵, jīlíng) or soul (灵魂, línghún), depending on its match. And the characters in a proper noun like Boston (波士顿, bōshìdùn) or the US (美国, měiguó) do not mean the same thing once split apart.

So the researchers trained ERNIE on a new version of masking that hides strings of characters rather than single ones. They also trained it to distinguish between meaningful and random strings so it could mask the right character combinations accordingly. As a result, ERNIE has a greater grasp of how words encode information in Chinese and is much more accurate at predicting the missing pieces. This proves useful for applications like translation and information retrieval from a text document. The researchers very quickly discovered that this approach actually works better for English, too. Though not as often as Chinese, English similarly has strings of words that express a meaning different from the sum of their parts. Proper nouns like “Harry Potter” and expressions like “chip off the old block” cannot be meaningfully parsed by separating them into individual words.

 

So for the sentence:

Harry Potter is a series of fantasy novels written by J. K. Rowling.

BERT might mask it the following way:

[mask] Potter is a series [mask] fantasy novels [mask] by J. [mask] Rowling.

But ERNIE would instead mask it like this:

Harry Potter is [mask] [mask] [mask] fantasy novels by [mask] [mask] [mask].

 

ERNIE thus learns more robust predictions based on meaning rather than statistical word usage patterns.The latest version of ERNIE uses several other training techniques as well. It considers the ordering of sentences and the distances between them, for example, to understand the logical progression of a paragraph. Most important, however, it uses a method called continuous training that allows it to train on new data and new tasks without it forgetting those it learned before. This allows it to get better and better at performing a broad range of tasks over time with minimal human interference.

 

 

Baidu actively uses ERNIE to give users more applicable search results, remove duplicate stories in its news feed, and improve its AI assistant Xiao Du’s ability to accurately respond to requests. It has also described ERNIE’s latest architecture in a paper that will be presented at the Association for the Advancement of Artificial Intelligence conference next year. The same way their team built on Google’s work with BERT, the researchers hope others will also benefit from their work with ERNIE. “When we first started this work, we were thinking specifically about certain characteristics of the Chinese language,” says Hao Tian, the chief architect of Baidu Research. “But we quickly discovered that it was applicable beyond that.”

Source:https://www.technologyreview.com/s/614996/ai-baidu-ernie-google-bert-natural-language-glue/

https://gluebenchmark.com/leaderboard/

https://github.com/PaddlePaddle/ERNIE

Thoughts: While interesting, it will still take a lot more effort to convince people to use alternatives to google translate, which I recently experienced difficulties in using when I went to Japan for vacation where it struggled to phrase the Japanese grammar properly into English. I do suggest everyone to give Microsoft translate a try though, the UI honestly feels better than google's design.

Specs: Motherboard: Asus X470-PLUS TUF gaming (Yes I know it's poor but I wasn't informed) RAM: Corsair VENGEANCE® LPX DDR4 3200Mhz CL16-18-18-36 2x8GB

            CPU: Ryzen 9 5900X          Case: Antec P8     PSU: Corsair RM850x                        Cooler: Antec K240 with two Noctura Industrial PPC 3000 PWM

            Drives: Samsung 970 EVO plus 250GB, Micron 1100 2TB, Seagate ST4000DM000/1F2168 GPU: EVGA RTX 2080 ti Black edition

Link to comment
Share on other sites

Link to post
Share on other sites

Ernie also drove the fastest milk cart in the west.   What a champ...

Grammar and spelling is not indicative of intelligence/knowledge.  Not having the same opinion does not always mean lack of understanding.  

Link to comment
Share on other sites

Link to post
Share on other sites

Dang, I wonder how well/quickly this could be implemented into video games. In a game like Bethesda's epics, allowing characters to go off on tangents, as well as understand the player, and important NPCs can then guide the conversation back on topic "naturally". Would make for a whole 'nother other level of immersion. Or having characters have full on conversations you could eavesdrop on, instead of millions of pre-recorded dialogue options that repeat after a few times.

Spoiler

CPU: Intel i7 6850K

GPU: nVidia GTX 1080Ti (ZoTaC AMP! Extreme)

Motherboard: Gigabyte X99-UltraGaming

RAM: 16GB (2x 8GB) 3000Mhz EVGA SuperSC DDR4

Case: RaidMax Delta I

PSU: ThermalTake DPS-G 750W 80+ Gold

Monitor: Samsung 32" UJ590 UHD

Keyboard: Corsair K70

Mouse: Corsair Scimitar

Audio: Logitech Z200 (desktop); Roland RH-300 (headphones)

 

Link to comment
Share on other sites

Link to post
Share on other sites

30 minutes ago, The1Dickens said:

Dang, I wonder how well/quickly this could be implemented into video games. In a game like Bethesda's epics, allowing characters to go off on tangents, as well as understand the player, and important NPCs can then guide the conversation back on topic "naturally". Would make for a whole 'nother other level of immersion. Or having characters have full on conversations you could eavesdrop on, instead of millions of pre-recorded dialogue options that repeat after a few times.

The first problem is that you assume Bethesda will ever code well.

Specs: Motherboard: Asus X470-PLUS TUF gaming (Yes I know it's poor but I wasn't informed) RAM: Corsair VENGEANCE® LPX DDR4 3200Mhz CL16-18-18-36 2x8GB

            CPU: Ryzen 9 5900X          Case: Antec P8     PSU: Corsair RM850x                        Cooler: Antec K240 with two Noctura Industrial PPC 3000 PWM

            Drives: Samsung 970 EVO plus 250GB, Micron 1100 2TB, Seagate ST4000DM000/1F2168 GPU: EVGA RTX 2080 ti Black edition

Link to comment
Share on other sites

Link to post
Share on other sites

11 hours ago, The1Dickens said:

Dang, I wonder how well/quickly this could be implemented into video games. In a game like Bethesda's epics, allowing characters to go off on tangents, as well as understand the player, and important NPCs can then guide the conversation back on topic "naturally". Would make for a whole 'nother other level of immersion. Or having characters have full on conversations you could eavesdrop on, instead of millions of pre-recorded dialogue options that repeat after a few times.

I can see it now, when talking to an NPC,  you say: "I wish we had a car to get to the house quicker",  NPC responds:  "how much would you pay for that DLC"?

Grammar and spelling is not indicative of intelligence/knowledge.  Not having the same opinion does not always mean lack of understanding.  

Link to comment
Share on other sites

Link to post
Share on other sites

Well it's still getting a beating and a questioning of where the other 9.9% went.

Link to comment
Share on other sites

Link to post
Share on other sites

One More example that a headline often has very little to do with the actual content and not infrequently actually contradicts it.

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

ill defeat this AI easily

Link to comment
Share on other sites

Link to post
Share on other sites

It being China though, probably cheated to get that score.  Is their algo an IM window feeding questions to 30 asian nerds?

Workstation:  13700k @ 5.5Ghz || Gigabyte Z790 Ultra || MSI Gaming Trio 4090 Shunt || TeamGroup DDR5-7800 @ 7000 || Corsair AX1500i@240V || whole-house loop.

LANRig/GuestGamingBox: 9900nonK || Gigabyte Z390 Master || ASUS TUF 3090 650W shunt || Corsair SF600 || CPU+GPU watercooled 280 rad pull only || whole-house loop.

Server Router (Untangle): 13600k @ Stock || ASRock Z690 ITX || All 10Gbe || 2x8GB 3200 || PicoPSU 150W 24pin + AX1200i on CPU|| whole-house loop

Server Compute/Storage: 10850K @ 5.1Ghz || Gigabyte Z490 Ultra || EVGA FTW3 3090 1000W || LSI 9280i-24 port || 4TB Samsung 860 Evo, 5x10TB Seagate Enterprise Raid 6, 4x8TB Seagate Archive Backup ||  whole-house loop.

Laptop: HP Elitebook 840 G8 (Intel 1185G7) + 3080Ti Thunderbolt Dock, Razer Blade Stealth 13" 2017 (Intel 8550U)

Link to comment
Share on other sites

Link to post
Share on other sites

36 minutes ago, AnonymousGuy said:

It being China though, probably cheated to get that score.  Is their algo an IM window feeding questions to 30 asian nerds?

Sad, yea not sure why but I instantly assumed they cheated to lol

Link to comment
Share on other sites

Link to post
Share on other sites

Not long ago Baidu was one of the most nefarious bundleware company around, you couldn't remove their software from computers being just a common user, you'd have to remove registry keys to prevent it from installing itself again.

Project Diesel 5.0: Motherboard: ASRock Fatal1ty X370 Professional Gaming /// CPU: Ryzen 5 3600X  /// CPU Cooler: Scythe Ninja 5 /// GPU: Zotac AMP Extreme RTX 2070 /// RAM: 2x 16gb G.Skill Ripjaws V @3200mhz /// Chassis: Lian Li Lancool One Digital (black) /// PSU: Super Flower Leadex III 750w /// Storage: Inland Premium 1TB NVME + Toshiba X300 4TB

 

Peripherals: Mice: Cooler Master MM720 /// Keyboard: Corsair K70 MK2 SE (Cherry Silver), Blitzwolf BW-KB1 (Gateron Reds) /// Monitor: Acer XZ320Q 32' (VA, 1080p @240hz) /// AMP: Topping PA3 (Onkyo Integra A-817XD undergoing restoration) /// DAC: Weiliang SU5 /// Speakers: AAT BSF-100 /// Mike: Alctron CS35U /// Headphones: Blon B8, ISK MDH-9000

 

Living room: TV: Samsung QLED Q7FN 55' 4k /// Amplifier: Denon AVR-X2400H /// Speakers: DALI Zensor 7 /// Consoles: Sony PS4 Pro 1TB, Sony PS3 500gb /// LD/CD/DVD: Pioneer DVL-909 /// Power Supplies: Upsai ACF-2100T + GR Savage CDR2200EX

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×