Jump to content

Scientist who created LSTM developed an LSTM-based GPT rival that beats GPT4 in preliminary testing, but cannot research it due to lack of funding

n0stalghia

Summary

Sepp Hochreiter is the scientist who invented Long Short-Term Memory (LSTM) in the 90s. LSTM is the technical foundation for a lot of modern AI-based tech, including Apple Siri, Amazon Alexa, Google Translate, Google Voice, Google DeepMind's StarCraft II agent and OpanAI Five (a Dota 2 agent). He and his team at the Johannes Kepler University of Linz, Austria have developed a model which based on preliminary tests yields better results than GPT-3.5 and GPT-4. However, he cannot develop and test it further because Austria does not offer enough funding for AI research. 

 

image.jpeg.d68db527cf49a0a62099218d606e98e1.jpeg

Sepp Hochreiter

 

Quotes

[Words in square brackets] have been added by me to provide context and explanations.

Quote

Now he and his team have created a model that overshadows ChatGPT based on preliminary testing. What makes it possible is a connection between the transformation process used in current large-language models and Hochreiter's LSTM. "That's why I [sic.] am faster and can analyze much longer sentences and much longer texts", he said in an interview to Austrian media Ö1-Digital.Leben.

 

Quote

But Austria is funding AI research with only 7 million Euro. By comparison, the Netherlands are investing two billion Euro into AI research. Hochreiter, even as a star researcher, has no resources to test his model on a larger scale. "It's such a catastrophe. People, who [compared to him] are not [even] that far into their research get funded. It is really frustrating".

 

Quote

It [University of Tübingen] receives 30 million Euro each year. As opposed to that, Austrias financial support for AI research matches that of Uganda.

 

Quote

A couple of years ago, Hochreiter told how he met Amazon employees at an AI conference. They congratulated him on his scientific publication [LSTM] and told him how Amazon has gained two billion in revenue [using his technology]. As a thank you they invited him for a mojito.

 

Now, he fears a similar mojito moment awaits him. "We now have a thing that at the moment is better than ChatGPT. But we cannot run it", says Hochreiter. "We cannot train it. We just don't have enough funding. Now I have to see how I can give it to Amazon or Facebook, so I can continue my research, because there is no money flowing in Austria".

 

My thoughts

The original article by Austrian media predominantly focuses on problems of AI research funding in Austria. However, I believe that the news that there is an LSTM-based LLM that is faster than GPT-4 and that can take more information as context thanks to its usage of LSTM should be news enough in itself to get international attention. It seems that due to the lack of state funding, the technology is destined to end up in the hands of a gigantic corporation like Amazon or Facebook, instead of being released publicly (i.e., freely and openly).

 

Sources

Austrian state media ORF: https://science.orf.at/stories/3218956/. Original text in German.

I like cute animal pics.

Mac Studio | Ryzen 7 5800X3D + RTX 3090

Link to comment
Share on other sites

Link to post
Share on other sites

20 minutes ago, n0stalghia said:

Summary

Sepp Hochreiter is the scientist who invented Long Short-Term Memory (LSTM) in the 90s. LSTM is the technical foundation for a lot of modern AI-based tech, including Apple Siri, Amazon Alexa, Google Translate, Google Voice, Google DeepMind's StarCraft II agent and OpanAI Five (a Dota 2 agent). He and his team at the Johannes Kepler University of Linz, Austria have developed a model which based on preliminary tests yields better results than GPT-3.5 and GPT-4. However, he cannot develop and test it further because Austria does not offer enough funding for AI research. 

 

image.jpeg.d68db527cf49a0a62099218d606e98e1.jpeg

Sepp Hochreiter

 

Quotes

[Words in square brackets] have been added by me to provide context and explanations.

 

 

 

 

My thoughts

The original article by Austrian media predominantly focuses on problems of AI research funding in Austria. However, I believe that the news that there is an LSTM-based LLM that is faster than GPT-4 and that can take more information as context thanks to its usage of LSTM should be news enough in itself to get international attention. It seems that due to the lack of state funding, the technology is destined to end up in the hands of a gigantic corporation like Amazon or Facebook, instead of being released publicly (i.e., freely and openly).

 

Sources

Austrian state media ORF: https://science.orf.at/stories/3218956/. Original text in German.

There are things better than LSTM out there, but you're not throwing your baby out with the bathwater until you see proof.

 

I think it would be interesting to see some actual competition on the LLM side of AI since it's extortionately expensive to actually build one, let alone the hardware needed to train it. 7 million euros' gets you what? 13 DGX's and two researchers?

Link to comment
Share on other sites

Link to post
Share on other sites

35 minutes ago, Arika S said:

"we made an AI better than chat GPT....but we can't prove it....just trust us"

 

excuse me, waiter, this meal has a bit too much salt for me.

He has a good track record and he's looking for funding to prove his claims, as opposed to selling a product without proof. Don't think this is a good take, imo.

I like cute animal pics.

Mac Studio | Ryzen 7 5800X3D + RTX 3090

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, n0stalghia said:

He has a good track record and he's looking for funding to prove his claims, as opposed to selling a product without proof. Don't think this is a good take, imo.

i don't care what someone's track record is, if they make the claim that they have the best x but no one is able to verify those claims, then i'm going to take their claim with a mountain of salt.

 

especially when the guy says:

Quote

"But we cannot run it", says Hochreiter. "We cannot train it. We just don't have enough funding"

 

If that's a "bad take", then i have some magic beans i would like to sell you.

🌲🌲🌲

 

 

 

◒ ◒ 

Link to comment
Share on other sites

Link to post
Share on other sites

15 minutes ago, Arika S said:

no one is able to verify those claims

I think you may have misunderstood the translation. The entire point of the article is him asking someone to step up and verify it. He's a researcher, not a startup generator - his goal is to create a new tech and publish a paper on it. They cannot do that if they can't afford a server farm that can parse the internet like OpenAI can. Is that clear enough now?

 

Hence the second quote - they cannot run/train it on the same scale as OpenAI. Given localized scale - i.e., let's say 1000 texts each 10.000 words long. You feed this data to ChatGPT-4 and their model and ask to, let's say, write a summary. On this localized scale, their model beats ChatGPT-4. But they want to verify that on bigger datasets. Hence the article.

 

I hope this clears up things for you.

 

I like cute animal pics.

Mac Studio | Ryzen 7 5800X3D + RTX 3090

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, n0stalghia said:

The entire point of the article is him asking someone to step up and verify it.

but HE still made the claim that his LLM is better than GPT4. i'm not going to trust HIS claims regardless of his background, that's not the kind of person i am. If someone does step up and verify it, great, but his whole push to get funding/verification is that "we are better", not "we want to see if we are better".

🌲🌲🌲

 

 

 

◒ ◒ 

Link to comment
Share on other sites

Link to post
Share on other sites

I don't see recent papers regarding a new LLM architecture that in preliminary tests beat GPT3.5:
https://scholar.google.at/citations?hl=en&user=tvUH3WMAAAAJ&view_op=list_works&sortby=pubdate

 

Maybe this researcher has a good idea, but he is not owed a month worth of hundreds of A100 runtime with trillions worth of tokens to train a new GPT4 scale LLM. And I don't see why can't he just train a much smaller model, and compare it with similarly smaller models.

 

My UNI had a small cluster and there was lots of competition to use it. We simply do not have the resources to test every resource intensive idea. It's up to him to prove his idea will work if scaled up.

Link to comment
Share on other sites

Link to post
Share on other sites

Jest let those AIs mingle together now.

| Ryzen 7 7800X3D | AM5 B650 Aorus Elite AX | G.Skill Trident Z5 Neo RGB DDR5 32GB 6000MHz C30 | Sapphire PULSE Radeon RX 7900 XTX | Samsung 990 PRO 1TB with heatsink | Arctic Liquid Freezer II 360 | Seasonic Focus GX-850 | Lian Li Lanccool III | Mousepad: Skypad 3.0 XL / Zowie GTF-X | Mouse: Zowie S1-C | Keyboard: Ducky One 3 TKL (Cherry MX-Speed-Silver)Beyerdynamic MMX 300 (2nd Gen) | Acer XV272U | OS: Windows 11 |

Link to comment
Share on other sites

Link to post
Share on other sites

to beat these, you need that data. So change about how it "beats" anything, but could maybe be interesting and have its own goals.

ofc there is papers all the time about models using less data. also how GPT or other fine tuners used for GPT to add more to it, to recent LLM to fine tuners that adds some of the larger context or handle more information.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×