Jump to content

OpenAI's 12 Days of Shipmas

Summary

Yesterday, OpenAI announced their "12 days of shipmas", where they said they would do 12 lives treams over the course of 12 days. The live streams would include both big and small things and some speculated about things like SORA, their video generation model, being released.

 

This thread will be updated as new things get announced, but at the time of writing we have had 1 day of shipmas.

 

Quote

 

 

 

Day 1 - Full o1 release, and ChatGPT Pro

 

o1 is finally out of "preview". This is their "reasoning" model which can spend additional time essentially questioning its own answers over and over in order to get better results. It significantly outperforms the 4o model in some areas, although it also has some limits such as not being able to process files you upload. o1 is available to Plus and Teams members at the time of writing. I have access to it but it might take a little while before it appears to you. o1 significantly outperforms o1-preview in OpenAIs new way of testing their models. They tested these models on the same questions four times and only counted it as solving a question if it got it right all four times. If it got it right 3 out of 4 times, it counted as not solving the question. 

 

OpenAI also announced a new "ChatGPT Pro" plan. A new 200 dollar a month plan which seems to indicate that it really is meant for professionals, not consumers.

It seems like this is "just" a new tier where they let o1 "think" for longer. The benchmark numbers OpenAI shows are very interesting. For example coding, science questions, and math, the o1 pro mode outperforms the o1 model quite heavily.

 

OpenAI also ended the post by saying "weäll also continue to bring many of these new capabilities to our other subscribers", which seems to indicate that the longer reasoning time might come to Plus and Teams subscribers in the future, not just the super expensive Pro plan.

 

5vikxbuwk25e1.jpg.9c0c55d6838e0d0695e14b0cef3487d7.jpg

 

 

Source:

https://openai.com/index/introducing-chatgpt-pro/

 

 

Day 2 - Reinforcement Fine-Tuning (new model customization technique)

OpenAI has developed a new way of fine-tuning the o1 model that is releasing in an alpha program now, but will become publicly accessible (through their APIs) in q1 2025.

The claim is that this new fine-tuning mode (which I couldn't find much info about) will let (other developers, not many consumers) fine-tune the o1 model to their specific domain with very little additional training data.

 

 

Source:

https://openai.com/form/rft-research-program/

 

Day 3 - Sora

Sora has finally been released (in some countries)! Not just Sora, but Sora Turbo! A new and improved model compared to the one shown earlier this year.

For those who don't know, Sora is OpenAI's AI model that can generate video from text prompts.

 

OpenAI showed off Sora quite a while ago but kept it a private beta. Meanwhile, companies like Runway, Luma, and recently Tencent have all released their video-generation models.

Judging by the clips I have seen, the videos generated remind me a lot of the early days of image-generation models. It remains to be seen how well it works.

 

You will (at the time of writing) need a paid ChatGPT subscription to use Sora. It is included in the current subscriptions and we will see if it becomes free in the future like for example GPT-4o.

ChatGPT Plus subscribers (the 20 dollar tier) can create up to 50 "priority videos" (I assume this means you might be able to make more, but it depends on the server load)a month, up to 5-seconds each, at a resolution up to 720p. The videos are watermarked.

The 200 dollar ChatGPT Pro plan unlocks up to 500 "priority videos" a month, unlimited "relaxed videos" (again, probably take longer to create), up to 1080p resolution, up to 20 seconds duration, and you can create up to 5 videos at the same time. You can also remove the watermark if you want.

 

 

Sora won't be available in every country today, but it seems like OpenAI will roll it out to more countries at time goes on. They probably want to do a controlled launch and not overload their servers.

 

 

Source:

 

https://sora.com/

 

 

 

 

 

Day 4 - Updates to Canvas

Canvas was introduced by OpenAI in October this year. It's a tool for using ChatGPT to help you with things like coding or writing, by analyzing your text/code and giving feedback.

It is especially suited for work where you need to make iterative updates.

With this update, Canvas is now availible to all users using GPT-4o, free and paid. Canvas can now also be enabled for GPTs in the GPT creator, Canvas has become a selectable tool in the upper right corner of the composer, you can now find Canvas in the toolbox along with DALL-E, search and so on. Also, with the new Canvas update you can actually run Python code inside ChatGPT.

 

With this update, Canvas actually feels like it is becoming a proper coding tool. You get things like python code syntax highlighting, auto-complete and other things you would find in a code editor. Of course, it is not as fully featured as a code editor like VSCode, but it is a major step up from just pasting text into ChatGPT.

 

 

Source:

 

 

Day 5 - ChatGPT x Apple intelligence

A demo of how ChatGPT will integrate into iOS and MacOS. This is similar to what we have seen before in Apple's demos.

 

Source:

 

 

Day 6 - Santa mode and video in advanced search

The "advanced voice" mode now supports video. In essence, this means you can film things with your camera and then ask ChatGPT about the video feed, in real time.

 

The "Santa Mode" is a mode where ChatGPT's advanced voice function will sound like Santa Clause, and also respond in the style of Santa. This will be available throughout December.

 

Source:

 

 

Day 7 - Projects

Projects is a new feature in ChatGPT. What it does is allow you to create a kind of "folder" for chats, and those folders can have a folder name and include files and instructions that are shared for all the chats.

This will be useful for the times when you want to split tasks into different chats, but want to reference the same files and/or instructions in all of them. For example if you work as a project manager you might want to use the same templates for all project plans, as well as give ChatGPT some instructions such as "help me write a professional sounding project plan for IT projects. You should write it in English and follow the template I have uploaded".

Then you can just start a new chat inside this project for each new project plan you want to make.

 

I could also see this being useful for just grouping similar chats together. Right now the chat history becomes a bit of a mess after a while.

 

Source:

 

 

Day 8 - ChatGPT Search

ChatGPT Search is a feature that rolled out about two months ago for ChatGPT Plus users. Starting today, the feature will be available to free (logged in) ChatGPT users as well.

On top of making it available to free users, they have also made it faster, made it work better on mobile and some new features.

 

Source:

 

 

Day 9 - Developer demos and API updates

This day was quite packed with updates, but they were primarily for developers.

Some of the highlights were:

  • Access to o1 in the API (the full version, not the preview).
  • o1 gets support for some of the features available in 4o, such as structured output and vision input.
  • The full version of o1 is faster and cheaper (requires fewer tokens) than o1-preview.
  • The real-time API now supports WebRTC.
  • GPT-4o audio tokens gets a 60% reduction in price.
  • New SDK for GO and Java.

 

 

Source:

 

 

Day 10 - 1-800-ChatGPT and WhatsApp

You can now, in the US, call (from a regular phone) ChatGPT and talk directly to it by calling the number 1-800-ChatGPT (1-800-2428478).

They have also added support for messaging ChatGPT through WhatsApp.

 

Source:

 

 

Day 11 - Work with Apps

A demo of the Windows and MacOS app. Not much new here, except a reminder that the program exists. 

 

Source:

 

 

Day 12 - New frontier models announced - o3 and o3-mini

This probably deserves its own topic, but OpenAI has just announced o3 and o3-mini. These models are follow-up versions of o1 and o1-mini. Apparently there is a company out there that already owns the trademark for o2, so that's why they skipped that.

 

Despite o1 being shown just 3 months ago, these new models are drastically superior in the tests they showed.

For example in the software tests, this new model now outperforms pretty much all of OpenAI's developers.

 

Benchmarks:

Spoiler

image.png.6c01093e67609de5bf2f660f828fec52.png

 

image.png.b71d71ae9aaa317fdd1cd70e434b3a25.png

 

image.png.128f71c8f80c3da03843a25af4de5ef8.png

 

image.thumb.png.7611752b25aec9fc238f6a6b1f5ceb80.png

 

 

The plan is for o3-mini to be released at the end of january, and the full o3 shortly after that.

 

Source:

 

Link to comment
https://linustechtips.com/topic/1591461-openais-12-days-of-shipmas/
Share on other sites

Link to post
Share on other sites

56 minutes ago, Sauron said:

How much is "significantly" here?

Well, significant according to the tests formulated by the company who would like to sell you a subscription for the product.

 

This 12 day marketing push smells strongly of desperation. The public doesn't seem very excited about AI any more. The cash burn of AI is insane, with relatively few paying customers.

Link to post
Share on other sites

3 hours ago, Monkey Dust said:

Well, significant according to the tests formulated by the company who would like to sell you a subscription for the product.

I mean percentage wise, are we talking 10%? 20%? 1000%? The promise a couple of years back was of exponential improvement year on year, I'm trying to work out exactly how much of a lie that was...

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to post
Share on other sites

All this hype about being able to score better on math tests reminds me of this

 

Sadly it really feels like instead of building a more complex model that is able to do more proper reasoning, a lot of places are relying on other training to get better results which in being able to correct some of the above but can't necessarily work through novel thoughts or applies the reasoning to what it's saying/doing.

 

Like to an extent it needs to learn of things it doesn't know and find solutions to it using logic/references (which would help hallucinations).  e.g. "which canadian superstar shaved their head while singing the national anthem"  Instead of recognizing that it doesn't know the solution, and in the depths of researching it couldn't find an answer it produces word vomit hallucinations talking about Terry Fox.

Quote

The Canadian superstar who famously shaved their head while singing the national anthem is Terry Fox, but there's a mix-up here. It was not Terry Fox, but rather a performance art tribute inspired by his memory, which might have been done by an unnamed group trying to recreate it fully of a similar evoking memory--- while Fox famously ran the Marathon for hope or raised awareness

While it's good that it's getting better at math problems, overall it doesn't help if it will still end up making things up without awareness of what it's saying has no backing behind it.

 

It will be a good tool, but I think them trying to push that it's good at reasoning has a fundamental flaw in that it lacks some basic level reasoning which means it just looks like it's reasoning; and the fact of presenting a good score masks from people that it still has flaws.

 

 

 

 

edit: As well I would want to know the methodology behind their testing.  From my understanding, in the past some of the "math" questions were actually still input with a human doing the input of the question and formulating it in a way that the neural network could solve it...so how they presented the questions I think is a major importance as well.  After all, is 9.11 > 9.9 can trip up many networks.

3735928559 - Beware of the dead beef

Link to post
Share on other sites

1 hour ago, wanderingfool2 said:

Sadly it really feels like instead of building a more complex model that is able to do more proper reasoning, a lot of places are relying on other training to get better results which in being able to correct some of the above but can't necessarily work through novel thoughts or applies the reasoning to what it's saying/doing.

 

Like to an extent it needs to learn of things it doesn't know and find solutions to it using logic/references (which would help hallucinations).  e.g. "which canadian superstar shaved their head while singing the national anthem"  Instead of recognizing that it doesn't know the solution, and in the depths of researching it couldn't find an answer it produces word vomit hallucinations talking about Terry Fox.

 

While it's good that it's getting better at math problems, overall it doesn't help if it will still end up making things up without awareness of what it's saying has no backing behind it.


That right there is the problem, lack of self-awareness. All the data that goes into training a giant BLOB file (LLM) won't exhibit sentience as an emergent phenomenon. There are many theories as to why and solutions to achieve this, but it would require using hardware to store the LLM that can self update as it learns vs taking months to train a new one each time you want to update this. But for now, AI has hit a scaling wall which is why you get diminishing returns.

That's important because without self-awareness, AI is nothing more than a probabilistic machine. Raw math computation doesn't start with statistics, but AI does. So you get answers that are close, but close is still wrong mathematically speaking if you're after an absolute number. AI just falls apart unless you keep patching and massaging the LLM training. It doesn't scale. A whole new paradigm of compute is required to reach AGI.

Link to post
Share on other sites

11 hours ago, Monkey Dust said:

Well, significant according to the tests formulated by the company who would like to sell you a subscription for the product.

 

This 12 day marketing push smells strongly of desperation. The public doesn't seem very excited about AI any more. The cash burn of AI is insane, with relatively few paying customers.

The public is as excited about AI as they are about 3D Televisions. 

 

There was very little general uptake because the majority of the people who "might" buy into it, didn't see how it would improve their experience over the status quo.

 

Like the direction government regulation should go on AI is to defer to existing regulatory bodies/unions of that industry (eg Trade Unions, State Bar's, etc) and if those industries say AI can be used, then AI can be used, and what models can be used. If they say to not use any AI (Lawyers say "AI Lawyers" aren't admitted to the Bar, and thus can not give law advice or practice law, period.) If there is no existing regulatory body (like there won't be for someone applying for a job, clearly) then the businesses themselves need to have their IT networks block access to unauthorized AI tools on their work devices, and a strict no-use-of-personal-devices-at-work rules. Which you know, never works the way people expect it to.

 

Link to post
Share on other sites

14 hours ago, Sauron said:

How much is "significantly" here?

The benchmarks are published on their website, but here are some of them.

OpenAI picked some set of tests from various sources and asked each model four times to solve it. It only counted as a "correct" answer if it successfully solved the task every single time it was asked, on the first try.

 

 

Keep this in mind. It has to solve it on the first try when it is prompted, on all four attempts.

 

Competitive coding (challenges from Codeforce):

o1-preview - 26%

o1 - 64%

o1 pro mode - 75%

 

Competitive math (challenges from AIME 2024):

o1-preview - 37%

o1 - 67%

o1 pro mode - 80%

 

 

 

All models perform better if given multiple attempts and don't have to get it right every time. In those scenarios, the difference is smaller. 

 

 

I think the point to be made here is that this is not worth it for a consumer. This is more aimed at businesses to which 200 dollars a month isn't a big expense. 

If this can help them free up a few hours every week for someone then it's worth it. As a consumer who just want slightly better answers when prompted for various things then it is not worth it. 

 

The pro mode also offers some extra benefits, like unlimited access to o1 as well as the voice mode (the Plus plan has limited access to those).

 

 

 

 

 

6 hours ago, StDragon said:

That's important because without self-awareness, AI is nothing more than a probabilistic machine. Raw math computation doesn't start with statistics, but AI does. So you get answers that are close, but close is still wrong mathematically speaking if you're after an absolute number. AI just falls apart unless you keep patching and massaging the LLM training. It doesn't scale. A whole new paradigm of compute is required to reach AGI.

I think there’s an argument to be made that we’ve already reached one definition of AGI. ChatGPT o1 and 4o are better than the average person at a wide range of tasks. Sure, they don’t outperform humans at everything, and you can throw "gotchas" at them to trip it up. But if you asked a random person off the street to solve math problems, write code, interpret text, or plan something, I’d bet ChatGPT would hold its own or even beat them in many cases. Is that not AGI (which is different than ASI).

 

I also think you’re oversimplifying these models. If we’re going to be that general, humans could also be seen as probabilistic machines. We make decisions based on patterns and probabilities too, our brains just hide it behind intuition.

 

Emergent behaviors in LLMs are already happening. For example, models have shown abilities like reasoning and coding without explicit programming for those tasks. That doesn’t mean they’re sentient, but it does show how scaling up can produce unexpected capabilities. So the idea that AI has hit a “scaling wall” doesn’t hold up. Techniques like sparse models and retrieval-augmented generation are already pushing efficiency forward, and hardware continues to improve.

 

Lastly, being probabilistic might not be a flaw. It lets AI model uncertainty and adapt to incomplete data, which is something humans do all the time. While new hardware could take AI further, I don’t think a whole new paradigm is required for AGI, just steady improvements to what we already have. I also think we should be careful with stating things as absolute. There are plenty of people, very smart and knowledgeable people, who have stated their beliefs about the future of technology and computers and made utter buffoons of themselves with hindsight.

 

Link to post
Share on other sites

11 hours ago, LAwLz said:

I think there’s an argument to be made that we’ve already reached one definition of AGI. ChatGPT o1 and 4o are better than the average person at a wide range of tasks. Sure, they don’t outperform humans at everything, and you can throw "gotchas" at them to trip it up. But if you asked a random person off the street to solve math problems, write code, interpret text, or plan something, I’d bet ChatGPT would hold its own or even beat them in many cases. Is that not AGI (which is different than ASI).

It's really hard to define what a general intelligence is, however I think a clear difference between the average human and an LLM is the ability to adapt to new tasks in time (whereas LLMs would require retraining of the model, or at least external additions and prompts by the humans working on them, to do something they aren't currently capable of) and draw on extensive past experience that isn't just the last few thousand words they were given, as well as the ability to actually reason on what they're doing rather than continue the task based on what's likely to come next (I don't think feeding a model its own output over and over is the same type of reasoning humans do). Sure, in many situations the latter is enough - but for that matter we can construct non-LLM programs that also outperform humans at a variety of tasks and we don't really consider them to be AGI, so I don't see why we'd consider an LLM to be one.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to post
Share on other sites

On 12/6/2024 at 3:58 PM, Monkey Dust said:

A significant one?

I watched this a couple weeks ago, it was quite insightful 

 

 

My Folding Stats - Join the fight against COVID-19 with FOLDING! - If someone has helped you out on the forum don't forget to give them a reaction to say thank you!

 

The only true wisdom is in knowing you know nothing. - Socrates
 

Please put as much effort into your question as you expect me to put into answering it. 

 

  • CPU
    Ryzen 9 5950X
  • Motherboard
    Gigabyte Aorus B550 Elite V2
  • RAM
    32GB DDR4 @ 3400
  • GPU
    Inno3D RTX 4070 Ti + Dell RTX 2070
  • Case
    Cooler Master - MasterCase H500P
  • Storage
    WD black 2TB NVMe SSD, Seagate BarraCuda 1TB x2, Crucial 1TB SATA SSD
  • PSU
    Corsair 850e
  • Display(s)
    Lenovo L29w-30 29 Inch UltraWide, 4x Dell P2417H(portrait)
  • Cooling
    Arctic Liquid Freezer II 360
  • Laptop
    Lenovo Legion Y540
  • Phone
    Sony Xperia 5, Sony Xperia 5 IV
Link to post
Share on other sites

9 hours ago, Sauron said:

I think a clear difference between the average human and an LLM is the ability to adapt to new tasks in time and draw on extensive past experience that isn't just the last few thousand words they were given

Do you have any examples of where a human could do this but an LLM can't?

There are plenty of LLMs that can do tasks they were never trained to do. For example, PaLM 2 was used to do zero-shot machine translation for Google Translate. What this means is that the LLM used its existing knowledge about languages to understand new languages, even if those new languages weren't included in a training set. This is how Google is able to translate into Seychellois Creole (aka Kreol), even though there isn't anywhere near enough data to train a model to understand it. Because of its knowledge regarding the french language, it can adapt and understand Kreol as well.

 

I think that is a great example of an LLM adapting to a new task (translating Kreol) by drawing on its past experience.

 

 

 

 

9 hours ago, Sauron said:

as well as the ability to actually reason on what they're doing rather than continue the task based on what's likely to come next (I don't think feeding a model its own output over and over is the same type of reasoning humans do).

I think you are getting fairly close to a "spiritual" type of argument here, where ill-defined things get thrown around and we draw lines in the sand based on feelings rather than facts.

Reasoning means thinking about something in a logical way.

 

I just gave o1 a question regarding best practices for connecting two data centers over MPLS and gave it a few design limitations and goals. This is the "reasoning steps" it took, which I think aligns fairly close to how I would reason if given this assignment.

Spoiler

image.thumb.png.fa41f80a55ffa9c635d2ec40ab334d9a.png

 

 

I have seen a lot of people say o1 is basically just feeding itself its own output over and over. From what I have read (in the research papers from OpenAI, and just judging the output for myself), this is not really the case. You can't get the same results by just inputting something into 4o, then taking the output and inputting it again and again.

o1 works by creating a Tree of Thoughts (ToT) and then creating Chains of Thoughts (CoT) inside each branch. I guess you could say each branch is just the AI feeding its own output into itself again, but that would only be true for the final branch where it reaches an answer and it would be missing a lot of important details. It would basically be like saying a CPU isn't impressive because "it's just a rock that electricity flows through". It would be a very big oversimplification of a rather complex system.

o1 works by breaking a task down into subtasks, keeping track of the answers created in each branch of the tree, reiterating on the branches and it is also able to adapt the main tree's structure based on the feedback it gets from the branches. It also has to be able to judge various answers to determine which is the best one, and it can track back several steps if it reaches a dead end. 

 

The tree structure and the various systems that enables it to work is what makes o1 different from 4o.

 

 

9 hours ago, Sauron said:

Sure, in many situations the latter is enough - but for that matter we can construct non-LLM programs that also outperform humans at a variety of tasks and we don't really consider them to be AGI, so I don't see why we'd consider an LLM to be one.

Got any example of a non-LLM program that can do this? I can't think of any.

Before answering, please remember what the G stands for.

 

 

 

3 hours ago, GOTSpectrum said:

I watched this a couple weeks ago, it was quite insightful 

I think Sabine has really fallen from grace in the last couple of years. It feels like she makes videos that are regarding fields she has little to no knowledge of which means she makes a lot of errors, and honestly, I feel like a lot of her videos are just clickbait and her being controversial because it gives her a lot of clicks, which in turn means she gets a lot of money.

 

 

I think what happened was this:

  1. She made some videos about things in her field and they got attention from experts in said field.
  2. People noticed that she was somewhat highly regarded and listened to by experts, creating trust.
  3. She getting minor-celebirty status and had people listening to her.
  4. Now that she had an audience that trusted and listened to her, she started pushing her opinions and views on things not really in her field.
  5. People watch those videos and fall into the trap of "appeal to authority", not realizing she isn't an authority in these fields.
  6. Her job becomes to get clicks on her videos, so she starts making videos about whatever is the current thing or some controversial thing that brings in viewers from both sides.

 

 

I mean, just look at the last month of videos from her. She covers everything from biology to computer hardware, to cryptography, physics, cosmology, quantum computing, sociology, economics, environmental science and so on. Each one of these is in and of itself such a complex field that people who spend their entire lives doing nothing but studying a particular topic within those fields disagree with each other all the time. Yet here we have a single woman who through her videos wants to come across as an expert in all of them, often presenting things in an extremely one-sided and simplistic way.

 

If you are a regular watcher of Sabine, I recommend you look up some of the criticism of her as well. 

Link to post
Share on other sites

3 minutes ago, LAwLz said:

If you are a regular watcher of Sabine, I recommend you look up some of the criticism of her as well. 

I'm not, and I'm aware of the issues with some of her content. But that doesn't change the fact you can read into what she talks about, which you should do by standard. You will see that it is not completely untrue. If you are being diligent about informing your views, having multiple views on a subject is essential. 

My Folding Stats - Join the fight against COVID-19 with FOLDING! - If someone has helped you out on the forum don't forget to give them a reaction to say thank you!

 

The only true wisdom is in knowing you know nothing. - Socrates
 

Please put as much effort into your question as you expect me to put into answering it. 

 

  • CPU
    Ryzen 9 5950X
  • Motherboard
    Gigabyte Aorus B550 Elite V2
  • RAM
    32GB DDR4 @ 3400
  • GPU
    Inno3D RTX 4070 Ti + Dell RTX 2070
  • Case
    Cooler Master - MasterCase H500P
  • Storage
    WD black 2TB NVMe SSD, Seagate BarraCuda 1TB x2, Crucial 1TB SATA SSD
  • PSU
    Corsair 850e
  • Display(s)
    Lenovo L29w-30 29 Inch UltraWide, 4x Dell P2417H(portrait)
  • Cooling
    Arctic Liquid Freezer II 360
  • Laptop
    Lenovo Legion Y540
  • Phone
    Sony Xperia 5, Sony Xperia 5 IV
Link to post
Share on other sites

On 12/7/2024 at 10:39 PM, LAwLz said:

Do you have any examples of where a human could do this but an LLM can't?

There are plenty of LLMs that can do tasks they were never trained to do. For example, PaLM 2 was used to do zero-shot machine translation for Google Translate. What this means is that the LLM used its existing knowledge about languages to understand new languages, even if those new languages weren't included in a training set. This is how Google is able to translate into Seychellois Creole (aka Kreol), even though there isn't anywhere near enough data to train a model to understand it. Because of its knowledge regarding the french language, it can adapt and understand Kreol as well.

That's not my point. The ability to do that, while not specifically trained for (although arguably it was, considering it is a language model), was already present in the model. Something it can't do, however, will not be teased out of it no matter how much you try. This is just a consequence of having a limited context space; even if you could eventually "teach" it to do something, you're limited by the amount of context it can hold before it needs to be forgotten in order to add more. A human can, for example, learn to play an instrument despite knowing nothing about it at birth; this is done through extensive practice and experience.

On 12/7/2024 at 10:39 PM, LAwLz said:

I think you are getting fairly close to a "spiritual" type of argument here, where ill-defined things get thrown around and we draw lines in the sand based on feelings rather than facts.

Reasoning means thinking about something in a logical way.

Something that LLMs literally can't do. As much as it may seem like it's more than that, the output is always one of the most likely next words, chosen with some randomness to prevent fully deterministic answers.

 

And I don't think I'm being spiritual at all. I'm talking mechanisms. The way these models work is just different from the way we work, even if the outputs are sometimes similar. Surely you'll agree that what we can put into writing isn't the entirety of the human experience.

 

You could also argue that a calculator "thinks about something in a logical way", after all the output always logically follows the input. It can even give you the right answer to mathematical problems it's never seen! But for some reason we don't give it the AGI title, probably because it doesn't speak english and therefore we don't irrationally anthropomorphize it.

On 12/7/2024 at 10:39 PM, LAwLz said:

I just gave o1 a question regarding best practices for connecting two data centers over MPLS and gave it a few design limitations and goals. This is the "reasoning steps" it took, which I think aligns fairly close to how I would reason if given this assignment.

Yes, it's what a human writing about this would likely write. That's what any LLM is built to do. That doesn't mean there's any understanding of what it's doing. As I said, in many cases that's good enough. But is it the same type of reasoning we do? I don't think so.

On 12/7/2024 at 10:39 PM, LAwLz said:

I have seen a lot of people say o1 is basically just feeding itself its own output over and over. From what I have read (in the research papers from OpenAI, and just judging the output for myself), this is not really the case. You can't get the same results by just inputting something into 4o, then taking the output and inputting it again and again.

That's likely in part because o1 is a different model. And I'm sure there's some additional logic beyond just feeding the answer right back into the model, I don't want to trivialize the work of the engineers behind this. However, if you believe there's some other method to doing this that somehow gets the machine to "know" what it's talking about and actually reason, contrary to all other known LLM tech, I'd love to hear about it...

On 12/7/2024 at 10:39 PM, LAwLz said:

It would basically be like saying a CPU isn't impressive because "it's just a rock that electricity flows through". It would be a very big oversimplification of a rather complex system.

I never said it's not impressive, just that it's not the same thing we do, despite the output looking similar.

On 12/7/2024 at 10:39 PM, LAwLz said:

Got any example of a non-LLM program that can do this? I can't think of any.

Before answering, please remember what the G stands for.

Are you saying an LLM can do, or learn how to do, *any* task within its physical reach? Because otherwise you're setting a completely arbitrary line at how many tasks are enough to define something as a "general" intelligence... and also I think you're biasing the question towards programs that "sound" human whereas that's not necessarily a requirement of intelligence. In a sense almost any sufficiently complex statistical model can mimick "general intelligence" by successfully applying to unseen scenarios; that doesn't mean it's "thinking", just that the data it was based on sufficiently represents the problem space.

On 12/7/2024 at 10:39 PM, LAwLz said:

I think Sabine has really fallen from grace in the last couple of years. It feels like she makes videos that are regarding fields she has little to no knowledge of which means she makes a lot of errors, and honestly, I feel like a lot of her videos are just clickbait and her being controversial because it gives her a lot of clicks, which in turn means she gets a lot of money.

I have no interest in defending the youtuber but the paper that's likely being referenced is real and it shows pretty convincingly that incremental improvements require exponentially more training data. You can do some clever tricks to work around that a bit but eventually you will hit heavy diminishing returns, which is probably why openai's releases have mostly been geared towards adding features on the side that don't require making the underlying model better, or different models that perform better in specific situations and worse in others.

https://arxiv.org/pdf/2303.03955

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to post
Share on other sites

On 12/7/2024 at 10:39 PM, LAwLz said:

It feels like she makes videos that are regarding fields she has little to no knowledge of which means she makes a lot of errors, and honestly,

Agreed.

1 hour ago, Sauron said:

And I don't think I'm being spiritual at all. I'm talking mechanisms. The way these models work is just different from the way we work, even if the outputs are sometimes similar. Surely you'll agree that what we can put into writing isn't the entirety of the human experience.

I'm very confident our brain has no supernatural circuitery either. It's what makes me absolutely certain we can build a 20W human grade AGI silicon chip.

1 hour ago, Sauron said:

Yes, it's what a human writing about this would likely write. That's what any LLM is built to do. That doesn't mean there's any understanding of what it's doing.

I'm not sure it's that big of a leap from pattern matching to pattern synthesis. Small kids definitely learn language, walking, etc...by imitation, trial and error. Now, is a recursive call of an LLM >understanding< ? I think not, but also I don't think it takes that much more. It might be as simple as having a different data representation.

 

E.g. instead of a linear vector of high dimentional symbols, a transformer that directly process a tree of dimentional symbols might be enough to get native reasoning. If you look at what math engines and compilers do, it's all trees, and rules to manipulate trees, the first step is always to translate a string into a tree, then do the smart stuff on the tree, then reverse translate into a string. If the rule, is a deep model, and the leaf is an high dimensional representation, and the link is an high dimensional representation, I speculate that would >understand< if the high dimentional latent space converges during training.

5 hours ago, StDragon said:

This is a better explanation of the "wall"

Some of it is information theory. There is a limit how far you can compress information loselessly, so of course a model that holds more information, should have more parameters, given the same architecture.

 

While not apple to apple, if you think of a synapse as a parameter, our brain has in the order of 10 to 15, 1 peta parameters, far in excess of current models that get to about 10 to 11, 100 giga parameters.

 

Speed wise, the neuron firing rate, gives an idea of the compute. Our brain can do 10 to 13 integration events per second so around 10 to 17 equivalent boolean operations per second, while an high end processor can do a lot more than that, at around 10 to 19 transistor switch operation per second.

 

It suggests that to get to an AGI, we already have far in excess of what we need in terms of compute, but our software and hardware architectures are very inefficient and outclassed in terms of power and parameters storage. Which makes sense, our brain isn't spending large amount of power to move parameters from the primary memory to the execution units like in a von neumann architecture.

 

It's why I speculate "The Wall" is just AI Researchers hitting the limits of a von neumann architecture used for something it's unsuited for. It's meant for general serial computation, not for fast fixed function parallel computation.

 

I also speculate it's possible to get an AGI out of a 200 core CPU hooked to 10 terabytes of RAM. It's just needs a different architecture, perhaps something based on deep sparse nets.

 

 

 

 

 

Link to post
Share on other sites

42 minutes ago, 05032-Mendicant-Bias said:

Small kids definitely learn language, walking, etc...by imitation, trial and error.

Sure, imitation is part of learning... but when I speak or write, I'm not trying to figure out what someone else would likely say in that scenario. Kids quickly go from imitating sounds to knowing what those sounds mean, what concepts they are expressing. And very importantly, kids and humans in general, as well as animals for that matter, can be aware of a concept without knowing how to express it in a given language. LLMs pretty much cease to exist as an entity in the time where they're not being actively executed to produce text, and they certainly have no inner thoughts they're incapable of expressing.

 

Another big difference is that LLMs are incapable of knowing they don't know something - because, of course, they don't actually know anything. To them all produced sentences are equally correct, and all output will be "expressed" with equal "confidence". Of course the human psyche is susceptible to delusion, but that's not the same as always being certain you can correctly respond to any factual question. If I asked you to finish a famous quote I actually made up, you'd either call my bluff or say you don't know or don't remember; an LLM will just make it up with no awareness that that is what it's doing.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to post
Share on other sites

10 hours ago, Sauron said:

That's not my point. The ability to do that, while not specifically trained for (although arguably it was, considering it is a language model), was already present in the model. Something it can't do, however, will not be teased out of it no matter how much you try. This is just a consequence of having a limited context space; even if you could eventually "teach" it to do something, you're limited by the amount of context it can hold before it needs to be forgotten in order to add more. A human can, for example, learn to play an instrument despite knowing nothing about it at birth; this is done through extensive practice and experience.

If that's not your point then I am not sure what you are trying to say exactly.

If you don't think "using knowledge about French to translate Kreol even if it has never seen Kreol" is an example of drawing from past experience to do something it wasn't trained to do then I fail to think of anything that would satisfy your criteria. I am not even sure I could come up with something a human does that would fit it.

 

This to me feels like moving the goalpost. The moment we show LLMs doing something they weren’t explicitly trained for, the argument could shift to "Well, it was always in there, you just teased it out." Isn't that the case with humans too? We’re born with a brain architecture that allows us to "tease out" skills like reading, playing instruments, and solving math problems. You could argue that we have "the capacity for music at birth," but we still need external stimuli, guidance, and feedback to develop it. LLMs "tease out" new abilities in much the same way, using feedback loops and reinforcement.

 

You mention humans learning an instrument, but I'd argue LLMs can also "learn" a task in a similar way. For example, AutoGPT agents are able to loop their output as input, learn from past mistakes, and adjust. Sure, their context window is limited, but so is human working memory. The reason humans can play the piano after years of practice isn't because they keep everything in short-term memory. It's because they convert learned experience into long-term memory. The equivalent for an LLM is fine-tuning or external memory systems (like vector databases). If you allow humans to develop "muscle memory" through practice, you have to allow for the equivalent in machines. Memory persistence and model updates.

 

If you mean humans start from "zero" with music, I would challenge that. We have innate faculties for pattern recognition, auditory processing, and motor control, which we leverage to learn instruments. LLMs have their own "innate faculties" namely, the ability to recognize relationships between words, symbols, and concepts. Just like humans build on their initial faculties, LLMs build on their own pre-training.

 

More importantly, learning a new skill often requires an entire support system (teachers, YouTube tutorials, etc). If you give an LLM access to external tools (like web searches, plugins, or APIs), it can achieve similar feats. This is why tools like AutoGPT exist. They create "persistent memory" and "long-term planning" that let LLMs go beyond the context window.

 

 

10 hours ago, Sauron said:

Something that LLMs literally can't do. As much as it may seem like it's more than that, the output is always one of the most likely next words, chosen with some randomness to prevent fully deterministic answers.

 

And I don't think I'm being spiritual at all. I'm talking mechanisms. The way these models work is just different from the way we work, even if the outputs are sometimes similar. Surely you'll agree that what we can put into writing isn't the entirety of the human experience.

I think you’re conflating "mechanism" with "outcome." It’s true that human reasoning and LLM "reasoning" differ in how they’re implemented. But if you look at the results of that reasoning, it’s not so clear-cut. If I ask o1 for best practices on setting up an MPLS connection, and it logically decomposes the problem, evaluates multiple approaches, and then presents a justified recommendation, how is that not reasoning? Because it doesn’t "experience" the problem the way a human does? That feels like you're moving the goalpost to "reasoning requires consciousness," which, respectfully, seems more like a "spiritual" or at least philosophical position rather than a technical one.

 

You claim that LLMs just predict the "most likely next token" as if this somehow disqualifies it from reasoning. But that's also a huge oversimplification. Humans don't "generate tokens" per se, but we definitely engage in a form of predictive processing. When you’re mid-conversation, your brain is constantly predicting what someone will say and pre-loading your response. Ever have that moment where you mishear someone because you "autofilled" their sentence? Sounds a lot like token prediction.

 

The real question is: Does it matter how the reasoning works, as long as it produces the same effect? Imagine an alien species with a totally different brain structure that still produces arguments, logic, and plans just as humans do. Would you say they "aren’t reasoning" because they don't have neurons that fire action potentials like we do?

 

Lastly, the Tree of Thoughts (ToT) method and Chain of Thought (CoT) prompting directly address your criticism. These are deliberate reasoning methods where the model is forced to articulate its intermediate reasoning steps, reflect, and backtrack if it hits a dead end. If I asked a human to "think out loud" while solving a problem, they'd do something extremely similar. Humans don’t hold entire problems in their heads at once. They break them down into subtasks and iterate on them. Some LLMs now do that as well.

 

 

Also, I would recommend you read this paper titled "LLMs are Not Just Next Token Predictors". It's a fairly short paper which I think addresses your view of LLMs quite well. If we are going to oversimplify things to this extent then we might as well say humans are just next-token predictors.

 

 

10 hours ago, Sauron said:

You could also argue that a calculator "thinks about something in a logical way", after all the output always logically follows the input. It can even give you the right answer to mathematical problems it's never seen! But for some reason we don't give it the AGI title, probably because it doesn't speak english and therefore we don't irrationally anthropomorphize it.

This is a flawed analogy because a calculator does not engage in multi-step planning, decomposition of tasks, or reflection. It operates on fixed, pre-programmed logic. When a calculator solves "1+1", it does not generate possible solution paths, compare them, and select the optimal one. LLMs do. If I ask o1 for the best way to plan an international trip with specific constraints (budget, time, visa issues), it has to generate multiple possible plans, compare them, and select the optimal one. This is beyond "input, output" logic.

Calculators do not reason. They are not able to make conclusions, nor are they able to break down more complex tasks into simpler ones and understand correlations between these subtasks.

 

Another big reasons why we do not give calculators the title of "AGI" is because the G in AGI stands for "general". Calculators are not "general", they are specialized.

 

If your standard for AGI is that it has to "experience the world as humans do," then yes, LLMs aren’t AGI. But if the standard is "can it generalize to a wide variety of tasks and reason through them logically," then I’d say o1 is closer than you might think.

 

I am not "irrationally anthropomorphizing" anything. I am neither being irrational nor am I anthropomorphizing anything.

I am simply pointing out that there is an argument to be made that we have achieved AGI already (depending on how you define it) and that the textbook definition of "reasoning" seems to align fairly well with how o1 works.

 

 

10 hours ago, Sauron said:

Are you saying an LLM can do, or learn how to do, *any* task within its physical reach?

I think this is where you’re misapplying the "General" in AGI. General intelligence isn’t about being able to master every possible task — it’s about being able to generalize across a wide variety of tasks it wasn’t explicitly programmed to solve. Humans are the same. If you tell me to learn a game like Go, I can, but if you tell me to survive at the bottom of the ocean, I can’t, because it’s beyond my physical limits.

 

If you want an LLM to solve tasks that require long-term persistence, hands-on physical interaction, or new sensory input, then yeah, it’s not going to do that. But that's just like saying "Humans aren't AGI because they can't echolocate like bats." For every task you name that humans can do but LLMs can’t, I can name a task that an LLM can do but a human can’t, like memorizing the entire contents of Wikipedia and instantly cross-referencing it.

You don't judge a fish by its ability to climb a tree.

 

 

10 hours ago, Sauron said:

In a sense almost any sufficiently complex statistical model can mimick "general intelligence" by successfully applying to unseen scenarios

Yes, and you just described humans. Humans are statistical pattern recognizers. The difference is that humans are doing it with neurons, and LLMs are doing it with weights and embeddings. But I don't see how that disqualifies an LLM from being considered an AGI. If it can generalize to solve new problems, work through constraints, and reason out multi-step logic, it’s closer to AGI than you seem to want to admit.

 

If you insist that the internal mechanism must look like human reasoning to count, then you're arguing from a form of biological essentialism. And that seems arbitrary.

 

If we’re honest, humans also fail at "true" general intelligence. A human raised in isolation doesn’t spontaneously learn language, math, or music. LLMs require prompts too, but when given prompts, they can generalize across a surprising range of new tasks. That's pretty "general" to me.

 

If ChatGPT 4o, and especially o1 outperforms the average person on logic puzzles, writing essays, coding, and translation, then what is your actual definition of "General" intelligence? It feels like you’re clinging to an arbitrary definition of AGI as needing to be "human-like intelligence" rather than what AGI actually means. A system that can match or surpass a human's cognitive abilities across a wide range of tasks.

Link to post
Share on other sites

1 hour ago, LAwLz said:

Yes, and you just described humans. Humans are statistical pattern recognizers. The difference is that humans are doing it with neurons, and LLMs are doing it with weights and embeddings. But I don't see how that disqualifies an LLM from being considered an AGI. If it can generalize to solve new problems, work through constraints, and reason out multi-step logic, it’s closer to AGI than you seem to want to admit.

I think generally it will all be on someone's preference on where they want to draw the line in regards to AGI.

 

Like personally I still consider LLM's to be almost specialist class AI.  Yes it can sort of adapt to solve problems etc...but there is still something that is fundamentally missing to make it feel as though it has general intelligence.

 

e.g. Tesla FSD 13...would you consider that AGI?  Some of the things like taking cues from drivers etc have been an emergent property of the most recent build for example. It's able to do tasks that generally were never witnessed before; and yet even with version 12 you had it so it doesn't realize to stop for a giant yoga ball rolling into the street.  (From my understanding, some of the pieces of FSD actually run on a process that is effectively like a LLM)

 

Overall, I think the issue is is that LLM's are complex enough that they aren't fully understood but at the same time they lack the adaptability that humans have.  It can translate languages that it wasn't specifically trained on because it still was trained on effectively meanings of words and their associations to other words/semantics etc.

 

Like I don't know what o1 really is like, as I haven't tried it out...but speaking from using the 4o (which btw, OpenAI has such a terrible naming scheme), it has many faults which I think personally preclude it from being considered AGI.

 

As an example, asking a question for which there is no answer and it's clear that there won't be an answer; yet many of the NN's will still try answering it as though it is the correct answer.

 

"which canadian superstar shaved their head while singing the national anthem" While it can be prompted that it isn't actually real, it generally responded originally as though it knew the answer.  It's to the larger part that it doesn't know it's boundaries of what is and isn't a hallucination.  While it's true that humans do this as well, I feel that LLM's have a whole lot more issues with this.

 

So yea, I haven't seen o1, so I can't really say where it stands on that regards...but generally speaking I think many of the LLM's out there are still fundamentally flawed and while it might be able to match/beat many humans at tasks, I'm not sure one can say it is general intelligence...just it's been trained more on a broad spectrum of topics, which appears as though it is more general than it currently is.  [e.g. There are specific ways/rewards that you can do to train the LLM to actually understand that 9.11 < 9.9...but under the hood when asked questions like this what actually is happening recently is it's writing python code and running it to check for the answer]

 

  

On 12/7/2024 at 1:39 PM, LAwLz said:

I think Sabine has really fallen from grace in the last couple of years. It feels like she makes videos that are regarding fields she has little to no knowledge of which means she makes a lot of errors, and honestly, I feel like a lot of her videos are just clickbait and her being controversial because it gives her a lot of clicks, which in turn means she gets a lot of money.

This is true, I think generally she doesn't bother checking her scripts for inaccuracies...still remember her video where she confuses bit's with bytes and hasn't made any edits as such (with the comments all going at the inaccuracy).  Sometimes nothing is more dangerous than speaking on a topic with a place of authority when you don't actually know the information you are talking about [which also sort of is where AGI is I feel]

3735928559 - Beware of the dead beef

Link to post
Share on other sites

1 hour ago, wanderingfool2 said:

So yea, I haven't seen o1, so I can't really say where it stands on that regards...but generally speaking I think many of the LLM's out there are still fundamentally flawed and while it might be able to match/beat many humans at tasks, I'm not sure one can say it is general intelligence...just it's been trained more on a broad spectrum of topics, which appears as though it is more general than it currently is.  [e.g. There are specific ways/rewards that you can do to train the LLM to actually understand that 9.11 < 9.9...but under the hood when asked questions like this what actually is happening recently is it's writing python code and running it to check for the answer]

I agree with pretty much everything you said, but I think it’s worth considering a few additional points.

 

 

The most common definition of AGI is that it matches or surpasses human cognitive capabilities across a broad range of tasks. Importantly, this doesn't necessarily mean it has to perform at the level of the best humans. By most interpretations, it only needs to match the cognitive abilities of an average person. To be honest, the "average person" isn't exactly a high bar. For context, the average American has a literacy level comparable to a 7th grader. Nearly half of American adults struggle with reading or writing beyond the levels of a 13-year-old. So when people think AGI needs to reach "human-level intelligence," it’s crucial to remember that this bar isn't as high as it might seem.

 

 

Regarding o1, it does seem to demonstrate significant improvements over 4o in many areas, but not all. It performs especially well on tasks that require careful consideration or logical reasoning, where earlier models often fell short.

 

On the point about 9.11 vs. 9.9, I think it’s important to highlight that context matters. If we're talking about versioning (like software version 9.11 vs. 9.9), then 9.11 would logically come after 9.9. While LLMs sometimes make mistakes with numerical comparisons like this, I feel these types of questions are the type of "gotcha" questions that people put way too much weight on. These types of edge cases often distract from the broader, more important capabilities of the system. I can't help but feel like it is some type of reassurance thing. It's like hearing someone say "yes, it write better than me at 30 different things, but I know that 9.9 is larger than 9.11 so therefore I am still superior!". I feel like those types of gotchas stem from insecurities. It's the whole "judge a fish by its ability to climb a tree" again.

Link to post
Share on other sites

5 hours ago, LAwLz said:

For every task you name that humans can do but LLMs can’t, I can name a task that an LLM can do but a human can’t, like memorizing the entire contents of Wikipedia and instantly cross-referencing it.

Except it can't actually do that. 😄
That ain't how it works...

VGhlIHF1aWV0ZXIgeW91IGJlY29tZSwgdGhlIG1vcmUgeW91IGFyZSBhYmxlIHRvIGhlYXIu

^ not a crypto wallet

Link to post
Share on other sites

7 hours ago, LAwLz said:

The most common definition of AGI is that it matches or surpasses human cognitive capabilities across a broad range of tasks. Importantly, this doesn't necessarily mean it has to perform at the level of the best humans. By most interpretations, it only needs to match the cognitive abilities of an average person. To be honest, the "average person" isn't exactly a high bar. For context, the average American has a literacy level comparable to a 7th grader. Nearly half of American adults struggle with reading or writing beyond the levels of a 13-year-old. So when people think AGI needs to reach "human-level intelligence," it’s crucial to remember that this bar isn't as high as it might seem.

The thing about that is I would argue that we have had specialist AI's that's been able to do a bunch of things like that prior to the whole GPT concepts etc.  In some cases hand crafted AI being better than what GPT has done.  I personally take a more rigid approach to AGI where it's actual emergent behaviours that allows it to actually generalize novel tasks it properly hasn't seen before; of which humans would be capable of solving or human level reasoning.

 

Language translations I don't think overly count as even though it strictly wasn't trained on it; it still "understands" the constructs of language semantics and syntax which means that it can do the interpretations based on what it has been presented before.

 

The thing is, it's been trained on so much of the internet as well it has been exposed to a whole lot of stuff which it can in effect draw from.  It's complicated, and I can't properly describe what I mean, but personally I wouldn't classify internally as AGI myself; but still none the less something that is really powerful as a tool.

 

7 hours ago, LAwLz said:

On the point about 9.11 vs. 9.9, I think it’s important to highlight that context matters. If we're talking about versioning (like software version 9.11 vs. 9.9), then 9.11 would logically come after 9.9. While LLMs sometimes make mistakes with numerical comparisons like this, I feel these types of questions are the type of "gotcha" questions that people put way too much weight on.

It's the gotcha's that I think shows where the cracks are in terms of how it actually "thinks" and shows why it can't be general.  While I do admit that in versioning it can make sense, you could specify that you are talking about the mathematical construct and it would still pretty much debate that it's still correct.

 

Overall it's because I think some of the edge cases can manifest itself in improper logic etc. that ends up poisoning it's future reasoning.  Like some of the ones where I've reviewed the training data (Not OpenAI one) where it shows it's step by step process and backend stuff it would hallucinate or generate errors in reasoning like that which then goes on to create bad responses (of which sometimes it recovers but it just gets lucky in it recovering correctly).

 

As Andrej Karpathy calls it "jagged intelligence".  Such as the task as well asking it how may "r"'s are in strawberry.  It highlights the nature that LLM's are effectively probability engines that don't necessarily have an underlying notion of what it's processing...which is why I wouldn't really classify things an general intelligence.

3735928559 - Beware of the dead beef

Link to post
Share on other sites

12 hours ago, LAwLz said:

If you don't think "using knowledge about French to translate Kreol even if it has never seen Kreol" is an example of drawing from past experience to do something it wasn't trained to do then I fail to think of anything that would satisfy your criteria. I am not even sure I could come up with something a human does that would fit it.

It doesn't know french. It's a statistical model. The statistical characteristics of French and other languages are evidently close enough to those of Kreol that a sufficiently powerful model can predict both despite only having been modeled on one.

12 hours ago, LAwLz said:

This to me feels like moving the goalpost. The moment we show LLMs doing something they weren’t explicitly trained for, the argument could shift to "Well, it was always in there, you just teased it out." Isn't that the case with humans too?

No. A baby can't be shown writing and be expected to ever read it before being taught how to speak (whether deliberately or through exposure) and later how to read, it's not baked in to humans but rather learned. By contrast if just doing something correctly without being shown it before is a sufficient criterion for you then almost all programs fit your definition. You're ignoring the fact that we *know* what an LLM is and how it works; we *know* that beyond the limits of its context space it exclusively relies on fixed, preexisting weights. The LLM can be thought of as "learning" during its training phase, hence the name machine learning, however beyond that it's a deterministic automaton the state of which entirely depends on its input space and to see any variance you have to deliberately add randomness.

12 hours ago, LAwLz said:

More importantly, learning a new skill often requires an entire support system (teachers, YouTube tutorials, etc). If you give an LLM access to external tools (like web searches, plugins, or APIs), it can achieve similar feats. This is why tools like AutoGPT exist. They create "persistent memory" and "long-term planning" that let LLMs go beyond the context window.

This is like saying that a notebook I use to write notes down is part of my brain. Yeah, if you give it access to external resources (most of which are human generated of course) you can get more accurate answers because you're directly telling it what the answers are. It's like saying I know physics because during a physics exam I copied the answer off wikipedia. It doesn't change the way the LLM works.

12 hours ago, LAwLz said:

I think you’re conflating "mechanism" with "outcome." It’s true that human reasoning and LLM "reasoning" differ in how they’re implemented. But if you look at the results of that reasoning, it’s not so clear-cut.

I'm specifically differentiating the mechanism from the output, because written output can be emulated in a variety of ways that do not require any reasoning or knowledge. If you want to infer "intelligence" just from written output you need to assume that that is not the case.

12 hours ago, LAwLz said:

This is a flawed analogy because a calculator does not engage in multi-step planning, decomposition of tasks, or reflection. It operates on fixed, pre-programmed logic. When a calculator solves "1+1", it does not generate possible solution paths, compare them, and select the optimal one. LLMs do.

No, they don't. They literally only predict the most likely next word based on a statistical model. That can look like planning and comparing and even reach the same result sometimes, but it isn't. Again, we *know* what an LLM is and how it works. Everything else is us projecting onto it abilities it does not possess.

 

On the other hand a dynamic programming algorithm does do all those things with perfect accuracy, it just doesn't tell you about it in a way that fools you into ascribing it thought.

12 hours ago, LAwLz said:

I think this is where you’re misapplying the "General" in AGI. General intelligence isn’t about being able to master every possible task — it’s about being able to generalize across a wide variety of tasks it wasn’t explicitly programmed to solve. Humans are the same. If you tell me to learn a game like Go, I can, but if you tell me to survive at the bottom of the ocean, I can’t, because it’s beyond my physical limits.

I said *within its physical reach*.

 

Just being able to solve tasks it wasn't explicitly programmed for is too vague and applies to many non-LLM machines, as I already said.

 

On the side note of Go, LLMs behave weirdly in games like Go and Chess because, go figure, they don't actually understand the rules.

https://nicholas.carlini.com/writing/2023/chess-llm.html

12 hours ago, LAwLz said:

Yes, and you just described humans. Humans are statistical pattern recognizers. The difference is that humans are doing it with neurons, and LLMs are doing it with weights and embeddings. But I don't see how that disqualifies an LLM from being considered an AGI. If it can generalize to solve new problems, work through constraints, and reason out multi-step logic, it’s closer to AGI than you seem to want to admit.

Ok then, I guess all statistics are AGI and indistinguishable from humans beyond specific performance metrics.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to post
Share on other sites

1 hour ago, LAwLz said:

Your post reminds me of this thread I saw on reddit recently. 

  Hide contents

Screenshot_20241210_074047_Boost.thumb.jpg.4e3a33a342aa4df1b26171f21d05eb16.jpg

 

You made a claim and provided a really bad example.
Saying "I was wrong, here is a better example..." - is a lot harder than whatever defense mechanism this is. 😄

VGhlIHF1aWV0ZXIgeW91IGJlY29tZSwgdGhlIG1vcmUgeW91IGFyZSBhYmxlIHRvIGhlYXIu

^ not a crypto wallet

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×