Apple Siri powered by ReaLM LLM

05032-Mendicant-Bias · April 4

Summary

Research paper from Apple details ReaLM a local LLM model with 0.08B to 3B parameters which Apple researchers claim performs comparably with GPT3.5 across several benchmarks.

Quotes

Quote

We present our results in Table 3. Overall, we find that our approach outperforms the MARRS model in all types of datasets. We also find that our approach is able to outperform GPT-3.5, which has a significantly larger number of parameters than our model by several orders of magnitude.

My thoughts

Apples ReaLM models detailed in the paper are 0.08B to 3B parameters. For reference GPT3.5 is estimated to be 175B parameters and GPT4 is estimated to be 1760B parameters.

If the researchers aren't overselling their models, it's an enormous improvement in performance/parameters, so much that the model can feasibly run locally using neural engine acceleration within smartphones power and ram limitations.

Facebook and Microsoft for contrast have and are acquiring multiple hundreds of thousands of Nvidia H100 accelerators to train and run inference of their LLM as a cloud service, which is expensive. It's wild something like Bing Chat is free, given how much it costs to run.

If the performance of Siri powered by ReaLM are even close to GPT3.5, it could mean that Apple Siri-ReaLM will be much cheaper for Apple to run, and if Apple is able to run their LLM model locally on smartphones, it has also advantage for privacy of LLM use.

Sources

2403.20329.pdf (arxiv.org)

WhitetailAni · April 4

3 minutes ago, 05032-Mendicant-Bias said:

Summary

Research paper from Apple details ReaLM a local LLM model with 0.08M to 3B

Think this is supposed to be 0.08B or 80M

FlyingPotato_is_taken · April 4

Slight disclaimer: arXiv is a preprint server. Those papers aren't peer reviewed.

leadeater · April 4

Either I'm not reading the table right but isn't it actually equivalent to GPT4 rather than GPT3.5?

Edit:

Ah yes, I wasn't just being dumb

Quote

We also find that our approach performs in the same ballpark as the latest GPT-4 despite being a much lighter (and faster) model. We especially wish to highlight the gains on onscreen datasets, and find that our model with the textual encoding approach is able to perform almost as well as GPT-4 despite the latter being provided with screenshots.

05032-Mendicant-Bias · April 4

12 minutes ago, leadeater said:

Either I'm not reading the table right but isn't it actually equivalent to GPT4 rather than GPT3.5?

My take is that the narrow field performance they care about is closer to GPT4, while the general purpose performance is closer to GPT3.5

GPT4 is also multi modal, and as far as I can tell, ReaLM is not multimodal, but it's given some special embeddings for screen hints, that maybe come from another model maybe something like CLIP that describe an image. ReaLM doesn't seelm to be multimodal like GPT4. Until more is known, I think the comparison with GPT3.5 is more apple to apple.

Franck · April 4

2 hours ago, 05032-Mendicant-Bias said:

if Apple is able to run their LLM model locally on smartphones, it has also advantage for privacy of LLM use.

lower quality than GTP 3.5 you have some version of LLama and light models takes about 16 gb vram to run. There is no way they would run equivalent of GPT 3.5 local on phones. I currently run small model of LLama on RTX A4500 and it take the full 20gb of vram it has to give.

05032-Mendicant-Bias · April 4

44 minutes ago, Franck said:

There is no way they would run equivalent of GPT 3.5 local on phones.

At around 10GB use I can run a llama 7B derived model. But they aren't even close to the free Bing Chat.

Going by the parameter count claimed in the paper, it's doable to run it locally. It's why it would be a really big deal if it's true. A 3B model performing as a 170B model for Siri like workloads, would be huge.

hishnash · April 5

7 hours ago, 05032-Mendicant-Bias said:

A 3B model performing as a 170B model for Siri like workloads, would be huge.

In the end Siri like workloads are best done on device given your want low latency and you have so many possible devices that could be making requests that doing this server side would just cost a small fortune. Just piping Siri to a cloud based LLM would have so many downsides from a load perspective since the Siri load is likly has lots of spikes throughout the day, and you cant pipe requests half way round the world to have a single server handle that load due to massive added latency so you would end up with a lot of compute spread out around the world with much of it only be used at peak times for that timezone.

Being able to do as much as possible locally on the device, even possibly letting that on device model figure out how to query remove data sources (even remote LLMs) would filter out most of this load... it would be a huge wast of $ to have an huge cloud based LLM handle people adding groceries to thier shopping list, or tuning on and off lights in the house (this sort of things makes up most of the Siri requests)... people are not asking Siri to write them a 100 page essay.

Bananasplit_00 · April 5

8 hours ago, hishnash said:

In the end Siri like workloads are best done on device given your want low latency and you have so many possible devices that could be making requests that doing this server side would just cost a small fortune. Just piping Siri to a cloud based LLM would have so many downsides from a load perspective since the Siri load is likly has lots of spikes throughout the day, and you cant pipe requests half way round the world to have a single server handle that load due to massive added latency so you would end up with a lot of compute spread out around the world with much of it only be used at peak times for that timezone.

Being able to do as much as possible locally on the device, even possibly letting that on device model figure out how to query remove data sources (even remote LLMs) would filter out most of this load... it would be a huge wast of $ to have an huge cloud based LLM handle people adding groceries to thier shopping list, or tuning on and off lights in the house (this sort of things makes up most of the Siri requests)... people are not asking Siri to write them a 100 page essay.

"Hey Siri, turn off the lights"

*phone catches fire*

Sauron · April 5

On 4/4/2024 at 2:47 PM, 05032-Mendicant-Bias said:

If the researchers aren't overselling their models, it's an enormous improvement in performance/parameters, so much that the model can feasibly run locally using neural engine acceleration within smartphones power and ram limitations.

This is likely still not desirable since you'd still be working with a limited battery life. It would certainly save a lot of money for the service provider though.

StDragon · April 6

On 4/5/2024 at 10:43 AM, Sauron said:

This is likely still not desirable since you'd still be working with a limited battery life. It would certainly save a lot of money for the service provider though.

If you had 16GB of RAM, in theory you could run an 8B LLM on a smartphone with enough NPU performance to do the inferencing. So yes, these 0.08B to 3B models could run on an iPhone natively to perform complex iOS tasks.

Finally, an intelligent Siri that would be actually useful.

https://www.forbes.com/sites/kateoflahertyuk/2024/03/15/apples-new-ai-move-just-changed-the-game-for-all-iphone-users/?sh=3b8ce72b7277

Apple bought DarwinAI earlier this year, and the firm’s employees have joined the iPhone maker’s AI division, according to Bloomberg, which cites “people with knowledge of the matter, who asked not to be identified because the deal hasn’t been announced.”

Another advantage of DarwinAI that will benefit Apple specifically is the company has developed tech that can make AI systems smaller and faster. “That could be helpful to Apple, which is focused on running AI on devices rather than entirely in the cloud,” Bloomberg writes.

So yes, rumored to be implemented in iOS 18.

Dracarris · April 7

On 4/5/2024 at 5:43 PM, Sauron said:

This is likely still not desirable since you'd still be working with a limited battery life.

I think Apples track record shows that we can rely on them figuring out the energy efficiency part just fine.

Sauron · April 7

5 hours ago, Dracarris said:

I think Apples track record shows that we can rely on them figuring out the energy efficiency part just fine.

Uhhh... not really. Battery life has been largely hit-or-miss on iDevices, as exemplified by this utter monstrosity:

Spoiler

StDragon · April 7

2 hours ago, Sauron said:

Uhhh... not really. Battery life has been largely hit-or-miss on iDevices, as exemplified by this utter monstrosity:

Emphasis mine...

That was an issue, but primarily due to 3rd party apps such as Facebook and WeChat running in the background consuming CPU cycles and thus battery. iOS itself wasn't the problem.

Anyways, Apple addressed this providing "Battery Usage by App" status within iOS --> Settings --> Battery. Also, the A series have improved in both performance and battery efficiency thanks largely in part due to the process node they're fabbed with.

Sauron · April 7

1 hour ago, StDragon said:

That was an issue, but primarily due to 3rd party apps such as Facebook and WeChat running in the background consuming CPU cycles and thus battery. iOS itself wasn't the problem.

iOS was not the problem, the insufficient battery size was. Apple hardware and software is not magic, if the battery is too small you'll get bad battery life and, if you overuse the hardware by running heavy LLMs, that battery life will be shortened further.

hishnash · April 8

3 hours ago, Sauron said:

if you overuse the hardware by running heavy LLMs, that battery life will be shortened further.

Apple solution for this over the years has been to let devs (and the OS) push as much heavy work to be async dispatch to later when the user has the device attached to power (overnight charging etc)

I would not at all be surprised to see some form of LLM slicing that uses the on device data to do re-informent training and model reduction when charging so that the model that runs on demand is tiny but still has the needed user data within it. Apple have been encouraging devs to do this for the last few years already and they do this for the photo ML workloads they have.

hishnash · April 8

On 4/7/2024 at 4:59 AM, StDragon said:

Apple bought DarwinAI earlier this year, and the firm’s employees have joined the iPhone maker’s AI division, according to Bloomberg, which cites “people with knowledge of the matter, who asked not to be identified because the deal hasn’t been announced.”

If you call your company `DarwinAI` then you clearly have your targets set on being purchased by apple.

StDragon · April 8

27 minutes ago, hishnash said:

If you call your company `DarwinAI` then you clearly have your targets set on being purchased by apple.

It was a very smart move to sell. They got the biggest payout from Apple than they would otherwise have gotten going at it alone.

There's nothing special about what they've done. The only reason Apple bought them is to get a head-start. Google and Huawei will be doing the same thing with their phones too.

Sauron · April 8

6 hours ago, hishnash said:

Apple solution for this over the years has been to let devs (and the OS) push as much heavy work to be async dispatch to later when the user has the device attached to power (overnight charging etc)

doesn't work for interactive tasks

6 hours ago, hishnash said:

I would not at all be surprised to see some form of LLM slicing that uses the on device data to do re-informent training and model reduction when charging so that the model that runs on demand is tiny but still has the needed user data within it. Apple have been encouraging devs to do this for the last few years already and they do this for the photo ML workloads they have.

afaik that's not really a thing you can do... if there's retraining involved it can be done asynchronously but the meat of the operation would still be the inference of a response, which you can't get from just cutting down the model (if we knew which specific parts of the network are needed to answer specific types of question this would be a whole lot easier...).

To be fair I'm not specialized in neural networks so maybe it's possible and I just don't know about it.

Dracarris · April 8

15 hours ago, Sauron said:

Uhhh... not really. Battery life has been largely hit-or-miss on iDevices, as exemplified by this utter monstrosity:

That's advanced cherry-picking on your end, congrats. How old is the iphone shown in this picture? Battery life has been anything but hit or miss for many many years.

And, btw, battery life per mWh is a thing, and so is a less or more optimized SW/HW stack. It's not magic, it's engineering.

Sauron · April 8

34 minutes ago, Dracarris said:

And, btw, battery life per mWh is a thing, and so is a less or more optimized SW/HW stack. It's not magic, it's engineering.

The hardware is kind of beside the point since we're comparing software processes on the same device. Obviously more efficient hardware will draw comparatively less battery power, but it will still be proportionally constrained by the capacity of the battery itself.

The question here is whether running an LLM like this won't draw significantly more power (on the same device of course) compared to a more constrained, but arguably more than adequate, assistant chatbot. And my impression is that no amount of "engineering" will significantly change that ratio, because there just isn't much you can do in terms of software optimization to reduce that workload. Maybe with specialized hardware acceleration the difference won't be that large, but it remains to be seen and it's certainly not a given as you seem to imply.

45 minutes ago, Dracarris said:

That's advanced cherry-picking on your end, congrats. How old is the iphone shown in this picture? Battery life has been anything but hit or miss for many many years.

Here, have a graph of battery life by generation:

https://www.statista.com/statistics/1308110/iphone-battery-life-comparison-by-model/

you'll notice that while overall the trend has been increasing (mainly due to the batteries just getting larger, once apple got over the obsession with hyperthin phones and tiny screens) it varies year by year, with the 12 for example having a significantly shorter battery life than the 11, and notably abysmal performances from SE models and some top end ones even long after the example I gave - in fact, the 8, X and Xs had even worse battery lives than the 6 and 6s, which spawned the camel back battery cover.

Dracarris · April 8

31 minutes ago, Sauron said:

Maybe with specialized hardware acceleration the difference won't be that large, but it remains to be seen and it's certainly not a given as you seem to imply.

Well unfortunately, Apple cannot design such specialized acceleration hardware in-house, right.

31 minutes ago, Sauron said:

Here, have a graph of battery life by generation:

https://www.statista.com/statistics/1308110/iphone-battery-life-comparison-by-model/

you'll notice that while overall the trend has been increasing (mainly due to the batteries just getting larger, once apple got over the obsession with hyperthin phones and tiny screens) it varies year by year, with the 12 for example having a significantly shorter battery life than the 11, and notably abysmal performances from SE models and some top end ones even long after the example I gave - in fact, the 8, X and Xs had even worse battery lives than the 6 and 6s, which spawned the camel back battery cover.

That statista statistics is not accessible without a subscription. But any of these phones had battery life for well over a day unless the battery was severly degraded. The SEs were an exception, I can tell from own experience that an SE 2020 was a bit on-edge for whole-day battery after 3 years without a battery replacement. Still far from abysmal performance as you try to paint it, and the fact you are still referring to models as old as the 8 and even the infamous 6S is showing. For X and Xs I know people that operate these phones to this day without any battery life issues. And I personally fully support the strategy of properly optimizing the phone instead of just slapping in a 25Wh battery, making it cookable via ultra-turbo fast charging plus-ultra (tm) and calling it a day.

Sauron · April 8

14 minutes ago, Dracarris said:

Well unfortunately, Apple cannot design such specialized acceleration hardware in-house, right.

It may not even be currently possible to make it that efficient, engineering is not magic as you said. Maybe they can do it, maybe not - I only take issue with people just assuming they can as if by magic.

16 minutes ago, Dracarris said:

That statista statistics is not accessible without a subscription.

I

Spoiler

Here you go, I don't have a subscription so I'm not sure what's going on on your end.

25 minutes ago, Dracarris said:

the fact you are still referring to models as old as the 8 and even the infamous 6S is showing.

I referred to models as recent as the Xs and 12...

28 minutes ago, Dracarris said:

Still far from abysmal performance as you try to paint it

Less than half the battery life as competitors and even other models from the same company is abysmal in my eyes.

Spoiler

I'm sure it's fine for many use cases, but comparatively it's terrible and Apple seems to agree, as evidenced by the camel case.

18 minutes ago, Dracarris said:

And I personally fully support the strategy of properly optimizing the phone instead of just slapping in a 25Wh battery, making it cookable via ultra-turbo fast charging plus-ultra (tm) and calling it a day.

Again you're acting as though optimization is magic. If the phone is doing things then it will consume battery charge, this is just physics; you can make it a bit more efficient by not running pointless operations but it's not like android phones don't do that... aside from poorly made apps, which are not platform specific, there's only so much you can do in software if you want the device to carry out a given task. You can clearly see just across iphones how much of a difference simply having a larger battery makes. Unless you're about to argue that the software on the Xs was just particularly unoptimized, despite running many of the same iOS versions as other models that came out shortly before and after.

Dracarris · April 8

26 minutes ago, Sauron said:

I referred to models as recent as the Xs and 12...

You specifically stressed and cherry-picked the 8 and 6S. And your "recent" Xs is 5.5years old at this point, and the 12 has more than jolly fine battery life.

26 minutes ago, Sauron said:

Less than half the battery life as competitors and even other models from the same company is abysmal in my eyes.

Under what conditions? On paper? A quick Google search says the X has 11hours or 660min of battery life, and not 560 as claimed in your graph. I actually bought a heavily used X as a backup phone last year after I broke my SE and it easily lasted a full day for me. So even if the competition has such better battery life as you claim, I honestly don't give a flycing fuck if a 6.5year old phone can easily bring me through the day and I'd rather take a smaller and lighter phone.

26 minutes ago, Sauron said:

Reveal hidden contents

That's 58 to 75h of battery life, or approx 3 days. I really doubt that all these phones were tested in the same way as the iphones in the other graph, or that the way these were tested is meaningful for actual usage scenarios.

And with all that said, it's still almost solely peeps with Android phones that I constantly see charging their phones while sitting at their desk at work or always having a charger or power bank with them. My personal anecdotal evidence for sure, but still a bit strange IMHO.

26 minutes ago, Sauron said:

Again you're acting as though optimization is magic.

Not at all. Try reading my replies properly. I never even touched the word magic, it's just the typical scheme of putting those words in mouths of people that remotely speak positive about an Apple product.

Good engineering goes a long way. It's not magic, it's good SW/HW design and physics. Energy spent for an operation can be drastically lowered and optimized if you're willing to put in the engineering effort/resources and cost.

Just one more time: It's not magic, and please don't ever again try to make it look as if I've said or implied that.

Sauron · April 8

12 minutes ago, Dracarris said:

Under what conditions? On paper? A quick Google search says the X has 11hours or 660min of battery life, and not 560 as claimed in your graph.

I'm using the graph for comparisons with itself and other stats from the same source. I'm not interested in specific numbers because of course these stats depend on the type of workload. All I'm talking about is relative performance; whatever these tests were, the X had worse performance in them than the 6/s and vastly lower than, say, a 13 or 14.

14 minutes ago, Dracarris said:

So even if the competition has such better battery life as you claim, I honestly don't give a flycing fuck if a 6.5year old phone can easily bring me through the day and I'd rather take a smaller and lighter phone.

Anecdote, irrelevant. The point being argued is whether iphone battery life is consistent (it isn't) and consistently better than competitors (it also isn't). I don't care if you personally thought it was enough.

18 minutes ago, Dracarris said:

That's 58 to 75h of battery life, or approx 3 days.

...no it's not...? 1800 minutes is 30 hours...

23 minutes ago, Dracarris said:

I really doubt that all these phones were tested in the same way as the iphones in the other graph

It's the same source so... why would you assume that? Are you just so invested in iphones having a better battery life that you'll ignore data to the contrary?

24 minutes ago, Dracarris said:

And with all that said, it's still almost solely peeps with Android phones that I constantly see charging their phones while sitting at their desk at work or always having a charger or power bank with them. My personal anecdotal evidence for sure, but still a bit strange IMHO.

Yeah, it is just your personal anecdotal impression, so I'll just ignore it. Maybe you don't do that but a lot of people, myself included, keep their phones under charge while at their desk even when they're not low on battery - because why not?

Not to mention, it's not like all android phones have equally amazing battery life, and I never argued as much. The android market is vast and diverse, offering models across all price ranges and of varying quality. All I'm saying is that compared to some of the most popular competitors, iphone battery life is not especially impressive and is really bad in some models.

29 minutes ago, Dracarris said:

Not at all. Try reading my replies properly. I never even touched the word magic, it's just the typical scheme of putting those words in mouths of people that remotely speak positive about an Apple product.

I'm saying your usage of the word "engineering" is interchangeable with magic, because you throw it out as a thought terminating clichè. You can't just say "engineering" will solve a problem without explaining how it might do that.

If you say a bridge can't hold because the beams are too thin, I can't just say "engineering" will make it work and expect you to believe me. I'd at least have to show you how the beams might be made stronger, or how adding cables would make them sufficient, or something like that. The same goes here. I'm not even saying it's impossible to make hardware that can run this in a way that doesn't significantly impact battery life, but you haven't really made an argument for the opposite.

Sign In

Apple Siri powered by ReaLM LLM

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites