Intel working with the open-source community to take aim at Nvidia in the Enterprise LLM space?

kenial · July 19, 2023

Summary

An interesting repo has appeared on Intel's GitHub page titled "intel-extension-for-transformers". This repo, although having been around for a while, has recently been updated with code that provides a "...toolkit to accelerate Transformer-based models on Intel platforms." Specifically Sapphire Rapids Xeon processors. In short, the code allows for more efficient compression, training and inference of transformer models- something NVidia has been king of for as long as CUDA has been around. Throughout the documentation are many references to open-source transformer projects, such as llama.cpp and stable diffusion making the presence of the repo an interesting conundrum. On the one hand, the open-source "AI" community (I'm not proud of using that term here, but it helps get the point across) has been making leaps and bounds ever since the release of the first Stable Diffusion models last summer and the un-official release (or leak, if you prefer) of the model weights for Meta's LLaMA models by drastically lower the bar to entry using well researched compression methods (8, then 4-bit quantization) and hyper efficient fine-tuning techniques.

Quotes

Quote

There is no news article, but here are some quotes from the documentation:

"ITREX Graph is an experimental c++ bare metal LLM inference solution that mainly references and borrows from the llama.cpp project. llama.cpp puts almost all core code and kernels in a single file and use a large number of macros, making it difficult for developers to read and modify...

Intel® Extension for Transformers is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular effective on 4th Intel Xeon Scalable processor Sapphire Rapids (codenamed Sapphire Rapids). The toolkit provides the below key features and examples...

...Transformers-accelerated Neural Engine is one of reference deployments that Intel® Extension for Transformers provides. Neural Engine aims to demonstrate the optimal performance of extremely compressed NLP models by exploring the optimization opportunities from both HW and SW."

My thoughts

To start, I have not verified the claims made in the repo and it is clear that it is more a proof of concept than it is a ready to ship solution. That said, I find this news worthy because it shows that Intel is not taking this "AI" war lying down. They are teaming up with open-source projects to repurpose Xeon processors- that many medium to large-sized businesses might already need, or have- and trying to make them a compelling choice for budget conscious IT departments not wanting to spend an arm and leg on Azure OpenAI fees, or Nvidia H100's. This could be a middle ground where you may not have to use the largest LLM around to achieve your goals (such as a Vector DB context generating chatbot needing little creativity) or as a larger POC project without having to build out any additional infrastructure. This also takes aim at AMD/Epyc chips. Yes, AMD may still have the most compelling option on the market- but can they be as efficient and performant with the soon-to-be-everywhere "AI" workloads? What do you think?

P.S I get if this isn't the right place for this post, I couldn't find anywhere else that seemed appropriate.

Sources

https://github.com/intel/intel-extension-for-transformers/tree/main

https://github.com/intel/intel-extension-for-transformers/blob/main/docs/architecture.md

https://arxiv.org/abs/2211.07715

leadeater · July 19, 2023

Anyone interested in this topic area I suggest watching the latest LevelOne video on ROCm, also mentioned near the end improvements for running on CPU as well as showing off how much better ROCm has gotten in the last 8 months in the video.

williamcll · July 21, 2023

So would these intel tools work with AMD Cards?

kenial · July 23, 2023

No, the extension only works with the newest Xeon chips for now- not sure if it would even be compatible with older generations.

igormp · July 24, 2023

I don't think this can be properly considered news, but this is just intel adding support for their devices on Hugging face's transformers (which is a framework like pytorch or tensorflow).

They have done similar projects for other major frameworks:

https://github.com/intel/intel-extension-for-tensorflow

https://github.com/intel/intel-extension-for-pytorch

The two above also work with their GPU offerings instead of being CPU only, which I find to be somewhat dumb, unless all you want is really small models to run at a snail's pace.

Sign In

Intel working with the open-source community to take aim at Nvidia in the Enterprise LLM space?

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Featured Topics

Topics

Latest From Linus Tech Tips:

I Was Never Meant to Have This Prototype CPU

Latest From Tech Quickie:

Why Do Speakers Hiss?

Latest From TechLinked:

Intel: “It Wasn’t Me”

Latest From GameLinked:

Is Nintendo being FRAMED?

Latest From ShortCircuit:

How is this even handheld?! - OneXPlayer X1

Latest From Mac Address:

Why did you buy an Apple Vision Pro?

Latest From Channel Super Fun:

I Swapped the CEO's Assistant For a Day!

My Activity Streams