Jump to content

Intel working with the open-source community to take aim at Nvidia in the Enterprise LLM space?

kenial

 

Summary

An interesting repo has appeared on Intel's GitHub page titled "intel-extension-for-transformers". This repo, although having been around for a while, has recently been updated with code that provides a "...toolkit to accelerate Transformer-based models on Intel platforms." Specifically Sapphire Rapids Xeon processors. In short, the code allows for more efficient compression, training and inference of transformer models- something NVidia has been king of for as long as CUDA has been around. Throughout the documentation are many references to open-source transformer projects, such as llama.cpp and stable diffusion making the presence of the repo an interesting conundrum. On the one hand, the open-source "AI" community (I'm not proud of using that term here, but it helps get the point across) has been making leaps and bounds ever since the release of the first Stable Diffusion models last summer and the un-official release (or leak, if you prefer) of the model weights for Meta's LLaMA models by drastically lower the bar to entry using well researched compression methods (8, then 4-bit quantization) and hyper efficient fine-tuning techniques.  

 

spacer.png

 

Quotes

Quote

There is no news article, but here are some quotes from the documentation: 

 

"ITREX Graph is an experimental c++ bare metal LLM inference solution that mainly references and borrows from the llama.cpp project. llama.cpp puts almost all core code and kernels in a single file and use a large number of macros, making it difficult for developers to read and modify...

 

Intel® Extension for Transformers is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular effective on 4th Intel Xeon Scalable processor Sapphire Rapids (codenamed Sapphire Rapids). The toolkit provides the below key features and examples...


...Transformers-accelerated Neural Engine is one of reference deployments that Intel® Extension for Transformers provides. Neural Engine aims to demonstrate the optimal performance of extremely compressed NLP models by exploring the optimization opportunities from both HW and SW."

 

 

My thoughts

To start, I have not verified the claims made in the repo and it is clear that it is more a proof of concept than it is a ready to ship solution. That said, I find this news worthy because it shows that Intel is not taking this "AI" war lying down. They are teaming up with open-source projects to repurpose Xeon processors- that many medium to large-sized businesses might already need, or have- and trying to make them a compelling choice for budget conscious IT departments not wanting to spend an arm and leg on Azure OpenAI fees, or Nvidia H100's. This could be a middle ground where you may not have to use the largest LLM around to achieve your goals (such as a Vector DB context generating chatbot needing little creativity) or as a larger POC project without having to build out any additional infrastructure. This also takes aim at AMD/Epyc chips. Yes, AMD may still have the most compelling option on the market- but can they be as efficient and performant with the soon-to-be-everywhere "AI" workloads? What do you think? 

 

P.S I get if this isn't the right place for this post, I couldn't find anywhere else that seemed appropriate. 

 

Sources

https://github.com/intel/intel-extension-for-transformers/tree/main

https://github.com/intel/intel-extension-for-transformers/blob/main/docs/architecture.md

https://arxiv.org/abs/2211.07715

Link to comment
Share on other sites

Link to post
Share on other sites

Anyone interested in this topic area I suggest watching the latest LevelOne video on ROCm, also mentioned near the end improvements for running on CPU as well as showing off how much better ROCm has gotten in the last 8 months in the video.

Link to comment
Share on other sites

Link to post
Share on other sites

So would these intel tools work with AMD Cards?

Specs: Motherboard: Asus X470-PLUS TUF gaming (Yes I know it's poor but I wasn't informed) RAM: Corsair VENGEANCE® LPX DDR4 3200Mhz CL16-18-18-36 2x8GB

            CPU: Ryzen 9 5900X          Case: Antec P8     PSU: Corsair RM850x                        Cooler: Antec K240 with two Noctura Industrial PPC 3000 PWM

            Drives: Samsung 970 EVO plus 250GB, Micron 1100 2TB, Seagate ST4000DM000/1F2168 GPU: EVGA RTX 2080 ti Black edition

Link to comment
Share on other sites

Link to post
Share on other sites

No, the extension only works with the newest Xeon chips for now- not sure if it would even be compatible with older generations. 

Link to comment
Share on other sites

Link to post
Share on other sites

I don't think this can be properly considered news, but this is just intel adding support for their devices on Hugging face's transformers (which is a framework like pytorch or tensorflow).

 

They have done similar projects for other major frameworks:

https://github.com/intel/intel-extension-for-tensorflow

https://github.com/intel/intel-extension-for-pytorch

 

The two above also work with their GPU offerings instead of being CPU only, which I find to be somewhat dumb, unless all you want is really small models to run at a snail's pace.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×