Anyone who is somewhat familiar with Nvidia’s (NVDA) strategy knows that the company is all-in on AI, both with regards to software and hardware. AI was the center technology around which Nvidia’s claim for a $1 trillion market opportunity was based at its recent investor day. In the future, Nvidia envisions that all enterprises will build AI data centers.
However, Nvidia isn’t the only company pursuing this market opportunity. After many years, Intel (NASDAQ:INTC) is finally getting its AI act together in 2022. This could belucly detrimental to Nvidia’s market share, and converse opens up a very lucrative new market for Intel. In particular, Intel just launched its newest Habana part that blows Nvidia’s current-generation A100 out of the water. This means Nvidia’s AI training leadership is finally getting seriously challenged.
Intel has a multi-pronged AI strategy. Basically, Intel’s strategy has been to infuse all of its silicon with AI capabilities. So when you hear a company like AMD (AMD) make fanfare about putting Xilinx AI in its silicon, Intel has actually been pursuing a similar strategy for years already. In particular:
- In CPUs, Intel has used its AVX-512 leadership and combined this with specific 8-bit and 16-bit AI instructions. In the upcoming Sapphire Rapids Xeon, Intel will finally be putting a full-blown AI accelerator called AMX that makes Nvidia’s midrange AI accelerators obsolete. (Intel claims 30x performance vs. Ice Lake, although I estimate the theoretical TOPS improvement at around 10x.)
- In FPGAs, Intel launched an FPGA with the equivalent of Nvidia’s Tensor Cores a few years ago, although this part is still on 14nm.
- In GPUs, Intel’s upcoming Ponte Vecchio should be about on-par with Nvidia’s upcoming Hopper.
- Besides this “traditional” silicon, Intel also has dedicated Habana NPUs (neural processing unit) for training and inference.
To be clear upfront, I have been very critical about Intel’s Habana strategy and execution. Especially given that Intel will soon have a like-for-like competitor to Nvidia in the form of its Ponte Vecchio GPU, the necessity for the Habana accelerator isn’t quite clear. This is even more so given that the Gaudi part that launched on AWS last year was on an outdated 16nm process.
However, what the Habana Gaudi lacked in performance, it made up for in pricing. Intel/Habana claimed that it delivered a 40% higher performance/price compared to the A100. Habana was able to achieve this in four ways.
First, by being a dedicated NPU, Gaudi lacks all GPU functionality. This allowed for more deep learning hardware, which made Gaudi faster than Nvidia’s 16nm V100 part, and hence reduced the performance delta compared to the 7nm A100. Secondly, not only is Gaudi1 faster than V100, it also has a smaller silicon area, making it cheaper to manufacture. Thirdly, 16nm wafers are inherently cheaper than 7nm wafers, further widening the cost gap. Lastly, Nvidia is using its monopoly position in AI training to the fullest extent by charging ultra-high margin prices. By giving up on some gross margin, Habana was ultimately able to deliver a 16nm part that has a favorable performance per dollar compared to Nvidia’s offerings.
Nevertheless, the obvious next move was to move to 7nm, and this is what Habana just launched with Gaudi2. The remarkable thing here is that Gaudi2 launches just about half a year after Gaudi1 became available in AWS, or 1.5 years after the initial Gaudi1 announcement. So while Habana is still behind in process technology, as the A100 launched about two years ago already, this quick pace does restore some confidence in Habana’s execution.
Gaudi2 triples the amount of cores compared to Gaudi. This allows Habana to claim a 2x performance advantage over the A100, as measured in a number of benchmarks (one shown below). In other words, Habana currently has a definite leadership in AI performance (ignoring Cerebras’ Wafer-Scale Engine). FurthermoreGaudi2 also leapfrog the A100’s 80GB memory and like the first Gaudi still relies on the open Ethernet interconnect instead of Nvidia’s proprietary NVLink:
Gaudi2 triples the in-package memory capacity from 32GB to 96GB of HBM2E at 2.45TB/sec bandwidth, and integrates 24 x 100GbE RoCE RDMA NICs, on-chip, for scaling-up and scaling-out using standard Ethernet.
While people may remark that Nvidia will launch its H100 Hopper GPU in Q3, making Habana’s leadership only short-lived (Nvidia has claimed 3x baseline performance over the A100), Gaudi2 will remain a compelling alternative. Especially since some of the previously mentioned advantages (7nm wafers beings less expensive than 5nm wafers and Habana not trying to get the same gross margins as Nvidia) remain valid.
Perhaps as one caveat, Gaudi2 does have a significantly higher TDP than the A100, so in terms of performance per watt, the difference is smaller.
Ultimately, what’s most bullish about Gaudi2 is that it delivers a 2x performance advantage over the A100 in the same process node. This suggests that Habana simply has a superior architecture that lacks all the legacy bloat due to the A100 being a GPU infused with AI capabilities. So when Habana finally reaches process node time to market parity, it will likely have a clear leadership. (If Gaudi2 had launched in 2020, it would have had a 2x leadership for 2 years instead of 2 months.)
Potential stock impact
A successful foray into AI could not only fuel Intel’s growth, but it could also serve as a key indicator of Intel’s technology leadership. If investors recognize this (along with Intel fixing some of its other issues over the coming years), then perhaps the stock market may reward Intel with a higher multiple, similar to Nvidia and others.
As a reminder, this was one of the two pillars of Pat Gelsinger’s “double double” strategy for the stock price: double the earnings at double the multiple.
The launch of Gaudi2 is mostly a moral victory. On the same process node, Habana claims that it delivers a 2x performance advantage over the A100. However, with Nvidia moving to the H100 next quarter, Habana’s leadership will admittedly be short-lived. Nevertheless, Gaudi is the first line of chips from many startups that came into existence that is seriously challenging Nvidia. The other encouraging sign is that it is launching just six months after Gaudi became available at AWS (although that was with a delay of six months), which suggests that Habana is steadily reducing its process disadvantage. Although AWS hasn’t announced anything regarding its plans to introduce Gaudi2 yet, it should allow Habana to maintain its existing performance per dollar advantage in the cloud.
From a higher level, 2022 marks an important year for Intel’s AI strategy, after many years of development. After Gaudi2, Intel will further launch its Ponte Vecchio GPU (which should have a similar performance as the H100) as well as Sapphire Rapids Xeon CPU with AMX matrix acceleration instructions – those are like Tensor cores inside each CPU, eliminating the need for a separate GPU. In the second half of the year, Intel will also launch Sapphire Rapids with HBM.
In summary, Nvidia is far from the only competitor anymore when it comes to AI, whose leadership in market share mostly concerns the training part of creating AI models, not inference where Xeons have been (MOST) widely deployed for years already. By the end of the year Intel will have not one but three leadership products to compete in this space. Given Nvidia’s premium pricing and margins, the current status quo in market share seems untenable.