Highlights from NVIDIA's GTC Conference 2024

NVIDIA announces new Blackwell compute platform for 1 trillion scale Gen AI

In a few short years, NVIDIA’s annual GPU Technology Conference (GTC, for short) has gone beyond a developer event and has become the key venue to find out what’s happening in AI. For the 2024 conference opening, the technology juggernaut’s CEO, Jensen Huang, made several important product, platform and partnership announcements that showcased the company’s dominance in hardware and leadership, especially in the generative AI space. I won’t enumerate all the details on these announcements as these have been covered heavily by the tech community. Here’s a few thoughts and highlights:

Hardware and History

The big hardware unveil was the Blackwell compute platform and new AI chip, christened the B200. For the techies, the new Blackwell architecture links 2 dies with a 10 terabyte/second chip-to-chip link, creating a GPU with over 200 billion transistors. This was the anticipated big leap forward.

NVIDIA is building with immense scale- and demand for such scale- in mind. The newest DGX B200 systems (with 8 B200 GPUs) and DGX superPODs (32 of the DGX B200 systems) enable enormous workloads. The largest tech companies will be lining up to purchase thousands of these to power their data centers.

It is crazy to think that a mere 12 years ago, Alex Krizhevsky, Ilya Sutskever and Geoff Hinton published their stunning results on the ImageNet computer vision challenge utilizing only 2 GPUs (each with 3B transistors) to perform the convolutional operations. The model that was trained for the image recognition task had 60 million parameters. NVIDIA is aiming for customers that will train future models that scale to over 1 trillion parameters. OpenAI’s GPT-4 is thought to have in the neighborhood of 1.76 trillion parameters.  AI companies are seeking such vast compute scale to test and eventually achieve artificial general intelligence, or AGI. NVIDIA’s hardware is integral to that roadmap, at least for those that believe we can achieve AGI with existing algorithms and compute.

Software and Platforms from NVIDIA: NIM

A major component of NVIDIA’s strategy is to build out a complete platform, hardware with software, to allow organizations of any size to jump in and have all the tech they require for their AI journey. The NVIDIA presentations and marketing have made this clear. The most important example of this is the announcement of NIM, NVIDIA’s Inference Microservices. These are meant to be enterprise-grade generative AI microservices that businesses can use to create applications. NIM can be viewed as a container of microservices, where any model (open or proprietary) can be run anywhere. More importantly for businesses, the AI/ML models can be deployed anywhere as well- to the cloud, a server or even in serverless mode. The container is highly optimized for NVIDIA’s GPUs with tech included to improve inference. The NIM microservices are available in NVIDIA AI Enterprise 5.0 (software release was announced at GTC 2024).

Business Impact: RAG deployment models

I wanted to highlight one of the strongest Gen AI use cases that NVIDIA appears to support well is retrieval augmented generation (RAG) deployment models. At the conference and on NVIDIA’s website, key vector database vendors such as Weaviate, Redis, Faiss and others were announced to be supporting NIM. NVIDIA already has NeMo retriever microservices inside of NIM deployments. Integration of NeMo with vector DBs will enhance and make RAG applications more robust.

Use Cases and Industry Presentations

The conference featured tons of presentations on Gen AI applications from nearly every industry. I will feature a few that I attended in another blog post. I was particularly struck by advancements in healthcare and drug discovery.

Check other posts

see all