SNOWFLAKE PARTNERS WITH META TO HOST LLAMA 3.1 MULTILINGUAL OPEN SOURCE LLMS

Snowflake , the AI Data Cloud company , announced that it will host the Llama 3.1 collection of multilingual open source large language models , in Snowflake Cortex AI for enterprises to easily harness and build powerful AI applications at scale . This offering includes Meta ’ s largest and most powerful open source LLM , Llama 3.1 405B , with Snowflake developing and open sourcing the inference system stack to enable real-time , high-throughput inference and further democratise powerful natural language processing and generation applications .

Snowflake ’ s industry-AI Research Team has optimised Llama 3.1 405B for both inference and fine-tuning , supporting a massive 128K context window from day one , while enabling real-time inference with up to 3x lower end-to-end latency and 1.4x higher throughput than existing open source solutions .

Moreover , it allows for fine-tuning on the massive model using just a single GPU node , eliminating costs and complexity for developers and users — all within Cortex AI .

By partnering with Meta , Snowflake is providing customers with easy , efficient , and trusted ways to seamlessly access , finetune , and deploy Meta ’ s newest models in the AI Data Cloud , with a comprehensive approach to trust and safety built-in at the foundational level .

Massive model scale and memory requirements pose significant challenges for users aiming to achieve low-latency inference for real-time use cases , high throughput for cost effectiveness , and long context support for various enterprise-grade generative AI use cases .

The memory requirements of storing model and activation states also make fine-tuning extremely challenging , with the large GPU

By partnering with Meta , Snowflake is providing customers with easy ways to access and deploy Meta ’ s newest models in AI Data Cloud .

clusters required to fit the model states for training often inaccessible to data scientists .

“ Snowflake ’ s world-class AI Research Team is blazing a trail for how enterprises and the open source community can harness stateof-the-art open models like Llama 3.1 405B for inference and finetuning in a way that maximizes efficiency ,” said Vivek Raghunathan , VP of AI Engineering , Snowflake .

“ We are not just bringing Meta ’ s cutting-edge models directly to our customers through Snowflake Cortex AI . We are arming enterprises and the AI community with new research and open source code that supports 128K context windows , multi-node inference , pipeline parallelism , 8-bit floating point quantisation , and more to advance AI for the broader ecosystem .”

In tandem with the launch of Llama 3.1 405B , Snowflake ’ s AI Research Team is now open sourcing its Massive LLM Inference and Fine-Tuning System Optimisation Stack in collaboration with DeepSpeed , Hugging Face , vLLM , and the broader AI community . This breakthrough establishes a new state-of-the-art for open source inference and fine-tuning systems for multi-hundred billion parameter models .

Snowflake ’ s Massive LLM Inference and Fine-Tuning System Optimisation Stack addresses these challenges . By using advanced parallelism techniques and memory optimisations , Snowflake enables fast and efficient AI processing , without needing complex and expensive infrastructure . •

INTELLIGENT TECH CHANNELS 29

Intelligent Tech Channels Issue 79 | Page 29

SNOWFLAKE PARTNERS WITH META TO HOST LLAMA 3.1 MULTILINGUAL OPEN SOURCE LLMS