ChannelLife New Zealand - Industry insider news for technology resellers
Story image

How NVIDIA is accelerating AI with purpose-built cloud infrastructure

Yesterday

Together AI is reshaping cloud infrastructure to better support artificial intelligence (AI) workloads. The company's Vice President of Engineering, Charles Srisuwananukorn, recently outlined how rethinking every layer of the technology stack has driven efficiency and performance.

"AI workloads are massively parallel to a level that we've rarely seen before," Charles said.

"They typically use specialised hardware, almost always GPUs, and they push this hardware all the way to the edge."

To address this, Together AI has adopted a comprehensive approach — from hardware optimisation to software improvements and managed services.

Hardware foundations

Together AI's infrastructure is built using NVIDIA's latest GPU chips, including the GB200 NVL72, B2100, H2100, H1100, and A100.

"Every time a new chip comes out from NVIDIA, we quickly optimise all of our software and integrate it into our platform," Charles explained.

The GPUs are connected with InfiniBand networks and Spectrum X Ethernet to ensure fast, non-blocking communication. AI-native storage solutions such as Weka and Vast are also integrated to improve access to training data and boost write speeds for checkpointing.

Charles emphasised that much of Together AI's effort is spent refining the hardware itself. "We're constantly poring over network diagrams and making sure that it follows the NVIDIA reference architecture," he said. "We run benchmark after benchmark and test after test to ensure everything is running perfectly and efficiently."

Software advancements

Alongside hardware improvements, Together AI has prioritised optimising its software to fully utilise GPU capabilities. A key development has been the company's proprietary Together Kernel Collection, which significantly improves model training and inference speeds.

"Flash Attention, created by our chief scientist Tri Dao, is a great example of our work," Charles said. "By understanding hardware deeply and writing software tailored to it, we've achieved significant gains in efficiency and performance."

Flash Attention speeds up large language model (LLM) training by up to three times, he explained, helping developers train models faster and more efficiently. Together AI's kernel collection is available to developers to help improve their own projects.

"These kernels can accelerate Llama 270B model training by about 90%," Charles said. "For inference, optimised kernels improve performance by 75% using FP8 and BF16 quantisation."

This comprehensive optimisation strategy allowed Together AI to outperform competitors in adopting new technologies.

"Shortly after DeepSeek R1 was released, Together AI became the fastest inference provider by a significant margin," Charles said. "This success comes from having an optimised stack ready to deploy new models quickly."

Managed services and instant clusters

Recognising that some developers prefer to avoid the complexities of managing infrastructure, Together AI also provides managed services for serverless inference and fine-tuning.

"Our APIs and developer tools provide a seamless experience for deploying AI," Charles explained. "We also offer AI advisory services to ensure customers are using the latest techniques and frameworks effectively."

In addition, Together AI launched Together Instant Clusters, which offer self-service GPU clusters that users can set up within minutes. These clusters are designed for distributed AI workloads such as training and inference and provide performance equivalent to traditional bare-metal deployments.

"We've benchmarked Together Instant Clusters repeatedly to ensure they provide bare-metal performance," Charles said.

"They also offer flexible deployment options, allowing users to adjust their cluster size and software configurations without long-term commitments."

Driving innovation

Together AI's optimised platform has already delivered significant benefits to forward-thinking companies. Charles pointed out that the rapidly evolving nature of AI demands ongoing refinement of cloud infrastructure.

"As AI continues to evolve with innovations such as reasoning models, we will face new challenges that require us to rethink how we provision, manage and run infrastructure," he said. "That's what we do at Together AI — to help our customers push the boundaries of what's possible."

Follow us on:
Follow us on LinkedIn Follow us on X
Share on:
Share on LinkedIn Share on X