Red Hat unlocks Generative AI

Red Hat AI Inference Server, powered by vLLM and enhanced with Neural Magic technologies, delivers faster, higher-performing and more cost-efficient AI inference across the hybrid cloud.

  • Sunday, 25th May 2025 Posted 16 hours ago in by Phil Alsop

Red Hat has introduced Red Hat AI Inference Server, a significant step towards democratising generative AI (gen AI) across the hybrid cloud. A new offering within Red Hat AI, the enterprise-grade inference server is born from the powerful vLLM community project and enhanced by Red Hat’s integration of Neural Magic technologies, offering greater speed, accelerator-efficiency and cost-effectiveness to help deliver Red Hat’s vision of running any gen AI model on any AI accelerator in any cloud environment. Whether deployed standalone or as an integrated component of Red Hat Enterprise Linux AI (RHEL AI) and Red Hat OpenShift AI, this breakthrough platform empowers organisations to more confidently deploy and scale gen AI in production.

Inference is the critical execution engine of AI, where pre-trained models translate data into real-world impact. It’s the pivotal point of user interaction, demanding swift and accurate responses. As gen AI models explode in complexity and production deployments scale, inference can become a significant bottleneck, devouring hardware resources and threatening to cripple responsiveness and inflate operational costs. Robust inference servers are no longer a luxury, but a necessity for unlocking the true potential of AI at scale, navigating underlying complexities with greater ease.

Red Hat directly addresses these challenges with Red Hat AI Inference Server — an open inference solution engineered for high performance and equipped with leading model compression and optimisation tools. This innovation empowers organisations to fully tap into the transformative power of gen AI by delivering dramatically more responsive user experiences and unparalleled freedom in their choice of AI accelerators, models and IT environments.

vLLM: Extending inference innovation

Red Hat AI Inference Server builds on the industry-leading vLLM project, which was started by University of California, Berkeley in mid-2023. The community project delivers high-throughput gen AI inference, support for large input context, multi-GPU model acceleration, support for continuous batching and more.

vLLM’s broad support for publicly available models – coupled with its day zero integration of leading frontier models including DeepSeek, Gemma, Llama, Llama Nemotron, Mistral, Phi and others, as well as open, enterprise-grade reasoning models like Llama Nemotron – positions it as a de facto standard for future AI inference innovation. Leading frontier model providers are increasingly embracing vLLM, solidifying its critical role in shaping gen AI’s future.

Introducing Red Hat AI Inference Server

Red Hat AI Inference Server packages the leading innovation of vLLM and forges it into the enterprise-grade capabilities of Red Hat AI Inference Server. Red Hat AI Inference Server is available as a standalone containerised offering or as part of both RHEL AI and Red Hat OpenShift AI.

Across any deployment environment, Red Hat AI Inference Server provides users with a hardened, supported distribution of vLLM, along with:

Intelligent LLM compression tools for dramatically reducing the size of both foundational and fine-tuned AI models, minimising compute consumption while preserving and potentially enhancing model accuracy.

Optimised model repository, hosted in the Red Hat AI organisation on Hugging Face, offers instant access to a validated and optimised collection of leading AI models ready for inference deployment, helping to accelerate efficiency by 2-4x without compromising model accuracy.

Red Hat’s enterprise support and decades of expertise in bringing community projects to production environments.

Third-party support for even greater deployment flexibility, enabling Red Hat AI Inference Server to be deployed on non-Red Hat Linux and Kubernetes platforms pursuant to Red Hat’s third-party support policy.

Red Hat’s vision: Any model, any accelerator, any cloud.

The future of AI must be defined by limitless opportunity, not constrained by infrastructure silos. Red Hat sees a horizon where organisations can deploy any model, on any accelerator, across any cloud, delivering an exceptional, more consistent user experience without exorbitant costs. To unlock the true potential of gen AI investments, enterprises require a universal inference platform - a standard for more seamless, high-performance AI innovation, both today and in the years to come.

Just as Red Hat pioneered the open enterprise by transforming Linux into the bedrock of modern IT, the company is now poised to architect the future of AI inference. vLLM’s potential is that of a linchpin for standardised gen AI inference, and Red Hat is committed to building a thriving ecosystem around not just the vLLM community but also llm-d for distributed inference at scale. The vision is clear: regardless of the AI model, the underlying accelerator or the deployment environment, Red Hat intends to make vLLM the definitive open standard for inference across the new hybrid cloud. 

Pulsant launches Partner Cloud

Posted 14 hours ago by Phil Alsop
IaaS offering forms core of new UK partner recruitment drive.

Hitachi Vantara launches Virtual Storage Platform 360

Posted 14 hours ago by Phil Alsop
VSP 360 unifies block, file, object, and software-defined storage into a single, intuitive platform, making it easier for businesses to break down...

CrowdStrike and Ignition Technology expand partnership

Posted 14 hours ago by Phil Alsop
Expanded partnership powers cybersecurity consolidation and partner growth across Benelux, France, Spain, Portugal, Italy and Greece.
EMEA-wide agreement creates new growth opportunities for partners.

HP expands Workforce Experience Platform

Posted 14 hours ago by Phil Alsop
AI-powered platform offers new capabilities across device management, sentiment analysis, and predictive lifecycle planning.

Infosys and LogicMonitor collaborate

Posted 14 hours ago by Phil Alsop
Driving enhanced operational efficiency and observability with Infosys AIOps Insights and LogicMonitor’s Edwin AI.

Guardz joins the Pax8 Marketplace

Posted 16 hours ago by Phil Alsop
AI-native detection and response streamlined for identities, email, devices and data with a user-centric approach, designed for MSPs.

Absolute Security introduces extreme resilience

Posted 16 hours ago by Phil Alsop
Absolute Security has introduced new Extreme Resilience capabilities available in Rehydrate, an Absolute Resilience Platform module. Rehydrate...