Why Replicate is Joining Cloudflare

Technology

Replicate, a platform for running AI models, has joined Cloudflare to build a comprehensive AI infrastructure layer. This partnership combines Replicate's model primitives with Cloudflare's network, Workers, and storage for full-stack AI development.

We are delighted to announce that Replicate has officially joined Cloudflare today.

When Replicate was founded in 2019, OpenAI had recently open-sourced GPT-2. At the time, few outside the machine learning community paid much attention to AI. However, for those of us within the field, it was clear that something monumental was on the horizon. Remarkable models were emerging from academic labs, yet running them required specialized expertise.

Our mission became to democratize these research models, making them accessible to developers. We envisioned programmers creatively adapting and transforming these models into innovative products that researchers might never have conceived.

We approached this challenge as a tooling problem. Just as platforms like Heroku simplified website deployment by abstracting server management, we aimed to build tools that allowed models to be run without requiring deep understanding of backpropagation or troubleshooting CUDA errors.

The first tool we developed was Cog: a standardized packaging format for machine learning models. Subsequently, we built Replicate as the platform to deploy Cog models as API endpoints in the cloud. We abstracted away both the low-level machine learning complexities and the intricate GPU cluster management necessary for large-scale inference.

Our timing proved to be opportune. With the release of Stable Diffusion in 2022, we possessed a mature infrastructure capable of handling the immense developer interest in running these models. Numerous fantastic applications and products were developed on Replicate, often packaging a single model within a sleek user interface to address specific use cases.

Since then, AI Engineering has evolved into a sophisticated discipline. Modern AI applications extend beyond merely running models. Today's AI stack encompasses model inference alongside microservices, content delivery, object storage, caching, databases, telemetry, and more. We observe many of our customers constructing complex, heterogeneous stacks where Replicate models constitute one component within a broader system spanning multiple platforms.

This evolution is precisely why we are joining Cloudflare. Replicate offers the foundational tools and primitives for running models, while Cloudflare provides an unparalleled network, Workers, R2, Durable Objects, and all the other essential primitives needed to construct a complete AI stack.

The modern AI stack fundamentally resides on the network. Models execute on data center GPUs and are interconnected by compact cloud functions that interface with vector databases, retrieve objects from blob storage, communicate with MCP servers, and so forth. The adage, "The network is the computer," has never been more pertinent.

At Cloudflare, we will now be empowered to build the AI infrastructure layer we have envisioned since our inception. This includes capabilities such as running fast models at the edge, deploying model pipelines on instantly-booting Workers, and streaming model inputs and outputs via WebRTC.

We are immensely proud of our achievements at Replicate. We pioneered the generative AI serving platform, establishing abstractions and design patterns that many of our peers have since adopted. We have cultivated a remarkable community of builders and researchers around our product.

Cloudflare's connectivity cloud offers robust protection for entire corporate networks, enables customers to efficiently build internet-scale applications, accelerates websites and internet applications, effectively wards off DDoS attacks, keeps hackers at bay, and supports organizations on their journey to Zero Trust.