🗞️PatelBytes🗞️: Cloudflare Release Edition

Draft Model

                April 13, 2025

            🗞️PatelBytes🗞️: Cloudflare Release Edition

                Cloudflare is amazing!

Cloudflare: Jack of all trades, Master of All?
Cloudflare had some interesting releases. If you are not following, Cloudflare is bringing out products in the non-CDN products such as AI, Big Data, UI deployment etc.
I'll give overview of the non-CDN related releases from Cloudflare. 
Side note: Cloudflare just made my week bad. So many releases, I just wrote down on things I know about. Read more on their blog here or follow someone else. Sorry.
1. Worker AI
a. Async Batch API
The new async Batch API allows for queuing request which can be batched together. Why is this needed? Good question, for many usecases such as long PDFs or LLM inference (aka LLM responses), this API can help in giving response at a later stage rather than failing instantly.
It is as simple as enabling a flag, which is queueRequest.
let res = await env.AI.run("<your-llm-provider", {
  "requests": [{
    "prompt": "Explain mechanics of wormholes"
  }, {
    "prompt": "List different plant species found in America"
  }]
}, {
  queueRequest: true
});

b. Expanded LoRA model
Low-Rank Adaptation (LoRA) is a way to fine-tune open-source models. Cloudflare provided this last year with Llama2 (Meta) and Gemma (Google) model.
YOu can fine tine by leveraging PEFT (Parameter Efficient Fine Finetuning) library along with Hugging face Auto-train library.
Cloudflare has expanded this year to:

Model
Status
Company
Parameter

Deepseek R1
Soon
Deepseek
32B

Gemma 3
Soon
Google
12B

Mistral Small 3.1
Soon
MistralAI
24B

Llama3.2 Vision
Soon
Meta
11B

Llama3.3 Instruct
Soon
Meta
70B, FP8

Llama Guard
Ready
Meta
8B

Llama 3.1
Soon
Meta
8B

Qwen 2.5 Coder
Ready
Alibaba
32B

Qwen
Ready
Alibaba
32B

c. Speculative decoding
LLM inference is the process of serving the LLM models at runtime. As LLM is based on prediction on the next token n+1, Cloudflare optimizes this by adding a Draft Model which predicts n+x. The base model checks the prediction and sees if this can be incorporated. The evaluation of the base model on the draft model is computtionally less expensive, hence leading to faster LLM inference.

2. Data
a. R2 Data Catalog: Apache Iceberg tables
If you don't know, S3 use cases is not just a dumb object storage. It was also be uses as a datastore for OLTP and OLAP workloads. And for the uninitiated on R2 and Apache Iceberg:

Cloudflare R2 is a S3 compatible object storage. 
Apache Iceberg is a format querying analytical data from object storage such as S3 in file formats such as Parquet and ORC. It supports ACID transactions, Time travel queries and Schema evolution.

Why is this an important release?
Because AWS S3 is expensive in terms of egress fees and Cloudflare provides R2 which is S3 compatible with ZERO egress fees!
b. Streaming ingestion with Arroyo and Pipelines
Arroyo is a real-time streaming processing engine for analytical workloads with subsecond querying times for joins, window functions, aggergate functions and OLAP joins on real-time data. It is a improvement over the de-facto streaming engine which is Apache Flink. But Apache Flinnk has a lot of overheads and operational compexity due it it being JVM based.
Cloudflare has bought Arroyo to work together with Pipelines which a Cloudflare product to ingest MBs of data in realtime into Cloudflare R2. 
c. Read Replication with D1
Cloudflare provides a Relation database on edge called D1 (SQLite underneath).
Since D1 was globally distributed, it was extremely fast as users were always close to the database, but it came at the cost of consistency.
So cloudflare added read-replication. It uses D1's Session API along with bookmarking to keep track of the state of data.
Old way of D1 query was like below:

Now with bookmarking, the bookmark mentions which data should be read. All equal to or above the 100 bookmark can be read. This ensures that latest data is queried in order to solve for those pesky consistency bugs.

3. UI all the way
Cloudflare has brought so many releases to make UI dpeloyment a breeze.
Their Pages product is for static pages such as SPAs, native HTMl pages.
And their Workers is like a Serverless platform. But it runs on V8 so a limited runtimes are supported. 

JS and Typescript
Rust (Done via JS plumbing using wasm-bindgen)
Python (With help of Pyodide which is a WASM based runtime for Python)

a. Vite and React Router v7 plugin
Vite is one of the fastest growing bundler tools and cloudflare has provided an adapter to it.
// vite.config.ts
import { defineConfig } from "vite";
import { cloudflare } from "@cloudflare/vite-plugin";

export default defineConfig({
  plugins: [cloudflare()],
});

React Router v7 is based on Vite and provides Full-stack capabilties in its "framework" mode which is different from its SPA-like "data" mode. Framework has additional features such as SSR, SPR(Server Side rendering and Static Pre-rendering respectively)
Cloudflare has provided deployment for React Router v7 too, making it a home for deploying all kinds of UI application.
b. Deploying Next.js using OpenNext
Next.js deployment on non-vercel platform is not an easy path. Sure, hosting simple trivial projects is easy based on their doc but a lot of people see issues due to ever-changing on Next.js internal workings across versions. There is an open issue by Vercel's Lee Robinson to create an adapter to allivite these issues.
OpenNext is a non-Vercel alternative to self-host Next.js. 
It is jointly maintained by Netlify, SST.dev and Cloudflare team.
Most of the features are supported and can be found here. Only caching seems to be not supported.
This makes Next.js Vercel agnostic and can help with removing the cardinal sin of Cloud computing which is "Vendor-lockin"
c. Vendor Lockin for Cloudflare
Ok, I am a hypocrite! I told no "Vendor Lockin" but damn, Cloudflare has everything under the sun.
✅ Run all UI frameworks such as React with Vite, React Router v7, Astro, Hono (Cloudflare Web framework that runs on Deno, Bun etc), Vue.js, Nuxt, Sveltekit.
✅ Run Neext.js using OpenNext Cloudflare adapter
✅ Run any Postgres or MySQL database using Hyperdrive
✅ More supported runtime in Workers 
(Remember it is V8 and is not 100% Node compatible)
✅ Image API for serving optimized images
✅ Preview URLs for UI deployment for feature branches
✅ Use any kind of storage:

S3, you got it with R2
Simple KV, got it with Workers KV
Database with D1
Stateful collaborative apps. Cloudflare has got you covered with Durable Objects
Want to follow AI hype. Workers AI can help you with that. Oh and the cherry on top is RAG. Yeah, Cloudflare had that too with AutoRAG

So go build your million dollar Cloudflare lock-in AI startup.
Have a nice day, and "Be curious, not Judgemental"
PS1: This was written by me for humans. Just used LLMs for learning and research.
PS2: V bgqm ygp, Cdmfc

            Hello there in the web

Don't miss what's next. Subscribe to 🗞️Patel Bytes🗞️:

Continue the conversation:

Model	Status	Company	Parameter
Deepseek R1	Soon	Deepseek	32B
Gemma 3	Soon	Google	12B
Mistral Small 3.1	Soon	MistralAI	24B
Llama3.2 Vision	Soon	Meta	11B
Llama3.3 Instruct	Soon	Meta	70B, FP8
Llama Guard	Ready	Meta	8B
Llama 3.1	Soon	Meta	8B
Qwen 2.5 Coder	Ready	Alibaba	32B
Qwen	Ready	Alibaba	32B