Two patterns for long jobs on Cloud Run

May 24, 2026

serverlesscloud-runarchitecture

I have two side projects with the same shape: a Next.js app on Vercel that needs to send 20+ second jobs to Cloud Run and stream progress back to the browser. MirrAI is built on after() + Supabase Realtime. Lumorem is built on Cloud Tasks + PartyKit. Here is the comparison.

Why I dropped SSE first

MirrAI shipped the obvious version first. The client POSTs to a generation endpoint, the worker streams progress through Server-Sent Events as it runs. Clean, no extra infrastructure.

client                       Vercel + Cloud Run worker
  │   POST /generation/create     │
  ├──────────────────────────────►│
  │◄── text/event-stream ─────────┤   event: step
  │                               │   data: "preprocessing"
  │                               │   event: step
  │                               │   data: "calling VTO"
  │                               │   event: done
  │                               │   data: { resultUrl }

It broke on mobile. SSE streams drop the moment the client side gets even slightly unreliable, and that is most of the time on a phone:

Going through a subway tunnel.
Wi-Fi handing off between access points.
The screen sleeping and idling the TCP socket.
CGNAT idle timeouts dropping the connection without telling anyone.

The fix would have been client-side reconnect with resume semantics. To resume, the server has to know where this client left off, which means I needed progress state on the server regardless of how it traveled to the client. Once I had that state, the open HTTP stream was doing nothing the state could not do better.

The server side has a quieter version of the same problem. Cloud Run's container runtime contract sends SIGTERM and gives the instance 10 seconds before SIGKILL, and Google is explicit that "graceful termination is therefore not always guaranteed." Rare, but when it happens your SSE handler is in the same boat as the subway-tunnel client from the other end.

The cost side seals it. Vercel function timeouts cap the SSE handler at 60 seconds on Hobby, 300 on Pro. A 90-second animation fits on Pro, but it eats most of that budget for a handler that is mostly idle waiting on Vertex AI. On Cloud Run, SSE works natively over HTTP streaming and the handler keeps CPU while the stream is open, but every connection occupies one slot of the per-instance concurrency budget (default 80) and I am billed for the whole request, including the long stretches where my code is asleep.

Migration: commit d10e18a, "Replace worker SSE with DB progress and Supabase Realtime for resilient mobile generation."

Pattern A: `after()` + DB + Supabase Realtime

This is MirrAI today.

client                    Vercel (Next.js)          Cloud Run worker
  │   POST /generation/create   │
  ├────────────────────────────►│
  │                             │  insert generation row (PENDING)
  │   200 { generationId }      │
  │◄────────────────────────────┤
  │                       after()├── fetch /jobs/try-on ──►│
  │                             │   (AbortSignal 8s)        │ process
  │                             │                           │ insert generation_steps
  │  Supabase Realtime subscribe ───────────────────────────► update generation row
  │◄────────── progress events ──────────────────────────────┤

The Vercel route returns the id immediately, then dispatches the worker as fire-and-forget. AbortSignal.timeout(8_000) is the load-bearing detail: I am not waiting for the worker to finish, only confirming the request was received. Cloud Run keeps running long after Vercel disconnects.

// app/api/generation/create/route.ts
after(async () => {
  try {
    await fetch(`${workerUrl}/jobs/try-on`, {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        Authorization: `Bearer ${workerSecret}`,
      },
      body: JSON.stringify({ generationId, userId, ...payload }),
      signal: AbortSignal.timeout(8_000),
    });
  } catch (err) {
    if (
      (err as Error).name !== "AbortError" &&
      (err as Error).name !== "TimeoutError"
    ) {
      logger.error("worker dispatch failed", err);
    }
  }
});

The worker writes progress as rows. The client subscribes via Supabase Realtime.

// worker/src/lib/progress.ts
export async function step(
  generationId: string,
  message: string,
  order: number,
) {
  await prisma.generationStep.create({
    data: { generationId, message, order },
  });
}

Half a day cost: Realtime channels are RLS-protected. The client has to call supabase.realtime.setAuth(jwt) before subscribing, otherwise the channel silently never delivers events. No error, no warning, just no events. That requirement is in the Supabase docs but not where you would look first.

Pattern B: Cloud Tasks + PartyKit

This is Lumorem.

client       Vercel (Next.js)    Cloud Tasks     Cloud Run worker     PartyKit room
  │  POST /api/audits │
  ├──────────────────►│
  │                   │ createTask
  │                   ├──────────►   ▼
  │  200 { auditId }  │              │
  │◄──────────────────┤              │  POST /audits/run (OIDC)
  │                                  ├──────────────────────►│
  │                                  │                       │  202 ACK
  │                                  │◄──────────────────────┤
  │                                  (queue releases)        │  run audit
  │                                                          │  publish to room
  │  ws://parties/main/<auditId> ─────────────────────────────────────────►│
  │◄──────────────── broadcast ────────────────────────────────────────────┤

The Vercel route enqueues a task. Cloud Tasks delivers it to the worker. Auth is OIDC: the service account email is baked into the task, no shared secret on the wire.

// lib/cloud-tasks.ts
const [task] = await client.createTask({
  parent: client.queuePath(PROJECT_ID, REGION, QUEUE),
  task: {
    httpRequest: {
      httpMethod: "POST",
      url: `${WORKER_URL}/audits/run`,
      headers: { "Content-Type": "application/json" },
      body: Buffer.from(JSON.stringify({ auditId })).toString("base64"),
      oidcToken: {
        serviceAccountEmail: INVOKER_SA,
        audience: WORKER_URL,
      },
    },
  },
});

The worker ACKs 202 immediately so Cloud Tasks releases the slot, then runs the pipeline.

// worker/src/index.ts
app.post("/audits/run", requireOidc, async (req, res) => {
  const { auditId } = req.body;
  res.status(202).json({ accepted: true, auditId });
 
  runAudit(auditId).catch(handleFailure);
});

Progress goes through a PartyKit room (one room per audit id). The DB row stays the source of truth. PartyKit is purely a notification channel.

The 202-ACK is a trade-off. I lose Cloud Tasks' retry-on-5xx; a watchdog cron picks up stuck audits instead. Honest second-guess: for this workload (slow jobs, low concurrency), I should probably ACK at the end and let the queue handle retry. The 202-first pattern earns its keep at concurrency I do not have yet.

Side-by-side

	Pattern A	Pattern B
Setup	Supabase is already in the stack.	Queue, OIDC service account, PartyKit deployment.
Auth on the wire	Shared secret header.	OIDC token signed per task.
Retry on 5xx	None. Worker handles it.	Free from Cloud Tasks, lost if you ACK 202 first.
Queue depth visibility	None. Read the DB.	Native in the Cloud Tasks console.
Where pub/sub lands	Postgres WAL.	One Cloudflare worker per room.
Vendor surface	Vercel + Supabase.	Vercel + Cloud Tasks + PartyKit.
Local dev	Worker on laptop with the same secret.	`LOCAL_WORKER_DIRECT_FIRE=true` bypasses the queue.

How I weigh them

Pattern A has the smaller setup when Supabase is already in the stack. No new infrastructure, Realtime handles fan-out off the WAL for free, and the operational surface stays narrow. The costs are quieter: the progress channel sits on the same Postgres instance running the product, and the dispatch from Vercel to Cloud Run leans on a shared secret in a header.

Pattern B asks for more upfront. A queue to configure, a service account to provision, a PartyKit deployment to keep up. In return, the queue is a component I can see and rate-limit, OIDC removes the shared-secret rotation problem, and pub/sub lives on a runtime built for it instead of a database doing it on the side.

My current lean: when Supabase is already load-bearing, Pattern A is a reasonable default and what I run on MirrAI today. Starting clean, I would probably reach for Pattern B, less out of conviction than out of preference for the kind of visibility a real queue gives me. I would not argue the opposite case is wrong.

The shared move

Both patterns drop the same assumption: the open HTTP connection is not what owns the job. The DB is. The rest is a question of where you pay. On Postgres via Realtime, or on a queue and a relay. Every architecture pays somewhere. The right one is the one that pays where it costs you least.