How We Cut Our Demo Hosting Costs to Near-Zero with Scale-to-Zero on Control Plane

control plane railsJune 19, 2026by Justin Gordon

We run a small fleet of live demo apps for React on Rails: a flagship demo, a React Server Components demo, a Hacker News clone, a TanStack starter, a Gumroad-style storefront, and the classic webpack tutorial. They exist so people can click a link and see the thing running - no clone, no bundle install, no waiting.

The problem: a demo that's useful 2 minutes a day still bills 24 hours a day.

If you run demos, staging environments, or review apps, you are the one stuck paying for all that idle time. ShakaCode's job is to guide you to a simpler plan: let the app sleep when no one is using it, wake it on the first click, and make the wait feel intentional instead of broken.

This is the story of how we made every demo's web tier sleep when nobody's looking and wake on demand - cutting the avoidable always-on compute bill for those apps by roughly 80-90% - and how we turned the unavoidable cold-start into a branded proof point instead of a dead-air spinner.

The motivation: paying full price for mostly-idle apps

Across the demo environments we were reserving about 10 vCPU and 22 GiB of memory, running around the clock. Control Plane's pricing is straightforward and usage-based (prices checked in June 2026):

$62.06 per vCPU-month
$7.23 per GiB-month

The expensive, addressable part is the Rails web tier - the rails (and background worker) workloads that actually serve and process requests. Adding those up across the six eligible demos:

Component	Reserved	Monthly ceiling
`rails` web tier (6 demos)	3.4 vCPU / 7.5 GiB	~$265/mo
Background `worker`s (2 demos)	0.4 vCPU / 1.0 GiB	~$32/mo
Addressable total		~$297/mo

That's the reservation ceiling; Control Plane's Capacity AI already trims actual usage below it. But the deeper waste isn't the rate - it's the duty cycle. A staging demo might get a handful of visits a day. It's genuinely needed maybe an hour or two out of every 24. We were paying for the other ~90%.

And that's just the named demos. The same pattern multiplies across review/PR preview environments - full app stacks spun up per pull request, used for a few minutes of QA, then idle until someone tears them down. Idle environments are the silent line item on every staging bill.

The question almost asks itself: why are these awake at all when no one is using them?

The fix: scale to zero, wake on the first click

Control Plane supports scale-to-zero for serverless workloads: set minScale: 0 with a concurrency-based autoscaler and a scaleToZeroDelay, and after a period of no traffic the workload scales down to zero replicas. There is no web replica compute to bill until traffic wakes the workload again.

Our plan had three parts:

Put only the stateless web workloads on HTTP-driven scale-to-zero.
Keep shared datastores small and always available.
Put an edge Worker in front so visitors see a branded wake-up page, not a raw cold-start error.

The web-tier setting is the important bit:

spec:
  type: serverless
  defaultOptions:
    autoscaling:
      metric: concurrency
      minScale: 0
      maxScale: 1
      scaleToZeroDelay: 900 # 15 minutes idle, then sleep

Datastores (Postgres, MySQL, Mongo, Redis, Memcached, Elasticsearch) stay always-on and small - they're shared, they hold state, and they do not receive external HTTP, so a browser visit cannot wake them in this pattern. Background workers we simply suspended. The Node SSR renderer (a separate React on Rails workload that handles server-side React rendering) stays warm at one replica so the first server-render after wake is fast.

So far, so good - except for one thing that turns a clever cost optimization into a bad user experience.

The catch: a cold start is a wait, and waits look broken

When a workload is asleep, the first visitor pays the cold-start cost - the container has to boot. For our apps that ranged from ~15 seconds to ~90 seconds depending on the app's weight. A blank, spinning browser tab for 30 seconds doesn't say "we're saving money," it says "this is broken."

In our tests, Control Plane's serverless layer did not hold that first request and quietly wait. At zero replicas it returned a fast 503 "no healthy upstream" and triggered the scale-up in the background. If you do nothing, your visitor's first impression of your demo is an error page.

So we needed a front door - something at the edge that catches that moment and makes it pleasant.

Turning the wait into a feature: an edge splash

Our demo domains already sit behind Cloudflare Workers, so we put a tiny Cloudflare Worker in front of every demo. Its logic is deliberately simple:

Proxy the request to the Control Plane origin.
If the origin is up, stream the response straight through. (When the app is awake, the Worker is invisible.)
If the origin returns a 503/502/504 or won't connect, the app is asleep. Trigger a wake in the background and return a branded "waking up" page instead.

In code, that shape looks like this:

export default {
  async fetch(request, env, ctx) {
    const originUrl = new URL(request.url)
    originUrl.hostname = env.ORIGIN_HOST

    const originRequest = new Request(originUrl, request)
    const originResponse = await fetch(originRequest).catch(() => null)

    if (originResponse && ![502, 503, 504].includes(originResponse.status)) {
      return originResponse
    }

    ctx.waitUntil(fetch(env.HEALTH_URL).catch(() => null))

    return new Response(renderWakePage(env.APP_NAME), {
      headers: { "content-type": "text/html; charset=utf-8" },
      status: 503,
    })
  },
}

The production Worker also handles details like preserving the original path, setting retry headers, and polling the app's health endpoint from the splash page before reloading.

That splash page is the trick. Instead of a dead spinner, the visitor sees a friendly card that:

explains why there was a pause - "To keep our demos cheap, Control Plane sleeps this app after 15 minutes of inactivity and wakes it on demand,"
shows an elapsed timer and a progress shimmer so it feels alive,
tells them how long it'll stay awake once it's up,
links to a deeper explainer, and
quietly credits the capability: "Powered by Control Plane scale-to-zero."

Behind the scenes, the page polls a health endpoint every couple of seconds and reloads itself the instant the app is ready. The cold start goes from "is this broken?" to "oh, neat - they sleep these to save money, and it's already loading."

A wait you explain is a completely different experience from a wait you don't.

What we shipped

Six of seven demos now use this scale-to-zero path. Real measured cold-start times:

Demo	Wakes in	Notes
flagship	~24s	single Rails app
webpack tutorial	~37s	+ the apex domain, also protected
RSC demo	~42s	Server Components stream through the Worker fine
Hacker News	~15s	renderer kept warm
TanStack starter	~32s	renderer kept warm
Gumroad clone	~91s	heaviest stack; splash sets a 60–90s expectation

The seventh, our changelog demo, is the lone holdout. It stores data on a local SQLite volume, which makes it a poor fit for this HTTP wake-on-first-click pattern until the data moves to a shared datastore.

The savings

The Rails web tier - about $265/month of reservation ceiling - now bills only during the 15-minute window after each visit instead of around the clock. Suspending the background workers removes another ~$32/month. For apps that are genuinely idle most of the day, that's roughly an 80–90% reduction on that compute, on the order of $240-267/month ($2.9k-$3.2k/year) reclaimed on staging demos alone - with zero change to the datastores and no day-to-day babysitting.

The bigger prize is the pattern, not the dollar figure: it generalizes directly to review apps, where idle-time waste is even worse. The customer wins by keeping every preview environment available for QA while refusing to fund idle compute.

Lessons (the sharp edges)

A few things we learned the hard way, in case you're doing this yourself:

You can't change a workload's type in place. Converting standard to serverless is rejected (405: Workload type may not be changed). You have to delete and recreate. We scripted it: back up the live spec, transform it, recreate, and verify the canonical endpoint is unchanged (it is — it's derived from the workload's identity).
Serverless doesn't hold the first request - it 503s immediately and scales up in the background. Don't pre-probe "is it warm?" and then proxy; that races, and a workload scaling down between the two can leak a raw 503. Just try the proxy and treat any 5xx as "show the splash."
Datastores don't belong in this HTTP scale-to-zero path. They're shared, stateful, and receive no HTTP to wake them. Keep them always-on and small. We added guardrails so the tooling refuses to put datastore workloads into the web-tier sleep path.
Health paths differ per app. Newer Rails has /up; older apps don't. Pick a path per app that returns 200 when warm (we used /robots.txt and /login for the two that lacked /up).
Cover the apex. Make sure every hostname that routes to the app — including the bare apex and www — goes through the same front door, or one of them will show the raw cold-start error.

Want this for your apps?

This is exactly the kind of thing Control Plane Flow is built to make easy: Heroku-style review apps and staging on Control Plane.

The plan is straightforward:

Identify the stateless web workloads that spend most of their life idle.
Move them to Control Plane serverless scale-to-zero.
Add the Cloudflare Worker splash so the first visitor understands the wake-up instead of seeing an error.

We're working on making scale-to-zero-with-a-splash a first-class, one-flag option for review apps, so every ephemeral environment can sleep when idle and wake with a branded page. If your CI spins up a preview app per pull request, this is found money - and a ShakaCode team can guide the migration without making your team become infrastructure specialists.

Stop paying for idle. Wake on the click.

Closing Remark

Could your team use some help with topics like this and others covered by ShakaCode's blog and open source? We specialize in optimizing Rails applications, especially those with advanced JavaScript frontends, like React. We can also help you optimize your CI processes with lower costs and faster, more reliable tests. Scraping web data and lowering infrastructure costs are two other areas of specialization. Feel free to reach out to ShakaCode's CEO, Justin Gordon, at [email protected] or schedule an appointment to discuss how ShakaCode can help your project!

Justin Gordon

CEO at ShakaCode

Share this article

Are you looking for a software development partner who can

develop modern, high-performance web apps and sites?

See what we've done