Building Rainfall: A Browser IDE with Live Collaboration and Cloud Execution

13 min read

Overview

Rainfall is a cloud code sandbox — a browser IDE where you create projects, edit code, preview apps live, share with others, and run everything in isolated cloud environments. No local setup. Open a tab, pick a template, start building.

You can create a React project, invite a teammate to edit alongside you, hit Run to spin up a dev server in the cloud, preview the app in an iframe, and drop into a real terminal to debug — all without leaving the browser.

This post isn't a walkthrough of the codebase. It's about what it actually took to get there: the problems that kept me stuck, the wrong turns, the things that looked simple on a whiteboard but weren't, and what I'd tell myself if I started over.

Why I built it

I've used Replit, CodeSandbox, and StackBlitz enough to know what a great in-browser dev experience feels like. I wanted to build my own version — not just an editor in a textarea, but a full workspace: files, tabs, preview, terminal, sharing, and real-time collaboration.

That sounds like a feature list. The real challenge is making those features work together.

Consider a normal session: you're editing App.jsx while a collaborator's cursor moves in the same file. You click Run — files need to reach a remote machine, dependencies install, a dev server starts, a preview URL appears. Meanwhile you want to open a shell into that same machine to check logs or run npm commands. When you close the tab, the VM should die so you're not paying for idle sandboxes.

Each piece is doable on its own. The hard part is that they share state, have different lifetimes, and fail in different ways. Editing is continuous. Runs are batch jobs. Terminals are long-lived streams. Storage is the durable backup. Getting those boundaries wrong means lost edits, stale previews, or orphaned VMs billing in the background.

Rainfall is my attempt to hold all of that in one product.

What I was aiming for

At a high level, Rainfall needed to do six things well:

  • Edit code like VS Code — Monaco editor, file tree, tabs, resizable panels that remember your layout
  • Preview apps in the browser — one click to run React, Express, or static sites and see the result live
  • Run a real terminal — not a log viewer, an actual shell with colors, arrow keys, and Ctrl+C
  • Collaborate live — multiple people in the same project, with cursors, roles, and sensible defaults for who can edit vs watch
  • Share and discover — private sandboxes, public projects on an Explore page, fork someone else's work into your account
  • Run user code safely — isolated cloud environments, not processes spawned on my own API server

Six starter templates cover the common cases: React (JS and TS), vanilla HTML, Express, Node scripts, and Python. I deliberately skipped AI code generation and offline editing for v1 — there was already enough to figure out without adding a copilot.

Under the hood it's a monorepo: a Next.js frontend, an Express API, a background worker for code execution, PostgreSQL for metadata, Cloudflare R2 for files, Redis for run jobs, Liveblocks for collaboration, and E2B for sandboxes. But the product goal was always the experience, not the diagram.

The big challenges

1. Where does the truth live?

This was the first major roadblock and it never fully went away — it just got clearer over time.

When you're editing alone, the model is straightforward: you type, changes debounce to cloud storage, done. But add collaboration and suddenly there are three layers:

  • A live shared document while people are in the room (via Yjs + Liveblocks)
  • Durable storage in R2 for downloads, runs, and when nobody's online
  • Local UI state in the browser — open tabs, scroll position, what's visible in the file tree

The mistake I almost made was treating all three as interchangeable. They're not.

Concrete example: two people co-edit a file. One clicks Run. If the live document hasn't been written to storage yet, the remote VM gets the old version. I debugged this for longer than I'd like to admit because everything looked synced in the editor — both users saw the same text — but the run pipeline reads from storage, not the collab room.

Another example: opening a tab and binding the editor to stale local cache instead of the shared document. User A and User B see different content in the same file. Terrifying.

What I learned: pick one source of truth per mode, and be explicit about the handoff. Live collaboration owns open sessions. Object storage owns persistence and execution. The browser store is just a cache. Before every run and every download, flush everything from the collab layer to storage.

That one rule — flush before run — exists because I got burned thinking "it's synced" when it only synced between browsers, not to the place runs actually read from.

2. Building collaboration myself vs. buying it

My first plan was ambitious: a self-hosted collaboration server (Hocuspocus-style), custom WebSocket authentication, Redis so multiple server instances could share state, a background worker to persist edits to R2. Classic "I'll build the infrastructure" energy.

I sketched the architecture. It was clean. Then I estimated the work: auth on connect, reconnection handling, awareness/cursors, debounced persistence, conflict with the existing REST save path, room lifecycle when sandboxes are created or deleted. Months of plumbing before anyone could co-edit a single line of code.

I switched to Liveblocks for managed real-time sync with Yjs. The trade-offs are real:

  • It's a paid service after the free tier
  • Persistence back to R2 is webhook-driven — roughly once per minute, not per keystroke
  • Active document state lives on Liveblocks while a session is open

But I had two cursors in the same Monaco file in weeks, not months. Presence (who's online, who's on which file), reconnection, and multi-region sync came for free.

What I learned: for a solo project optimizing for "does multiplayer editing work reliably," buying managed infra early is often the right call. Self-hosting Yjs might be cheaper at scale. It isn't cheaper in calendar time. You can migrate later if vendor cost or control becomes the bottleneck. You can't get back the months you'd spend debugging WebSocket edge cases.

3. Running user code without running it on my server

This was the scariest problem and the one with the highest stakes.

The obvious approach — spawn child processes on the API server, pipe terminal I/O over WebSockets — works in a demo. One user, one project folder, you're careful about paths. It falls apart when you think about production:

  • Isolation: one malicious or buggy npm install shouldn't affect other users
  • Scaling: every active run wants CPU, memory, and open ports on your machine
  • Cost: long-running dev servers don't stop when the user closes the laptop
  • Preview URLs: Vite on localhost doesn't help if the browser can't reach it

I explored a Kubernetes-style path — pods per sandbox, ingress per preview URL, a controller watching a job queue. Architecturally sound. Weeks of cluster setup, image builds, and network policy before a single React app previewed. Overkill for proving the Run button.

I went with E2B — managed sandboxes with a clean SDK. The flow:

  1. User clicks Run → API enqueues a job on Redis and returns immediately
  2. A separate worker process picks up the job, downloads files from R2, creates a fresh E2B VM, installs deps, starts the dev server
  3. Worker reports status back: syncing → installing → running, with a preview URL when ready
  4. The web app polls until the iframe can load

The API never executes user code. It just enqueues and tracks state. When the user leaves the editor, the run stops and the VM is killed.

What I learned: don't run user code where you run your API. Outsource isolation to something purpose-built. Design the queue so the execution backend is swappable — E2B today, Kubernetes tomorrow — but never pretend child_process is a sandbox.

First roadblock after wiring this up: Vite binds to localhost by default, so E2B's port forwarding couldn't reach the dev server. Small detail, completely blocked preview until starter templates used --host 0.0.0.0. The kind of bug that only shows up end-to-end.

4. The terminal almost became its own product

Once runs worked, I wanted a real shell in the browser. Not captured stdout from a one-shot command — a PTY where you can cd, rerun npm, hit Ctrl+C, use arrow keys in vim if you're brave.

The roadblock: E2B's terminal API requires an API key. Browsers can't hold that key. Something on the server has to authenticate the user, connect to the already-running sandbox, and relay bytes both ways.

I considered putting WebSockets on the run worker. It already talks to E2B. But that worker is a background job processor — it consumes Redis queues, not user sessions. Adding long-lived WebSocket connections, cookie auth, and reconnect logic would turn it into a second API server.

The Express API already handles auth for everything else. The terminal gateway lives there: WebSocket upgrade, verify session, confirm the user owns the sandbox and a run is active, bridge to E2B's PTY.

Nested roadblock: v1 sent terminal I/O as JSON with base64-encoded payloads. Typing felt mushy — noticeable lag on every keystroke. Switching to raw binary WebSocket frames fixed it. Nobody writes blog posts about binary vs base64, but it's the difference between "this feels broken" and "this feels like a real terminal."

Another one: Next.js rewrites proxy HTTP API calls to Express, but they don't reliably proxy WebSocket upgrades. The terminal has to connect directly to the API origin. Easy to miss in local dev when everything is on localhost.

What I learned: streaming features (terminal, live logs) have different lifetimes than batch features (sync files, install, start server). Don't force them through the same pipe. And the small feel details — latency, reconnect, resize — matter as much as "does bytes flow."

Stop-on-leave was its own mini-crisis: users navigated away and E2B sandboxes kept running. Now leaving the editor schedules a deferred stop (with a grace period for React remounts in dev). Obvious in hindsight.

5. Collaboration UX is a product problem, not just a tech problem

Getting two cursors in the same file is the easy part. The hard part is deciding what viewers should see.

Three roles emerged: owner (full control), editor (can change code, can't run or delete the sandbox), viewer (read-only, live sync). Share links let you invite someone as "can edit" or "view only."

Editors need independent tabs — they're pair programming, not watching a screen share. If you sync everyone's tab list into the shared document, three people with three files open creates a tug-of-war. Messy and wrong.

Viewers need to follow along when the owner switches from App.jsx to styles.css during a demo. But they shouldn't force editors onto the same tab.

The solution that worked: viewers follow the owner's active file through presence — lightweight signals about who's looking at what — not through shared document state. A subtle "Following owner" label in the header. Editors ignore it entirely.

What I learned: not everything belongs in the CRDT. Some behaviors are better as presence signals. Simpler to build, simpler to explain to users, easier to change later (there's even an unfollow toggle now). Product rules like "viewers watch the owner, editors work independently" shouldn't leak into document structure.

Permissions had their own friction: Liveblocks room access has to stay in sync with Postgres share rows. Revoke someone's edit access → update the room ACL → their next token refresh disconnects them. Easy to forget one side of that pair.

6. Monaco + real-time sync is finicky

Monaco feels rock solid until you bind it to a collaborative document with y-monaco.

Problems I hit in order:

  1. Sync timing: attach the binding before the shared doc has loaded from the room, and an empty Y.Text can overwrite real file content from storage. Wait for sync first.
  2. Tab remounts: switching tabs destroys and recreates the Monaco instance. Bindings must tear down on unmount or you get ghost cursors and duplicate listeners.
  3. Large files: syncing a 2 MB JSON file through CRDTs makes everything sluggish. Files over ~512 KB skip collab binding and use normal saves instead.
  4. First open race: two users open a new sandbox simultaneously, both try to seed the room from storage. Needed server-side seeding with a lock so one wins cleanly.

Each of these produced a "where did my code go?" or "why is this empty?" bug report from my own testing before any user saw it.

What I learned: CRDT + Monaco is powerful and unforgiving. Sync-before-bind, destroy-on-unmount, size caps, and explicit seeding aren't polish — they're data integrity. Budget time for this; it's not a config flag.

Wrong turns and compromises

Not everything was a clean decision. Some paths I almost took, some trade-offs I accepted knowing they'd hurt later.

Almost building a collab server. The architecture doc was beautiful. The timeline wasn't. Would have worked eventually. Wouldn't have shipped this year.

Almost running terminals locally on the API. Fastest way to a demo. Would have made public deployment a security and cost non-starter — one backend instance per active session adds up fast.

Webhook persistence lag. Collaborative edits flush to R2 on a delay (~once per minute via webhook). Runs and downloads force an explicit flush, so execution is safe. But if you hard-refresh expecting files on disk to match the collab room exactly, you might get a surprise. Acceptable for now; not invisible to power users.

Public preview URLs. E2B gives you a host like https://5173-xxx.e2b.app — works instantly, no setup. Also not behind Rainfall auth. Fine for MVP and personal projects; needs a proxy or token if you want private previews.

Console vs preview sandboxes. React sandboxes auto-start Vite on Run. Node and Python sandboxes only sync files and install deps — you run npm start or python main.py yourself in the shell. That split made sense technically (not everything is a web server) but required a UX pass so users didn't click Run and wonder why nothing appeared in the preview panel.

Six templates at once. Should have stayed on React-only until the run pipeline was boring. Expanding to Express, vanilla, Python, etc. surfaced edge cases — different ports, no install step, pip vs npm — that multiplied testing surface.

How it came together

I didn't build everything at once. Each phase had a gate: if this doesn't work, the next thing is pointless.

Phase 1 — Editor and storage. Create a sandbox from a template. Edit files. Tree operations. Download as zip. Session state (open tabs) persists. Boring and essential — if saving is flaky, nothing else matters.

Phase 2 — Sharing. Public/private visibility, share links, Explore page, fork. This forced a real permission model early instead of bolting it on after collab.

Phase 3 — Collaboration. Liveblocks rooms, y-monaco, presence avatars, viewer follow, collaborative file tree. The hardest user-facing feature. Had to work before runs, because runs depend on a reliable flush from collab → storage.

Phase 4 — Execution. Redis queue, E2B worker, preview panel, run/stop, status polling. First time the Run button actually did something. React-only first, then other templates.

Phase 5 — Terminal. WebSocket PTY bridge, xterm in the shell panel, reconnect, resize. Only makes sense once runs produce a live VM to attach to.

Phase 6 — Hardening. Rate limits, webhook signature verification, seed locks, stop-on-leave, stale tab when someone deletes a remote file, connection banner when Liveblocks drops. The unglamorous work that makes it feel production-ish.

The order wasn't accidental. Collaboration before runs. Runs before terminal. Vertical slices over big-bang integration.

What I'd do differently

Design flush paths from day one. Even before collaboration existed, I should have treated "what happens immediately before execution" as a first-class flow — not an afterthought bolted on when runs returned stale code.

Document transport boundaries early. REST for CRUD, Liveblocks for collab, Redis for run jobs, WebSocket for terminal. Four ways data moves. Each is correct for its job. Future features need to land in the right bucket — don't stream terminal bytes through the collab doc.

Test unhappy paths before happy paths feel done. Owner deletes a file a collaborator has open. Liveblocks disconnects mid-typing. User double-clicks Run. User closes the laptop with a VM running. These scenarios drove most of the hardening phase and would have been cheaper to catch earlier.

Ship one template type first. React-only until run + preview + terminal is boring. Then add Express, Python, etc. I expanded too early and paid in edge-case debugging.

Invest in "feel" earlier for the terminal. Binary frames, reconnect, fit-on-resize — users forgive a slow preview once. They don't forgive a laggy shell.

Where it stands today

Rainfall works end to end. You can sign up, create a sandbox, edit alone or with collaborators, share with view/edit roles, run a live preview, open a shell into the running environment, publish to Explore, and fork public projects.

What's not there yet: AI-assisted coding, auth-gated preview URLs, offline editing, aggregated run logs for console projects. The core loop — edit, collaborate, run, debug — is solid.

Deployment is realistic without heroic ops: web frontend on Vercel or similar, API and run worker on Railway/Fly/a small VPS, managed Postgres and Redis, R2 for files. No per-user backend instance. E2B sandboxes spin up on demand and die when you leave. That's a deliberate contrast to architectures where every session needs a dedicated server process — those don't scale down to solo-maintainer projects easily.

Takeaways

The product is the integration, not any single feature. An editor alone is a weekend project. An editor plus preview plus terminal plus collaboration plus safe execution is where the months go. The bugs live in the seams.

Isolation is non-negotiable. If user code runs on your infrastructure without real sandboxing, you're in the security business whether you intended to be or not. Outsource it or invest heavily — there's no casual middle ground.

Buy the hard infra, build the product. Liveblocks and E2B cost money. They bought back time I would have spent on sync servers, container orchestration, and PTY isolation. Worth it for velocity on a small team of one.

UX rules belong outside the sync layer. Viewer follow-owner, role permissions, owner-only run — product decisions that shouldn't require changing document schema or CRDT structure.

Flush before execute. If your run pipeline reads from storage and your editor writes to something else, you need an explicit, tested handoff. Assume it will fail until proven otherwise.

Ship the vertical slice. One sandbox type, one successful collab session, one run with preview, one terminal session — then expand. Depth before breadth.

Try it

Rainfall is open source. Clone the repo, copy the env examples, start the web app, API, and run worker, build the E2B templates once, and open a React sandbox. Share it with a second browser profile to see collab in action.

Check out the full code on GitHub.


Building something similar? I'd love to hear from you.