Cloudsteading
Queue #1
Waiting for votes

Firecrawl on Cloudflare

Crawler, markdown extraction, and crawl jobs rebuilt with Workers, Queues, Browser Rendering, D1, and R2.

Firecrawl logo
would replace
Firecrawl · web-scraping

Web crawling and LLM-ready extraction APIs.

How it would work

A URL goes in, an API-ready result comes out. These are the stages the request flows through.

  1. input 01

    Submit URL or crawl seed

    Send a URL, crawl depth, output format, and optional extraction hints.

  2. render 02

    Render when needed

    Use Browser Rendering for JavaScript-heavy pages and plain Workers fetch for simple pages.

  3. extract 03

    Clean into useful output

    Turn pages into markdown, metadata, links, screenshots, and optional structured JSON.

  4. store 04

    Persist crawl history

    Keep job state in D1 and larger artefacts in R2 so results can be inspected later.

  5. deliver 05

    Return API-ready results

    Expose job status, retry failures, and return outputs through a small dashboard/API.

Concrete jobs it does

If you're paying Firecrawl for any of these, the proposed build would let you stop.

  • Scrape one URL into clean, LLM-ready markdown
  • Crawl a small site with depth + page-count limits
  • Render JavaScript-heavy pages (browser only when needed)
  • Extract structured fields with Workers AI
  • Store raw HTML, screenshots, and JSON artefacts in R2
  • Expose an API with key-based quotas and a tiny dashboard
  • Replay or rerun a job from job state in D1

Config knobs

What you'll be able to tune from day one.

Crawl mode
single page / small site / scheduled recrawl
Rendering
auto-detect, browser only when needed
Output
markdown, links, screenshot, structured JSON
Storage
D1 for jobs, R2 for artefacts
Limits
depth, page count, timeout, domain allowlist
Auth
API keys with per-key quota
proposed architecture

The Cloudflare loadout, wired together.

Architecture

Public interface
Surfaces 1
browser
first-party clients
App
Workers
entry
request handler
will build
  • + Single-page scrape API returning cleaned markdown and metadata.
  • + Queued crawl jobs for small sites, with progress state and retries.
  • + JavaScript rendering path for pages that need a browser.
  • + Stored crawl outputs, raw artefacts, and simple API-key based access.
  • + A small dashboard to inspect recent jobs, failures, usage, and outputs.
will not build
  • - Massive managed crawl infrastructure for very large or hostile sites.
  • - A perfect clone of every Firecrawl endpoint and extraction option.
  • - Enterprise compliance, SLAs, team governance, or managed support.
  • - Guaranteed bypassing of bot protection, paywalls, CAPTCHAs, or site restrictions.
  • - Every weird website edge case on day one.
readme

Firecrawl on Cloudflare

Firecrawl is valuable because it turns messy websites into clean, LLM-ready data without making the team own crawling infrastructure. This candidate asks whether the practical small-team version can be rebuilt with Cloudflare primitives while staying honest about scale, reliability, and edge cases.

What this is

Firecrawl is useful because it hides crawl orchestration, page rendering, extraction, retries, and storage behind a simple API.

How we would build it

Use Browser Rendering, D1, Queues, R2, Workers to cover the core workflow without adding rented infrastructure.

The honest limit

Excellent for small teams, internal agents, and controlled crawl workloads. It will not immediately match Firecrawl's managed reliability or every extraction edge case.