Queue #1

Waiting for votes

Firecrawl on Cloudflare

Crawler, markdown extraction, and crawl jobs rebuilt with Workers, Queues, Browser Rendering, D1, and R2.

replaces Firecrawl Browser Rendering D1 Queues R2 Workers Workers AI

would replace

Firecrawl · web-scraping

Web crawling and LLM-ready extraction APIs.

How it would work

A URL goes in, an API-ready result comes out. These are the stages the request flows through.

input 01

Submit URL or crawl seed

Send a URL, crawl depth, output format, and optional extraction hints.
→
render 02

Render when needed

Use Browser Rendering for JavaScript-heavy pages and plain Workers fetch for simple pages.
→
extract 03

Clean into useful output

Turn pages into markdown, metadata, links, screenshots, and optional structured JSON.
→
store 04

Persist crawl history

Keep job state in D1 and larger artefacts in R2 so results can be inspected later.
→
deliver 05

Return API-ready results

Expose job status, retry failures, and return outputs through a small dashboard/API.

Concrete jobs it does

If you're paying Firecrawl for any of these, the proposed build would let you stop.

→ Scrape one URL into clean, LLM-ready markdown
→ Crawl a small site with depth + page-count limits
→ Render JavaScript-heavy pages (browser only when needed)
→ Extract structured fields with Workers AI
→ Store raw HTML, screenshots, and JSON artefacts in R2
→ Expose an API with key-based quotas and a tiny dashboard
→ Replay or rerun a job from job state in D1

Config knobs

What you'll be able to tune from day one.

Crawl mode: single page / small site / scheduled recrawl
Rendering: auto-detect, browser only when needed
Output: markdown, links, screenshot, structured JSON
Storage: D1 for jobs, R2 for artefacts
Limits: depth, page count, timeout, domain allowlist
Auth: API keys with per-key quota

proposed architecture

The Cloudflare loadout, wired together.

Architecture

Public interface

Surfaces 1

browser

first-party clients

→

↓

App

Workers

entry

request handler

→

↓

Bindings

what each tool does

Cloudflare feature

Function in the proposed app

Browser Rendering tool 01

JavaScript page renderer

Render dynamic pages before extracting text, links, screenshots, and page state.

D1 tool 02

Job state database

Store users, API keys, crawl jobs, page status, errors, limits, and dashboard history.

Queues tool 03

Crawl job pipeline

Break a site crawl into retryable page jobs so long-running work does not depend on one request.

R2 tool 04

Crawl artefact storage

Keep raw HTML, screenshots, cleaned markdown, JSON output, and debug artefacts cheaply.

Workers tool 05

API gateway + crawl coordinator

Expose scrape/crawl endpoints, validate API keys, route work, and assemble responses.

Workers AI tool 06

Extraction assistant

Classify pages and extract structured fields when deterministic parsing is not enough.

will build

+ Single-page scrape API returning cleaned markdown and metadata.
+ Queued crawl jobs for small sites, with progress state and retries.
+ JavaScript rendering path for pages that need a browser.
+ Stored crawl outputs, raw artefacts, and simple API-key based access.
+ A small dashboard to inspect recent jobs, failures, usage, and outputs.

will not build

- Massive managed crawl infrastructure for very large or hostile sites.
- A perfect clone of every Firecrawl endpoint and extraction option.
- Enterprise compliance, SLAs, team governance, or managed support.
- Guaranteed bypassing of bot protection, paywalls, CAPTCHAs, or site restrictions.
- Every weird website edge case on day one.

readme

Firecrawl on Cloudflare

Firecrawl is valuable because it turns messy websites into clean, LLM-ready data without making the team own crawling infrastructure. This candidate asks whether the practical small-team version can be rebuilt with Cloudflare primitives while staying honest about scale, reliability, and edge cases.

What this is

Firecrawl is useful because it hides crawl orchestration, page rendering, extraction, retries, and storage behind a simple API.

How we would build it

Use Browser Rendering, D1, Queues, R2, Workers to cover the core workflow without adding rented infrastructure.

The honest limit

Excellent for small teams, internal agents, and controlled crawl workloads. It will not immediately match Firecrawl's managed reliability or every extraction edge case.

Firecrawl on Cloudflare

How it would work

Submit URL or crawl seed

Render when needed

Clean into useful output

Persist crawl history

Return API-ready results

Concrete jobs it does

Config knobs

The Cloudflare loadout, wired together.

Architecture

Firecrawl on Cloudflare

What this is

How we would build it

The honest limit