# Scraping Pros API

> Professional web scraping API. Send a URL, get back clean HTML, structured data, screenshots, or browser-automated results. Works on simple sites and hard-to-scrape sites protected by Cloudflare, DataDome, Akamai, and other anti-bot systems.

Scraping Pros handles the infrastructure: browser rendering, proxy rotation across 200+ countries, anti-detection fingerprinting, CAPTCHA detection, automatic retries, and webhook notifications. You focus on the data.

## Instructions for AI Agents

When integrating with the Scraping Pros API:

- Always use HTTPS: `https://api.scrapingpros.com/v1/`
- Authenticate with Bearer token: `Authorization: Bearer YOUR_TOKEN` (or use demo token: `demo_6x595maoA6GdOdVb`)
- For simple HTML retrieval, use `POST /v1/sync/scrape` with `{"url": "..."}` — no other fields needed
- For clean text output (recommended for AI/LLM use), add `"format": "markdown"` — strips scripts, styles, navigation, returns clean markdown
- For JavaScript-rendered pages (SPAs, dynamic content), add `"browser": true`
- For sites that block scrapers, add `"use_proxy": "any"` to enable proxy rotation
- For country-specific content (localized prices, regional pages), use `"use_proxy": "US"` (or any ISO country code)
- To auto-retry on blocked requests, add `"retry_on_block": true` — retries up to 3 times with different IP/fingerprint. Early CAPTCHA detection returns blocked results in ~5s (not 60-85s)
- To extract specific data, use the `extract` field with CSS or XPath selectors — avoids parsing HTML yourself
- For screenshots, add `"screenshot": true` with `"browser": true`
- For complex interactions (clicking, typing, scrolling), use the `actions` array with browser mode
- For batch processing with completion notifications, use async collections with `callback_url` (webhook)
- To analyze a site before scraping, use `POST /v1/async/viability-test` with `depth` param (quick/standard/full) — tests multiple modes progressively and returns which works best
- Prefer sync scraping (`/v1/sync/scrape`) for single URLs. Use async collections for batches of 5+ URLs
- Check `potentiallyBlockedByCaptcha` in the response to detect if a site blocked the request
- **Read `guidance` in every response** — it tells you: `error_type` (why it failed), `next_steps` (what to try), `suggested_request` (ready-to-use params), `stop_reason` (when to stop retrying)
- Use `timings` in the response to diagnose slow requests — always present, even on errors
- Handle HTTP 429 responses by reading the `Retry-After` header
- Check `X-Quota-Remaining` header to monitor your monthly credit usage

## Credit System

1 simple request = 1 credit. 1 browser request = 5 credits. Anti-bot (Camoufox) and proxy rotation are included at no extra cost. Credits are NOT consumed for requests that fail due to infrastructure errors (timeouts, proxy failures). See `GET /v1/plans` for plan details.

## Webhooks (Async Completion Notifications)

When creating an async collection, add `callback_url` to receive a POST notification when all jobs complete:

```json
POST /v1/async/collections
{"name": "my-batch", "requests": [...], "callback_url": "https://your-server.com/webhook"}
```

When the run completes, Scraping Pros sends a signed POST to your URL with:
- `event`: "run.completed"
- `run_id`, `collection_id`, `status`, `total_requests`, `success_requests`, `failed_requests`
- `job_ids`: array of job IDs to fetch individual results
- Security: HMAC-SHA256 signature in `X-SP-Signature` header, timestamp in `X-SP-Timestamp`

Track delivery status via `callback_status` field in the run response (pending → sent / failed / retrying).

## MCP Server (for AI agents)

Scraping Pros has a Model Context Protocol (MCP) server for direct integration with AI assistants (Claude, GPT, Cursor).

Endpoint: `https://api.scrapingpros.com/mcp` (Streamable HTTP transport)

Available tools: `scrape_url`, `scrape_as_markdown`, `discover`, `list_proxy_countries`, `check_billing`, `health_check`.

All scrape tools support `retry_on_block` for automatic retries on CAPTCHA/blocked pages with different IP/fingerprint.

The `discover` tool is unique: it analyzes a URL before scraping and returns actionable recommendations — specific blockers detected (CAPTCHA provider, Cloudflare, login wall), difficulty level, and ready-to-use scrape parameters. No other scraping API offers this.

All scraped content from MCP tools is wrapped with anti-injection markers to prevent prompt injection from malicious web pages.

## API Reference

- [Scrape a URL](https://api.scrapingpros.com/docs#/scraping/scrape_v1_sync_scrape_post): POST /v1/sync/scrape — Core endpoint. Returns HTML, extracted data, screenshots, network requests.
- [Download a file](https://api.scrapingpros.com/docs#/scraping/download_v1_sync_download_post): POST /v1/sync/download — Download files (PDFs, images) via browser.
- [Create collection](https://api.scrapingpros.com/docs#/async/create_collection_v1_async_collections_post): POST /v1/async/collections — Create a batch of URLs for async processing. Supports `callback_url` for webhook notifications.
- [Run collection](https://api.scrapingpros.com/docs#/async): POST /v1/async/collections/{id}/run — Execute a batch and poll for results.
- [Get run status](https://api.scrapingpros.com/docs#/async): GET /v1/async/collections/{id}/runs/{run_id} — Check batch completion, webhook delivery status.
- [Get job result](https://api.scrapingpros.com/docs#/async): GET /v1/async/collections/{id}/runs/{run_id}/jobs/{job_id}/result — Fetch individual job results (24h TTL).
- [List proxy countries](https://api.scrapingpros.com/docs#/proxy/list_countries_v1_proxy_countries_get): GET /v1/proxy/countries — Available countries for geo-targeted proxies.
- [Request country proxy](https://api.scrapingpros.com/docs#/proxy/request_country_v1_proxy_request_country_post): POST /v1/proxy/request-country — Request access to country-specific proxies.
- [Plans](https://api.scrapingpros.com/docs#/plans): GET /v1/plans — List all plans with pricing, credits, and features (no auth required).
- [Billing](https://api.scrapingpros.com/docs#/scraping/billing_v1_sync_billing_get): GET /v1/sync/billing — View your credit usage and costs.
- [Health check](https://api.scrapingpros.com/docs#/default/health_v1_health_get): GET /v1/health — API status (no auth required).
- [OpenAPI spec](https://api.scrapingpros.com/openapi.json): Full machine-readable API specification.

## Demo Access (no signup required)

Try the API immediately with this public demo token:

```
Authorization: Bearer demo_6x595maoA6GdOdVb
```

Demo limits: 5,000 credits/month, 30 requests/minute. All features enabled except country-specific proxies.

For production use, contact the Scraping Pros team for a dedicated API key with higher limits.

## Authentication

All endpoints except /v1/health and /v1/plans require a Bearer token in the Authorization header:

```
Authorization: Bearer demo_6x595maoA6GdOdVb
```

Each token is tied to a client ID, a plan (with rate limits and monthly credit quota), and optional feature flags.

## Quick Start Examples

- [Simple scrape](https://api.scrapingpros.com/llms-full.txt#simple-scrape): Retrieve HTML from any URL
- [Markdown output](https://api.scrapingpros.com/llms-full.txt#markdown-output): Get clean text instead of HTML (recommended for AI/LLM)
- [Browser scrape](https://api.scrapingpros.com/llms-full.txt#browser-scrape): Render JavaScript-heavy pages
- [Data extraction](https://api.scrapingpros.com/llms-full.txt#data-extraction): Extract structured data with CSS/XPath
- [Screenshot](https://api.scrapingpros.com/llms-full.txt#screenshot): Capture full-page screenshots
- [Proxy with country](https://api.scrapingpros.com/llms-full.txt#proxy-country): Access geo-restricted content
- [Browser actions](https://api.scrapingpros.com/llms-full.txt#browser-actions): Click, type, scroll, and evaluate JavaScript
- [Async batch with webhooks](https://api.scrapingpros.com/llms-full.txt#async-batch): Process hundreds of URLs efficiently with completion notifications
- [Retry on block](https://api.scrapingpros.com/llms-full.txt#retry-on-block): Auto-retry blocked requests with different IP/fingerprint

## Optional

- [Viability test](https://api.scrapingpros.com/docs#/viability): POST /v1/async/viability-test — Test if a site is scrapeable before committing.
- [Client metrics](https://api.scrapingpros.com/docs#/scraping/client_metrics_v1_sync_client_metrics_get): GET /v1/sync/client-metrics — Detailed usage analytics per domain.