Skip to content

Commit 88ba53a

Browse files
VinciGit00claude
andcommitted
docs: align CLI and MCP pages with v2 releases
CLI (just-scrape#13): - scrape: document 8 formats, multi-format via comma-separated -f, and new --html-mode / --scrolls / --prompt / --schema flags - search: document --location-geo-code, --time-range, --format - crawl: document -f / --format - Add the Fetch Modes enum table (auto|fast|js|direct+stealth|js+stealth) that replaces the legacy --stealth boolean MCP server (scrapegraph-mcp#16): - Replace the stale v1 tool list with the v2 surface: markdownify, smartscraper, searchscraper, scrape (formats[]), smartcrawler_* (markdown default), crawl_stop/resume, monitor_* lifecycle, credits, sgai_history - Note removal of sitemap and agentic_scrapper - Document SCRAPEGRAPH_API_BASE_URL override and v2 auth headers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent e9876d6 commit 88ba53a

2 files changed

Lines changed: 124 additions & 36 deletions

File tree

services/cli.mdx

Lines changed: 36 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -96,22 +96,42 @@ just-scrape search <query>
9696
just-scrape search <query> --num-results <n>
9797
just-scrape search <query> -p <prompt>
9898
just-scrape search <query> --schema <json>
99+
just-scrape search <query> --format markdown # or html
100+
just-scrape search <query> --location-geo-code <iso> # e.g. us, it, gb
101+
just-scrape search <query> --time-range past_week # past_hour|past_24_hours|past_week|past_month|past_year
99102
just-scrape search <query> --headers <json>
100103
```
101104

102105
### Scrape
103106

104-
Scrape content from a URL in various formats: markdown (default), html, screenshot, or branding. [Full docs →](/api-reference/scrape)
107+
Scrape a URL into one or more of 8 output formats. Multi-format is supported via comma-separated `-f`. [Full docs →](/api-reference/scrape)
105108

106109
```bash
107-
just-scrape scrape <url>
110+
just-scrape scrape <url> # markdown (default)
108111
just-scrape scrape <url> -f html
109112
just-scrape scrape <url> -f screenshot
110-
just-scrape scrape <url> -f branding
113+
just-scrape scrape <url> -f markdown,links,images # multi-format
114+
just-scrape scrape <url> -f json -p "Extract the title" # json format requires --prompt
115+
just-scrape scrape <url> -f json -p <prompt> --schema <json>
116+
just-scrape scrape <url> --html-mode reader # normal | reader | prune
117+
just-scrape scrape <url> --scrolls 3
111118
just-scrape scrape <url> -m direct+stealth
112119
just-scrape scrape <url> --country <iso>
113120
```
114121

122+
#### Formats
123+
124+
| Format | Description |
125+
|---|---|
126+
| `markdown` | Clean markdown conversion (default). Respects `--html-mode`. |
127+
| `html` | Raw / processed HTML. Respects `--html-mode`. |
128+
| `screenshot` | Page screenshot (PNG). |
129+
| `branding` | Extracted brand assets (logos, colors, fonts). |
130+
| `links` | All links on the page. |
131+
| `images` | All images on the page. |
132+
| `summary` | AI-generated page summary. |
133+
| `json` | Structured JSON via `--prompt` (+ optional `--schema`). |
134+
115135
### Markdownify
116136

117137
Convert any webpage to clean markdown (convenience wrapper for `scrape --format markdown`). [Full docs →](/api-reference/scrape)
@@ -132,9 +152,22 @@ just-scrape crawl <url> --max-pages <n>
132152
just-scrape crawl <url> --max-depth <n>
133153
just-scrape crawl <url> --max-links-per-page <n>
134154
just-scrape crawl <url> --allow-external
155+
just-scrape crawl <url> -f markdown # or html, json, etc.
135156
just-scrape crawl <url> -m direct+stealth
136157
```
137158

159+
### Fetch Modes
160+
161+
Use `-m / --mode` on `extract`, `search`, `scrape`, `markdownify`, and `crawl` to choose how pages are fetched. The legacy `--stealth` boolean is replaced by the mode enum below.
162+
163+
| Mode | Description |
164+
|---|---|
165+
| `auto` | Automatic selection (default) |
166+
| `fast` | Fastest, no JS rendering |
167+
| `js` | Full JS rendering |
168+
| `direct+stealth` | Direct fetch with anti-bot bypass |
169+
| `js+stealth` | JS rendering with anti-bot bypass |
170+
138171
### History
139172

140173
Browse request history for any service.

services/mcp-server.mdx

Lines changed: 88 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,16 @@ A production‑ready Model Context Protocol (MCP) server that connects LLMs to t
2222

2323
## Key Features
2424

25-
- 8 tools covering markdown conversion, AI extraction, search, crawling, sitemap, and agentic flows
25+
- Full v2 API coverage: extract, search, scrape, crawl (+ stop/resume), monitor lifecycle, credits, and history
26+
- Uses the v2 API base URL (`https://api.scrapegraphai.com/api/v2`) with `Authorization: Bearer` + `SGAI-APIKEY` headers
2627
- Remote HTTP MCP endpoint and local Python server support
2728
- Works with Cursor, Claude Desktop, and any MCP‑compatible client
2829
- Robust error handling, timeouts, and production‑tested reliability
2930

31+
<Warning>
32+
The MCP server is now on **v2** (`scrapegraph-mcp@2.0.0`). The v1 tools `sitemap`, `agentic_scrapper`, `markdownify_status`, and `smartscraper_status` have been removed. See [scrapegraph-mcp#16](https://github.com/ScrapeGraphAI/scrapegraph-mcp/pull/16) for the migration details.
33+
</Warning>
34+
3035
## Get Your API Key
3136

3237
Create an account and copy your API key from the [ScrapeGraph Dashboard](https://scrapegraphai.com/dashboard).
@@ -164,19 +169,30 @@ python -m scrapegraph_mcp.server
164169

165170
---
166171

172+
## Configuration
173+
174+
The server reads the ScrapeGraph API key from `SGAI_API_KEY` (local) or the `X-API-Key` header (remote). Optional environment overrides:
175+
176+
| Variable | Description | Default |
177+
|---|---|---|
178+
| `SGAI_API_KEY` | ScrapeGraph API key ||
179+
| `SCRAPEGRAPH_API_BASE_URL` | Override the v2 API base URL | `https://api.scrapegraphai.com/api/v2` |
180+
167181
## Available Tools
168182

169-
The server exposes 8 enterprise‑ready tools:
183+
The server exposes the full v2 API surface.
170184

171-
### 1. markdownify
185+
### Content tools
186+
187+
#### markdownify
172188
Convert a webpage to clean markdown.
173189

174190
```python
175191
markdownify(website_url: str)
176192
```
177193

178-
### 2. smartscraper
179-
AI‑powered extraction with optional infinite scrolls.
194+
#### smartscraper
195+
AI‑powered extraction (v2 `/extract`) with optional infinite scrolls.
180196

181197
```python
182198
smartscraper(
@@ -186,8 +202,8 @@ smartscraper(
186202
)
187203
```
188204

189-
### 3. searchscraper
190-
Search the web and extract structured results.
205+
#### searchscraper
206+
Search the web and extract structured results (v2 `/search`).
191207

192208
```python
193209
searchscraper(
@@ -197,53 +213,92 @@ searchscraper(
197213
)
198214
```
199215

200-
### 4. scrape
201-
Fetch raw HTML from a URL.
216+
#### scrape
217+
Fetch a URL using the v2 `/scrape` endpoint with configurable formats.
202218

203219
```python
204-
scrape(website_url: str)
220+
scrape(
221+
website_url: str,
222+
formats: list | None = None, # e.g. [{"type": "markdown", "mode": "normal"}]
223+
content_type: str | None = None,
224+
)
205225
```
206226

207-
### 5. sitemap
208-
Discover a site’s URLs and structure.
227+
### Crawl tools
209228

210-
```python
211-
sitemap(website_url: str)
212-
```
213-
214-
### 6. smartcrawler_initiate
215-
Start an async multi‑page crawl (AI or markdown mode).
229+
#### smartcrawler_initiate
230+
Start a multi‑page crawl. `extraction_mode` defaults to `markdown` in v2 (also supports `html`).
216231

217232
```python
218233
smartcrawler_initiate(
219234
url: str,
220-
prompt: str | None = None,
221-
extraction_mode: str = "ai",
222-
depth: int | None = None,
235+
extraction_mode: str = "markdown", # markdown | html
236+
max_depth: int | None = None,
223237
max_pages: int | None = None,
224-
same_domain_only: bool | None = None
238+
max_links_per_page: int | None = None,
239+
allow_external: bool | None = None,
240+
include_patterns: list[str] | None = None,
241+
exclude_patterns: list[str] | None = None,
225242
)
226243
```
227244

228-
### 7. smartcrawler_fetch_results
229-
Poll results using the returned request_id.
245+
#### smartcrawler_fetch_results
246+
Poll status / results for a crawl.
230247

231248
```python
232249
smartcrawler_fetch_results(request_id: str)
233250
```
234251

235-
### 8. agentic_scrapper
236-
Agentic, multi‑step workflows with optional schema and session persistence.
252+
#### crawl_stop
253+
Stop a running crawl.
237254

238255
```python
239-
agentic_scrapper(
256+
crawl_stop(request_id: str)
257+
```
258+
259+
#### crawl_resume
260+
Resume a paused / stopped crawl.
261+
262+
```python
263+
crawl_resume(request_id: str)
264+
```
265+
266+
### Monitor tools
267+
268+
Replace v1 "scheduled jobs". All monitor operations sit under a single namespace.
269+
270+
```python
271+
monitor_create(
240272
url: str,
241-
user_prompt: str | None = None,
242-
output_schema: dict | None = None,
243-
steps: list | None = None,
244-
ai_extraction: bool | None = None,
245-
persistent_session: bool | None = None,
246-
timeout_seconds: float | None = None
273+
interval: str, # 5-field cron expression
274+
name: str | None = None,
275+
formats: list | None = None,
276+
webhook_url: str | None = None,
277+
)
278+
monitor_list()
279+
monitor_get(monitor_id: str)
280+
monitor_pause(monitor_id: str)
281+
monitor_resume(monitor_id: str)
282+
monitor_delete(monitor_id: str)
283+
```
284+
285+
### Account tools
286+
287+
#### credits
288+
Get the remaining credit balance.
289+
290+
```python
291+
credits()
292+
```
293+
294+
#### sgai_history
295+
Browse paginated request history, optionally filtered by service.
296+
297+
```python
298+
sgai_history(
299+
service: str | None = None, # scrape | extract | search | monitor | crawl
300+
page: int | None = None,
301+
limit: int | None = None,
247302
)
248303
```
249304

0 commit comments

Comments
 (0)