Skip to content

Commit e9876d6

Browse files
VinciGit00claude
andcommitted
docs: sync SDK pages with final v2 API surface
Align sdks/javascript.mdx and sdks/python.mdx with the current schemas from scrapegraph-js#11 and scrapegraph-py#82: - search(): add locationGeoCode/location_geo_code, timeRange/time_range, prompt, format, mode; correct numResults default to 3 - extract(): drop llmConfig from params (ignored by v2 route); document mode, contentType, html, markdown alternatives to url - scrape(): document the formats[] array (tagged format entries with per-entry config) and add a multi-format example - crawl.start(): document maxDepth/max_depth, maxPages/max_pages, maxLinksPerPage, allowExternal, contentTypes - monitor.create(): drop prompt (not in v2 schema); add formats and webhookUrl/webhook_url - LlmConfig: clarify it belongs inside scrape json/summary format entries, not on extract/search Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent cec634f commit e9876d6

2 files changed

Lines changed: 131 additions & 45 deletions

File tree

sdks/javascript.mdx

Lines changed: 65 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -100,11 +100,18 @@ const { data, requestId } = await sgai.extract(
100100

101101
| Parameter | Type | Required | Description |
102102
| -------------------- | ----------- | -------- | -------------------------------------------------------- |
103-
| url | string | Yes | The URL of the webpage to scrape |
103+
| url | string | Yes\* | The URL of the webpage to scrape |
104104
| options.prompt | string | Yes | A description of what you want to extract |
105105
| options.schema | ZodSchema / object | No | Zod schema or JSON schema for structured response |
106+
| options.mode | string | No | HTML processing mode: `"normal"`, `"reader"`, `"prune"` |
107+
| options.contentType | string | No | Override the detected content type (e.g. `"text/html"`) |
106108
| options.fetchConfig | FetchConfig | No | Fetch configuration |
107-
| options.llmConfig | LlmConfig | No | LLM configuration |
109+
| options.html | string | No | Raw HTML input (alternative to `url`) |
110+
| options.markdown | string | No | Raw markdown input (alternative to `url`) |
111+
112+
<Note>
113+
\*One of `url`, `html`, or `markdown` is required.
114+
</Note>
108115

109116
<Accordion title="With Zod Schema" icon="code">
110117
```javascript
@@ -130,7 +137,7 @@ console.log(`Author: ${data.author}`);
130137
```
131138
</Accordion>
132139

133-
<Accordion title="With FetchConfig and LlmConfig" icon="code">
140+
<Accordion title="With FetchConfig" icon="code">
134141
```javascript
135142
const { data } = await sgai.extract(
136143
"https://example.com",
@@ -141,10 +148,6 @@ const { data } = await sgai.extract(
141148
wait: 2000,
142149
scrolls: 3,
143150
},
144-
llmConfig: {
145-
temperature: 0.3,
146-
maxTokens: 1000,
147-
},
148151
}
149152
);
150153
```
@@ -163,13 +166,17 @@ const { data } = await sgai.search(
163166

164167
#### Parameters
165168

166-
| Parameter | Type | Required | Description |
167-
| -------------------- | ----------- | -------- | -------------------------------------------------------- |
168-
| query | string | Yes | The search query |
169-
| options.numResults | number | No | Number of results (3-20). Default: 5 |
170-
| options.schema | ZodSchema / object | No | Schema for structured response |
171-
| options.fetchConfig | FetchConfig | No | Fetch configuration |
172-
| options.llmConfig | LlmConfig | No | LLM configuration |
169+
| Parameter | Type | Required | Description |
170+
| ------------------------ | ----------- | -------- | -------------------------------------------------------- |
171+
| query | string | Yes | The search query (1-500 chars) |
172+
| options.numResults | number | No | Number of results (1-20). Default: 3 |
173+
| options.prompt | string | No | Prompt used when extracting structured results |
174+
| options.schema | ZodSchema / object | No | Schema for structured response (requires `prompt`) |
175+
| options.format | string | No | `"markdown"` (default) or `"html"` |
176+
| options.mode | string | No | HTML processing mode: `"normal"`, `"reader"`, `"prune"` (default) |
177+
| options.locationGeoCode | string | No | Geo code for localized search (e.g. `"us"`, `"it"`, `"gb"`) |
178+
| options.timeRange | string | No | Recency filter: `"past_hour"`, `"past_24_hours"`, `"past_week"`, `"past_month"`, `"past_year"` |
179+
| options.fetchConfig | FetchConfig | No | Fetch configuration |
173180

174181
<Accordion title="Schema Example" icon="code">
175182
```javascript
@@ -196,7 +203,7 @@ console.log(`Price: ${data.price}`);
196203

197204
### scrape()
198205

199-
Convert any webpage to markdown, HTML, screenshot, or branding format.
206+
Convert any webpage into one or more output formats in a single request.
200207

201208
```javascript
202209
const { data } = await sgai.scrape("https://example.com");
@@ -205,11 +212,30 @@ console.log(data);
205212

206213
#### Parameters
207214

208-
| Parameter | Type | Required | Description |
209-
| -------------------- | ----------- | -------- | -------------------------------------------------------- |
210-
| url | string | Yes | The URL of the webpage to scrape |
211-
| options.format | string | No | `"markdown"`, `"html"`, `"screenshot"`, `"branding"` |
212-
| options.fetchConfig | FetchConfig | No | Fetch configuration |
215+
| Parameter | Type | Required | Description |
216+
| -------------------- | ------------- | -------- | -------------------------------------------------------- |
217+
| url | string | Yes | The URL of the webpage to scrape |
218+
| options.formats | FormatEntry[] | No | Array of format entries. Defaults to `[{ type: "markdown", mode: "normal" }]` |
219+
| options.contentType | string | No | Override the detected content type |
220+
| options.fetchConfig | FetchConfig | No | Fetch configuration |
221+
222+
Each format entry is a tagged object. Supported `type` values: `"markdown"`, `"html"`, `"screenshot"`, `"links"`, `"images"`, `"summary"`, `"json"`, `"branding"`. Entries can carry their own config:
223+
224+
<Accordion title="Multi-format Example" icon="code">
225+
```javascript
226+
const { data } = await sgai.scrape("https://example.com", {
227+
formats: [
228+
{ type: "markdown", mode: "normal" },
229+
{ type: "screenshot", fullPage: true, width: 1440, height: 900 },
230+
{
231+
type: "json",
232+
prompt: "Extract the product list",
233+
schema: { products: [{ name: "string", price: "string" }] },
234+
},
235+
],
236+
});
237+
```
238+
</Accordion>
213239

214240
### crawl
215241

@@ -219,7 +245,7 @@ Manage multi-page crawl operations asynchronously.
219245
// Start a crawl
220246
const job = await sgai.crawl.start("https://example.com", {
221247
maxDepth: 2,
222-
maxPages: 10,
248+
maxPages: 50,
223249
includePatterns: ["/blog/*", "/docs/**"],
224250
excludePatterns: ["/admin/*", "/api/*"],
225251
});
@@ -234,6 +260,21 @@ await sgai.crawl.stop(job.data.id);
234260
await sgai.crawl.resume(job.data.id);
235261
```
236262

263+
#### crawl.start() Parameters
264+
265+
| Parameter | Type | Required | Description |
266+
| --------------------------- | ------------- | -------- | -------------------------------------------------------- |
267+
| url | string | Yes | The starting URL |
268+
| options.formats | FormatEntry[] | No | Output formats per page. Defaults to `[{ type: "markdown", mode: "normal" }]` |
269+
| options.maxDepth | number | No | Maximum crawl depth. Default: `2` |
270+
| options.maxPages | number | No | Maximum pages to crawl (1-1000). Default: `50` |
271+
| options.maxLinksPerPage | number | No | Maximum links followed per page. Default: `10` |
272+
| options.allowExternal | boolean | No | Allow crossing domains. Default: `false` |
273+
| options.includePatterns | string[] | No | URL patterns to include |
274+
| options.excludePatterns | string[] | No | URL patterns to exclude |
275+
| options.contentTypes | string[] | No | Allowed content types |
276+
| options.fetchConfig | FetchConfig | No | Fetch configuration |
277+
237278
### monitor
238279

239280
Create and manage site monitoring jobs.
@@ -243,8 +284,9 @@ Create and manage site monitoring jobs.
243284
const monitor = await sgai.monitor.create({
244285
name: "Price Tracker",
245286
url: "https://example.com",
246-
prompt: "Track price changes",
247287
interval: "0 9 * * *", // Daily at 9 AM
288+
formats: [{ type: "markdown", mode: "normal" }],
289+
webhookUrl: "https://example.com/webhook",
248290
});
249291

250292
// List all monitors
@@ -305,7 +347,7 @@ Controls how pages are fetched. See the [proxy configuration guide](/services/ad
305347

306348
### LlmConfig
307349

308-
Controls LLM behavior for AI-powered methods.
350+
Controls LLM behavior for format entries that run an LLM (scrape `json` and `summary` formats). Pass it inside the format entry — it is not accepted at the top level of `extract` or `search` in v2.
309351

310352
```javascript
311353
{

sdks/python.mdx

Lines changed: 66 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -85,10 +85,19 @@ print(response)
8585

8686
| Parameter | Type | Required | Description |
8787
| ------------ | ----------- | -------- | -------------------------------------------------------- |
88-
| url | string | Yes | The URL of the webpage to scrape |
88+
| url | string | Yes\* | The URL of the webpage to extract from |
8989
| prompt | string | Yes | A description of what you want to extract |
90-
| output_schema| object | No | Pydantic model for structured response |
90+
| output_schema| object | No | Pydantic model for structured response (alias for `schema`) |
91+
| schema | dict | No | JSON schema for structured response |
92+
| mode | string | No | HTML processing mode: `"normal"`, `"reader"`, `"prune"` |
93+
| content_type | string | No | Override the detected content type |
9194
| fetch_config | FetchConfig | No | Fetch configuration (stealth, rendering, etc.) |
95+
| html | string | No | Raw HTML input (alternative to `url`) |
96+
| markdown | string | No | Raw markdown input (alternative to `url`) |
97+
98+
<Note>
99+
\*One of `url`, `html`, or `markdown` is required.
100+
</Note>
92101

93102
<Accordion title="Schema Example" icon="code">
94103
```python
@@ -123,12 +132,18 @@ response = client.search(
123132

124133
#### Parameters
125134

126-
| Parameter | Type | Required | Description |
127-
| ------------- | ----------- | -------- | -------------------------------------------------------- |
128-
| query | string | Yes | The search query |
129-
| num_results | number | No | Number of results (3-20). Default: 5 |
130-
| output_schema | object | No | Pydantic model for structured response |
131-
| fetch_config | FetchConfig | No | Fetch configuration |
135+
| Parameter | Type | Required | Description |
136+
| ----------------- | ----------- | -------- | -------------------------------------------------------- |
137+
| query | string | Yes | The search query (1-500 chars) |
138+
| num_results | int | No | Number of results (1-20). Default: 5 |
139+
| prompt | string | No | Prompt used when extracting structured results |
140+
| output_schema | object | No | Pydantic model for structured response (alias for `schema`) |
141+
| schema | dict | No | JSON schema for structured response (requires `prompt`) |
142+
| format | string | No | `"markdown"` (default) or `"html"` |
143+
| mode | string | No | HTML processing mode: `"normal"`, `"reader"`, `"prune"` (default) |
144+
| location_geo_code | string | No | Geo code for localized results (e.g. `"us"`, `"it"`, `"gb"`) |
145+
| time_range | string | No | Recency filter: `"past_hour"`, `"past_24_hours"`, `"past_week"`, `"past_month"`, `"past_year"` |
146+
| fetch_config | FetchConfig | No | Fetch configuration |
132147

133148
<Accordion title="Schema Example" icon="code">
134149
```python
@@ -154,7 +169,7 @@ print(f"Price: {response['data']['price']}")
154169

155170
### Scrape
156171

157-
Convert any webpage into markdown, HTML, screenshot, or branding format.
172+
Convert any webpage into one or more output formats in a single request.
158173

159174
```python
160175
response = client.scrape(
@@ -167,9 +182,30 @@ response = client.scrape(
167182
| Parameter | Type | Required | Description |
168183
| ------------- | ----------- | -------- | -------------------------------------------------------- |
169184
| url | string | Yes | The URL of the webpage to scrape |
170-
| format | string | No | Output format: `"markdown"`, `"html"`, `"screenshot"`, `"branding"` |
185+
| formats | list[dict] | No | Array of format entries. Defaults to `[{"type": "markdown", "mode": "normal"}]` |
186+
| format | string | No | Legacy single-format shortcut (`"markdown"`, `"html"`, `"screenshot"`, `"branding"`) |
187+
| content_type | string | No | Override the detected content type |
171188
| fetch_config | FetchConfig | No | Fetch configuration |
172189

190+
Each format entry is a dict with a `type` key. Supported types: `"markdown"`, `"html"`, `"screenshot"`, `"links"`, `"images"`, `"summary"`, `"json"`, `"branding"`. Entries can carry their own config:
191+
192+
<Accordion title="Multi-format Example" icon="code">
193+
```python
194+
response = client.scrape(
195+
url="https://example.com",
196+
formats=[
197+
{"type": "markdown", "mode": "normal"},
198+
{"type": "screenshot", "fullPage": True, "width": 1440, "height": 900},
199+
{
200+
"type": "json",
201+
"prompt": "Extract the product list",
202+
"schema": {"products": [{"name": "string", "price": "string"}]},
203+
},
204+
],
205+
)
206+
```
207+
</Accordion>
208+
173209
### Crawl
174210

175211
Manage multi-page crawl operations asynchronously.
@@ -178,7 +214,8 @@ Manage multi-page crawl operations asynchronously.
178214
# Start a crawl
179215
job = client.crawl.start(
180216
url="https://example.com",
181-
depth=2,
217+
max_depth=2,
218+
max_pages=50,
182219
include_patterns=["/blog/*", "/docs/**"],
183220
exclude_patterns=["/admin/*", "/api/*"],
184221
)
@@ -197,13 +234,19 @@ client.crawl.resume(job["id"])
197234

198235
#### crawl.start() Parameters
199236

200-
| Parameter | Type | Required | Description |
201-
| ---------------- | ----------- | -------- | -------------------------------------------------------- |
202-
| url | string | Yes | The starting URL to crawl |
203-
| depth | int | No | Crawl depth level |
204-
| include_patterns | list[str] | No | URL patterns to include (`*` any chars, `**` any path) |
205-
| exclude_patterns | list[str] | No | URL patterns to exclude |
206-
| fetch_config | FetchConfig | No | Fetch configuration |
237+
| Parameter | Type | Required | Description |
238+
| -------------------- | ----------- | -------- | -------------------------------------------------------- |
239+
| url | string | Yes | The starting URL to crawl |
240+
| formats | list[dict] | No | Output formats per page. Defaults to `[{"type": "markdown", "mode": "normal"}]` |
241+
| max_depth | int | No | Maximum crawl depth. Default: `2` |
242+
| max_pages | int | No | Maximum pages to crawl (1-1000). Default: `10` |
243+
| max_links_per_page | int | No | Maximum links followed per page. Default: `10` |
244+
| allow_external | bool | No | Allow crossing domains. Default: `False` |
245+
| include_patterns | list[str] | No | URL patterns to include (`*` any chars, `**` any path) |
246+
| exclude_patterns | list[str] | No | URL patterns to exclude |
247+
| content_types | list[str] | No | Allowed content types |
248+
| fetch_config | FetchConfig | No | Fetch configuration |
249+
| depth | int | No | Legacy alias for `max_depth` |
207250

208251
### Monitor
209252

@@ -214,8 +257,9 @@ Create and manage site monitoring jobs.
214257
monitor = client.monitor.create(
215258
name="Price Tracker",
216259
url="https://example.com",
217-
prompt="Track price changes",
218260
interval="0 9 * * *", # Daily at 9 AM
261+
formats=[{"type": "markdown", "mode": "normal"}],
262+
webhook_url="https://example.com/webhook",
219263
)
220264

221265
# List all monitors
@@ -273,14 +317,14 @@ config = FetchConfig(
273317

274318
### LlmConfig
275319

276-
Controls LLM behavior for AI-powered methods.
320+
Controls LLM behavior for format entries that run an LLM (scrape `json` and `summary` formats). Pass it inside the format entry — it is deprecated at the top level of `extract` and `search` in v2 and is ignored by the API.
277321

278322
```python
279323
from scrapegraph_py import LlmConfig
280324

281325
config = LlmConfig(
282326
model="gpt-4o-mini", # LLM model to use
283-
temperature=0.3, # Response creativity (0.0-2.0)
327+
temperature=0.3, # Response creativity (0.0-1.0)
284328
max_tokens=1000, # Maximum response tokens
285329
chunker="auto", # Content chunking strategy ("auto" or custom config)
286330
)
@@ -304,7 +348,7 @@ async def main():
304348
print(response)
305349

306350
# Crawl
307-
job = await client.crawl.start("https://example.com", depth=2)
351+
job = await client.crawl.start("https://example.com", max_depth=2)
308352
status = await client.crawl.status(job["id"])
309353
print(status)
310354

0 commit comments

Comments
 (0)