You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Align sdks/javascript.mdx and sdks/python.mdx with the current schemas
from scrapegraph-js#11 and scrapegraph-py#82:
- search(): add locationGeoCode/location_geo_code, timeRange/time_range,
prompt, format, mode; correct numResults default to 3
- extract(): drop llmConfig from params (ignored by v2 route); document
mode, contentType, html, markdown alternatives to url
- scrape(): document the formats[] array (tagged format entries with
per-entry config) and add a multi-format example
- crawl.start(): document maxDepth/max_depth, maxPages/max_pages,
maxLinksPerPage, allowExternal, contentTypes
- monitor.create(): drop prompt (not in v2 schema); add formats and
webhookUrl/webhook_url
- LlmConfig: clarify it belongs inside scrape json/summary format
entries, not on extract/search
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| url | string | Yes | The URL of the webpage to scrape |
218
+
| options.formats | FormatEntry[]| No | Array of format entries. Defaults to `[{ type: "markdown", mode: "normal" }]`|
219
+
| options.contentType | string | No | Override the detected content type |
220
+
| options.fetchConfig | FetchConfig | No | Fetch configuration |
221
+
222
+
Each format entry is a tagged object. Supported `type` values: `"markdown"`, `"html"`, `"screenshot"`, `"links"`, `"images"`, `"summary"`, `"json"`, `"branding"`. Entries can carry their own config:
| options.includePatterns | string[]| No | URL patterns to include |
274
+
| options.excludePatterns | string[]| No | URL patterns to exclude |
275
+
| options.contentTypes | string[]| No | Allowed content types |
276
+
| options.fetchConfig | FetchConfig | No | Fetch configuration |
277
+
237
278
### monitor
238
279
239
280
Create and manage site monitoring jobs.
@@ -243,8 +284,9 @@ Create and manage site monitoring jobs.
243
284
constmonitor=awaitsgai.monitor.create({
244
285
name:"Price Tracker",
245
286
url:"https://example.com",
246
-
prompt:"Track price changes",
247
287
interval:"0 9 * * *", // Daily at 9 AM
288
+
formats: [{ type:"markdown", mode:"normal" }],
289
+
webhookUrl:"https://example.com/webhook",
248
290
});
249
291
250
292
// List all monitors
@@ -305,7 +347,7 @@ Controls how pages are fetched. See the [proxy configuration guide](/services/ad
305
347
306
348
### LlmConfig
307
349
308
-
Controls LLM behavior for AI-powered methods.
350
+
Controls LLM behavior for format entries that run an LLM (scrape `json` and `summary` formats). Pass it inside the format entry — it is not accepted at the top level of `extract` or `search` in v2.
| url | string | Yes | The URL of the webpage to scrape |
170
-
| format | string | No | Output format: `"markdown"`, `"html"`, `"screenshot"`, `"branding"`|
185
+
| formats | list[dict]| No | Array of format entries. Defaults to `[{"type": "markdown", "mode": "normal"}]`|
186
+
| format | string | No | Legacy single-format shortcut (`"markdown"`, `"html"`, `"screenshot"`, `"branding"`) |
187
+
| content_type | string | No | Override the detected content type |
171
188
| fetch_config | FetchConfig | No | Fetch configuration |
172
189
190
+
Each format entry is a dict with a `type` key. Supported types: `"markdown"`, `"html"`, `"screenshot"`, `"links"`, `"images"`, `"summary"`, `"json"`, `"branding"`. Entries can carry their own config:
| include_patterns | list[str]| No | URL patterns to include (`*` any chars, `**` any path) |
246
+
| exclude_patterns | list[str]| No | URL patterns to exclude |
247
+
| content_types | list[str]| No | Allowed content types |
248
+
| fetch_config | FetchConfig | No | Fetch configuration |
249
+
| depth | int | No | Legacy alias for `max_depth`|
207
250
208
251
### Monitor
209
252
@@ -214,8 +257,9 @@ Create and manage site monitoring jobs.
214
257
monitor = client.monitor.create(
215
258
name="Price Tracker",
216
259
url="https://example.com",
217
-
prompt="Track price changes",
218
260
interval="0 9 * * *", # Daily at 9 AM
261
+
formats=[{"type": "markdown", "mode": "normal"}],
262
+
webhook_url="https://example.com/webhook",
219
263
)
220
264
221
265
# List all monitors
@@ -273,14 +317,14 @@ config = FetchConfig(
273
317
274
318
### LlmConfig
275
319
276
-
Controls LLM behavior for AI-powered methods.
320
+
Controls LLM behavior for format entries that run an LLM (scrape `json` and `summary` formats). Pass it inside the format entry — it is deprecated at the top level of `extract` and `search` in v2 and is ignored by the API.
277
321
278
322
```python
279
323
from scrapegraph_py import LlmConfig
280
324
281
325
config = LlmConfig(
282
326
model="gpt-4o-mini", # LLM model to use
283
-
temperature=0.3, # Response creativity (0.0-2.0)
327
+
temperature=0.3, # Response creativity (0.0-1.0)
284
328
max_tokens=1000, # Maximum response tokens
285
329
chunker="auto", # Content chunking strategy ("auto" or custom config)
0 commit comments