Skip to content

Commit e30cc55

Browse files
docs: rewrite README for v2 API
- Update all API documentation for v2 endpoints - Add examples table with path and description - Add scrape_json_extraction example - Enhance scrape_pdf and scrape_multi_format examples - Update environment variables section Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent d71e4d9 commit e30cc55

5 files changed

Lines changed: 247 additions & 141 deletions

File tree

README.md

Lines changed: 133 additions & 132 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 70%;">
88
</p>
99

10-
Official TypeScript SDK for the [ScrapeGraph AI API](https://scrapegraphai.com). Zero dependencies.
10+
Official TypeScript SDK for the [ScrapeGraph AI API](https://scrapegraphai.com) v2.
1111

1212
## Install
1313

@@ -20,15 +20,15 @@ bun add scrapegraph-js
2020
## Quick Start
2121

2222
```ts
23-
import { smartScraper } from "scrapegraph-js";
23+
import { scrape } from "scrapegraph-js";
2424

25-
const result = await smartScraper("your-api-key", {
26-
user_prompt: "Extract the page title and description",
27-
website_url: "https://example.com",
25+
const result = await scrape("your-api-key", {
26+
url: "https://example.com",
27+
formats: [{ type: "markdown" }],
2828
});
2929

3030
if (result.status === "success") {
31-
console.log(result.data);
31+
console.log(result.data?.results.markdown?.data);
3232
} else {
3333
console.error(result.error);
3434
}
@@ -47,197 +47,198 @@ type ApiResult<T> = {
4747

4848
## API
4949

50-
All functions take `(apiKey, params)` where `params` is a typed object.
51-
52-
### smartScraper
50+
### scrape
5351

54-
Extract structured data from a webpage using AI.
52+
Scrape a webpage in multiple formats (markdown, html, screenshot, json, etc).
5553

5654
```ts
57-
const res = await smartScraper("key", {
58-
user_prompt: "Extract product names and prices",
59-
website_url: "https://example.com",
60-
output_schema: { /* JSON schema */ }, // optional
61-
number_of_scrolls: 5, // optional, 0-50
62-
total_pages: 3, // optional, 1-100
63-
stealth: true, // optional, +4 credits
64-
cookies: { session: "abc" }, // optional
65-
headers: { "Accept-Language": "en" }, // optional
66-
steps: ["Click 'Load More'"], // optional, browser actions
67-
wait_ms: 5000, // optional, default 3000
68-
country_code: "us", // optional, proxy routing
69-
mock: true, // optional, testing mode
55+
const res = await scrape("key", {
56+
url: "https://example.com",
57+
formats: [
58+
{ type: "markdown", mode: "reader" },
59+
{ type: "screenshot", fullPage: true, width: 1440, height: 900 },
60+
{ type: "json", prompt: "Extract product info" },
61+
],
62+
contentType: "text/html", // optional, auto-detected
63+
fetchConfig: { // optional
64+
mode: "js", // "auto" | "fast" | "js"
65+
stealth: true,
66+
timeout: 30000,
67+
wait: 2000,
68+
scrolls: 3,
69+
headers: { "Accept-Language": "en" },
70+
cookies: { session: "abc" },
71+
country: "us",
72+
},
7073
});
7174
```
7275

73-
### searchScraper
76+
**Formats:**
77+
- `markdown` — Clean markdown (modes: `normal`, `reader`, `prune`)
78+
- `html` — Raw HTML (modes: `normal`, `reader`, `prune`)
79+
- `links` — All links on the page
80+
- `images` — All image URLs
81+
- `summary` — AI-generated summary
82+
- `json` — Structured extraction with prompt/schema
83+
- `branding` — Brand colors, typography, logos
84+
- `screenshot` — Page screenshot (fullPage, width, height, quality)
7485

75-
Search the web and extract structured results.
86+
### extract
87+
88+
Extract structured data from a URL, HTML, or markdown using AI.
7689

7790
```ts
78-
const res = await searchScraper("key", {
79-
user_prompt: "Latest TypeScript release features",
80-
num_results: 5, // optional, 3-20
81-
extraction_mode: true, // optional, false for markdown
82-
output_schema: { /* */ }, // optional
83-
stealth: true, // optional, +4 credits
84-
time_range: "past_week", // optional, past_hour|past_24_hours|past_week|past_month|past_year
85-
location_geo_code: "us", // optional, geographic targeting
86-
mock: true, // optional, testing mode
91+
const res = await extract("key", {
92+
url: "https://example.com",
93+
prompt: "Extract product names and prices",
94+
schema: { /* JSON schema */ }, // optional
95+
mode: "reader", // optional
96+
fetchConfig: { /* ... */ }, // optional
8797
});
88-
// res.data.result (extraction mode) or res.data.markdown_content (markdown mode)
98+
// Or pass html/markdown directly instead of url
8999
```
90100

91-
### markdownify
101+
### search
92102

93-
Convert a webpage to clean markdown.
103+
Search the web and optionally extract structured data.
94104

95105
```ts
96-
const res = await markdownify("key", {
97-
website_url: "https://example.com",
98-
stealth: true, // optional, +4 credits
99-
wait_ms: 5000, // optional, default 3000
100-
country_code: "us", // optional, proxy routing
101-
mock: true, // optional, testing mode
106+
const res = await search("key", {
107+
query: "best programming languages 2024",
108+
numResults: 5, // 1-20, default 3
109+
format: "markdown", // "markdown" | "html"
110+
prompt: "Extract key points", // optional, for AI extraction
111+
schema: { /* ... */ }, // optional
112+
timeRange: "past_week", // optional
113+
locationGeoCode: "us", // optional
114+
fetchConfig: { /* ... */ }, // optional
102115
});
103-
// res.data.result is the markdown string
104116
```
105117

106-
### scrape
118+
### generateSchema
107119

108-
Get raw HTML from a webpage.
120+
Generate a JSON schema from a natural language description.
109121

110122
```ts
111-
const res = await scrape("key", {
112-
website_url: "https://example.com",
113-
stealth: true, // optional, +4 credits
114-
branding: true, // optional, extract brand design
115-
country_code: "us", // optional, proxy routing
116-
wait_ms: 5000, // optional, default 3000
123+
const res = await generateSchema("key", {
124+
prompt: "Schema for a product with name, price, and rating",
125+
existingSchema: { /* ... */ }, // optional, to modify
117126
});
118-
// res.data.html is the HTML string
119-
// res.data.scrape_request_id is the request identifier
120127
```
121128

122129
### crawl
123130

124-
Crawl a website and its linked pages. Async — polls until completion.
131+
Crawl a website and its linked pages.
125132

126133
```ts
127-
const res = await crawl(
128-
"key",
129-
{
130-
url: "https://example.com",
131-
prompt: "Extract company info", // required when extraction_mode=true
132-
max_pages: 10, // optional, default 10
133-
depth: 2, // optional, default 1
134-
breadth: 5, // optional, max links per depth
135-
schema: { /* JSON schema */ }, // optional
136-
sitemap: true, // optional
137-
stealth: true, // optional, +4 credits
138-
wait_ms: 5000, // optional, default 3000
139-
batch_size: 3, // optional, default 1
140-
same_domain_only: true, // optional, default true
141-
cache_website: true, // optional
142-
headers: { "Accept-Language": "en" }, // optional
143-
},
144-
(status) => console.log(status), // optional poll callback
145-
);
146-
```
147-
148-
### agenticScraper
134+
// Start a crawl
135+
const start = await crawl.start("key", {
136+
url: "https://example.com",
137+
formats: [{ type: "markdown" }],
138+
maxPages: 50,
139+
maxDepth: 2,
140+
maxLinksPerPage: 10,
141+
includePatterns: ["/blog/*"],
142+
excludePatterns: ["/admin/*"],
143+
fetchConfig: { /* ... */ },
144+
});
149145

150-
Automate browser actions (click, type, navigate) then extract data.
146+
// Check status
147+
const status = await crawl.get("key", start.data.id);
151148

152-
```ts
153-
const res = await agenticScraper("key", {
154-
url: "https://example.com/login",
155-
steps: ["Type user@example.com in email", "Click login button"], // required
156-
user_prompt: "Extract dashboard data", // required when ai_extraction=true
157-
output_schema: { /* */ }, // required when ai_extraction=true
158-
ai_extraction: true, // optional
159-
use_session: true, // optional
160-
});
149+
// Control
150+
await crawl.stop("key", id);
151+
await crawl.resume("key", id);
152+
await crawl.delete("key", id);
161153
```
162154

163-
### generateSchema
155+
### monitor
164156

165-
Generate a JSON schema from a natural language description.
157+
Monitor a webpage for changes on a schedule.
166158

167159
```ts
168-
const res = await generateSchema("key", {
169-
user_prompt: "Schema for a product with name, price, and rating",
170-
existing_schema: { /* modify this */ }, // optional
160+
// Create a monitor
161+
const mon = await monitor.create("key", {
162+
url: "https://example.com",
163+
name: "Price Monitor",
164+
interval: "0 * * * *", // cron expression
165+
formats: [{ type: "markdown" }],
166+
webhookUrl: "https://...", // optional
167+
fetchConfig: { /* ... */ },
171168
});
169+
170+
// Manage monitors
171+
await monitor.list("key");
172+
await monitor.get("key", cronId);
173+
await monitor.update("key", cronId, { interval: "0 */6 * * *" });
174+
await monitor.pause("key", cronId);
175+
await monitor.resume("key", cronId);
176+
await monitor.delete("key", cronId);
172177
```
173178

174-
### sitemap
179+
### history
175180

176-
Extract all URLs from a website's sitemap.
181+
Fetch request history.
177182

178183
```ts
179-
const res = await sitemap("key", {
180-
website_url: "https://example.com",
181-
headers: { /* */ }, // optional
182-
stealth: true, // optional, +4 credits
183-
mock: true, // optional, testing mode
184+
const list = await history.list("key", {
185+
service: "scrape", // optional filter
186+
page: 1,
187+
limit: 20,
184188
});
185-
// res.data.urls is string[]
189+
190+
const entry = await history.get("key", "request-id");
186191
```
187192

188193
### getCredits / checkHealth
189194

190195
```ts
191196
const credits = await getCredits("key");
192-
// { remaining_credits: 420, total_credits_used: 69 }
197+
// { remaining: 1000, used: 500, plan: "pro", jobs: { crawl: {...}, monitor: {...} } }
193198

194199
const health = await checkHealth("key");
195-
// { status: "healthy" }
196-
```
197-
198-
### history
199-
200-
Fetch request history for any service.
201-
202-
```ts
203-
const res = await history("key", {
204-
service: "smartscraper",
205-
page: 1, // optional, default 1
206-
page_size: 10, // optional, default 10
207-
});
200+
// { status: "ok", uptime: 12345 }
208201
```
209202

210203
## Examples
211204

212-
Find complete working examples in the [`examples/`](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples) directory:
213-
214-
| Service | Examples |
215-
|---|---|
216-
| [SmartScraper](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/smartscraper) | basic, cookies, html input, infinite scroll, markdown input, pagination, stealth, with schema |
217-
| [SearchScraper](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/searchscraper) | basic, markdown mode, with schema |
218-
| [Markdownify](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/markdownify) | basic, stealth |
219-
| [Scrape](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/scrape) | basic, stealth, with branding |
220-
| [Crawl](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/crawl) | basic, markdown mode, with schema |
221-
| [Agentic Scraper](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/agenticscraper) | basic, AI extraction |
222-
| [Schema Generation](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/schema) | basic, modify existing |
223-
| [Sitemap](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/sitemap) | basic, with smartscraper |
224-
| [Utilities](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/utilities) | credits, health, history |
205+
| Path | Description |
206+
|------|-------------|
207+
| [`scrape/scrape_basic.ts`](examples/scrape/scrape_basic.ts) | Basic markdown scraping |
208+
| [`scrape/scrape_multi_format.ts`](examples/scrape/scrape_multi_format.ts) | Multiple formats (markdown, links, images, screenshot, summary) |
209+
| [`scrape/scrape_json_extraction.ts`](examples/scrape/scrape_json_extraction.ts) | Structured JSON extraction with schema |
210+
| [`scrape/scrape_pdf.ts`](examples/scrape/scrape_pdf.ts) | PDF document parsing with OCR metadata |
211+
| [`scrape/scrape_with_fetchconfig.ts`](examples/scrape/scrape_with_fetchconfig.ts) | JS rendering, stealth mode, scrolling |
212+
| [`extract/extract_basic.ts`](examples/extract/extract_basic.ts) | AI data extraction from URL |
213+
| [`extract/extract_with_schema.ts`](examples/extract/extract_with_schema.ts) | Extraction with JSON schema |
214+
| [`search/search_basic.ts`](examples/search/search_basic.ts) | Web search with results |
215+
| [`search/search_with_extraction.ts`](examples/search/search_with_extraction.ts) | Search + AI extraction |
216+
| [`crawl/crawl_basic.ts`](examples/crawl/crawl_basic.ts) | Start and monitor a crawl |
217+
| [`crawl/crawl_with_formats.ts`](examples/crawl/crawl_with_formats.ts) | Crawl with screenshots and patterns |
218+
| [`monitor/monitor_basic.ts`](examples/monitor/monitor_basic.ts) | Create a page monitor |
219+
| [`monitor/monitor_with_webhook.ts`](examples/monitor/monitor_with_webhook.ts) | Monitor with webhook notifications |
220+
| [`schema/generate_schema_basic.ts`](examples/schema/generate_schema_basic.ts) | Generate JSON schema from prompt |
221+
| [`schema/modify_existing_schema.ts`](examples/schema/modify_existing_schema.ts) | Modify an existing schema |
222+
| [`utilities/credits.ts`](examples/utilities/credits.ts) | Check account credits and limits |
223+
| [`utilities/health.ts`](examples/utilities/health.ts) | API health check |
224+
| [`utilities/history.ts`](examples/utilities/history.ts) | Request history |
225225

226226
## Environment Variables
227227

228228
| Variable | Description | Default |
229-
|---|---|---|
230-
| `SGAI_API_URL` | Override API base URL | `https://api.scrapegraphai.com/v1` |
229+
|----------|-------------|---------|
230+
| `SGAI_API_URL` | Override API base URL | `https://api.scrapegraphai.com/v2` |
231231
| `SGAI_DEBUG` | Enable debug logging (`"1"`) | off |
232232
| `SGAI_TIMEOUT_S` | Request timeout in seconds | `120` |
233233

234234
## Development
235235

236236
```bash
237237
bun install
238-
bun test # 21 tests
239-
bun run build # tsup → dist/
240-
bun run check # tsc --noEmit + biome
238+
bun run test # unit tests
239+
bun run test:integration # live API tests (requires SGAI_API_KEY)
240+
bun run build # tsup → dist/
241+
bun run check # tsc --noEmit + biome
241242
```
242243

243244
## License
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
import { scrape } from "scrapegraph-js";
2+
3+
const apiKey = process.env.SGAI_API_KEY!;
4+
5+
const res = await scrape(apiKey, {
6+
url: "https://scrapegraphai.com",
7+
formats: [
8+
{
9+
type: "json",
10+
prompt: "Extract the company name, tagline, and list of features",
11+
schema: {
12+
type: "object",
13+
properties: {
14+
companyName: { type: "string" },
15+
tagline: { type: "string" },
16+
features: {
17+
type: "array",
18+
items: { type: "string" },
19+
},
20+
},
21+
required: ["companyName"],
22+
},
23+
},
24+
],
25+
});
26+
27+
if (res.status === "success") {
28+
const json = res.data?.results.json;
29+
30+
console.log("=== JSON Extraction ===\n");
31+
console.log("Extracted data:");
32+
console.log(JSON.stringify(json?.data, null, 2));
33+
34+
if (json?.metadata?.chunker) {
35+
console.log("\nChunker info:");
36+
console.log(" Chunks:", json.metadata.chunker.chunks.length);
37+
console.log(" Total size:", json.metadata.chunker.chunks.reduce((a, c) => a + c.size, 0), "chars");
38+
}
39+
} else {
40+
console.error("Failed:", res.error);
41+
}

0 commit comments

Comments
 (0)