Skip to content

Commit ef8cb35

Browse files
committed
feat: migrate js sdk to api v2
1 parent 70b75d5 commit ef8cb35

16 files changed

Lines changed: 721 additions & 1209 deletions

README.md

Lines changed: 72 additions & 157 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,7 @@
33
[![npm version](https://badge.fury.io/js/scrapegraph-js.svg)](https://badge.fury.io/js/scrapegraph-js)
44
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
55

6-
<p align="left">
7-
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 70%;">
8-
</p>
9-
10-
Official TypeScript SDK for the [ScrapeGraph AI API](https://scrapegraphai.com). Zero dependencies.
6+
Official JavaScript/TypeScript SDK for the ScrapeGraph AI API v2.
117

128
## Install
139

@@ -20,224 +16,143 @@ bun add scrapegraph-js
2016
## Quick Start
2117

2218
```ts
23-
import { smartScraper } from "scrapegraph-js";
19+
import { scrapegraphai } from "scrapegraph-js";
2420

25-
const result = await smartScraper("your-api-key", {
26-
user_prompt: "Extract the page title and description",
27-
website_url: "https://example.com",
28-
});
21+
const sgai = scrapegraphai({ apiKey: "your-api-key" });
2922

30-
if (result.status === "success") {
31-
console.log(result.data);
32-
} else {
33-
console.error(result.error);
34-
}
23+
const result = await sgai.scrape("https://example.com", { format: "markdown" });
24+
25+
console.log(result.data);
26+
console.log(result._requestId);
3527
```
3628

37-
Every function returns `ApiResult<T>` — no exceptions to catch:
29+
Every method returns:
3830

3931
```ts
4032
type ApiResult<T> = {
41-
status: "success" | "error";
42-
data: T | null;
43-
error?: string;
44-
elapsedMs: number;
33+
data: T;
34+
_requestId: string;
4535
};
4636
```
4737

4838
## API
4939

50-
All functions take `(apiKey, params)` where `params` is a typed object.
51-
52-
### smartScraper
53-
54-
Extract structured data from a webpage using AI.
40+
Create a client once, then call the available v2 endpoints:
5541

5642
```ts
57-
const res = await smartScraper("key", {
58-
user_prompt: "Extract product names and prices",
59-
website_url: "https://example.com",
60-
output_schema: { /* JSON schema */ }, // optional
61-
number_of_scrolls: 5, // optional, 0-50
62-
total_pages: 3, // optional, 1-100
63-
stealth: true, // optional, +4 credits
64-
cookies: { session: "abc" }, // optional
65-
headers: { "Accept-Language": "en" }, // optional
66-
steps: ["Click 'Load More'"], // optional, browser actions
67-
wait_ms: 5000, // optional, default 3000
68-
country_code: "us", // optional, proxy routing
69-
mock: true, // optional, testing mode
43+
const sgai = scrapegraphai({
44+
apiKey: "your-api-key",
45+
baseUrl: "https://api.scrapegraphai.com", // optional
46+
timeout: 30000, // optional
47+
maxRetries: 2, // optional
7048
});
7149
```
7250

73-
### searchScraper
74-
75-
Search the web and extract structured results.
51+
### scrape
7652

7753
```ts
78-
const res = await searchScraper("key", {
79-
user_prompt: "Latest TypeScript release features",
80-
num_results: 5, // optional, 3-20
81-
extraction_mode: true, // optional, false for markdown
82-
output_schema: { /* */ }, // optional
83-
stealth: true, // optional, +4 credits
84-
time_range: "past_week", // optional, past_hour|past_24_hours|past_week|past_month|past_year
85-
location_geo_code: "us", // optional, geographic targeting
86-
mock: true, // optional, testing mode
54+
await sgai.scrape("https://example.com", {
55+
format: "markdown",
56+
fetchConfig: {
57+
mock: false,
58+
},
8759
});
88-
// res.data.result (extraction mode) or res.data.markdown_content (markdown mode)
8960
```
9061

91-
### markdownify
62+
### extract
9263

93-
Convert a webpage to clean markdown.
64+
Raw JSON schema:
9465

9566
```ts
96-
const res = await markdownify("key", {
97-
website_url: "https://example.com",
98-
stealth: true, // optional, +4 credits
99-
wait_ms: 5000, // optional, default 3000
100-
country_code: "us", // optional, proxy routing
101-
mock: true, // optional, testing mode
67+
await sgai.extract("https://example.com", {
68+
prompt: "Extract the page title",
69+
schema: {
70+
type: "object",
71+
properties: {
72+
title: { type: "string" },
73+
},
74+
},
10275
});
103-
// res.data.result is the markdown string
10476
```
10577

106-
### scrape
107-
108-
Get raw HTML from a webpage.
78+
Zod schema:
10979

11080
```ts
111-
const res = await scrape("key", {
112-
website_url: "https://example.com",
113-
stealth: true, // optional, +4 credits
114-
branding: true, // optional, extract brand design
115-
country_code: "us", // optional, proxy routing
116-
wait_ms: 5000, // optional, default 3000
81+
import { z } from "zod";
82+
83+
await sgai.extract("https://example.com", {
84+
prompt: "Extract the page title",
85+
schema: z.object({
86+
title: z.string(),
87+
}),
11788
});
118-
// res.data.html is the HTML string
119-
// res.data.scrape_request_id is the request identifier
12089
```
12190

122-
### crawl
123-
124-
Crawl a website and its linked pages. Async — polls until completion.
91+
### search
12592

12693
```ts
127-
const res = await crawl(
128-
"key",
129-
{
130-
url: "https://example.com",
131-
prompt: "Extract company info", // required when extraction_mode=true
132-
max_pages: 10, // optional, default 10
133-
depth: 2, // optional, default 1
134-
breadth: 5, // optional, max links per depth
135-
schema: { /* JSON schema */ }, // optional
136-
sitemap: true, // optional
137-
stealth: true, // optional, +4 credits
138-
wait_ms: 5000, // optional, default 3000
139-
batch_size: 3, // optional, default 1
140-
same_domain_only: true, // optional, default true
141-
cache_website: true, // optional
142-
headers: { "Accept-Language": "en" }, // optional
143-
},
144-
(status) => console.log(status), // optional poll callback
145-
);
94+
await sgai.search("What is the capital of France?", {
95+
numResults: 5,
96+
});
14697
```
14798

148-
### agenticScraper
149-
150-
Automate browser actions (click, type, navigate) then extract data.
99+
### schema
151100

152101
```ts
153-
const res = await agenticScraper("key", {
154-
url: "https://example.com/login",
155-
steps: ["Type user@example.com in email", "Click login button"], // required
156-
user_prompt: "Extract dashboard data", // required when ai_extraction=true
157-
output_schema: { /* */ }, // required when ai_extraction=true
158-
ai_extraction: true, // optional
159-
use_session: true, // optional
160-
});
102+
await sgai.schema("A product with name and price");
161103
```
162104

163-
### generateSchema
164-
165-
Generate a JSON schema from a natural language description.
105+
### credits
166106

167107
```ts
168-
const res = await generateSchema("key", {
169-
user_prompt: "Schema for a product with name, price, and rating",
170-
existing_schema: { /* modify this */ }, // optional
171-
});
108+
await sgai.credits();
172109
```
173110

174-
### sitemap
175-
176-
Extract all URLs from a website's sitemap.
111+
### history
177112

178113
```ts
179-
const res = await sitemap("key", {
180-
website_url: "https://example.com",
181-
headers: { /* */ }, // optional
182-
stealth: true, // optional, +4 credits
183-
mock: true, // optional, testing mode
114+
await sgai.history({
115+
page: 1,
116+
limit: 10,
117+
service: "scrape",
184118
});
185-
// res.data.urls is string[]
186119
```
187120

188-
### getCredits / checkHealth
121+
### crawl
189122

190123
```ts
191-
const credits = await getCredits("key");
192-
// { remaining_credits: 420, total_credits_used: 69 }
124+
const crawl = await sgai.crawl.start("https://example.com", {
125+
maxPages: 10,
126+
maxDepth: 2,
127+
});
193128

194-
const health = await checkHealth("key");
195-
// { status: "healthy" }
129+
await sgai.crawl.status((crawl.data as { id: string }).id);
196130
```
197131

198-
### history
199-
200-
Fetch request history for any service.
132+
### monitor
201133

202134
```ts
203-
const res = await history("key", {
204-
service: "smartscraper",
205-
page: 1, // optional, default 1
206-
page_size: 10, // optional, default 10
135+
await sgai.monitor.create({
136+
url: "https://example.com",
137+
prompt: "Notify me when the price changes",
138+
interval: "1h",
207139
});
208140
```
209141

210-
## Examples
211-
212-
Find complete working examples in the [`examples/`](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples) directory:
213-
214-
| Service | Examples |
215-
|---|---|
216-
| [SmartScraper](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/smartscraper) | basic, cookies, html input, infinite scroll, markdown input, pagination, stealth, with schema |
217-
| [SearchScraper](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/searchscraper) | basic, markdown mode, with schema |
218-
| [Markdownify](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/markdownify) | basic, stealth |
219-
| [Scrape](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/scrape) | basic, stealth, with branding |
220-
| [Crawl](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/crawl) | basic, markdown mode, with schema |
221-
| [Agentic Scraper](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/agenticscraper) | basic, AI extraction |
222-
| [Schema Generation](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/schema) | basic, modify existing |
223-
| [Sitemap](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/sitemap) | basic, with smartscraper |
224-
| [Utilities](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/utilities) | credits, health, history |
225-
226-
## Environment Variables
142+
## Breaking Changes In v2
227143

228-
| Variable | Description | Default |
229-
|---|---|---|
230-
| `SGAI_API_URL` | Override API base URL | `https://api.scrapegraphai.com/v1` |
231-
| `SGAI_DEBUG` | Enable debug logging (`"1"`) | off |
232-
| `SGAI_TIMEOUT_S` | Request timeout in seconds | `120` |
144+
- The SDK now uses `scrapegraphai(config)` instead of flat top-level functions.
145+
- Requests target the new `/v2/*` API surface.
146+
- Old helpers like `smartScraper`, `searchScraper`, `markdownify`, `agenticScraper`, `sitemap`, and `generateSchema` are not part of the v2 client.
147+
- `crawl` and `monitor` are now namespaced APIs.
233148

234149
## Development
235150

236151
```bash
237152
bun install
238-
bun test # 21 tests
239-
bun run build # tsup → dist/
240-
bun run check # tsc --noEmit + biome
153+
bun test
154+
bun run check
155+
bun run build
241156
```
242157

243158
## License

biome.json

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,11 @@
11
{
2-
"$schema": "https://biomejs.dev/schemas/1.9.4/schema.json",
3-
"organizeImports": {
4-
"enabled": true
2+
"$schema": "https://biomejs.dev/schemas/2.4.9/schema.json",
3+
"assist": {
4+
"actions": {
5+
"source": {
6+
"organizeImports": "on"
7+
}
8+
}
59
},
610
"formatter": {
711
"enabled": true,
@@ -16,7 +20,7 @@
1620
},
1721
"overrides": [
1822
{
19-
"include": ["tests/**"],
23+
"includes": ["tests/**"],
2024
"linter": {
2125
"rules": {
2226
"suspicious": {
@@ -27,6 +31,6 @@
2731
}
2832
],
2933
"files": {
30-
"ignore": ["node_modules", "dist", "bun.lock", ".claude", "examples"]
34+
"includes": ["**", "!dist", "!node_modules", "!bun.lock", "!.claude", "!examples"]
3135
}
3236
}

0 commit comments

Comments
 (0)