77 <img src =" https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png " alt =" ScrapeGraph API Banner " style =" width : 70% ;" >
88</p >
99
10- Official TypeScript SDK for the [ ScrapeGraph AI API] ( https://scrapegraphai.com ) . Zero dependencies .
10+ Official TypeScript SDK for the [ ScrapeGraph AI API] ( https://scrapegraphai.com ) v2 .
1111
1212## Install
1313
@@ -20,15 +20,15 @@ bun add scrapegraph-js
2020## Quick Start
2121
2222``` ts
23- import { smartScraper } from " scrapegraph-js" ;
23+ import { scrape } from " scrapegraph-js" ;
2424
25- const result = await smartScraper (" your-api-key" , {
26- user_prompt : " Extract the page title and description " ,
27- website_url: " https://example.com " ,
25+ const result = await scrape (" your-api-key" , {
26+ url : " https://example.com " ,
27+ formats: [{ type: " markdown " }] ,
2828});
2929
3030if (result .status === " success" ) {
31- console .log (result .data );
31+ console .log (result .data ?. results . markdown ?. data );
3232} else {
3333 console .error (result .error );
3434}
@@ -47,197 +47,198 @@ type ApiResult<T> = {
4747
4848## API
4949
50- All functions take ` (apiKey, params) ` where ` params ` is a typed object.
51-
52- ### smartScraper
50+ ### scrape
5351
54- Extract structured data from a webpage using AI .
52+ Scrape a webpage in multiple formats (markdown, html, screenshot, json, etc) .
5553
5654``` ts
57- const res = await smartScraper (" key" , {
58- user_prompt: " Extract product names and prices" ,
59- website_url: " https://example.com" ,
60- output_schema: { /* JSON schema */ }, // optional
61- number_of_scrolls: 5 , // optional, 0-50
62- total_pages: 3 , // optional, 1-100
63- stealth: true , // optional, +4 credits
64- cookies: { session: " abc" }, // optional
65- headers: { " Accept-Language" : " en" }, // optional
66- steps: [" Click 'Load More'" ], // optional, browser actions
67- wait_ms: 5000 , // optional, default 3000
68- country_code: " us" , // optional, proxy routing
69- mock: true , // optional, testing mode
55+ const res = await scrape (" key" , {
56+ url: " https://example.com" ,
57+ formats: [
58+ { type: " markdown" , mode: " reader" },
59+ { type: " screenshot" , fullPage: true , width: 1440 , height: 900 },
60+ { type: " json" , prompt: " Extract product info" },
61+ ],
62+ contentType: " text/html" , // optional, auto-detected
63+ fetchConfig: { // optional
64+ mode: " js" , // "auto" | "fast" | "js"
65+ stealth: true ,
66+ timeout: 30000 ,
67+ wait: 2000 ,
68+ scrolls: 3 ,
69+ headers: { " Accept-Language" : " en" },
70+ cookies: { session: " abc" },
71+ country: " us" ,
72+ },
7073});
7174```
7275
73- ### searchScraper
76+ ** Formats:**
77+ - ` markdown ` — Clean markdown (modes: ` normal ` , ` reader ` , ` prune ` )
78+ - ` html ` — Raw HTML (modes: ` normal ` , ` reader ` , ` prune ` )
79+ - ` links ` — All links on the page
80+ - ` images ` — All image URLs
81+ - ` summary ` — AI-generated summary
82+ - ` json ` — Structured extraction with prompt/schema
83+ - ` branding ` — Brand colors, typography, logos
84+ - ` screenshot ` — Page screenshot (fullPage, width, height, quality)
7485
75- Search the web and extract structured results.
86+ ### extract
87+
88+ Extract structured data from a URL, HTML, or markdown using AI.
7689
7790``` ts
78- const res = await searchScraper (" key" , {
79- user_prompt: " Latest TypeScript release features" ,
80- num_results: 5 , // optional, 3-20
81- extraction_mode: true , // optional, false for markdown
82- output_schema: { /* */ }, // optional
83- stealth: true , // optional, +4 credits
84- time_range: " past_week" , // optional, past_hour|past_24_hours|past_week|past_month|past_year
85- location_geo_code: " us" , // optional, geographic targeting
86- mock: true , // optional, testing mode
91+ const res = await extract (" key" , {
92+ url: " https://example.com" ,
93+ prompt: " Extract product names and prices" ,
94+ schema: { /* JSON schema */ }, // optional
95+ mode: " reader" , // optional
96+ fetchConfig: { /* ... */ }, // optional
8797});
88- // res.data.result (extraction mode) or res.data.markdown_content (markdown mode)
98+ // Or pass html/markdown directly instead of url
8999```
90100
91- ### markdownify
101+ ### search
92102
93- Convert a webpage to clean markdown .
103+ Search the web and optionally extract structured data .
94104
95105``` ts
96- const res = await markdownify (" key" , {
97- website_url: " https://example.com" ,
98- stealth: true , // optional, +4 credits
99- wait_ms: 5000 , // optional, default 3000
100- country_code: " us" , // optional, proxy routing
101- mock: true , // optional, testing mode
106+ const res = await search (" key" , {
107+ query: " best programming languages 2024" ,
108+ numResults: 5 , // 1-20, default 3
109+ format: " markdown" , // "markdown" | "html"
110+ prompt: " Extract key points" , // optional, for AI extraction
111+ schema: { /* ... */ }, // optional
112+ timeRange: " past_week" , // optional
113+ locationGeoCode: " us" , // optional
114+ fetchConfig: { /* ... */ }, // optional
102115});
103- // res.data.result is the markdown string
104116```
105117
106- ### scrape
118+ ### generateSchema
107119
108- Get raw HTML from a webpage .
120+ Generate a JSON schema from a natural language description .
109121
110122``` ts
111- const res = await scrape (" key" , {
112- website_url: " https://example.com" ,
113- stealth: true , // optional, +4 credits
114- branding: true , // optional, extract brand design
115- country_code: " us" , // optional, proxy routing
116- wait_ms: 5000 , // optional, default 3000
123+ const res = await generateSchema (" key" , {
124+ prompt: " Schema for a product with name, price, and rating" ,
125+ existingSchema: { /* ... */ }, // optional, to modify
117126});
118- // res.data.html is the HTML string
119- // res.data.scrape_request_id is the request identifier
120127```
121128
122129### crawl
123130
124- Crawl a website and its linked pages. Async — polls until completion.
131+ Crawl a website and its linked pages.
125132
126133``` ts
127- const res = await crawl (
128- " key" ,
129- {
130- url: " https://example.com" ,
131- prompt: " Extract company info" , // required when extraction_mode=true
132- max_pages: 10 , // optional, default 10
133- depth: 2 , // optional, default 1
134- breadth: 5 , // optional, max links per depth
135- schema: { /* JSON schema */ }, // optional
136- sitemap: true , // optional
137- stealth: true , // optional, +4 credits
138- wait_ms: 5000 , // optional, default 3000
139- batch_size: 3 , // optional, default 1
140- same_domain_only: true , // optional, default true
141- cache_website: true , // optional
142- headers: { " Accept-Language" : " en" }, // optional
143- },
144- (status ) => console .log (status ), // optional poll callback
145- );
146- ```
147-
148- ### agenticScraper
134+ // Start a crawl
135+ const start = await crawl .start (" key" , {
136+ url: " https://example.com" ,
137+ formats: [{ type: " markdown" }],
138+ maxPages: 50 ,
139+ maxDepth: 2 ,
140+ maxLinksPerPage: 10 ,
141+ includePatterns: [" /blog/*" ],
142+ excludePatterns: [" /admin/*" ],
143+ fetchConfig: { /* ... */ },
144+ });
149145
150- Automate browser actions (click, type, navigate) then extract data.
146+ // Check status
147+ const status = await crawl .get (" key" , start .data .id );
151148
152- ``` ts
153- const res = await agenticScraper (" key" , {
154- url: " https://example.com/login" ,
155- steps: [" Type user@example.com in email" , " Click login button" ], // required
156- user_prompt: " Extract dashboard data" , // required when ai_extraction=true
157- output_schema: { /* */ }, // required when ai_extraction=true
158- ai_extraction: true , // optional
159- use_session: true , // optional
160- });
149+ // Control
150+ await crawl .stop (" key" , id );
151+ await crawl .resume (" key" , id );
152+ await crawl .delete (" key" , id );
161153```
162154
163- ### generateSchema
155+ ### monitor
164156
165- Generate a JSON schema from a natural language description .
157+ Monitor a webpage for changes on a schedule .
166158
167159``` ts
168- const res = await generateSchema (" key" , {
169- user_prompt: " Schema for a product with name, price, and rating" ,
170- existing_schema: { /* modify this */ }, // optional
160+ // Create a monitor
161+ const mon = await monitor .create (" key" , {
162+ url: " https://example.com" ,
163+ name: " Price Monitor" ,
164+ interval: " 0 * * * *" , // cron expression
165+ formats: [{ type: " markdown" }],
166+ webhookUrl: " https://..." , // optional
167+ fetchConfig: { /* ... */ },
171168});
169+
170+ // Manage monitors
171+ await monitor .list (" key" );
172+ await monitor .get (" key" , cronId );
173+ await monitor .update (" key" , cronId , { interval: " 0 */6 * * *" });
174+ await monitor .pause (" key" , cronId );
175+ await monitor .resume (" key" , cronId );
176+ await monitor .delete (" key" , cronId );
172177```
173178
174- ### sitemap
179+ ### history
175180
176- Extract all URLs from a website's sitemap .
181+ Fetch request history .
177182
178183``` ts
179- const res = await sitemap (" key" , {
180- website_url: " https://example.com" ,
181- headers: { /* */ }, // optional
182- stealth: true , // optional, +4 credits
183- mock: true , // optional, testing mode
184+ const list = await history .list (" key" , {
185+ service: " scrape" , // optional filter
186+ page: 1 ,
187+ limit: 20 ,
184188});
185- // res.data.urls is string[]
189+
190+ const entry = await history .get (" key" , " request-id" );
186191```
187192
188193### getCredits / checkHealth
189194
190195``` ts
191196const credits = await getCredits (" key" );
192- // { remaining_credits: 420, total_credits_used: 69 }
197+ // { remaining: 1000, used: 500, plan: "pro", jobs: { crawl: {...}, monitor: {...} } }
193198
194199const health = await checkHealth (" key" );
195- // { status: "healthy" }
196- ```
197-
198- ### history
199-
200- Fetch request history for any service.
201-
202- ``` ts
203- const res = await history (" key" , {
204- service: " smartscraper" ,
205- page: 1 , // optional, default 1
206- page_size: 10 , // optional, default 10
207- });
200+ // { status: "ok", uptime: 12345 }
208201```
209202
210203## Examples
211204
212- Find complete working examples in the [ ` examples/ ` ] ( https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples ) directory:
213-
214- | Service | Examples |
215- | ---| ---|
216- | [ SmartScraper] ( https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/smartscraper ) | basic, cookies, html input, infinite scroll, markdown input, pagination, stealth, with schema |
217- | [ SearchScraper] ( https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/searchscraper ) | basic, markdown mode, with schema |
218- | [ Markdownify] ( https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/markdownify ) | basic, stealth |
219- | [ Scrape] ( https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/scrape ) | basic, stealth, with branding |
220- | [ Crawl] ( https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/crawl ) | basic, markdown mode, with schema |
221- | [ Agentic Scraper] ( https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/agenticscraper ) | basic, AI extraction |
222- | [ Schema Generation] ( https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/schema ) | basic, modify existing |
223- | [ Sitemap] ( https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/sitemap ) | basic, with smartscraper |
224- | [ Utilities] ( https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/utilities ) | credits, health, history |
205+ | Path | Description |
206+ | ------| -------------|
207+ | [ ` scrape/scrape_basic.ts ` ] ( examples/scrape/scrape_basic.ts ) | Basic markdown scraping |
208+ | [ ` scrape/scrape_multi_format.ts ` ] ( examples/scrape/scrape_multi_format.ts ) | Multiple formats (markdown, links, images, screenshot, summary) |
209+ | [ ` scrape/scrape_json_extraction.ts ` ] ( examples/scrape/scrape_json_extraction.ts ) | Structured JSON extraction with schema |
210+ | [ ` scrape/scrape_pdf.ts ` ] ( examples/scrape/scrape_pdf.ts ) | PDF document parsing with OCR metadata |
211+ | [ ` scrape/scrape_with_fetchconfig.ts ` ] ( examples/scrape/scrape_with_fetchconfig.ts ) | JS rendering, stealth mode, scrolling |
212+ | [ ` extract/extract_basic.ts ` ] ( examples/extract/extract_basic.ts ) | AI data extraction from URL |
213+ | [ ` extract/extract_with_schema.ts ` ] ( examples/extract/extract_with_schema.ts ) | Extraction with JSON schema |
214+ | [ ` search/search_basic.ts ` ] ( examples/search/search_basic.ts ) | Web search with results |
215+ | [ ` search/search_with_extraction.ts ` ] ( examples/search/search_with_extraction.ts ) | Search + AI extraction |
216+ | [ ` crawl/crawl_basic.ts ` ] ( examples/crawl/crawl_basic.ts ) | Start and monitor a crawl |
217+ | [ ` crawl/crawl_with_formats.ts ` ] ( examples/crawl/crawl_with_formats.ts ) | Crawl with screenshots and patterns |
218+ | [ ` monitor/monitor_basic.ts ` ] ( examples/monitor/monitor_basic.ts ) | Create a page monitor |
219+ | [ ` monitor/monitor_with_webhook.ts ` ] ( examples/monitor/monitor_with_webhook.ts ) | Monitor with webhook notifications |
220+ | [ ` schema/generate_schema_basic.ts ` ] ( examples/schema/generate_schema_basic.ts ) | Generate JSON schema from prompt |
221+ | [ ` schema/modify_existing_schema.ts ` ] ( examples/schema/modify_existing_schema.ts ) | Modify an existing schema |
222+ | [ ` utilities/credits.ts ` ] ( examples/utilities/credits.ts ) | Check account credits and limits |
223+ | [ ` utilities/health.ts ` ] ( examples/utilities/health.ts ) | API health check |
224+ | [ ` utilities/history.ts ` ] ( examples/utilities/history.ts ) | Request history |
225225
226226## Environment Variables
227227
228228| Variable | Description | Default |
229- | ---| ---| ---|
230- | ` SGAI_API_URL ` | Override API base URL | ` https://api.scrapegraphai.com/v1 ` |
229+ | ---------- | ------------- | ------ ---|
230+ | ` SGAI_API_URL ` | Override API base URL | ` https://api.scrapegraphai.com/v2 ` |
231231| ` SGAI_DEBUG ` | Enable debug logging (` "1" ` ) | off |
232232| ` SGAI_TIMEOUT_S ` | Request timeout in seconds | ` 120 ` |
233233
234234## Development
235235
236236``` bash
237237bun install
238- bun test # 21 tests
239- bun run build # tsup → dist/
240- bun run check # tsc --noEmit + biome
238+ bun run test # unit tests
239+ bun run test:integration # live API tests (requires SGAI_API_KEY)
240+ bun run build # tsup → dist/
241+ bun run check # tsc --noEmit + biome
241242```
242243
243244## License
0 commit comments