You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Align with scrapegraph-py PR #82: base URL /api/v2, Bearer + SGAI-APIKEY
headers, X-SDK-Version, and v2 endpoints for scrape, extract, search, crawl,
credits, history, and monitor.
BREAKING CHANGE: Removes sitemap, agentic_scrapper, markdownify_status, and
smartscraper_status. Crawl supports markdown/html only (no AI crawl mode).
Adds crawl_stop, crawl_resume, credits, sgai_history, and monitor_* tools.
Optional SCRAPEGRAPH_API_BASE_URL for custom API hosts.
Made-with: Cursor
- ✅ Migrated MCP client and tools to **API v2** ([scrapegraph-py#82](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/82)): base `https://api.scrapegraphai.com/api/v2`, Bearer + SGAI-APIKEY, new crawl/monitor/credits/history tools; removed sitemap, agentic_scrapper, status polling tools.
383
+
379
384
### January 2026
380
-
- ✅ Added `time_range` parameter to SearchScraper for filtering results by recency
385
+
- ✅ Added `time_range` parameter to SearchScraper for filtering results by recency (v1-era; **ignored on API v2**)
381
386
- ✅ Supported time ranges: `past_hour`, `past_24_hours`, `past_week`, `past_month`, `past_year`
382
387
- ✅ Documentation updated to reflect SDK changes (scrapegraph-py#77, scrapegraph-js#2)
Copy file name to clipboardExpand all lines: .agent/system/project_architecture.md
+35-24Lines changed: 35 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
# ScrapeGraph MCP Server - Project Architecture
2
2
3
-
**Last Updated:**January 2026
4
-
**Version:**1.0.0
3
+
**Last Updated:**April 2026
4
+
**Version:**2.0.0
5
5
6
6
## Table of Contents
7
7
-[System Overview](#system-overview)
@@ -19,11 +19,12 @@
19
19
20
20
The ScrapeGraph MCP Server is a production-ready [Model Context Protocol](https://modelcontextprotocol.io/introduction) (MCP) server that provides seamless integration between AI assistants (like Claude, Cursor, etc.) and the [ScrapeGraphAI API](https://scrapegraphai.com). This server enables language models to leverage advanced AI-powered web scraping capabilities with enterprise-grade reliability.
21
21
22
-
**Key Capabilities:**
23
-
-**Markdownify** - Convert webpages to clean, structured markdown
24
-
-**SmartScraper** - AI-powered structured data extraction from webpages
25
-
-**SearchScraper** - AI-powered web searches with structured results
26
-
-**SmartCrawler** - Intelligent multi-page web crawling with AI extraction or markdown conversion
22
+
**Key Capabilities (API v2):**
23
+
-**Scrape** (`markdownify`, `scrape`) — POST `/api/v2/scrape`
24
+
-**Extract** (`smartscraper`) — POST `/api/v2/extract` (URL-only)
25
+
-**Search** (`searchscraper`) — POST `/api/v2/search`
26
+
-**Crawl** — POST/GET `/api/v2/crawl` (+ stop/resume); markdown/html crawl only
-v2 methods include `scrape_v2`, `extract`, `search_api`, `crawl_*`, `monitor_*`, `credits`, `history`, plus compatibility wrappers used by MCP tools
146
147
147
148
**FastMCP Server:**
148
149
- Created with `FastMCP("ScapeGraph API MCP Server")`
@@ -185,7 +186,9 @@ The server follows a simple, single-file architecture:
185
186
186
187
## MCP Tools
187
188
188
-
The server exposes 5 tools to AI assistants:
189
+
The server exposes many `@mcp.tool()` handlers (see repository `README.md` for the full table). The detailed subsections below still use **v1-style endpoint names** in several places; treat them as illustrative and prefer the v2 mapping in **API Integration**.
@@ -520,20 +431,18 @@ The server enables sophisticated queries across various scraping scenarios:
520
431
-**SearchScraper**: "Research and summarize recent developments in AI-powered web scraping"
521
432
-**SearchScraper**: "Search for the top 5 articles about machine learning frameworks and extract key insights"
522
433
-**SearchScraper**: "Find recent news about GPT-4 and provide a structured summary"
523
-
-**SearchScraper with time_range**: "Search for AI news from the past week only" (uses `time_range="past_week"`)
434
+
-**SearchScraper**: v2 does not apply `time_range`; phrase queries to bias recency in natural language instead
524
435
525
-
### Website Analysis
526
-
-**Sitemap**: "Extract the complete sitemap structure from the ScrapeGraph website"
527
-
-**Sitemap**: "Discover all URLs on this blog site"
436
+
### Website analysis
437
+
- Use **`smartcrawler_initiate`** (markdown/html) plus **`smartcrawler_fetch_results`** to map and capture multi-page content; there is no separate **sitemap** tool on v2.
528
438
529
-
### Multi-Page Crawling
530
-
-**SmartCrawler (AI mode)**: "Crawl the entire documentation site and extract all API endpoints with descriptions"
531
-
-**SmartCrawler (Markdown mode)**: "Convert all pages in the blog to markdown up to 2 levels deep"
532
-
-**SmartCrawler**: "Extract all product information from an e-commerce site, maximum 100 pages, same domain only"
439
+
### Multi-page crawling
440
+
-**SmartCrawler (markdown/html)**: "Crawl the blog in markdown mode and poll until complete"
441
+
- For structured fields per page, run **`smartscraper`** on individual URLs (or **`monitor_create`** on a schedule)
533
442
534
-
### Advanced Agentic Scraping
535
-
-**Agentic Scraper**: "Navigate through a multi-step authentication form and extract user dashboard data"
536
-
-**Agentic Scraper with schema**: "Follow pagination links and compile a dataset with schema: {title, author, date, content}"
443
+
### Monitors and account
444
+
-**Monitor**: "Run this extract prompt on https://example.com every day at 9am" (`monitor_create` with cron)
445
+
-**Credits / history**: `credits`, `sgai_history`
537
446
-**Agentic Scraper**: "Execute a complex workflow: login, navigate to reports, download data, and extract summary statistics"
0 commit comments