feat!: migrate MCP server to ScrapeGraph API v2

VinciGit00 · VinciGit00 · commit fad0e1c0e123 · 2026-04-03T07:39:03.000+02:00
Align with scrapegraph-py PR #82: base URL /api/v2, Bearer + SGAI-APIKEY
headers, X-SDK-Version, and v2 endpoints for scrape, extract, search, crawl,
credits, history, and monitor.

BREAKING CHANGE: Removes sitemap, agentic_scrapper, markdownify_status, and
smartscraper_status. Crawl supports markdown/html only (no AI crawl mode).
Adds crawl_stop, crawl_resume, credits, sgai_history, and monitor_* tools.

Optional SCRAPEGRAPH_API_BASE_URL for custom API hosts.

Made-with: Cursor
diff --git a/.agent/README.md b/.agent/README.md
@@ -12,7 +12,7 @@ Complete system architecture documentation including:
 - **Technology Stack** - Python 3.10+, FastMCP, httpx dependencies
 - **Project Structure** - File organization and key files
 - **Core Architecture** - MCP design, server architecture, patterns
-- **MCP Tools** - All 5 tools (markdownify, smartscraper, searchscraper, smartcrawler_initiate, smartcrawler_fetch_results)
+- **MCP Tools** - API v2 tools (markdownify, scrape, smartscraper, searchscraper, crawl, credits, history, monitor, …)
 - **API Integration** - ScrapeGraphAI API endpoints and credit system
 - **Deployment** - Smithery, Claude Desktop, Cursor, Docker setup
 - **Recent Updates** - SmartCrawler integration and latest features
@@ -95,7 +95,7 @@ Complete Model Context Protocol integration documentation:
 
 **...available tools and their parameters:**
 - Read: [Project Architecture - MCP Tools](./system/project_architecture.md#mcp-tools)
-- Quick reference: 5 tools (markdownify, smartscraper, searchscraper, smartcrawler_initiate, smartcrawler_fetch_results)
+- Quick reference: see README “Available Tools” table (v2: + scrape, crawl_stop/resume, credits, sgai_history, monitor_*; removed sitemap, agentic_scrapper, *\_status tools)
 
 **...error handling:**
 - Read: [MCP Protocol - Error Handling](./system/mcp_protocol.md#error-handling)
@@ -134,6 +134,7 @@ npx @modelcontextprotocol/inspector scrapegraph-mcp
 **Manual Testing (stdio):**
 ```bash
 echo '{"jsonrpc":"2.0","method":"tools/call","params":{"name":"markdownify","arguments":{"website_url":"https://scrapegraphai.com"}},"id":1}' | scrapegraph-mcp
+# (v2: same tool name; backend calls POST /scrape)
 ```
 
 **Integration Testing (Claude Desktop):**
@@ -174,13 +175,14 @@ echo '{"jsonrpc":"2.0","method":"tools/list","id":1}' | docker run -i -e SGAI_AP
 
 Quick reference to all MCP tools:
 
-| Tool | Parameters | Purpose | Credits | Async |
-|------|------------|---------|---------|-------|
-| `markdownify` | `website_url` | Convert webpage to markdown | 2 | No |
-| `smartscraper` | `user_prompt`, `website_url`, `number_of_scrolls?`, `markdown_only?` | AI-powered data extraction | 10+ | No |
-| `searchscraper` | `user_prompt`, `num_results?`, `number_of_scrolls?`, `time_range?` | AI-powered web search | Variable | No |
-| `smartcrawler_initiate` | `url`, `prompt?`, `extraction_mode`, `depth?`, `max_pages?`, `same_domain_only?` | Start multi-page crawl | 100+ | Yes (returns request_id) |
-| `smartcrawler_fetch_results` | `request_id` | Get crawl results | N/A | No (polls status) |
+| Tool | Notes |
+|------|--------|
+| `markdownify` / `scrape` | POST /scrape (v2) |
+| `smartscraper` | POST /extract; URL only |
+| `searchscraper` | POST /search; num_results 3–20 |
+| `smartcrawler_*`, `crawl_stop`, `crawl_resume` | POST/GET /crawl |
+| `credits`, `sgai_history` | GET /credits, /history |
+| `monitor_*` | /monitor namespace |
 
 For detailed tool documentation, see [Project Architecture - MCP Tools](./system/project_architecture.md#mcp-tools).
 
@@ -376,8 +378,11 @@ npx @modelcontextprotocol/inspector scrapegraph-mcp
 
 ## 📅 Changelog
 
+### April 2026
+- ✅ Migrated MCP client and tools to **API v2** ([scrapegraph-py#82](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/82)): base `https://api.scrapegraphai.com/api/v2`, Bearer + SGAI-APIKEY, new crawl/monitor/credits/history tools; removed sitemap, agentic_scrapper, status polling tools.
+
 ### January 2026
-- ✅ Added `time_range` parameter to SearchScraper for filtering results by recency
+- ✅ Added `time_range` parameter to SearchScraper for filtering results by recency (v1-era; **ignored on API v2**)
 - ✅ Supported time ranges: `past_hour`, `past_24_hours`, `past_week`, `past_month`, `past_year`
 - ✅ Documentation updated to reflect SDK changes (scrapegraph-py#77, scrapegraph-js#2)
 
diff --git a/.agent/system/project_architecture.md b/.agent/system/project_architecture.md
@@ -1,7 +1,7 @@
 # ScrapeGraph MCP Server - Project Architecture
 
-**Last Updated:** January 2026
-**Version:** 1.0.0
+**Last Updated:** April 2026
+**Version:** 2.0.0
 
 ## Table of Contents
 - [System Overview](#system-overview)
@@ -19,11 +19,12 @@
 
 The ScrapeGraph MCP Server is a production-ready [Model Context Protocol](https://modelcontextprotocol.io/introduction) (MCP) server that provides seamless integration between AI assistants (like Claude, Cursor, etc.) and the [ScrapeGraphAI API](https://scrapegraphai.com). This server enables language models to leverage advanced AI-powered web scraping capabilities with enterprise-grade reliability.
 
-**Key Capabilities:**
-- **Markdownify** - Convert webpages to clean, structured markdown
-- **SmartScraper** - AI-powered structured data extraction from webpages
-- **SearchScraper** - AI-powered web searches with structured results
-- **SmartCrawler** - Intelligent multi-page web crawling with AI extraction or markdown conversion
+**Key Capabilities (API v2):**
+- **Scrape** (`markdownify`, `scrape`) — POST `/api/v2/scrape`
+- **Extract** (`smartscraper`) — POST `/api/v2/extract` (URL-only)
+- **Search** (`searchscraper`) — POST `/api/v2/search`
+- **Crawl** — POST/GET `/api/v2/crawl` (+ stop/resume); markdown/html crawl only
+- **Monitor, credits, history** — `/api/v2/monitor`, `/credits`, `/history`
 
 **Purpose:**
 - Bridge AI assistants (Claude, Cursor, etc.) with web scraping capabilities
@@ -129,7 +130,7 @@ AI Assistant (Claude/Cursor)
     ↓ (stdio via MCP)
 FastMCP Server (this project)
     ↓ (HTTPS API calls)
-ScrapeGraphAI API (https://api.scrapegraphai.com/v1)
+ScrapeGraphAI API (default https://api.scrapegraphai.com/api/v2)
     ↓ (web scraping)
 Target Websites
 ```
@@ -139,10 +140,10 @@ Target Websites
 The server follows a simple, single-file architecture:
 
 **`ScapeGraphClient` Class:**
-- HTTP client wrapper for ScrapeGraphAI API
-- Base URL: `https://api.scrapegraphai.com/v1`
-- API key authentication via `SGAI-APIKEY` header
-- Methods: `markdownify()`, `smartscraper()`, `searchscraper()`, `smartcrawler_initiate()`, `smartcrawler_fetch_results()`
+- HTTP client wrapper for ScrapeGraphAI API v2 ([scrapegraph-py#82](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/82))
+- Base URL: `https://api.scrapegraphai.com/api/v2` (override with env `SCRAPEGRAPH_API_BASE_URL`)
+- Auth: `Authorization: Bearer`, `SGAI-APIKEY`, `X-SDK-Version: scrapegraph-mcp@2.0.0`
+- v2 methods include `scrape_v2`, `extract`, `search_api`, `crawl_*`, `monitor_*`, `credits`, `history`, plus compatibility wrappers used by MCP tools
 
 **FastMCP Server:**
 - Created with `FastMCP("ScapeGraph API MCP Server")`
@@ -185,7 +186,9 @@ The server follows a simple, single-file architecture:
 
 ## MCP Tools
 
-The server exposes 5 tools to AI assistants:
+The server exposes many `@mcp.tool()` handlers (see repository `README.md` for the full table). The detailed subsections below still use **v1-style endpoint names** in several places; treat them as illustrative and prefer the v2 mapping in **API Integration**.
+
+**v2 tool names:** `markdownify`, `scrape`, `smartscraper`, `searchscraper`, `smartcrawler_initiate`, `smartcrawler_fetch_results`, `crawl_stop`, `crawl_resume`, `credits`, `sgai_history`, `monitor_create`, `monitor_list`, `monitor_get`, `monitor_pause`, `monitor_resume`, `monitor_delete`.
 
 ### 1. `markdownify(website_url: str)`
 
@@ -388,21 +391,29 @@ If status is "completed":
 
 ### ScrapeGraphAI API
 
-**Base URL:** `https://api.scrapegraphai.com/v1`
+**Base URL:** `https://api.scrapegraphai.com/api/v2` (configurable via `SCRAPEGRAPH_API_BASE_URL`)
 
 **Authentication:**
-- Header: `SGAI-APIKEY: your-api-key`
+- Headers: `Authorization: Bearer <key>`, `SGAI-APIKEY: <key>`
 - Obtain API key from: [ScrapeGraph Dashboard](https://dashboard.scrapegraphai.com)
 
-**Endpoints Used:**
-
-| Endpoint | Method | Tool |
-|----------|--------|------|
-| `/v1/markdownify` | POST | `markdownify()` |
-| `/v1/smartscraper` | POST | `smartscraper()` |
-| `/v1/searchscraper` | POST | `searchscraper()` |
-| `/v1/crawl` | POST | `smartcrawler_initiate()` |
-| `/v1/crawl/{request_id}` | GET | `smartcrawler_fetch_results()` |
+**Endpoints used (v2):**
+
+| Endpoint | Method | MCP tools (typical) |
+|----------|--------|---------------------|
+| `/scrape` | POST | `markdownify`, `scrape` |
+| `/extract` | POST | `smartscraper` |
+| `/search` | POST | `searchscraper` |
+| `/crawl` | POST | `smartcrawler_initiate` |
+| `/crawl/{id}` | GET | `smartcrawler_fetch_results` |
+| `/crawl/{id}/stop` | POST | `crawl_stop` |
+| `/crawl/{id}/resume` | POST | `crawl_resume` |
+| `/credits` | GET | `credits` |
+| `/history` | GET | `sgai_history` |
+| `/monitor` | POST, GET | `monitor_create`, `monitor_list` |
+| `/monitor/{id}` | GET, DELETE | `monitor_get`, `monitor_delete` |
+| `/monitor/{id}/pause` | POST | `monitor_pause` |
+| `/monitor/{id}/resume` | POST | `monitor_resume` |
 
 **Request Format:**
 ```json
diff --git a/README.md b/README.md
@@ -26,18 +26,21 @@ A production-ready [Model Context Protocol](https://modelcontextprotocol.io/intr
 - [Technology Stack](#technology-stack)
 - [License](#license)
 
+## API v2
+
+This MCP server targets **ScrapeGraph API v2** (`https://api.scrapegraphai.com/api/v2`), aligned with
+[scrapegraph-py PR #82](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/82). Auth sends both
+`Authorization: Bearer` and `SGAI-APIKEY`. Override the base URL with **`SCRAPEGRAPH_API_BASE_URL`** if needed.
+
 ## Key Features
 
-- **8 Powerful Tools**: From simple markdown conversion to complex multi-page crawling and agentic workflows
-- **AI-Powered Extraction**: Intelligently extract structured data using natural language prompts
-- **Multi-Page Crawling**: SmartCrawler supports asynchronous crawling with configurable depth and page limits
-- **Infinite Scroll Support**: Handle dynamic content loading with configurable scroll counts
-- **JavaScript Rendering**: Full support for JavaScript-heavy websites
-- **Flexible Output Formats**: Get results as markdown, structured JSON, or custom schemas
-- **Easy Integration**: Works seamlessly with Claude Desktop, Cursor, and any MCP-compatible client
-- **Enterprise-Ready**: Robust error handling, timeout management, and production-tested reliability
-- **Simple Deployment**: One-command installation via Smithery or manual setup
-- **Comprehensive Documentation**: Detailed developer docs in `.agent/` folder
+- **Scrape & extract**: `markdownify` / `scrape` (POST /scrape), `smartscraper` (POST /extract, URL only)
+- **Search**: `searchscraper` (POST /search; `num_results` clamped 3–20)
+- **Crawl**: Async multi-page crawl in **markdown** or **html** only; `crawl_stop` / `crawl_resume`
+- **Monitors**: Scheduled jobs via `monitor_create`, `monitor_list`, `monitor_get`, pause/resume/delete
+- **Account**: `credits`, `sgai_history`
+- **Easy integration**: Claude Desktop, Cursor, Smithery, HTTP transport
+- **Developer docs**: `.agent/` folder
 
 ## Quick Start
 
@@ -62,112 +65,20 @@ That's it! The server is now available to your AI assistant.
 
 ## Available Tools
 
-The server provides **8 enterprise-ready tools** for AI-powered web scraping:
-
-### Core Scraping Tools
-
-#### 1. `markdownify`
-Transform any webpage into clean, structured markdown format.
-
-```python
-markdownify(website_url: str)
-```
-- **Credits**: 2 per request
-- **Use case**: Quick webpage content extraction in markdown
-
-#### 2. `smartscraper`
-Leverage AI to extract structured data from any webpage with support for infinite scrolling.
-
-```python
-smartscraper(
-    user_prompt: str,
-    website_url: str,
-    number_of_scrolls: int = None,
-    markdown_only: bool = None
-)
-```
-- **Credits**: 10+ (base) + variable based on scrolling
-- **Use case**: AI-powered data extraction with custom prompts
-
-#### 3. `searchscraper`
-Execute AI-powered web searches with structured, actionable results.
-
-```python
-searchscraper(
-    user_prompt: str,
-    num_results: int = None,
-    number_of_scrolls: int = None,
-    time_range: str = None  # Filter by: past_hour, past_24_hours, past_week, past_month, past_year
-)
-```
-- **Credits**: Variable (3-20 websites × 10 credits)
-- **Use case**: Multi-source research and data aggregation
-- **Time filtering**: Use `time_range` to filter results by recency (e.g., `"past_week"` for recent results)
-
-### Advanced Scraping Tools
-
-#### 4. `scrape`
-Basic scraping endpoint to fetch page content with optional heavy JavaScript rendering.
-
-```python
-scrape(website_url: str, render_heavy_js: bool = None)
-```
-- **Use case**: Simple page content fetching with JS rendering support
-
-#### 5. `sitemap`
-Extract sitemap URLs and structure for any website.
-
-```python
-sitemap(website_url: str)
-```
-- **Use case**: Website structure analysis and URL discovery
-
-### Multi-Page Crawling
-
-#### 6. `smartcrawler_initiate`
-Initiate intelligent multi-page web crawling (asynchronous operation).
-
-```python
-smartcrawler_initiate(
-    url: str,
-    prompt: str = None,
-    extraction_mode: str = "ai",
-    depth: int = None,
-    max_pages: int = None,
-    same_domain_only: bool = None
-)
-```
-- **AI Extraction Mode**: 10 credits per page - extracts structured data
-- **Markdown Mode**: 2 credits per page - converts to markdown
-- **Returns**: `request_id` for polling
-- **Use case**: Large-scale website crawling and data extraction
-
-#### 7. `smartcrawler_fetch_results`
-Retrieve results from asynchronous crawling operations.
-
-```python
-smartcrawler_fetch_results(request_id: str)
-```
-- **Returns**: Status and results when crawling is complete
-- **Use case**: Poll for crawl completion and retrieve results
-
-### Intelligent Agent-Based Scraping
-
-#### 8. `agentic_scrapper`
-Run advanced agentic scraping workflows with customizable steps and structured output schemas.
-
-```python
-agentic_scrapper(
-    url: str,
-    user_prompt: str = None,
-    output_schema: dict = None,
-    steps: list = None,
-    ai_extraction: bool = None,
-    persistent_session: bool = None,
-    timeout_seconds: float = None
-)
-```
-- **Use case**: Complex multi-step workflows with custom schemas and persistent sessions
+| Tool | Role |
+|------|------|
+| `markdownify` | POST /scrape (markdown) |
+| `scrape` | POST /scrape (`output_format`: markdown, html, screenshot, branding) |
+| `smartscraper` | POST /extract (requires `website_url`; no inline HTML/markdown body on v2) |
+| `searchscraper` | POST /search (`num_results` 3–20; `time_range` / `number_of_scrolls` ignored on v2) |
+| `smartcrawler_initiate` | POST /crawl — `extraction_mode` **`markdown`** or **`html`** (default markdown). No AI crawl across pages. |
+| `smartcrawler_fetch_results` | GET /crawl/:id |
+| `crawl_stop`, `crawl_resume` | POST /crawl/:id/stop \| resume |
+| `credits` | GET /credits |
+| `sgai_history` | GET /history |
+| `monitor_create`, `monitor_list`, `monitor_get`, `monitor_pause`, `monitor_resume`, `monitor_delete` | /monitor API |
+
+**Removed vs older MCP releases:** `sitemap`, `agentic_scrapper`, `markdownify_status`, `smartscraper_status` (no v2 endpoints).
 
 ## Setup Instructions
 
@@ -482,7 +393,7 @@ root_agent = LlmAgent(
 - Adjust based on your use case (crawling operations may need even longer timeouts)
 
 **Tool Filtering:**
-- By default, all 8 tools are exposed to the agent
+- By default, all registered MCP tools are exposed to the agent (see [Available Tools](#available-tools))
 - Use `tool_filter` to limit which tools are available:
   ```python
   tool_filter=['markdownify', 'smartscraper', 'searchscraper']
@@ -520,20 +431,18 @@ The server enables sophisticated queries across various scraping scenarios:
 - **SearchScraper**: "Research and summarize recent developments in AI-powered web scraping"
 - **SearchScraper**: "Search for the top 5 articles about machine learning frameworks and extract key insights"
 - **SearchScraper**: "Find recent news about GPT-4 and provide a structured summary"
-- **SearchScraper with time_range**: "Search for AI news from the past week only" (uses `time_range="past_week"`)
+- **SearchScraper**: v2 does not apply `time_range`; phrase queries to bias recency in natural language instead
 
-### Website Analysis
-- **Sitemap**: "Extract the complete sitemap structure from the ScrapeGraph website"
-- **Sitemap**: "Discover all URLs on this blog site"
+### Website analysis
+- Use **`smartcrawler_initiate`** (markdown/html) plus **`smartcrawler_fetch_results`** to map and capture multi-page content; there is no separate **sitemap** tool on v2.
 
-### Multi-Page Crawling
-- **SmartCrawler (AI mode)**: "Crawl the entire documentation site and extract all API endpoints with descriptions"
-- **SmartCrawler (Markdown mode)**: "Convert all pages in the blog to markdown up to 2 levels deep"
-- **SmartCrawler**: "Extract all product information from an e-commerce site, maximum 100 pages, same domain only"
+### Multi-page crawling
+- **SmartCrawler (markdown/html)**: "Crawl the blog in markdown mode and poll until complete"
+- For structured fields per page, run **`smartscraper`** on individual URLs (or **`monitor_create`** on a schedule)
 
-### Advanced Agentic Scraping
-- **Agentic Scraper**: "Navigate through a multi-step authentication form and extract user dashboard data"
-- **Agentic Scraper with schema**: "Follow pagination links and compile a dataset with schema: {title, author, date, content}"
+### Monitors and account
+- **Monitor**: "Run this extract prompt on https://example.com every day at 9am" (`monitor_create` with cron)
+- **Credits / history**: `credits`, `sgai_history`
 - **Agentic Scraper**: "Execute a complex workflow: login, navigate to reports, download data, and extract summary statistics"
 
 ## Error Handling
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "scrapegraph-mcp"
-version = "1.0.1"
+version = "2.0.0"
 description = "MCP server for ScapeGraph API integration"
 license = {text = "MIT"}
 readme = "README.md"
@@ -52,6 +52,10 @@ line-length = 100
 target-version = "py312"
 select = ["E", "F", "I", "B", "W"]
 
+[tool.ruff.lint.per-file-ignores]
+# MCP tool docstrings and embedded guides exceed 100 cols by design.
+"src/scrapegraph_mcp/server.py" = ["E501"]
+
 [tool.mypy]
 python_version = "3.12"
 warn_return_any = true
diff --git a/server.json b/server.json
diff --git a/src/scrapegraph_mcp/server.py b/src/scrapegraph_mcp/server.py