Skip to content

Commit 0289ac3

Browse files
VinciGit00claude
andcommitted
feat: update docs for v2 API - new services, langchain integration, and navigation
- Add v2 service pages: Extract, Search, Crawl, Monitor - Update Scrape service for v2 format-based API - Update LangChain integration for v2 tools (ExtractTool, SearchTool, etc.) - Update v2 navigation: remove old services (SmartScraper, SearchScraper, Markdownify, SmartCrawler, Sitemap, AgenticScraper, Langflow) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent e3f3927 commit 0289ac3

7 files changed

Lines changed: 1075 additions & 276 deletions

File tree

docs.json

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -47,13 +47,11 @@
4747
{
4848
"group": "Services",
4949
"pages": [
50-
"services/smartscraper",
51-
"services/searchscraper",
52-
"services/markdownify",
50+
"services/extract",
51+
"services/search",
5352
"services/scrape",
54-
"services/smartcrawler",
55-
"services/sitemap",
56-
"services/agenticscraper",
53+
"services/crawl",
54+
"services/monitor",
5755
{
5856
"group": "CLI",
5957
"icon": "terminal",
@@ -102,7 +100,6 @@
102100
"integrations/llamaindex",
103101
"integrations/crewai",
104102
"integrations/agno",
105-
"integrations/langflow",
106103
"integrations/vercel_ai",
107104
"integrations/google-adk",
108105
"integrations/x402"

integrations/langchain.mdx

Lines changed: 154 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -25,81 +25,72 @@ pip install langchain-scrapegraph
2525

2626
## Available Tools
2727

28-
### SmartScraperTool
28+
### ExtractTool
2929

3030
Extract structured data from any webpage using natural language prompts:
3131

3232
```python
33-
from langchain_scrapegraph.tools import SmartScraperTool
33+
from langchain_scrapegraph.tools import ExtractTool
3434

3535
# Initialize the tool (uses SGAI_API_KEY from environment)
36-
tool = SmartscraperTool()
36+
tool = ExtractTool()
3737

3838
# Extract information using natural language
3939
result = tool.invoke({
40-
"website_url": "https://www.example.com",
41-
"user_prompt": "Extract the main heading and first paragraph"
40+
"url": "https://www.example.com",
41+
"prompt": "Extract the main heading and first paragraph"
4242
})
4343
```
4444

4545
<Accordion title="Using Output Schemas" icon="code">
4646
Define the structure of the output using Pydantic models:
4747

4848
```python
49-
from typing import List
5049
from pydantic import BaseModel, Field
51-
from langchain_scrapegraph.tools import SmartScraperTool
50+
from langchain_scrapegraph.tools import ExtractTool
5251

5352
class WebsiteInfo(BaseModel):
54-
title: str = Field(description="The main title of the webpage")
55-
description: str = Field(description="The main description or first paragraph")
56-
urls: List[str] = Field(description="The URLs inside the webpage")
53+
title: str = Field(description="The main title of the page")
54+
description: str = Field(description="The main description")
5755

58-
# Initialize with schema
59-
tool = SmartScraperTool(llm_output_schema=WebsiteInfo)
56+
# Initialize with output schema
57+
tool = ExtractTool(llm_output_schema=WebsiteInfo)
6058

6159
result = tool.invoke({
62-
"website_url": "https://www.example.com",
63-
"user_prompt": "Extract the website information"
60+
"url": "https://example.com",
61+
"prompt": "Extract the title and description"
6462
})
6563
```
6664
</Accordion>
6765

68-
### SearchScraperTool
66+
### SearchTool
6967

70-
Process HTML content directly with AI extraction:
68+
Search the web and extract structured results using AI:
7169

7270
```python
73-
from langchain_scrapegraph.tools import SearchScraperTool
71+
from langchain_scrapegraph.tools import SearchTool
7472

75-
76-
tool = SearchScraperTool()
73+
tool = SearchTool()
7774
result = tool.invoke({
78-
"user_prompt": "Find the best restaurants in San Francisco",
75+
"query": "Find the best restaurants in San Francisco",
7976
})
80-
8177
```
8278

83-
<Accordion title="Using Output Schemas" icon="code">
84-
```python
85-
from typing import Optional
86-
from pydantic import BaseModel, Field
87-
from langchain_scrapegraph.tools import SearchScraperTool
79+
### ScrapeTool
8880

89-
class RestaurantInfo(BaseModel):
90-
name: str = Field(description="The restaurant name")
91-
address: str = Field(description="The restaurant address")
92-
rating: float = Field(description="The restaurant rating")
81+
Scrape a webpage and return it in the desired format:
9382

83+
```python
84+
from langchain_scrapegraph.tools import ScrapeTool
9485

95-
tool = SearchScraperTool(llm_output_schema=RestaurantInfo)
86+
tool = ScrapeTool()
9687

97-
result = tool.invoke({
98-
"user_prompt": "Find the best restaurants in San Francisco"
99-
})
88+
# Scrape as markdown (default)
89+
result = tool.invoke({"url": "https://example.com"})
10090

91+
# Scrape as HTML
92+
result = tool.invoke({"url": "https://example.com", "format": "html"})
10193
```
102-
</Accordion>
10394

10495
### MarkdownifyTool
10596

@@ -112,34 +103,146 @@ tool = MarkdownifyTool()
112103
markdown = tool.invoke({"website_url": "https://example.com"})
113104
```
114105

106+
### Crawl Tools
107+
108+
Start and manage crawl jobs with `CrawlStartTool`, `CrawlStatusTool`, `CrawlStopTool`, and `CrawlResumeTool`:
109+
110+
```python
111+
import time
112+
from langchain_scrapegraph.tools import CrawlStartTool, CrawlStatusTool
113+
114+
start_tool = CrawlStartTool()
115+
status_tool = CrawlStatusTool()
116+
117+
# Start a crawl job
118+
result = start_tool.invoke({
119+
"url": "https://example.com",
120+
"depth": 2,
121+
"max_pages": 5,
122+
"format": "markdown",
123+
})
124+
print("Crawl started:", result)
125+
126+
# Check status
127+
crawl_id = result.get("id")
128+
if crawl_id:
129+
time.sleep(5)
130+
status = status_tool.invoke({"crawl_id": crawl_id})
131+
print("Crawl status:", status)
132+
```
133+
134+
### Monitor Tools
135+
136+
Create and manage monitors (replaces scheduled jobs) with `MonitorCreateTool`, `MonitorListTool`, `MonitorGetTool`, `MonitorPauseTool`, `MonitorResumeTool`, and `MonitorDeleteTool`:
137+
138+
```python
139+
from langchain_scrapegraph.tools import MonitorCreateTool, MonitorListTool
140+
141+
create_tool = MonitorCreateTool()
142+
list_tool = MonitorListTool()
143+
144+
# Create a monitor
145+
result = create_tool.invoke({
146+
"name": "Price Monitor",
147+
"url": "https://example.com/products",
148+
"prompt": "Extract current product prices",
149+
"cron": "0 9 * * *", # Daily at 9 AM
150+
})
151+
print("Monitor created:", result)
152+
153+
# List all monitors
154+
monitors = list_tool.invoke({})
155+
print("All monitors:", monitors)
156+
```
157+
158+
### HistoryTool
159+
160+
Retrieve request history:
161+
162+
```python
163+
from langchain_scrapegraph.tools import HistoryTool
164+
165+
tool = HistoryTool()
166+
history = tool.invoke({})
167+
```
168+
169+
### GetCreditsTool
170+
171+
Check your remaining API credits:
172+
173+
```python
174+
from langchain_scrapegraph.tools import GetCreditsTool
175+
176+
tool = GetCreditsTool()
177+
credits = tool.invoke({})
178+
```
179+
115180
## Example Agent
116181

117182
Create a research agent that can gather and analyze web data:
118183

119184
```python
120-
from langchain.agents import initialize_agent, AgentType
121-
from langchain_scrapegraph.tools import SmartScraperTool
185+
from langchain.agents import AgentExecutor, create_openai_functions_agent
186+
from langchain_core.messages import SystemMessage
187+
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
122188
from langchain_openai import ChatOpenAI
189+
from langchain_scrapegraph.tools import ExtractTool, GetCreditsTool, SearchTool
123190

124-
# Initialize tools
191+
# Initialize the tools
125192
tools = [
126-
SmartScraperTool(),
193+
ExtractTool(),
194+
GetCreditsTool(),
195+
SearchTool(),
127196
]
128197

129-
# Create an agent
130-
agent = initialize_agent(
131-
tools=tools,
132-
llm=ChatOpenAI(temperature=0),
133-
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
134-
verbose=True
135-
)
136-
137-
# Use the agent
138-
response = agent.run("""
139-
Visit example.com, make a summary of the content and extract the main heading and first paragraph
140-
""")
198+
# Create the prompt template
199+
prompt = ChatPromptTemplate.from_messages([
200+
SystemMessage(
201+
content=(
202+
"You are a helpful AI assistant that can analyze websites and extract information. "
203+
"You have access to tools that can help you scrape and process web content. "
204+
"Always explain what you're doing before using a tool."
205+
)
206+
),
207+
MessagesPlaceholder(variable_name="chat_history", optional=True),
208+
("user", "{input}"),
209+
MessagesPlaceholder(variable_name="agent_scratchpad"),
210+
])
211+
212+
# Initialize the LLM
213+
llm = ChatOpenAI(temperature=0)
214+
215+
# Create the agent
216+
agent = create_openai_functions_agent(llm, tools, prompt)
217+
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
218+
219+
# Example usage
220+
response = agent_executor.invoke({
221+
"input": "Extract the main products from https://www.scrapegraphai.com/"
222+
})
223+
print(response["output"])
141224
```
142225

226+
## Migration from v1
227+
228+
If you're upgrading from v1, here are the key changes:
229+
230+
| v1 Tool | v2 Tool |
231+
|---------|---------|
232+
| `SmartScraperTool` | `ExtractTool` |
233+
| `SearchScraperTool` | `SearchTool` |
234+
| `SmartCrawlerTool` | `CrawlStartTool` / `CrawlStatusTool` / `CrawlStopTool` / `CrawlResumeTool` |
235+
| `CreateScheduledJobTool` | `MonitorCreateTool` |
236+
| `GetScheduledJobsTool` | `MonitorListTool` |
237+
| `GetScheduledJobTool` | `MonitorGetTool` |
238+
| `PauseScheduledJobTool` | `MonitorPauseTool` |
239+
| `ResumeScheduledJobTool` | `MonitorResumeTool` |
240+
| `DeleteScheduledJobTool` | `MonitorDeleteTool` |
241+
| `MarkdownifyTool` | `MarkdownifyTool` (unchanged) |
242+
| `GetCreditsTool` | `GetCreditsTool` (unchanged) |
243+
| `AgenticScraperTool` | Removed |
244+
| -- | `HistoryTool` (new) |
245+
143246
## Configuration
144247

145248
Set your ScrapeGraph API key in your environment:

0 commit comments

Comments
 (0)