Skip to content

Commit 69b4202

Browse files
committed
feat: add knowledge base
1 parent 295c6c0 commit 69b4202

11 files changed

Lines changed: 944 additions & 1 deletion

docs.json

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,17 @@
5050
"services/smartcrawler",
5151
"services/sitemap",
5252
"services/agenticscraper",
53-
"services/cli",
53+
{
54+
"group": "CLI",
55+
"icon": "terminal",
56+
"pages": [
57+
"services/cli/introduction",
58+
"services/cli/commands",
59+
"services/cli/json-mode",
60+
"services/cli/ai-agent-skill",
61+
"services/cli/examples"
62+
]
63+
},
5464
{
5565
"group": "MCP Server",
5666
"icon": "/logo/mcp.svg",
@@ -125,6 +135,15 @@
125135
"knowledge-base/introduction"
126136
]
127137
},
138+
{
139+
"group": "CLI",
140+
"pages": [
141+
"knowledge-base/cli/getting-started",
142+
"knowledge-base/cli/json-mode",
143+
"knowledge-base/cli/ai-agent-skill",
144+
"knowledge-base/cli/command-examples"
145+
]
146+
},
128147
{
129148
"group": "AI Tools",
130149
"pages": [
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
---
2+
title: Using just-scrape as a coding agent skill
3+
description: 'Give AI coding agents access to web scraping through the just-scrape skill'
4+
---
5+
6+
`just-scrape` can be installed as a **skill** for AI coding agents via [Vercel's skills.sh](https://skills.sh). This lets agents like Claude, Cursor, and others call ScrapeGraphAI commands directly during a coding session.
7+
8+
## Install the skill
9+
10+
```bash
11+
bunx skills add https://github.com/ScrapeGraphAI/just-scrape
12+
```
13+
14+
Browse the skill page: [skills.sh/scrapegraphai/just-scrape/just-scrape](https://skills.sh/scrapegraphai/just-scrape/just-scrape)
15+
16+
## What this enables
17+
18+
Once installed, your coding agent can:
19+
20+
- Scrape a website to gather data needed for a task
21+
- Convert documentation pages to markdown for context
22+
- Search the web and extract structured results
23+
- Check your credit balance mid-session
24+
- Browse request history
25+
26+
## How agents use it
27+
28+
Agents invoke the skill in `--json` mode so output is clean and token-efficient:
29+
30+
```bash
31+
just-scrape smart-scraper https://api.example.com/docs \
32+
-p "Extract all endpoint names, methods, and descriptions" \
33+
--json
34+
```
35+
36+
```bash
37+
just-scrape search-scraper "latest release notes for react-query" \
38+
--num-results 3 --json
39+
```
40+
41+
## Manual setup with Cursor
42+
43+
If you are using Cursor without the skills.sh integration, configure `just-scrape` via the [MCP Server](/services/mcp-server/cursor) for the best experience.
44+
45+
Alternatively, add a script to your project that Cursor can call:
46+
47+
```bash
48+
# .cursor/scrape.sh
49+
#!/bin/bash
50+
just-scrape smart-scraper "$1" -p "$2" --json
51+
```
52+
53+
Then tell Cursor: *"Run `.cursor/scrape.sh <url> <prompt>` to scrape a page."*
54+
55+
## Tips
56+
57+
- Set `SGAI_API_KEY` in your shell profile so the skill picks it up automatically across all agent sessions.
58+
- Use `--json` every time — agents don't need spinners or banners.
59+
- Pass `--schema` with a JSON schema to get typed, predictable output that agents can parse reliably.
60+
61+
```bash
62+
just-scrape smart-scraper https://example.com \
63+
-p "Extract company info" \
64+
--schema '{"type":"object","properties":{"name":{"type":"string"},"founded":{"type":"number"},"employees":{"type":"string"}}}' \
65+
--json
66+
```
Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
---
2+
title: CLI command examples
3+
description: 'Practical examples for every just-scrape command'
4+
---
5+
6+
## Smart Scraper
7+
8+
Extract structured data from any URL using AI.
9+
10+
```bash
11+
# Basic extraction
12+
just-scrape smart-scraper https://news.ycombinator.com \
13+
-p "Extract the top 10 story titles and their URLs"
14+
15+
# Enforce a strict output schema
16+
just-scrape smart-scraper https://news.example.com \
17+
-p "Get all article headlines and dates" \
18+
--schema '{"type":"object","properties":{"articles":{"type":"array","items":{"type":"object","properties":{"title":{"type":"string"},"date":{"type":"string"}}}}}}'
19+
20+
# Scroll to load more content, then extract
21+
just-scrape smart-scraper https://store.example.com/shoes \
22+
-p "Extract all product names, prices, and ratings" \
23+
--scrolls 5
24+
25+
# Bypass anti-bot protection (costs +4 credits)
26+
just-scrape smart-scraper https://app.example.com/dashboard \
27+
-p "Extract user stats" \
28+
--stealth
29+
30+
# Pass cookies and custom headers
31+
just-scrape smart-scraper https://example.com/protected \
32+
-p "Extract the protected content" \
33+
--cookies '{"session": "abc123"}' \
34+
--headers '{"X-Custom-Header": "value"}'
35+
```
36+
37+
## Search Scraper
38+
39+
Search the web and extract structured data from results.
40+
41+
```bash
42+
# Research across multiple sources
43+
just-scrape search-scraper "What are the best Python web frameworks in 2025?" \
44+
--num-results 10
45+
46+
# Get raw markdown only (2 credits instead of 10)
47+
just-scrape search-scraper "React vs Vue comparison" \
48+
--no-extraction --num-results 5
49+
50+
# Structured output with schema
51+
just-scrape search-scraper "Top 5 cloud providers pricing" \
52+
--schema '{"type":"object","properties":{"providers":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"free_tier":{"type":"string"}}}}}}'
53+
```
54+
55+
## Markdownify
56+
57+
Convert any webpage to clean markdown.
58+
59+
```bash
60+
# Convert a blog post
61+
just-scrape markdownify https://blog.example.com/my-article
62+
63+
# Save to file
64+
just-scrape markdownify https://docs.example.com/api \
65+
--json | jq -r '.result' > api-docs.md
66+
67+
# Bypass Cloudflare or anti-bot protections
68+
just-scrape markdownify https://protected.example.com --stealth
69+
```
70+
71+
## Crawl
72+
73+
Crawl multiple pages and extract data from each.
74+
75+
```bash
76+
# Crawl a docs site and collect code examples
77+
just-scrape crawl https://docs.example.com \
78+
-p "Extract all code snippets with their language" \
79+
--max-pages 20 --depth 3
80+
81+
# Crawl only blog pages
82+
just-scrape crawl https://example.com \
83+
-p "Extract article titles and summaries" \
84+
--rules '{"include_paths":["/blog/*"],"same_domain":true}' \
85+
--max-pages 50
86+
87+
# Get raw markdown from all pages (no AI extraction, cheaper)
88+
just-scrape crawl https://example.com \
89+
--no-extraction --max-pages 10
90+
```
91+
92+
## Scrape
93+
94+
Get raw HTML from a URL.
95+
96+
```bash
97+
# Basic HTML fetch
98+
just-scrape scrape https://example.com
99+
100+
# Geo-targeted request with anti-bot bypass
101+
just-scrape scrape https://store.example.com \
102+
--stealth --country-code DE
103+
104+
# Extract branding info (logos, colors, fonts)
105+
just-scrape scrape https://example.com --branding
106+
```
107+
108+
## Sitemap
109+
110+
Get all URLs from a website's sitemap.
111+
112+
```bash
113+
# List all pages
114+
just-scrape sitemap https://example.com
115+
116+
# Pipe URLs to another command
117+
just-scrape sitemap https://example.com --json | jq -r '.urls[]'
118+
```
119+
120+
## Agentic Scraper
121+
122+
Browser automation with AI — login, click, navigate, fill forms.
123+
124+
```bash
125+
# Log in and extract dashboard data
126+
just-scrape agentic-scraper https://app.example.com/login \
127+
-s "Fill email with user@test.com,Fill password with secret,Click Sign In" \
128+
--ai-extraction -p "Extract all dashboard metrics"
129+
130+
# Navigate a multi-step form
131+
just-scrape agentic-scraper https://example.com/wizard \
132+
-s "Click Next,Select Premium plan,Fill name with John,Click Submit"
133+
134+
# Persist browser session across runs
135+
just-scrape agentic-scraper https://app.example.com \
136+
-s "Click Settings" --use-session
137+
```
138+
139+
## Generate Schema
140+
141+
Generate a JSON schema from a natural language description.
142+
143+
```bash
144+
# Generate a schema
145+
just-scrape generate-schema "E-commerce product with name, price, ratings, and reviews array"
146+
147+
# Refine an existing schema
148+
just-scrape generate-schema "Add an availability field" \
149+
--existing-schema '{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}}}'
150+
```
151+
152+
## History
153+
154+
Browse request history interactively or export it.
155+
156+
```bash
157+
# Interactive history browser (arrow keys to navigate)
158+
just-scrape history smartscraper
159+
160+
# Fetch a specific request by ID
161+
just-scrape history smartscraper abc123-def456-7890
162+
163+
# Export last 100 crawl jobs as JSON
164+
just-scrape history crawl --json --page-size 100 \
165+
| jq '.requests[] | {id: .request_id, status}'
166+
```
167+
168+
Services: `markdownify`, `smartscraper`, `searchscraper`, `scrape`, `crawl`, `agentic-scraper`, `sitemap`
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
---
2+
title: Getting started with just-scrape
3+
description: 'Install and configure the just-scrape CLI in minutes'
4+
---
5+
6+
`just-scrape` is the official command-line interface for ScrapeGraphAI. It gives you AI-powered web scraping, data extraction, search, and crawling directly from your terminal.
7+
8+
## Installation
9+
10+
<CodeGroup>
11+
12+
```bash npm
13+
npm install -g just-scrape
14+
```
15+
16+
```bash pnpm
17+
pnpm add -g just-scrape
18+
```
19+
20+
```bash yarn
21+
yarn global add just-scrape
22+
```
23+
24+
```bash bun
25+
bun add -g just-scrape
26+
```
27+
28+
```bash npx (no install)
29+
npx just-scrape --help
30+
```
31+
32+
```bash bunx (no install)
33+
bunx just-scrape --help
34+
```
35+
36+
</CodeGroup>
37+
38+
Package: [just-scrape](https://www.npmjs.com/package/just-scrape) on npm | [GitHub](https://github.com/ScrapeGraphAI/just-scrape)
39+
40+
## Setting up your API key
41+
42+
The CLI needs a ScrapeGraphAI API key. Get one from the [dashboard](https://dashboard.scrapegraphai.com). The CLI checks for it in this order:
43+
44+
1. **Environment variable**`export SGAI_API_KEY="sgai-..."`
45+
2. **`.env` file**`SGAI_API_KEY=sgai-...` in the project root
46+
3. **Config file**`~/.scrapegraphai/config.json`
47+
4. **Interactive prompt** — the CLI will ask and save it automatically
48+
49+
The easiest approach for a new machine is to just run any command — the CLI will prompt you for the key and save it to `~/.scrapegraphai/config.json` so you never need to set it again.
50+
51+
## Environment variables
52+
53+
| Variable | Description | Default |
54+
|---|---|---|
55+
| `SGAI_API_KEY` | ScrapeGraphAI API key ||
56+
| `JUST_SCRAPE_API_URL` | Override the API base URL | `https://api.scrapegraphai.com/v1` |
57+
| `JUST_SCRAPE_TIMEOUT_S` | Request/polling timeout in seconds | `120` |
58+
| `JUST_SCRAPE_DEBUG` | Set to `1` to enable debug logging to stderr | `0` |
59+
60+
## Verify your setup
61+
62+
Run a quick health check to confirm the key is valid:
63+
64+
```bash
65+
just-scrape validate
66+
```
67+
68+
Check your credit balance:
69+
70+
```bash
71+
just-scrape credits
72+
```
73+
74+
## Your first scrape
75+
76+
```bash
77+
just-scrape smart-scraper https://news.ycombinator.com \
78+
-p "Extract the top 5 story titles and their URLs"
79+
```
80+
81+
<Note>
82+
See the [full CLI reference](/services/cli) for all commands and options.
83+
</Note>

0 commit comments

Comments
 (0)