Skip to content

Commit a28f70c

Browse files
committed
re-organise repo structure
1 parent 6fd674f commit a28f70c

43 files changed

Lines changed: 451 additions & 238 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 229 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,239 @@
1-
install Node
1+
# CodeWiki: Automated Repository-Level Documentation Generation
2+
3+
<div align="center">
4+
5+
![CodeWiki Architecture](img/framework-overview.png)
6+
7+
<!-- [![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE) -->
8+
[![Python](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
9+
<!-- [![Paper](https://img.shields.io/badge/paper-arXiv-red.svg)](https://arxiv.org/abs/XXXXX) -->
10+
11+
**The first open-source framework for holistic, structured repository-level documentation across multilingual codebases**
12+
13+
[Features](#features)[Installation](#installation)[Quick Start](#quick-start)[Benchmark](#benchmark)[Documentation](#documentation-structure)[Citation](#citation)
14+
15+
</div>
16+
17+
---
18+
19+
## 🎯 Overview
20+
21+
Developers spend **58% of their working time** understanding codebases, yet maintaining comprehensive documentation remains a persistent challenge. CodeWiki addresses this by providing automated, scalable repository-level documentation generation that captures:
22+
23+
- 🏗️ **System Architecture** - High-level design patterns and module relationships
24+
- 🔄 **Data Flow Visualizations** - How information moves through your system
25+
- 📊 **Cross-Module Dependencies** - Interactive dependency graphs and sequence diagrams
26+
- 📝 **Comprehensive API Documentation** - From architectural overviews to implementation details
27+
28+
### Key Innovations
29+
30+
| Feature | Description | Impact |
31+
|---------|-------------|--------|
32+
| **Hierarchical Decomposition** | Dynamic programming-inspired strategy that partitions repositories into coherent modules | Handles codebases of arbitrary size (86K-1.4M LOC tested) |
33+
| **Recursive Agentic System** | Adaptive processing with dynamic delegation for complex modules | Maintains quality while scaling |
34+
| **Multi-Format Synthesis** | Generates textual documentation, architecture diagrams, data flows, and sequence diagrams | Comprehensive understanding from multiple perspectives |
35+
| **Multilingual Support** | Supports 7 languages: Python, Java, JavaScript, TypeScript, C, C++, C# | Universal applicability |
36+
37+
---
38+
39+
## Features
40+
41+
### Documentation Generation
42+
43+
-**Repository-Level Documentation** - First framework to generate complete repo-level docs at scale
44+
-**Visual Artifacts** - Automatic generation of architecture diagrams and data flow visualizations
45+
-**Cross-Module References** - Intelligent reference management prevents redundancy
46+
-**Hierarchical Structure** - Multi-level documentation from high-level overviews to detailed APIs
47+
48+
### Benchmark
49+
50+
-**[CodeWikiBench](https://github.com/FSoft-AI4Code/CodeWikiBench.git)** - First benchmark specifically designed for repository-level documentation
51+
52+
#### Performance Results
53+
54+
CodeWiki has been evaluated on the **CodeWikiBench** dataset, the first benchmark specifically designed for repository-level documentation quality assessment.
55+
56+
| Language Category | CodeWiki Avg | Improvement over Baseline |
57+
|-------------------|--------------|---------------------------|
58+
| High-Level (Python, JS, TS) | **79.14%** | +10.47% |
59+
| Managed (C#, Java) | **68.84%** | +4.04% |
60+
| Systems (C, C++) | 53.24% | -3.15% |
61+
| **Overall Average** | **68.79%** | **+4.73%** |
62+
63+
CodeWiki demonstrates significant improvements in high-level and managed languages, with an overall 4.73% improvement over baseline approaches.
64+
65+
---
66+
67+
## Installation
68+
69+
### Prerequisites
70+
71+
- Python 3.12+
72+
- Node.js (for mermaid validation)
73+
- Docker (optional, for containerized deployment)
74+
75+
### Standard Installation
76+
277
```bash
3-
# macos
78+
# Clone the repository
79+
git clone https://github.com/yourusername/codewiki.git
80+
cd codewiki
81+
82+
# Create and activate virtual environment
83+
python3.12 -m venv .venv
84+
source .venv/bin/activate # On Windows: .venv\Scripts\activate
85+
86+
# Install dependencies
87+
pip install -r requirements.txt
88+
89+
# Install Node.js (if not already installed)
90+
# macOS
491
brew install node
592

6-
# linux
93+
# Linux
794
sudo apt update && apt install -y nodejs npm
895
```
996

97+
### Docker Installation
98+
1099
```bash
11-
python3.12 -m venv .venv
100+
# Copy environment configuration
101+
cp env.example .env
102+
# Edit .env with your API keys
103+
104+
# Create network
105+
docker network create codewiki-network
106+
107+
# Start services
108+
docker-compose up -d
109+
```
110+
111+
---
112+
113+
## Quick Start
114+
115+
### 1. Configure API Keys
116+
117+
Create a `.env` file from the template:
118+
119+
```bash
120+
cp env.example .env
121+
```
122+
123+
Edit `.env` with your configuration:
124+
125+
```bash
126+
# LLM Configuration
127+
MAIN_MODEL=claude-sonnet-4
128+
FALLBACK_MODEL_1=glm-4p5
129+
LLM_BASE_URL=http://litellm:4000/
130+
LLM_API_KEY=your-api-key-here
131+
132+
# Application
133+
APP_PORT=8000
134+
135+
# Optional: Logfire for monitoring
136+
LOGFIRE_TOKEN=your-logfire-token
137+
```
138+
139+
### 2. Run the Web Application
140+
141+
```bash
142+
# Activate virtual environment
12143
source .venv/bin/activate
13-
pip install -r requirements.txt
144+
145+
# Start the web application
14146
python run_web_app.py
15147
```
148+
149+
Access the application at `http://localhost:8000` to generate documentation by github url and commit id (optional)
150+
151+
---
152+
153+
## Workflow
154+
155+
```mermaid
156+
graph TB
157+
A[Repository Input] --> B[Dependency Graph Construction]
158+
B --> C[Hierarchical Decomposition]
159+
C --> D[Module Tree]
160+
D --> E[Recursive Agent Processing]
161+
E --> F{Complexity Check}
162+
F -->|Complex| G[Dynamic Delegation]
163+
F -->|Simple| H[Generate Documentation]
164+
G --> E
165+
H --> I[Cross-Module References]
166+
I --> J[Hierarchical Assembly]
167+
J --> K[Comprehensive Documentation]
168+
K --> L[Architecture Diagrams]
169+
K --> M[Data Flow Visualizations]
170+
K --> N[API Documentation]
171+
```
172+
173+
### Processing Pipeline
174+
175+
1. **Repository Analysis** - AST parsing and dependency graph construction
176+
2. **Hierarchical Decomposition** - Feature-based module partitioning
177+
3. **Recursive Documentation** - Agent-based processing with dynamic delegation
178+
4. **Hierarchical Assembly** - Bottom-up synthesis of comprehensive docs
179+
180+
---
181+
182+
## Documentation Structure
183+
184+
Generated documentation includes:
185+
186+
### 📄 Textual Documentation
187+
- **README Overview** - High-level project introduction
188+
- **Architecture Guide** - System design and component relationships
189+
- **API Reference** - Detailed interface specifications
190+
- **Usage Examples** - Practical code samples and patterns
191+
192+
### 📊 Visual Artifacts
193+
- **System Architecture Diagrams** - Component relationships and hierarchies
194+
- **Data Flow Visualizations** - Information flow through the system
195+
- **Sequence Diagrams** - Inter-component communication patterns
196+
- **Dependency Graphs** - Module and function dependencies
197+
198+
199+
---
200+
201+
## Citation
202+
203+
If you use CodeWiki in your research, please cite:
204+
205+
```bibtex
206+
@article{codewiki2025,
207+
title={CodeWiki: Automated Repository-Level Documentation Generation with Hierarchical Decomposition and Agentic Processing},
208+
author={Your Name},
209+
journal={arXiv preprint arXiv:XXXXX},
210+
year={2025}
211+
}
212+
```
213+
214+
<!-- ---
215+
216+
## 📄 License
217+
218+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
219+
220+
--- -->
221+
222+
223+
<!-- ---
224+
225+
## 📧 Contact
226+
227+
- **Issues**: [GitHub Issues](https://github.com/yourusername/codewiki/issues)
228+
- **Discussions**: [GitHub Discussions](https://github.com/yourusername/codewiki/discussions)
229+
- **Email**: your.email@domain.com -->
230+
231+
---
232+
233+
<div align="center">
234+
235+
**Made with ❤️ by the CodeWiki Team**
236+
237+
[⬆ Back to Top](#codewiki-automated-repository-level-documentation-generation)
238+
239+
</div>

img/framework-overview.png

1.17 MB
Loading
Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -35,24 +35,23 @@
3535
logger.warning(f"Failed to configure logfire: {e}")
3636

3737
# Local imports
38-
from agent_tools.deps import CodeWikiDeps
39-
from agent_tools.read_code_components import read_code_components_tool
40-
from agent_tools.str_replace_editor import str_replace_editor_tool
41-
from agent_tools.generate_sub_module_documentations import generate_sub_module_documentation_tool
42-
from llm_services import fallback_models
43-
from prompt_template import (
38+
from .agent_tools.deps import CodeWikiDeps
39+
from .agent_tools.read_code_components import read_code_components_tool
40+
from .agent_tools.str_replace_editor import str_replace_editor_tool
41+
from .agent_tools.generate_sub_module_documentations import generate_sub_module_documentation_tool
42+
from .llm_services import fallback_models
43+
from .prompt_template import (
4444
SYSTEM_PROMPT,
4545
LEAF_SYSTEM_PROMPT,
4646
format_user_prompt,
4747
)
48-
from utils import is_complex_module
49-
from cluster_modules import cluster_modules
48+
from .utils import is_complex_module
5049
from config import (
5150
Config,
5251
MODULE_TREE_FILENAME,
5352
)
5453
from utils import file_manager
55-
from dependency_analyzer.models.core import Node
54+
from .dependency_analyzer.models.core import Node
5655

5756

5857
class AgentOrchestrator:
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
from dataclasses import dataclass
2-
from dependency_analyzer.models.core import Node
2+
from ..dependency_analyzer.models.core import Node
33

44
@dataclass
55
class CodeWikiDeps:

src/agent_tools/generate_sub_module_documentations.py renamed to src/be/agent_tools/generate_sub_module_documentations.py

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,12 @@
11
from pydantic_ai import RunContext, Tool, Agent
2-
import json
32

43
from .deps import CodeWikiDeps
54
from .read_code_components import read_code_components_tool
65
from .str_replace_editor import str_replace_editor_tool
7-
from llm_services import fallback_models
8-
from prompt_template import SYSTEM_PROMPT, LEAF_SYSTEM_PROMPT, format_user_prompt
9-
from utils import is_complex_module, count_tokens
10-
from cluster_modules import format_potential_core_components
6+
from ..llm_services import fallback_models
7+
from ..prompt_template import SYSTEM_PROMPT, LEAF_SYSTEM_PROMPT, format_user_prompt
8+
from ..utils import is_complex_module, count_tokens
9+
from ..cluster_modules import format_potential_core_components
1110
from config import MAX_TOKEN_PER_LEAF_MODULE
1211

1312

File renamed without changes.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
from pydantic_ai import RunContext, Tool
2323

2424
from .deps import CodeWikiDeps
25-
from utils import validate_mermaid_diagrams
25+
from ..utils import validate_mermaid_diagrams
2626

2727

2828
# There are some super strange "ascii can't decode x" errors,
Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,11 @@
33
import logging
44
logger = logging.getLogger(__name__)
55

6-
from dependency_analyzer.models.core import Node
7-
from llm_services import call_llm
8-
from utils import count_tokens
6+
from .dependency_analyzer.models.core import Node
7+
from .llm_services import call_llm
8+
from .utils import count_tokens
99
from config import MAX_TOKEN_PER_MODULE, CLUSTER_MODEL
10-
from prompt_template import format_cluster_prompt
10+
from .prompt_template import format_cluster_prompt
1111

1212

1313
def format_potential_core_components(leaf_nodes: List[str], components: Dict[str, Node]) -> tuple[str, str]:
File renamed without changes.

0 commit comments

Comments
 (0)