|
1 | | -install Node |
| 1 | +# CodeWiki: Automated Repository-Level Documentation Generation |
| 2 | + |
| 3 | +<div align="center"> |
| 4 | + |
| 5 | + |
| 6 | + |
| 7 | +<!-- [](LICENSE) --> |
| 8 | +[](https://www.python.org/downloads/) |
| 9 | +<!-- [](https://arxiv.org/abs/XXXXX) --> |
| 10 | + |
| 11 | +**The first open-source framework for holistic, structured repository-level documentation across multilingual codebases** |
| 12 | + |
| 13 | +[Features](#features) • [Installation](#installation) • [Quick Start](#quick-start) • [Benchmark](#benchmark) • [Documentation](#documentation-structure) • [Citation](#citation) |
| 14 | + |
| 15 | +</div> |
| 16 | + |
| 17 | +--- |
| 18 | + |
| 19 | +## 🎯 Overview |
| 20 | + |
| 21 | +Developers spend **58% of their working time** understanding codebases, yet maintaining comprehensive documentation remains a persistent challenge. CodeWiki addresses this by providing automated, scalable repository-level documentation generation that captures: |
| 22 | + |
| 23 | +- 🏗️ **System Architecture** - High-level design patterns and module relationships |
| 24 | +- 🔄 **Data Flow Visualizations** - How information moves through your system |
| 25 | +- 📊 **Cross-Module Dependencies** - Interactive dependency graphs and sequence diagrams |
| 26 | +- 📝 **Comprehensive API Documentation** - From architectural overviews to implementation details |
| 27 | + |
| 28 | +### Key Innovations |
| 29 | + |
| 30 | +| Feature | Description | Impact | |
| 31 | +|---------|-------------|--------| |
| 32 | +| **Hierarchical Decomposition** | Dynamic programming-inspired strategy that partitions repositories into coherent modules | Handles codebases of arbitrary size (86K-1.4M LOC tested) | |
| 33 | +| **Recursive Agentic System** | Adaptive processing with dynamic delegation for complex modules | Maintains quality while scaling | |
| 34 | +| **Multi-Format Synthesis** | Generates textual documentation, architecture diagrams, data flows, and sequence diagrams | Comprehensive understanding from multiple perspectives | |
| 35 | +| **Multilingual Support** | Supports 7 languages: Python, Java, JavaScript, TypeScript, C, C++, C# | Universal applicability | |
| 36 | + |
| 37 | +--- |
| 38 | + |
| 39 | +## Features |
| 40 | + |
| 41 | +### Documentation Generation |
| 42 | + |
| 43 | +- ✅ **Repository-Level Documentation** - First framework to generate complete repo-level docs at scale |
| 44 | +- ✅ **Visual Artifacts** - Automatic generation of architecture diagrams and data flow visualizations |
| 45 | +- ✅ **Cross-Module References** - Intelligent reference management prevents redundancy |
| 46 | +- ✅ **Hierarchical Structure** - Multi-level documentation from high-level overviews to detailed APIs |
| 47 | + |
| 48 | +### Benchmark |
| 49 | + |
| 50 | +- ✅ **[CodeWikiBench](https://github.com/FSoft-AI4Code/CodeWikiBench.git)** - First benchmark specifically designed for repository-level documentation |
| 51 | + |
| 52 | +#### Performance Results |
| 53 | + |
| 54 | +CodeWiki has been evaluated on the **CodeWikiBench** dataset, the first benchmark specifically designed for repository-level documentation quality assessment. |
| 55 | + |
| 56 | +| Language Category | CodeWiki Avg | Improvement over Baseline | |
| 57 | +|-------------------|--------------|---------------------------| |
| 58 | +| High-Level (Python, JS, TS) | **79.14%** | +10.47% | |
| 59 | +| Managed (C#, Java) | **68.84%** | +4.04% | |
| 60 | +| Systems (C, C++) | 53.24% | -3.15% | |
| 61 | +| **Overall Average** | **68.79%** | **+4.73%** | |
| 62 | + |
| 63 | +CodeWiki demonstrates significant improvements in high-level and managed languages, with an overall 4.73% improvement over baseline approaches. |
| 64 | + |
| 65 | +--- |
| 66 | + |
| 67 | +## Installation |
| 68 | + |
| 69 | +### Prerequisites |
| 70 | + |
| 71 | +- Python 3.12+ |
| 72 | +- Node.js (for mermaid validation) |
| 73 | +- Docker (optional, for containerized deployment) |
| 74 | + |
| 75 | +### Standard Installation |
| 76 | + |
2 | 77 | ```bash |
3 | | -# macos |
| 78 | +# Clone the repository |
| 79 | +git clone https://github.com/yourusername/codewiki.git |
| 80 | +cd codewiki |
| 81 | + |
| 82 | +# Create and activate virtual environment |
| 83 | +python3.12 -m venv .venv |
| 84 | +source .venv/bin/activate # On Windows: .venv\Scripts\activate |
| 85 | + |
| 86 | +# Install dependencies |
| 87 | +pip install -r requirements.txt |
| 88 | + |
| 89 | +# Install Node.js (if not already installed) |
| 90 | +# macOS |
4 | 91 | brew install node |
5 | 92 |
|
6 | | -# linux |
| 93 | +# Linux |
7 | 94 | sudo apt update && apt install -y nodejs npm |
8 | 95 | ``` |
9 | 96 |
|
| 97 | +### Docker Installation |
| 98 | + |
10 | 99 | ```bash |
11 | | -python3.12 -m venv .venv |
| 100 | +# Copy environment configuration |
| 101 | +cp env.example .env |
| 102 | +# Edit .env with your API keys |
| 103 | + |
| 104 | +# Create network |
| 105 | +docker network create codewiki-network |
| 106 | + |
| 107 | +# Start services |
| 108 | +docker-compose up -d |
| 109 | +``` |
| 110 | + |
| 111 | +--- |
| 112 | + |
| 113 | +## Quick Start |
| 114 | + |
| 115 | +### 1. Configure API Keys |
| 116 | + |
| 117 | +Create a `.env` file from the template: |
| 118 | + |
| 119 | +```bash |
| 120 | +cp env.example .env |
| 121 | +``` |
| 122 | + |
| 123 | +Edit `.env` with your configuration: |
| 124 | + |
| 125 | +```bash |
| 126 | +# LLM Configuration |
| 127 | +MAIN_MODEL=claude-sonnet-4 |
| 128 | +FALLBACK_MODEL_1=glm-4p5 |
| 129 | +LLM_BASE_URL=http://litellm:4000/ |
| 130 | +LLM_API_KEY=your-api-key-here |
| 131 | + |
| 132 | +# Application |
| 133 | +APP_PORT=8000 |
| 134 | + |
| 135 | +# Optional: Logfire for monitoring |
| 136 | +LOGFIRE_TOKEN=your-logfire-token |
| 137 | +``` |
| 138 | + |
| 139 | +### 2. Run the Web Application |
| 140 | + |
| 141 | +```bash |
| 142 | +# Activate virtual environment |
12 | 143 | source .venv/bin/activate |
13 | | -pip install -r requirements.txt |
| 144 | + |
| 145 | +# Start the web application |
14 | 146 | python run_web_app.py |
15 | 147 | ``` |
| 148 | + |
| 149 | +Access the application at `http://localhost:8000` to generate documentation by github url and commit id (optional) |
| 150 | + |
| 151 | +--- |
| 152 | + |
| 153 | +## Workflow |
| 154 | + |
| 155 | +```mermaid |
| 156 | +graph TB |
| 157 | + A[Repository Input] --> B[Dependency Graph Construction] |
| 158 | + B --> C[Hierarchical Decomposition] |
| 159 | + C --> D[Module Tree] |
| 160 | + D --> E[Recursive Agent Processing] |
| 161 | + E --> F{Complexity Check} |
| 162 | + F -->|Complex| G[Dynamic Delegation] |
| 163 | + F -->|Simple| H[Generate Documentation] |
| 164 | + G --> E |
| 165 | + H --> I[Cross-Module References] |
| 166 | + I --> J[Hierarchical Assembly] |
| 167 | + J --> K[Comprehensive Documentation] |
| 168 | + K --> L[Architecture Diagrams] |
| 169 | + K --> M[Data Flow Visualizations] |
| 170 | + K --> N[API Documentation] |
| 171 | +``` |
| 172 | + |
| 173 | +### Processing Pipeline |
| 174 | + |
| 175 | +1. **Repository Analysis** - AST parsing and dependency graph construction |
| 176 | +2. **Hierarchical Decomposition** - Feature-based module partitioning |
| 177 | +3. **Recursive Documentation** - Agent-based processing with dynamic delegation |
| 178 | +4. **Hierarchical Assembly** - Bottom-up synthesis of comprehensive docs |
| 179 | + |
| 180 | +--- |
| 181 | + |
| 182 | +## Documentation Structure |
| 183 | + |
| 184 | +Generated documentation includes: |
| 185 | + |
| 186 | +### 📄 Textual Documentation |
| 187 | +- **README Overview** - High-level project introduction |
| 188 | +- **Architecture Guide** - System design and component relationships |
| 189 | +- **API Reference** - Detailed interface specifications |
| 190 | +- **Usage Examples** - Practical code samples and patterns |
| 191 | + |
| 192 | +### 📊 Visual Artifacts |
| 193 | +- **System Architecture Diagrams** - Component relationships and hierarchies |
| 194 | +- **Data Flow Visualizations** - Information flow through the system |
| 195 | +- **Sequence Diagrams** - Inter-component communication patterns |
| 196 | +- **Dependency Graphs** - Module and function dependencies |
| 197 | + |
| 198 | + |
| 199 | +--- |
| 200 | + |
| 201 | +## Citation |
| 202 | + |
| 203 | +If you use CodeWiki in your research, please cite: |
| 204 | + |
| 205 | +```bibtex |
| 206 | +@article{codewiki2025, |
| 207 | + title={CodeWiki: Automated Repository-Level Documentation Generation with Hierarchical Decomposition and Agentic Processing}, |
| 208 | + author={Your Name}, |
| 209 | + journal={arXiv preprint arXiv:XXXXX}, |
| 210 | + year={2025} |
| 211 | +} |
| 212 | +``` |
| 213 | + |
| 214 | +<!-- --- |
| 215 | +
|
| 216 | +## 📄 License |
| 217 | +
|
| 218 | +This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. |
| 219 | +
|
| 220 | +--- --> |
| 221 | + |
| 222 | + |
| 223 | +<!-- --- |
| 224 | +
|
| 225 | +## 📧 Contact |
| 226 | +
|
| 227 | +- **Issues**: [GitHub Issues](https://github.com/yourusername/codewiki/issues) |
| 228 | +- **Discussions**: [GitHub Discussions](https://github.com/yourusername/codewiki/discussions) |
| 229 | +- **Email**: your.email@domain.com --> |
| 230 | + |
| 231 | +--- |
| 232 | + |
| 233 | +<div align="center"> |
| 234 | + |
| 235 | +**Made with ❤️ by the CodeWiki Team** |
| 236 | + |
| 237 | +[⬆ Back to Top](#codewiki-automated-repository-level-documentation-generation) |
| 238 | + |
| 239 | +</div> |
0 commit comments