A full-stack web application that converts PowerPoint presentations into narrated videos using AI-powered script generation and text-to-speech synthesis.
This application accepts a .pptx file, uses Google Gemini to generate a natural presenter script for each slide, synthesizes speech with Coqui TTS, and assembles the final narrated video using MoviePy and FFmpeg. It includes both a modern React web interface and a standalone command-line tool.
- AI-powered script generation using Google Gemini vision models
- High-quality text-to-speech synthesis via Coqui TTS (offline, no API fees)
- Drag-and-drop web interface built with React, TypeScript, and Tailwind CSS
- Real-time progress tracking with detailed status updates
- Script editing and regeneration without full reprocessing
- Multi-job management with persistent state
- Cross-platform slide conversion using LibreOffice headless mode
- Multiple video codec fallbacks (H.264, MP4V) for broad compatibility
- CLI support via the standalone
auto_presenter.pyscript
- Python 3.11 or higher
- Node.js 18 or higher
- LibreOffice (headless mode for PPTX-to-PDF conversion)
- FFmpeg (video encoding)
- A Google Gemini API key
GitHub Codespaces (Recommended):
- Create a new Codespace from this repository. The devcontainer will automatically install all dependencies.
- Set up your Gemini API key:
echo "GEMINI_API_KEY=your_api_key_here" > .env
- Run the startup script:
bash start-dev.sh
Local Development:
-
Clone the repository:
git clone https://github.com/danielcregg/powerpoint-to-video-old.git cd powerpoint-to-video-old -
Install backend dependencies:
cd backend pip install -r requirements.txt echo "GEMINI_API_KEY=your_api_key_here" > ../.env
-
Install frontend dependencies:
cd ../frontend npm install
Web Interface:
- Start the backend:
cd backend && python app.py
- Start the frontend (in a new terminal):
cd frontend && npm run dev
- Open
http://localhost:3000in your browser and upload a.pptxfile.
Command-Line Interface:
python auto_presenter.py presentation.pptxAPI Endpoints:
| Method | Endpoint | Description |
|---|---|---|
POST |
/upload |
Upload a PowerPoint file and start conversion |
GET |
/status/{job_id} |
Get conversion progress and status |
GET |
/download/{job_id} |
Download completed video |
GET |
/scripts/{job_id} |
Get generated scripts for editing |
PUT |
/scripts/{job_id} |
Update scripts and regenerate audio |
GET |
/jobs |
List all conversion jobs |
GET |
/health |
Check service availability |
- Python -- Backend logic and AI orchestration
- FastAPI -- REST API framework with async support
- React 18 -- Frontend UI with TypeScript
- Tailwind CSS -- Utility-first styling
- Google Gemini -- AI vision model for slide script generation
- Coqui TTS -- Offline text-to-speech synthesis
- MoviePy -- Video assembly from images and audio
- PyMuPDF -- PDF-to-image extraction
- LibreOffice -- Headless PPTX-to-PDF conversion
- FFmpeg -- Video encoding and processing
- Vite -- Frontend build tooling
This project is licensed under the MIT License. See the LICENSE file for details.