This project demonstrates a simple data engineering workflow: extracting, transforming, and loading (ETL) employee records from a CSV file into a SQLite database, and visualizing key insights from the data using Python.
- ETL Pipeline:
- Reads employee data from a CSV file
- Cleans and transforms the data (type conversion, enrichment, validation)
- Loads the processed data into a SQLite database
- Data Visualization:
- Plots salary distribution
- Plots number of hires per year
- Saves charts as PNG files for easy access
DE/
│ employee_records.csv # Source data (CSV)
│ employees.db # SQLite database (output)
│ README.md # Project documentation
│ .gitignore # Git ignore rules
│
├───visualizations/ # Folder for saved chart images
│ salary_distribution.png
│ hires_per_year.png
│
└───DE/
│ etl_employee_records.py # ETL pipeline script
│ visualize_employees.py # Data visualization (interactive)
│ all_visualizations.py # Script to save all charts as PNGs
- Clone or download this repository.
- Install Python 3.8+ (if not already installed).
- Install required Python packages:
pip install pandas sqlalchemy matplotlib seaborn
- Place your
employee_records.csvin the project root (DE/). - Run the ETL script to process the CSV and load data into the database:
python DE/etl_employee_records.py
- This will create/update
employees.dbwith a tableemployeescontaining the cleaned and enriched data.
- This will create/update
- Run the visualization script to generate charts interactively:
python DE/visualize_employees.py
- This will display:
- A histogram of salary distribution.
- A bar chart of number of hires per year.
- This will display:
- Run the batch visualization script to save charts as PNG files:
python DE/all_visualizations.py
- You can run this script from either the project root or the
DE/subfolder; it will automatically find the correct database and save images in thevisualizations/folder. - The following files will be created/updated:
visualizations/salary_distribution.pngvisualizations/hires_per_year.png
- You can run this script from either the project root or the
- Open the
visualizations/folder in your project root to view the generated PNG files. - These images can be shared, embedded in reports, or used for presentations.
- Salary Distribution: Histogram showing the spread of employee salaries.
- Number of Hires per Year: Bar chart showing hiring trends over time.
- Saved Images: PNG files in the
visualizations/folder for easy access.
- Automate the ETL process (e.g., with a scheduler or Airflow)
- Add more data validation and error logging
- Build an API to serve employee data
- Integrate with cloud storage (AWS S3 or GCP)
- Add more visualizations (e.g., by department, gender)
I've provided this project for educational use and experimentation only..

