Skip to content

Himanchal-Mishra/Flight-Price-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Flight Price Prediction – Exploratory Data Analysis & Feature Engineering

Project Overview

This project focuses on performing Exploratory Data Analysis (EDA) and feature engineering on flight pricing data to understand the factors that influence ticket prices and to prepare a clean, model-ready dataset for machine learning.

The emphasis of this project is not only on transforming data, but on making justified, data-driven decisions using EDA — closely mirroring real-world data science workflows.


Objective

  • Understand the distribution and behavior of flight prices
  • Identify key factors affecting ticket pricing
  • Perform EDA-driven feature engineering
  • Prepare a clean dataset suitable for predictive modeling

Dataset Description

The dataset contains information related to flight bookings, including:

  • Airline – Name of the airline
  • Source – City of departure
  • Destination – City of arrival
  • Date_of_Journey – Date of travel
  • Route – Cities covered during the journey
  • Dep_Time – Departure time
  • Arrival_Time – Arrival time
  • Duration – Total journey duration
  • Total_Stops – Number of stops
  • Additional_Info – Additional flight details
  • Price – Ticket price (Target variable)

Exploratory Data Analysis (EDA)

EDA was conducted to understand data structure, identify patterns, and guide preprocessing and feature engineering decisions.

EDA Activities Included:

  • Dataset shape, data types, and missing value analysis
  • Distribution and outlier analysis of flight prices
  • Airline-wise and stop-wise price comparison
  • Time-based analysis (journey month, weekday, departure and arrival hours)
  • Investigation of missing values in critical columns

Key EDA Insights

  • Flight prices are right-skewed with realistic high-value outliers
  • Airline and number of stops strongly influence ticket prices
  • Clear seasonality is observed across journey months
  • Time-of-day features show non-linear relationships with price
  • Flight duration is an important feature but is stored as text and requires conversion

These insights directly informed feature engineering decisions.


Feature Engineering

Feature engineering was performed in an iterative and EDA-driven manner to preserve meaningful information while making the data suitable for machine learning models.

Key Feature Engineering Steps:

  • Extracted day, month, and weekday from journey date
  • Extracted hour and minute features from departure and arrival times
  • Converted flight duration into total minutes
  • Encoded Total_Stops as an ordinal numerical feature
  • Removed Route after extracting stop-related information to avoid redundancy
  • Applied one-hot encoding to categorical variables
  • Removed records with missing Route and Total_Stops due to logical dependency

All transformations were justified based on exploratory analysis.


Final Dataset

  • Contains only numeric, model-ready features
  • No missing values
  • Suitable for regression-based machine learning models

Workflow Summary

Raw Data
   ↓
Initial EDA
   ↓
Key Insights
   ↓
Feature Engineering
   ↓
Final Clean Dataset
This workflow ensures that preprocessing decisions are transparent, explainable,
and data-driven.

 Repository Structure
text
Copy code
├── EDA_and_Feature_Engineering_Flight_Price_Prediction.ipynb
├── README.md
└── data/
    └── flight_price_dataset.xlsx
 Future Scope
Train and evaluate machine learning models

Perform feature importance analysis

Hyperparameter tuning

Model deployment using a simple web interface

👤 Author
Himanchal Mishra
Engineering Student | Data Analytics Enthusiast

Releases

No releases published

Packages

 
 
 

Contributors