This project builds a deep learning classification model to predict whether an Indian Initial Public Offering (IPO) will list with positive gains or losses. Using historical IPO data from the Indian market, the model leverages subscription metrics and issue characteristics to forecast listing performance.
Investment firms face uncertainty when deciding which Indian IPOs to invest in. This project addresses that challenge by:
- Analyzing patterns in historical IPO performance
- Identifying key factors that influence listing day gains
- Building a predictive model to classify IPOs as likely profit or loss scenarios
- Enabling data-driven investment decisions
Source: Moneycontrol
Records: 319 historical IPOs from the Indian market (2010-2022)
| Feature | Description | Type |
|---|---|---|
| Date | IPO listing date | Temporal |
| IPOName | Company name | Categorical |
| Issue_Size | IPO size in INR Crores | Numeric |
| Subscription_QIB | Subscription multiple from Qualified Institutional Buyers | Numeric |
| Subscription_HNI | Subscription multiple from High Networth Individuals | Numeric |
| Subscription_RII | Subscription multiple from Retail Individual Investors | Numeric |
| Subscription_Total | Total subscription multiple across all investor categories | Numeric |
| Issue_Price | IPO issue price in INR | Numeric |
| Listing_Gains_Percent | Percentage gain/loss on listing day | Target (continuous) |
| Listing_Gains_Profit | Binary classification: 1 (profit) or 0 (loss) | Target (classification) |
- Profit (1): 174 IPOs (54.5%)
- Loss (0): 145 IPOs (45.5%)
- The dataset is fairly balanced, indicating a reliable training environment for classification
- ✅ No missing values across all 319 records
- All features are complete and ready for modeling
| Metric | Listing_Gains_Percent |
|---|---|
| Mean | 4.74% |
| Std Dev | 47.65% |
| Min | -97.15% |
| Max | +270.40% |
| Median | 1.81% |
- Issue_Size: Highly right-skewed (4.85), with most IPOs below ₹1,100 Cr, but outliers reaching ₹21,000 Cr
- Subscription Metrics: All positively skewed, indicating varying investor demand patterns
- Issue_Price: Moderately right-skewed (1.70), ranging from ₹0 to ₹2,150
- Outliers Detected: Multiple features contain outliers, particularly in Issue_Size and subscription metrics
- Subscription_HNI: Shows the highest variability and most outliers among investor categories
- Feature Correlation: Subscription metrics show moderate correlation with listing gains, suggesting predictive value
Independent Variables (Features):
- Issue characteristics: Size, Price
- Investor demand signals: QIB, HNI, RII, and Total subscription multiples
Target Variable:
- Primary:
Listing_Gains_Profit(binary classification: 0 or 1) - Secondary:
Listing_Gains_Percent(continuous regression)
The notebook follows the complete machine learning workflow:
-
Exploratory Data Analysis (EDA)
- Data loading and inspection
- Missing value analysis
- Statistical summaries
- Distribution analysis
-
Data Visualization
- Correlation heatmaps
- Distribution plots
- Boxplots for outlier detection
- Category comparisons
-
Data Preprocessing
- Feature scaling/normalization
- Outlier handling
- Feature engineering
- Train-test splitting
-
Deep Learning Model Development
- TensorFlow/Keras neural network
- Classification architecture
- Hyperparameter tuning
- Model evaluation
- Approximately 55% of IPOs list at a profit, suggesting a generally favorable market for most IPOs
- Subscription demand (especially from institutional and high-networth investors) appears to be a strong indicator of listing performance
- Significant variability in gains (-97% to +270%) highlights the importance of predictive modeling
- Outliers and skewness require careful preprocessing before model training
- Feature Engineering: Create derived features from subscription ratios
- Data Preprocessing: Handle outliers and scale features appropriately
- Model Development: Build and train a deep learning classifier
- Model Evaluation: Cross-validation, confusion matrix, ROC-AUC
- Production Ready: Model serialization and deployment considerations
- pandas: Data manipulation and analysis
- numpy: Numerical computations
- matplotlib & seaborn: Data visualization
- TensorFlow/Keras: Deep learning framework (for modeling phase)
- scikit-learn: Preprocessing and evaluation (anticipated)
- Ensure the dataset
Indian_IPO_Market_Data.csvis in the working directory - Run cells sequentially to reproduce the analysis
- Use insights from EDA to inform preprocessing decisions
- Extend with your own feature engineering and modeling approaches
- Temporal Patterns: IPO performance may vary across different market cycles (2010-2022 covers multiple market phases)
- Survivorship Bias: Only completed IPOs are included; failed IPOs may follow different patterns
- Market Conditions: Model predictions should account for current market volatility and conditions
- Regulatory Changes: Indian IPO regulations have evolved; older data may not reflect current conditions
Author's Note: This project demonstrates the complete data science workflow from raw data to actionable insights, suitable for investment decision-making in the Indian IPO market.