top of page
Pink Gradient
Pink Gradient
< Back

Flight Delay Analysis

Project Description

Technologies: Python, SQL, scikit-learn, Power BI

Flight delays are a common problem for travelers and airlines, and I wanted to explore whether historical data could be used to predict future delays. I collected and processed over 14 years of airline and airport records, which included departure times, arrival times, delays, and weather features.

After cleaning and transforming the dataset in SQL and Python, I applied feature engineering to include weather conditions, flight distance, and airport congestion as predictors. I then trained multiple models — including Linear Regression, KNN, Decision Trees, and Gradient Boosting. Model performance was evaluated using metrics such as RMSE and F1-score. Gradient Boosting emerged as the best-performing model with strong predictive accuracy.

To make the results interpretable, I built a Power BI dashboard that visualized flight delay probabilities, cost of delays, and historical trends. This dashboard provides a clear interface for understanding which factors most influence delays and how they vary across routes and seasons.

Data Source

Bureau of Transportation Statistics: Airline On-time and Delay Causes

https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp


We used different regression methods to build different models to predict the arrival delay and used the following methods to evaluate and determine the ideal model:

  1. Linear Regression

  2. Decision Trees Regression

  3. Gradient Boosting Regression

  4. KNN Regression


Skills Showcase:

  • Data cleaning and feature engineering from 14+ years of flight records

  • Predictive modeling with Linear Regression, Gradient Boosting, KNN

  • Model evaluation using RMSE, Precision, Recall, F1-score

  • Building interpretable dashboards in Power BI with DAX

  • SQL integration for large dataset querying


Key Insights:

  • Delay probability increases significantly for flights departing late evenings

  • Weather (precipitation, storms) was a top predictive feature

  • Gradient Boosting model outperformed others with the lowest RMSE


MS Business Analytics and Information Systems

University of SouthFlorida

  • LinkedIn
bottom of page