"The goal is to turn data into information, and information into insight."
— Carly Fiorina



Flight Delay Analysis
Project Description
Technologies: Python, SQL, scikit-learn, Power BI
Flight delays are a common problem for travelers and airlines, and I wanted to explore whether historical data could be used to predict future delays. I collected and processed over 14 years of airline and airport records, which included departure times, arrival times, delays, and weather features.
After cleaning and transforming the dataset in SQL and Python, I applied feature engineering to include weather conditions, flight distance, and airport congestion as predictors. I then trained multiple models — including Linear Regression, KNN, Decision Trees, and Gradient Boosting. Model performance was evaluated using metrics such as RMSE and F1-score. Gradient Boosting emerged as the best-performing model with strong predictive accuracy.
To make the results interpretable, I built a Power BI dashboard that visualized flight delay probabilities, cost of delays, and historical trends. This dashboard provides a clear interface for understanding which factors most influence delays and how they vary across routes and seasons.
Data Source
Bureau of Transportation Statistics: Airline On-time and Delay Causes
https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp
We used different regression methods to build different models to predict the arrival delay and used the following methods to evaluate and determine the ideal model:
Linear Regression
Decision Trees Regression
Gradient Boosting Regression
KNN Regression
Skills Showcase:
Data cleaning and feature engineering from 14+ years of flight records
Predictive modeling with Linear Regression, Gradient Boosting, KNN
Model evaluation using RMSE, Precision, Recall, F1-score
Building interpretable dashboards in Power BI with DAX
SQL integration for large dataset querying
Key Insights:
Delay probability increases significantly for flights departing late evenings
Weather (precipitation, storms) was a top predictive feature
Gradient Boosting model outperformed others with the lowest RMSE



