SkyFlow: AI-Powered Flight Delay Predictions
Dipti Aswath | LinkedIn | Email | Early SkyFlow Prototype
License
This project is licensed under the Apache License 2.0. You may use, modify, and distribute this code under the terms of the license. See the LICENSE file for more details.
Attribution: Please ensure to give proper credit to the original author listed above, when reusing or redistributing the code.
Table of Contents
Executive Summary
Problem Statement:
Airlines and airports face significant operational challenges due to flight delays, which can be caused by a variety of factors including flight status, weather conditions, air traffic congestion, aircraft specifics, and inefficiencies in ground and passenger handling. The objective is to predict flight delays by developing a multi-class classification model that considers both departure and arrival delays, helping improve operational planning and customer satisfaction.
Rationale:
Flight delays can have widespread consequences for airlines, from passenger dissatisfaction to operational disruptions. Developing a predictive model for flight delays not only addresses the core issue of minimizing delays but also enhances decision-making processes across various facets of airline operations.
Business Case 1: Enhancing Operational Efficiency
Predicting flight delays enables airlines to optimize their operations, routing, and resource management.
-
Route Optimization and Scheduling Adjustments: Airlines can reroute flights to avoid congested airspace or adverse weather, minimizing delays. Predictions also allow real-time adjustments to schedules, gates, and crew to manage disruptions efficiently.
-
Resource Allocation: By anticipating delays, airlines can proactively allocate ground crew, gates, and equipment, reducing the cascading effects on other flights.
-
Operational Resilience: Dynamic rerouting and resource realignment minimize the operational impacts of weather or high-traffic delays, enhancing resilience in crisis situations.
-
Cost Management: Avoiding delays lowers costs linked to operational disruptions, improving resource utilization and overall profitability.
Business Case 2: Improving Customer Experience
Accurate delay predictions lead to better customer service and proactive communication, enhancing the passenger experience.
-
Proactive Passenger Communication: Accurate predictions allow airlines to update passengers promptly, manage expectations, and offer rebooking or compensation options.
-
Improved Customer Service: Delay forecasts support better service recovery, leading to a smoother passenger experience and increased loyalty.
-
Competitive Advantage: Effective rerouting and communication give airlines an edge in maintaining on-time performance and customer satisfaction.
By addressing these areas, airlines can significantly improve operational efficiency, enhance passenger experience with better customer satisfaction scores, and better manage resources and disruptions. Predictive modeling for flight delays is not just about minimizing delays but also about fostering a more responsive and resilient airline operation.
Example Usage: An AI system that predicts flight delays could also:
-
Suggest alternate flight paths that are less likely to experience delays.
-
Provide passengers with timely updates and rebooking options.
-
Dynamically adjust flight schedules to manage disruptions effectively.
-
Allocate resources efficiently to minimize the impact on subsequent flights.
Research Question:
How can we develop an AI and machine learning-powered smart system to accurately predict flight delays by assessing multiple factors, including departure and arrival times, flight status, weather conditions, air traffic, aircraft specifics, and ground operations?
Flight Delay Predictions - Key Metrics:
SkyFlow is an advanced tool that helps predict how flights might perform. It looks at many factors like weather, how busy the airport is, and how well the airline usually does. Then, it puts each flight into one of three groups:
-
On Time: These flights are expected to leave and arrive as scheduled.
-
Partial Delay: These flights might be delayed leaving or arriving.
-
Full Delay: These flights are likely to be delayed both leaving and arriving.
Why is this important?
It helps everyone plan better:
-
Airlines: Can manage their schedules more effectively.
-
Airports: Can prepare for busy times.
-
Passengers: Can adjust their plans if needed.
How do we know if SkyFlow is doing a good job?
We look at five main things to evaluate SkyFlow’s performance:
-
Precision: How often SkyFlow correctly identifies delay groups when it predicts a delay.
-
Recall: How often SkyFlow correctly identifies actual delays out of all delayed flights.
-
F1 Score: How well SkyFlow balances precision and recall.
-
Precision-Recall Area Under the Curve (PR AUC): How well SkyFlow performs across different thresholds for classifying delays.
-
Receiver Operating Characteristic Area Under the Curve (ROC AUC): How well SkyFlow distinguishes between delayed and on-time flights.
Our goal is to make SkyFlow as accurate as possible, so everyone can rely on its predictions to make their travel smoother and more predictable.
To monitor overall performance, we use the Precision-Recall Area Under the Curve (PR AUC) and Receiver Operating Characteristic Area Under the Curve (ROC AUC).
For evaluating the balance between correctly identifying delays and avoiding false alarms, we rely on the F1 Score as the primary metric, which combines precision and recall into a single value. Further, the F1 score is optimized to give more weightage to the full-delay groups.
Approach
CRISP-DM Framework:
For the Flight Delay Prediction problem, the CRISP-DM (Cross Industry Standard Process for Data Mining) framework was applied to provide a structured solution. The process was as follows:
-
Business Understanding: The goal was to predict flight delays to improve airline operational efficiency and enhance customer satisfaction by reducing unexpected delays.
-
Data Understanding: A detailed analysis of the dataset was performed, identifying key patterns and relationships, such as flight times, delays, and distances, that could significantly influence prediction outcomes.
-
Data Preparation: The raw data was preprocessed, and relevant features were engineered. This included detailed delay metrics such as departure and arrival times, distances, and other flight-specific attributes to ensure high-quality inputs for model training.
-
Modeling: Various machine learning models were trained and evaluated, focusing on performance metrics like Precision-Recall AUC, ROC AUC, and F1 score. These models were iteratively tuned to optimize predictive performance.
-
Deployment: The best-performing model was integrated into SkyFlow’s prototype application, enabling real-time flight delay predictions. Future iterations aim to further enhance the operational decision-making.
Feature Engineering:
During the data preparation phase, significant feature engineering was conducted as outlined in a later Methodology section. Initially, features that captured the relationship between departure and arrival delays were found to introduce data leakage, leading to overly optimistic predictions. As a result, these features were excluded in Phase 2.
To improve prediction delays in Phase2, new features were engineered by tracking flight segment sequences for each tail number on a given day (SEGMENT_NUMBER).
Historical flight information, such as previous airports (PREVIOUS_AIRPORT), prior delays (PREVIOUS_ARR_DELAY), and flight durations (PREVIOUS_DURATION), was incorporated. This was done by merging current flight records with its own FLIGHT_DURATION with the corresponding previous segment data, providing a richer and more comprehensive dataset for predicting delays.
Please refer to this section for details on the algorithm.
Key Findings from Exploratory Data Analysis:
Highest Departure and Arrival delays by Carriers (2019): Identifying the carriers with the highest delays directly relates to improved customer experience and financial impact. By pinpointing these carriers, airlines can better manage customer expectations, offer targeted support, and address issues that could lead to costly disruptions and compensation claims.
Top 30 Congested Airports with Flight Delays (2019): This finding supports enhanced operational efficiency and operational resilience. By focusing on the most congested airports, airlines can optimize resource allocation and improve scheduling to alleviate delays at these critical points, leading to smoother operations and better crisis management.
SMOTE Resampling on Training Data: Demonstrates the importance of data-driven decision making. By improving model performance through resampling, airlines can make more accurate predictions about delays, leading to better strategic planning and performance monitoring.
Delay Trends Across Distance Groups and Flight Segments (2019): This finding helps provide valuable insights into how aircraft operational schedules and the number of daily flights contributed to 2019 delays, effectively addressing operational efficiency and contingency planning. Understanding how delay patterns vary with flight distance and segment numbers helps airlines plan better turnaround times and manage operational schedules more effectively to prevent delays.
-
Segment Number Decreases with Distance: As flight distance increases, the number of segments (flights) decreases. Aircraft flying longer routes complete fewer flights in a day due to time constraints.
-
Delays Correlate with Higher Segment Numbers: Flights scheduled for more segments in a day are more prone to delays, regardless of distance. These delays are likely due to operational factors, such as shorter turnaround times, leading to delayed departures and arrivals.
Median Departure and Arrival Delays per Carrier (2019): Identified the top 20 carriers with the highest median delays. For each carrier, the top 20 airports with the most significant contribution to delays were also identified. By examining median delays, airlines can gain insights into typical delay experiences and ensure compliance with regulations. Focusing on specific carriers and airports with high delays can enhance overall safety and customer satisfaction.
-
Comprehensive Delay Analysis: By considering both departure and arrival delays, we provide a more holistic view of 2019 airline performance and airport efficiency. Endeavor Air Inc shows a highest delay at Miami International Airport. Comair Inc follows with the next highest delay at Portland International Airport.
-
Focus on median delays: The use of median delays helped identify typical delay experiences, filtering out the effect of extreme delays that skewed averages.
-
Unique Operational Factors: The variation in delay trends suggests that delays may be influenced by distinct factors specific to each carrier and airport, rather than being caused by common issues across multiple locations. For instance, both Endeavor Air Inc and Comair Inc experienced higher-than-usual precipitation at the airports on their flight day, which could have contributed to their delays.
Analyzing Trends in Flight Delays by Distance Groups (2019): This focuses on understanding how flight delays vary across different distance categories, which helps airlines optimize their operations to inform strategies to mitigate delays.
It can be observed that flights traveling short and moderate distances tend to have higher delays compared to the remainder of the distance categories.
Analyzing Trends in Flight Delays by Season, Time of Day and Day of Week (2019): This trend analysis aims to assist airlines in optimizing their operations by informing strategies to mitigate delays.
-
Seasonal Trends: Summer months generally experience the highest rates of both arrival and departure delays, while winter months also show significant arrival delays.
-
Time of Day: Early morning and late-night flights are associated with the highest arrival delays, whereas afternoon and evening flights tend to experience more departure delays.
-
Weekly Patterns: There is a noticeable dip in arrival delays on Saturdays, while other days of the week exhibit a relatively even distribution of both arrival and departure delays.
Analyzing Historical Average Delays (2019): Visualize the average historical delays of DEP_BLOCK_HIST, which represents the historical average delay for different departure time blocks aggregated by month, and DEP_AIRPORT_HIST, indicating the historical average delay rates for flights departing from specific airports per month. This analysis examines how these metrics fluctuate due to various time-related and seasonal factors, aiming to provide insights into delay patterns across different times of day, days of the week, and seasons.
-
Seasonal Trends: Historical average delays are generally higher during the summer months, followed by winter and spring.
-
Weekly Trends: Historical delays are evenly distributed throughout the week.
-
Time of Day: Average delays for different departure time blocks are notably higher in the afternoons and evenings.
Analyzing Average Weather features by Airlines and Airports (2019): This analysis was done to understand how selected weather features (PRCP, TMAX, AWND, SNOW, SNWD) vary across different carriers, departing airports, and previous airports, to observe any patterns with how weather conditions impact flight operations.
There was no significant trend observed in the average values of the selected weather features, when grouped by the specified columns (CARRIER_NAME, DEPARTING_AIRPORT, PREVIOUS_AIRPORT).
Actionable Insights - Recommendations from Exploratory Data Analysis:
Finding | Recommendation |
---|---|
Highest Departure and Arrival Delays by Carriers | - Implement targeted training and support programs for high-delay carriers to improve operational efficiency. - Use delay data to manage customer communications proactively. |
Top 30 Congested Airports with Flight Delays | - Allocate more resources and staff during peak times at congested airports to minimize delays. - Develop contingency plans for high-traffic airports to handle surges in passenger volume effectively. |
Delay Trends Across Distance Groups and Flight Segments | - Analyze operational schedules to optimize turnaround times for flights, especially those with multiple segments. - Review scheduling for short and moderate-distance flights to reduce potential delays. |
Seasonal Trends | - Increase staffing and operational resources during summer months to manage higher delay rates effectively. - Monitor weather patterns and adjust scheduling in advance to minimize disruptions during winter months. |
Time of Day | - Consider adjusting flight schedules to reduce the number of early morning and late-night flights that experience high arrival delays. - Increase capacity and resources during afternoon and evening hours to mitigate departure delays. |
Weekly Patterns | - Evaluate operational strategies to understand the factors contributing to increased delays on specific days. - Promote Saturday travel incentives to balance the load and improve operational efficiency. |
Model Evaluation and Performance Summary:
The following machine learning models were evaluated for predicting flight delays, listed in order:
-
Dummy Classifier (Baseline)
-
Multinomial Logistic Regression Classifier
-
Decision Tree Classifier with hyperparameter tuning
-
Ensemble Models – Bagging with Bagging Classifier with Decision Trees, Random Forest; Boosting with XGBoost, CatBoost and Light Gradient Boosting Machine (LGBM)
-
Hybrid Ensemble Models – Voting Classifier as an ensemble of XGBoost and Random Forest; Stacking Classifiers (hyperparameter tuned and without) with XGBoost and Random Forest as base estimator and a meta classifier with one-vs-rest Logistic Regression; and a custom Hybrid Ensemble that comprises of both the Voting and tuned Stacking Classifier
The ensemble, and hybrid ensemble models outperformed the baseline, Logistic Regression, and Decision Tree models. This section summarizes and compares the key metrics across these model groups, while making its final recommendation for production deployment here.
Actionable Insights - Recommendations for Model Selection and Deployment for Flight Delay Predictions:
Best Model: Voting Classifier
The Voting Classifier emerges as the best overall model for flight delay predictions due to its performance across multiple metrics:
-
Highest weighted F1 score (0.7944)
-
Highest accuracy (0.8290)
-
Best weighted PR AUC (0.82)
-
Best weighted ROC AUC (0.80)
Key strengths:
-
Strong performance in identifying on-time flights (class 0)
-
Good balance between precision and recall across delay classes
Deployment considerations:
-
Implement as the primary model for flight delay predictions
-
Use for real-time predictions and operational decision-making
-
Integrate into airline and airport management systems
Alternate Model: Hybrid Ensemble Classifier
The Hybrid Ensemble Classifier is an alternate choice:
-
High weighted F1 score (0.7935)
-
Good accuracy (0.8234)
-
High weighted PR AUC (0.82)
-
High weighted ROC AUC (0.80)
Key strengths:
-
Performance comparable to the Voting Classifier
-
Good balance between precision and recall for delay classes 0 and 2
Deployment considerations:
-
Use as a complementary model to the Voting Classifier
-
Use where compute resources and infrastructure allow for multiple model deployments
Actionable Insights - Recommendations based on influential Features in Flight Delay Predictions:
Feature | Recommendation |
---|---|
PREVIOUS_ARR_DELAY | - Implement robust systems to track and analyze previous flight delays. - Develop strategies to mitigate the cascading effect of delays (e.g., buffer time between connected flights). |
SEGMENT_NUMBER | - Optimize flight schedules, especially for aircraft making multiple trips per day. - Consider maintenance and crew scheduling to minimize delays in later segments. |
PREVIOUS_DURATION | - Analyze routes with consistently longer durations and consider adjustments. - Improve accuracy of flight duration estimates for better scheduling. |
DEP_PART_OF_DAY | - Adjust departure times to less congested periods of the day. - Allocate more resources during peak departure times. |
PREVIOUS_AIRPORT | - Identify problematic connections or airports. - Optimize route networks to minimize impact of delay-prone airports. |
DISTANCE | - Allocate appropriate aircraft to routes based on distance. - Consider fuel stops or direct flights for very long distances. |
DEP_BLOCK_HIST | - Use historical data to predict and prepare for delays during specific time blocks. - Adjust staffing and resources based on historically problematic time periods. |
CARRIER_NAME | - Benchmark airline performance against industry standards. - Share best practices within the organization to improve overall efficiency. |
PRCP (Precipitation) | - Enhance weather forecasting capabilities. - Develop contingency plans for various weather scenarios. - Invest in equipment and training for efficient operations during adverse weather. |
DAY_OF_WEEK | - Adjust resources and schedules based on weekly patterns. - Implement dynamic pricing strategies to manage demand across different days. |
Deep Dives
Enhanced Feature Engineering Algorithm
Input:
- Raw flight data
- Aircraft data
- Weather data
- Airport data
- Airline data
Output:
- Enriched dataset with engineered features for flight delay prediction
Algorithm:
1. Initialize empty dataset D for engineered features
2. For each flight record F in raw flight data:
2.1. Extract basic flight information (date, origin, destination, etc.)
2.2. Compute SEGMENT_NUMBER:
a. Group flights by TAIL_NUM and DAY_OF_MONTH
b. Sort by DEP_TIME within each group
c. Assign sequential numbers starting from 1
2.3. Add SEGMENT_NUMBER to D
3. For each flight record F in D:
3.1. Identify previous flight P with same TAIL_NUM
3.2. If P exists:
a. Set PREVIOUS_AIRPORT = P.DESTINATION
b. Set PREVIOUS_ARR_DELAY = P.ARR_DELAY
c. Set PREVIOUS_DEP_DELAY = P.DEP_DELAY
d. Set PREVIOUS_DURATION = P.ACTUAL_ELAPSED_TIME
3.3. Else:
Set all PREVIOUS_* features to null or appropriate default values
3.4. Add PREVIOUS_* features to D
4. Compute FLIGHT_DURATION:
4.1. FLIGHT_DURATION = CRS_ARR_TIME - CRS_DEP_TIME
4.2. Add FLIGHT_DURATION to D
5. Merge weather data with D based on date and airport
6. Compute temporal features:
6.1. Extract MONTH, DAY_OF_WEEK from date
6.2. Compute SEASON based on MONTH
6.3. Compute DEP_PART_OF_DAY based on CRS_DEP_TIME
6.4. Add temporal features to D
7. Merge airport and airline data with D
8. Compute flight statistics, passenger statistics, and employee statistics:
8.1. Add all statistics features to D
9. Compute historical performance metrics:
8.1. Calculate CARRIER_HISTORICAL (average delay by carrier and month)
8.2. Calculate DEP_AIRPORT_HIST (average delay by departure airport and month)
8.3. Calculate DEP_BLOCK_HIST (average delay by departure time block and month)
8.4. Add historical metrics to D
10. Handle missing values and perform necessary data type conversions
11. Return enriched dataset D
Performance comparison across Baseline, Logistic Regression and Decision Tree
Model | Strengths | Weaknesses | Key Observations | Important Features |
---|---|---|---|---|
Baseline | - Simple and fast | - Very poor weighted F1 score (0.0373) - Low weighted PR AUC (0.63) - Poor weighted ROC AUC (0.50) - Low accuracy (0.1461) - Unable to distinguish between classes effectively |
- Performs poorly across all metrics - Not suitable for this classification task |
N/A |
Multinomial Logistic Regression | - Best overall performance - Highest weighted F1 score (0.7329) - Highest weighted PR AUC (0.77) - Best weighted ROC AUC (0.74) - Best accuracy (0.7051) - Good balance between precision and recall |
- Still struggles with minority class (class 1) - Slightly lower interpretability compared to Decision Tree |
- Shows the best overall performance - Outperforms other models in most weighted metrics - Provides a good balance across different metrics and classes |
Positive influence on class 2: - DAY_OF_WEEK - CARRIER_NAME - PREVIOUS_ARR_DELAY - MONTH - ARR_PART_OF_DAY - DEP_PART_OF_DAY - SEASON Negative influence on class 2: - PREVIOUS_DURATION_CATEGORY - FLIGHT_DURATION_CATEGORY - DISTANCE_GROUP_DESC |
Hyperparameter-tuned Decision Tree | - Competitive weighted F1 score (0.7422) - Good weighted PR AUC (0.74) - Decent weighted ROC AUC (0.70) - Highest accuracy (0.7359) - Better interpretability than Logistic Regression |
- Slightly lower weighted F1 score than Logistic Regression - Lower weighted PR AUC and ROC AUC compared to Log |
Performance comparison across Ensemble Bagging and Boosting Classifiers
Model | Strengths | Weaknesses | Key Observations | Important Features |
---|---|---|---|---|
BaggingClassifier (Decision Tree) | - High weighted F1 score (0.7888) - High weighted PR AUC (0.81) - Good weighted ROC AUC (0.78) |
- Slightly lower weighted ROC AUC compared to some other models | - Balanced performance across weighted metrics - Good overall predictive power |
Top 5 (Permutation Importance): 1. PREVIOUS_ARR_DELAY: 0.1340 2. PREVIOUS_DURATION: 0.0802 3. SEGMENT_NUMBER: 0.0766 4. DEP_PART_OF_DAY: 0.0597 5. ARR_PART_OF_DAY: 0.0199 |
Random Forest Classifier | - High weighted F1 score (0.7887) - High weighted PR AUC (0.81) - Best weighted ROC AUC (0.79) |
- Marginally lower weighted F1 score than BaggingClassifier | - Very similar performance to BaggingClassifier - Slightly better at handling class imbalance |
Top 5 (Built-in Importance): 1. PREVIOUS_ARR_DELAY: 0.1370 2. DISTANCE: -0.0006 3. TMAX: -0.0000 4. FLIGHT_DURATION: -0.0005 5. AWND: 0.0001 |
XGBoost Classifier | - High weighted PR AUC (0.81) - High weighted ROC AUC (0.79) |
- Lower weighted F1 score (0.7682) compared to BaggingClassifier and Random Forest | - Good balance between precision and recall - Strong performance in AUC metrics |
Top 5 (Built-in Importance): 1. PREVIOUS_ARR_DELAY: 0.1718 2. DEP_PART_OF_DAY: 0.0503 3. PREVIOUS_DURATION_CATEGORY: -0.0040 4. PRCP: 4.3230 5. ARR_PART_OF_DAY: 4.4493 |
LightGBM | - High weighted PR AUC (0.81) - High weighted ROC AUC (0.79) |
- Lower weighted F1 score (0.7182) | - Underperforms in F1 score compared to other models - Maintains strong AUC performance |
Top 5 (Built-in Importance): 1. AIRLINE_AIRPORT_FLIGHTS_MONTH: 1207.0000 2. AIRLINE_FLIGHTS_MONTH: 996.0000 3. PREVIOUS_ARR_DELAY: 1031.0000 4. DISTANCE: 915.0000 5. DEP_AIRPORT_HIST: 856 |
CatBoost | - Relatively high weighted PR AUC (0.78) | - Lowest weighted F1 score (0.5134) - Lowest weighted ROC AUC (0.75) |
- Significantly underperforms compared to other models - Struggles with overall predictive power |
Top 5 (Built-in Importance): 1. PREVIOUS_ARR_DELAY: 64.3610 2. DEP_PART_OF_DAY: 11.9855 3. ARR_PART_OF_DAY: 4.4493 4. PRCP: 4.3230 5 SEGMENT_NUMBER: 2.8930 |
Performance comparison across Hybrid Ensemble Classifiers
Model | Strengths | Weaknesses | Key Observations | Important Features |
---|---|---|---|---|
Voting Classifier | - Highest weighted F1 score (0.7944) - Highest accuracy (0.8290) - Best weighted PR AUC (0.82) - Best weighted ROC AUC (0.80) |
- Low F1 score for class 1 (0.0677) | - Best overall performance - Strong in identifying on-time flights (class 0) - Good balance between precision and recall |
Top 5 (Permutation Importance): 1. PREVIOUS_ARR_DELAY: 0.1311 2. SEGMENT_NUMBER: 0.0552 3. PREVIOUS_AIRPORT: 0.0476 4. PREVIOUS_DURATION: 0.0429 5. DEP_PART_OF_DAY: 0.0184 |
Stacking Classifier | - Good weighted F1 score (0.7896) - Good accuracy (0.8118) - High weighted PR AUC (0.81) - High weighted ROC AUC (0.79) |
- Lower performance on class 1 (F1 score: 0.0936) compared to other classes | - Slightly lower performance than Voting Classifier - Better performance on class 1 compared to Voting Classifier |
Top 5 (Permutation Importance): 1. PREVIOUS_ARR_DELAY: 0.1059 2. PREVIOUS_AIRPORT: 0.0247 3. SEGMENT_NUMBER: 0.0200 4. PREVIOUS_DURATION: 0.0192 5. DEP_PART_OF_DAY: 0.0173 |
Tuned Stacking Classifier | - Improved weighted F1 score (0.7921) - Improved accuracy (0.8180) - High weighted PR AUC (0.81) - High weighted ROC AUC (0.79) |
- Still struggles with class 1 (F1 score: 0.0901) | - Performance improvement over base Stacking Classifier - Better balance across all classes |
Top 5 (Permutation Importance): 1. PREVIOUS_ARR_DELAY: 0.1320 2. PREVIOUS_AIRPORT: 0.0734 3. SEGMENT_NUMBER: 0.0530 4. PREVIOUS_DURATION: 0.0472 5. DEP_PART_OF_DAY: 0.0256 |
Hybrid Ensemble Classifier | - High weighted F1 score (0.7935) - Good accuracy (0.8234) - High weighted PR AUC (0.82) - High weighted ROC AUC (0.80) |
- Struggles with class 1 (F1 score: 0.0813) | - Performance comparable to other ensemble methods - Good balance between precision and recall for class 0 and 2 |
Top 5 (Permutation Importance): 1. PREVIOUS_ARR_DELAY: 0.1324 2. PREVIOUS_AIRPORT: 0.0495 3. PREVIOUS_DURATION: 0.0458 4. SEGMENT_NUMBER: 0.0431 5. DEP_PART_OF_DAY: 0.0216 |
Features influencing Flight Delay Predictions
Based on the feature importance results from across these models, the following features are consistently influential in flight delay predictions – ref: feature descriptions:
-
PREVIOUS_ARR_DELAY: This is consistently the most important feature across all models. It represents the arrival delay of the previous flight for the same aircraft.
-
SEGMENT_NUMBER: This feature, which represents the order of flights for an aircraft on a given day, is highly influential in several models.
-
PREVIOUS_DURATION: The duration of the previous flight is an important factor in predicting delays.
-
DEP_PART_OF_DAY: The time of day when the current flight departs is a significant predictor of delays.
-
PREVIOUS_AIRPORT: The departing airport from where the aircraft on its previous segment last came seems to have a notable impact on delay predictions.
-
DISTANCE: The flight distance appears to be moderately important in several models.
-
DEP_BLOCK_HIST: Historical average delay for different departure time blocks is influential.
-
CARRIER_NAME: The airline operating the flight is a relevant factor in some models.
-
PRCP: Precipitation at the airport on the day of the flight is a notable weather-related feature.
-
DAY_OF_WEEK: The day of the week when the flight occurs has some influence on delay predictions.
These features consistently appear among the top influential factors across different models (Bagging Classifier, Random Forest, XGBoost, LightGBM, and ensemble methods like Voting and Stacking Classifiers). While the exact order and magnitude of importance varies between the models, these features represent a mix of temporal factors (previous delays and time of day), operational aspects (segment number and carrier), geographical elements (distance and previous airport), and weather conditions (precipitation).
Partial Dependence Plots - Visualize Feature Impact on Flight Delay Predictions for each Delay Class
Data Sources
Kaggle Dataset from here, that is comprised of multiple csv’s listed below.
-
Air Carrier Summary
-
Aircraft Inventory
-
Air Carrier employee support (Ground Crew, Flight Attendants)
-
Flight On Time Reporting Status with Air Carrier info for 2019-2020
-
Airport Weather
-
Airport and Carrier look-up codes
Methodology Used for Data Preparation and Modeling
Data Preparation: Involved cleaning and merging multiple raw CSV files to create a unified dataset with ~4M entries (for training) and ~2M entries (for testing) with 34 predictor variables and 1 target variable. Raw data-set description is here.
Feature Engineering:
-
Delay Categories: Classified delays into three distinct categories for more granular analysis of flight performance:
Class0: On-time Departure and Arrival - Flights that depart and arrive within their scheduled times.
Class1: Either departure or arrival delayed - Flights that experience delays either during arrival or departure.
Class2: Delayed Departure and Arrival - Flights that experience delays both in departure and arrival times.
-
Aggregation Features: Developed historical delay averages, to identify patterns and trends in airline operations.
CARRIER_HISTORICAL = captures the historical average delay rate of each carrier per month
DEP_AIRPORT_HIST = captures historical average delay rates for flights departing from specific airports per month
PREV_AIRPORT_HIST = captures historical average delay rate for the airport from which the aircraft arrived before the current departure
DAY_HISTORICAL = captures historical average delays associated with each day of the week, adjusted monthly
DEP_BLOCK_HIST = captures historical average delay rate for different departure time blocks, aggregated by month
-
Time-Based Features: Extracted seasonal information from the month and categorized parts of the day using departure and arrival time blocks to enhance temporal analysis of flight data.
-
Distance-Based Features: Mapped distance groups to descriptive labels, providing clearer insights into flight range categories for more intuitive analysis.
-
Delay-Based Features: Created new features by combining actual departure and arrival times with scheduled times, generating detailed delay metrics to enhance analysis of flight performance and punctuality. However, in Phase 2 of model evaluation, these features were removed due to data leakage, as they resulted in nearly 100% prediction accuracy.
ELAPSED_TIME_DIFF, DEP_DELAY, ARR_DELAY
- Flight Duration, Previous Flight Duration and Arrival Delay: Phase2 also introduced new delay-based features. Flight duration was the total duration of the current flight calculated from the planned departure and arrival times. This feature helps in assessing how longer flight durations may correlate with increased delays. Previous Flight Duration and Previous Arrival Delay were introduced as historical features and the approach to engineering these new features is outlined in the executive summary.
FLIGHT_DURATION, FLIGHT_DURATION_CATEGORY, PREVIOUS_DURATION,
PREVIOUS_DURATION_CATEGORY, PREVIOUS_ARR_DELAY
- Employee Statistics Features: Developed features to analyze staffing and resourcing in airline and carrier operations, providing insights into workforce allocation, scheduling efficiency, and resource optimization.
FLT_ATTENDANTS_PER_PASS, PASSENGER_HANDLING
-
Removed highly correlated features with VIF – see before and after removal:
Data Pre-Processing: Missing values and outliers detected were removed. SMOTE Tomek was applied to just the training dataset. This combined SMOTE’s oversampling of the minority classes (classes 0,1 and 2) and Tomek links’ under-sampling. Categorical features were also target encoded and Numerical features were scaled.
Model Evaluation with Training, Validation and Test dataset:
The dataset was initially split into Training (70%, 4.542M entries) and Test (30%, 1.946M entries) sets. The training set was further divided, with 20% retained for validation. From the remaining training data, a sample of up to 500,000 entries was extracted for model training, ensuring that the sample size did not exceed the available data.
All splits were performed using stratified sampling to maintain class distribution. This approach was adopted to manage the large dataset by creating a more manageable training set size while still preserving a substantial validation set.
Project Structure
Data:
Analysis and Visualization:
-
AutoViz Plots (Credit: AutoViML/AutoViz)
Notebooks:
Links to the latest set of Notebooks from this folder are noted below. Please note, earlier revisions continue to be available in the same folder to track iterations.
Model Artifacts:
Folder here contains:
-
Recommended Model for production deploys
-
Performance Metrics for model evaluations in csv
StreamLit and FastAPI interface:
-
FastAPI as backend API deployed to AWS EC2 here
-
StreamLit application deployed to AWS EC2 here
-
Model deployed to AWS EC2 is this
Repository with GitLFS:
This project uses Git Large File Storage (LFS) to handle large files efficiently. Git LFS replaces large files with text pointers inside Git, while storing the file contents on a remote server.
To work with this repository:
-
Ensure you have Git LFS installed. If not, install it from git-lfs.com.
-
After cloning the repository, run:
git lfs install
git lfs pull
- When adding new large files, track them with:
git lfs track "path/to/large/file"
- Commit and push as usual. Git LFS will handle the large files automatically. For more information on Git LFS, refer to the official documentation.
Project Infrastructure
This project utilized Google Colab Pro to handle computationally intensive notebook operations for data exploration and modeling. Key components include:
Notebooks:
-
Data exploration and modeling results from Colab Pro are captured in notebooks available in this GitHub repository.
-
Direct links to key external notebooks for results: Exploration Notebook, Modeling Notebook
AutoViz Visualizations:
- Comprehensive AutoViz plots generated during data exploration are externally stored here due to size constraints on GitHub.
Decision Tree and Random Forest Artifacts
- Decision tree and Random Forest tree structures are available externally - view here
MLOps with SkyFlow
-
SkyFlow is a Streamlit application deployed on an Amazon EC2 instance, which serves as the hosting environment. The application is accessible via a registered domain name (skyflow-kvgrowth.com), managed through AWS Route 53.
-
Route 53 is configured with an A record that points the domain(skyflow-kvgrowth.com) to the EC2 instance’s Elastic IP address, ensuring a stable connection even if the instance is restarted. For secure access, an SSL/TLS certificate is implemented, using AWS Certificate Manager (ACM) with an Application Load Balancer (ALB) for SSL termination.
-
EC2 instance’s security group is configured to allow inbound traffic on necessary ports (8000, 443, and 8501 for Streamlit). This setup provides a secure, scalable, and easily manageable environment for hosting the Streamlit application, with the flexibility to handle increased traffic and maintain high availability.
Key Insights from Phase1 to Phase2 of Project
-
Switched to predicting three classes instead of earlier four classes removing granularity of whether a flight had a specific arrival delay or a departure delay to see if performance with minority delay classes would improve
-
Experimented with F2 scores as an evaluation metric
-
Switched back to focus on F1 Score to: a) Decrease false positives for delayed flights, especially Class2, b) Improve accuracy of on-time flight predictions - Class1, c) Increase precision for Class2 and Class1
-
Model Tuning by adjusting thresholds for Class2 to optimize for F1 score. Also, added class weights where needed
-
Revisited SMOTETomek sampling strategy to improve prediction performance for minority classes – Class1 (either departure or arrival delayed) and Class2 – (both arrival and departure delayed)
-
Adding a Stacking Classifier and a Hybrid Ensemble to improve F1 scores by combining the strengths of multiple models, allowing them to capture diverse patterns in the data. This approach helped achieve a better balance between precision and recall, improving overall F1 performance
-
Enhanced feature engineering outlined in summary to further improve model performance on minority class predictions
Future Work
Feature Engineering: Improve flight prediction performance of the minority classes (Class1 and Class2) with engineered features.
Use of Principal Component Analysis (PCA): With 2D visualization to explore patterns within the current delay classes. If analysis reveals significant overlap between classes or a lack of distinct patterns, it may be beneficial to consider a more granular classification, such as separating arrival delays and departure delays into their own distinct classes.
Extend Forecast Horizon and Implement Multi-Step Forecasting: Increase the prediction timeframe beyond the current 24-hour forecast, implementing a multi-step forecasting approach that provides:
- Short-term predictions (24 hours)
- Medium-term predictions (48-72 hours)
- Long-term predictions (up to 7 days) This multi-horizon approach allows for both immediate operational adjustments and longer-term strategic planning.
Explore use of Deep Learning Architectures: Investigate if performance can be improved further by:
- Implementing LSTM (Long Short-Term Memory) networks to capture long-term dependencies in flight data
- Exploring Transformer models for their ability to handle sequential data and long-range dependencies
- Experimenting with hybrid models that combine CNN-LSTM architectures to capture both spatial and temporal patterns in flight and weather data
Expand SkyFlow: Refine its StreamLit interface beyond the initial prototype to include dashboards and to work with reduced number of inputs.
Real-time Updates: Incorporate real-time data to provide predictions as the departure time approaches.
Appendix
Baseline Dummy Classifier
Multinomial Logistic Regression Classifier
Decision Tree – HyperParameter tuned Decision Tree
Ensemble and Hybrid Ensemble model evaluation metrics
Similar metrics for the ensemble and hybrid classifiers can be found in this notebook here
References
How are airlines using AI to minimize disruptions
Case Study with JetBlue’s use of Tommorow.io
KDD2018: Predicting Estimated Time of Arrival for Commercial Flights
Mamdouh, M., Ezzat, M. & A.Hefny, H. A novel intelligent approach for flight delay prediction. J Big Data 10, 179 (2023). https://doi.org/10.1186/s40537-023-00854-w
Yuemin Tang. 2021. Airline Flight Delay Prediction Using Machine Learning Models. In 2021 5th International Conference on E-Business and Internet (ICEBI 2021), October 15-17, 2021, Singapore, Singapore. ACM, New York, NY, USA, 7 Pages. https://doi.org/10.1145/3497701.3497725