Skip to the content.

SkyFlow: AI-Powered Flight Delay Predictions

Dipti Aswath | LinkedIn | Email | Early SkyFlow Prototype

License

This project is licensed under the Apache License 2.0. You may use, modify, and distribute this code under the terms of the license. See the LICENSE file for more details.

Attribution: Please ensure to give proper credit to the original author listed above, when reusing or redistributing the code.

Table of Contents

  1. Executive Summary

  2. Deep Dives

  3. Data Sources

  4. Methodology Used for Data Preparation and Modeling

  5. Project Structure

  6. Project Infrastructure

  7. Key Insights from Phase1 to Phase2 of Project

  8. Future Work

  9. Appendix

  10. References

Executive Summary

Problem Statement:

Airlines and airports face significant operational challenges due to flight delays, which can be caused by a variety of factors including flight status, weather conditions, air traffic congestion, aircraft specifics, and inefficiencies in ground and passenger handling. The objective is to predict flight delays by developing a multi-class classification model that considers both departure and arrival delays, helping improve operational planning and customer satisfaction.

Rationale:

Flight delays can have widespread consequences for airlines, from passenger dissatisfaction to operational disruptions. Developing a predictive model for flight delays not only addresses the core issue of minimizing delays but also enhances decision-making processes across various facets of airline operations.

Business Case 1: Enhancing Operational Efficiency

Predicting flight delays enables airlines to optimize their operations, routing, and resource management.

Business Case 2: Improving Customer Experience

Accurate delay predictions lead to better customer service and proactive communication, enhancing the passenger experience.

By addressing these areas, airlines can significantly improve operational efficiency, enhance passenger experience with better customer satisfaction scores, and better manage resources and disruptions. Predictive modeling for flight delays is not just about minimizing delays but also about fostering a more responsive and resilient airline operation.

Example Usage: An AI system that predicts flight delays could also:

  1. Suggest alternate flight paths that are less likely to experience delays.

  2. Provide passengers with timely updates and rebooking options.

  3. Dynamically adjust flight schedules to manage disruptions effectively.

  4. Allocate resources efficiently to minimize the impact on subsequent flights.

Research Question:

How can we develop an AI and machine learning-powered smart system to accurately predict flight delays by assessing multiple factors, including departure and arrival times, flight status, weather conditions, air traffic, aircraft specifics, and ground operations?

Flight Delay Predictions - Key Metrics:

SkyFlow is an advanced tool that helps predict how flights might perform. It looks at many factors like weather, how busy the airport is, and how well the airline usually does. Then, it puts each flight into one of three groups:

Why is this important?

It helps everyone plan better:

How do we know if SkyFlow is doing a good job?

We look at five main things to evaluate SkyFlow’s performance:

Our goal is to make SkyFlow as accurate as possible, so everyone can rely on its predictions to make their travel smoother and more predictable.

To monitor overall performance, we use the Precision-Recall Area Under the Curve (PR AUC) and Receiver Operating Characteristic Area Under the Curve (ROC AUC).

For evaluating the balance between correctly identifying delays and avoiding false alarms, we rely on the F1 Score as the primary metric, which combines precision and recall into a single value. Further, the F1 score is optimized to give more weightage to the full-delay groups.

Approach

CRISP-DM Framework:

For the Flight Delay Prediction problem, the CRISP-DM (Cross Industry Standard Process for Data Mining) framework was applied to provide a structured solution. The process was as follows:

  1. Business Understanding: The goal was to predict flight delays to improve airline operational efficiency and enhance customer satisfaction by reducing unexpected delays.

  2. Data Understanding: A detailed analysis of the dataset was performed, identifying key patterns and relationships, such as flight times, delays, and distances, that could significantly influence prediction outcomes.

  3. Data Preparation: The raw data was preprocessed, and relevant features were engineered. This included detailed delay metrics such as departure and arrival times, distances, and other flight-specific attributes to ensure high-quality inputs for model training.

  4. Modeling: Various machine learning models were trained and evaluated, focusing on performance metrics like Precision-Recall AUC, ROC AUC, and F1 score. These models were iteratively tuned to optimize predictive performance.

  5. Deployment: The best-performing model was integrated into SkyFlow’s prototype application, enabling real-time flight delay predictions. Future iterations aim to further enhance the operational decision-making.

Feature Engineering:

During the data preparation phase, significant feature engineering was conducted as outlined in a later Methodology section. Initially, features that captured the relationship between departure and arrival delays were found to introduce data leakage, leading to overly optimistic predictions. As a result, these features were excluded in Phase 2.

To improve prediction delays in Phase2, new features were engineered by tracking flight segment sequences for each tail number on a given day (SEGMENT_NUMBER).

Historical flight information, such as previous airports (PREVIOUS_AIRPORT), prior delays (PREVIOUS_ARR_DELAY), and flight durations (PREVIOUS_DURATION), was incorporated. This was done by merging current flight records with its own FLIGHT_DURATION with the corresponding previous segment data, providing a richer and more comprehensive dataset for predicting delays.

Please refer to this section for details on the algorithm.

Key Findings from Exploratory Data Analysis:

Highest Departure and Arrival delays by Carriers (2019): Identifying the carriers with the highest delays directly relates to improved customer experience and financial impact. By pinpointing these carriers, airlines can better manage customer expectations, offer targeted support, and address issues that could lead to costly disruptions and compensation claims.

A graph of blue rectangular objects Description automatically generated

A graph of blue rectangular objects Description automatically generated

Top 30 Congested Airports with Flight Delays (2019): This finding supports enhanced operational efficiency and operational resilience. By focusing on the most congested airports, airlines can optimize resource allocation and improve scheduling to alleviate delays at these critical points, leading to smoother operations and better crisis management.

A graph showing the number of airports Description automatically generated

A map of the united states with different colored spots Description automatically generated

SMOTE Resampling on Training Data: Demonstrates the importance of data-driven decision making. By improving model performance through resampling, airlines can make more accurate predictions about delays, leading to better strategic planning and performance monitoring.

A close-up of a computer code Description automatically generated

A blue and purple pie chart Description automatically generated

Delay Trends Across Distance Groups and Flight Segments (2019): This finding helps provide valuable insights into how aircraft operational schedules and the number of daily flights contributed to 2019 delays, effectively addressing operational efficiency and contingency planning. Understanding how delay patterns vary with flight distance and segment numbers helps airlines plan better turnaround times and manage operational schedules more effectively to prevent delays.

A graph of a number across a group Description automatically generated

Median Departure and Arrival Delays per Carrier (2019): Identified the top 20 carriers with the highest median delays. For each carrier, the top 20 airports with the most significant contribution to delays were also identified. By examining median delays, airlines can gain insights into typical delay experiences and ensure compliance with regulations. Focusing on specific carriers and airports with high delays can enhance overall safety and customer satisfaction.

A screenshot of a graph Description automatically generated

A row of purple rectangular objects Description automatically generated

Analyzing Trends in Flight Delays by Distance Groups (2019): This focuses on understanding how flight delays vary across different distance categories, which helps airlines optimize their operations to inform strategies to mitigate delays.

It can be observed that flights traveling short and moderate distances tend to have higher delays compared to the remainder of the distance categories.

A graph of different colored squares Description automatically generated with medium confidence

Analyzing Trends in Flight Delays by Season, Time of Day and Day of Week (2019): This trend analysis aims to assist airlines in optimizing their operations by informing strategies to mitigate delays.

A screenshot of a graph Description automatically generatedA purple and orange squares Description automatically generated

A purple and orange bars Description automatically generated

Analyzing Historical Average Delays (2019): Visualize the average historical delays of DEP_BLOCK_HIST, which represents the historical average delay for different departure time blocks aggregated by month, and DEP_AIRPORT_HIST, indicating the historical average delay rates for flights departing from specific airports per month. This analysis examines how these metrics fluctuate due to various time-related and seasonal factors, aiming to provide insights into delay patterns across different times of day, days of the week, and seasons.

A group of different colored bars Description automatically generated

Analyzing Average Weather features by Airlines and Airports (2019): This analysis was done to understand how selected weather features (PRCP, TMAX, AWND, SNOW, SNWD) vary across different carriers, departing airports, and previous airports, to observe any patterns with how weather conditions impact flight operations.

There was no significant trend observed in the average values of the selected weather features, when grouped by the specified columns (CARRIER_NAME, DEPARTING_AIRPORT, PREVIOUS_AIRPORT).

A graph of numbers on a white background Description automatically generatedA chart with purple rectangles Description automatically generated

Actionable Insights - Recommendations from Exploratory Data Analysis:

Finding Recommendation
Highest Departure and Arrival Delays by Carriers - Implement targeted training and support programs for high-delay carriers to improve operational efficiency.
- Use delay data to manage customer communications proactively.
Top 30 Congested Airports with Flight Delays - Allocate more resources and staff during peak times at congested airports to minimize delays.
- Develop contingency plans for high-traffic airports to handle surges in passenger volume effectively.
Delay Trends Across Distance Groups and Flight Segments - Analyze operational schedules to optimize turnaround times for flights, especially those with multiple segments.
- Review scheduling for short and moderate-distance flights to reduce potential delays.
Seasonal Trends - Increase staffing and operational resources during summer months to manage higher delay rates effectively.
- Monitor weather patterns and adjust scheduling in advance to minimize disruptions during winter months.
Time of Day - Consider adjusting flight schedules to reduce the number of early morning and late-night flights that experience high arrival delays.
- Increase capacity and resources during afternoon and evening hours to mitigate departure delays.
Weekly Patterns - Evaluate operational strategies to understand the factors contributing to increased delays on specific days.
- Promote Saturday travel incentives to balance the load and improve operational efficiency.

Model Evaluation and Performance Summary:

The following machine learning models were evaluated for predicting flight delays, listed in order:

The ensemble, and hybrid ensemble models outperformed the baseline, Logistic Regression, and Decision Tree models. This section summarizes and compares the key metrics across these model groups, while making its final recommendation for production deployment here.

Actionable Insights - Recommendations for Model Selection and Deployment for Flight Delay Predictions:

Best Model: Voting Classifier

The Voting Classifier emerges as the best overall model for flight delay predictions due to its performance across multiple metrics:

Key strengths:

Deployment considerations:

Alternate Model: Hybrid Ensemble Classifier

The Hybrid Ensemble Classifier is an alternate choice:

Key strengths:

Deployment considerations:

Actionable Insights - Recommendations based on influential Features in Flight Delay Predictions:

Feature Recommendation
PREVIOUS_ARR_DELAY - Implement robust systems to track and analyze previous flight delays.
- Develop strategies to mitigate the cascading effect of delays (e.g., buffer time between connected flights).
SEGMENT_NUMBER - Optimize flight schedules, especially for aircraft making multiple trips per day.
- Consider maintenance and crew scheduling to minimize delays in later segments.
PREVIOUS_DURATION - Analyze routes with consistently longer durations and consider adjustments.
- Improve accuracy of flight duration estimates for better scheduling.
DEP_PART_OF_DAY - Adjust departure times to less congested periods of the day.
- Allocate more resources during peak departure times.
PREVIOUS_AIRPORT - Identify problematic connections or airports.
- Optimize route networks to minimize impact of delay-prone airports.
DISTANCE - Allocate appropriate aircraft to routes based on distance.
- Consider fuel stops or direct flights for very long distances.
DEP_BLOCK_HIST - Use historical data to predict and prepare for delays during specific time blocks.
- Adjust staffing and resources based on historically problematic time periods.
CARRIER_NAME - Benchmark airline performance against industry standards.
- Share best practices within the organization to improve overall efficiency.
PRCP (Precipitation) - Enhance weather forecasting capabilities.
- Develop contingency plans for various weather scenarios.
- Invest in equipment and training for efficient operations during adverse weather.
DAY_OF_WEEK - Adjust resources and schedules based on weekly patterns.
- Implement dynamic pricing strategies to manage demand across different days.

Deep Dives

Enhanced Feature Engineering Algorithm

    Input:
    - Raw flight data
    - Aircraft data
    - Weather data
    - Airport data
    - Airline data

    Output:
    - Enriched dataset with engineered features for flight delay prediction

    Algorithm:

    1. Initialize empty dataset D for engineered features

    2. For each flight record F in raw flight data:

        2.1. Extract basic flight information (date, origin, destination, etc.)

        2.2. Compute SEGMENT_NUMBER:

            a. Group flights by TAIL_NUM and DAY_OF_MONTH

            b. Sort by DEP_TIME within each group

            c. Assign sequential numbers starting from 1

        2.3. Add SEGMENT_NUMBER to D

    3. For each flight record F in D:

        3.1. Identify previous flight P with same TAIL_NUM

        3.2. If P exists:

            a. Set PREVIOUS_AIRPORT = P.DESTINATION

            b. Set PREVIOUS_ARR_DELAY = P.ARR_DELAY

            c. Set PREVIOUS_DEP_DELAY = P.DEP_DELAY

            d. Set PREVIOUS_DURATION = P.ACTUAL_ELAPSED_TIME

        3.3. Else:

            Set all PREVIOUS_* features to null or appropriate default values

        3.4. Add PREVIOUS_* features to D

    4. Compute FLIGHT_DURATION:

        4.1. FLIGHT_DURATION = CRS_ARR_TIME - CRS_DEP_TIME

        4.2. Add FLIGHT_DURATION to D

    5. Merge weather data with D based on date and airport

    6. Compute temporal features:

        6.1. Extract MONTH, DAY_OF_WEEK from date

        6.2. Compute SEASON based on MONTH

        6.3. Compute DEP_PART_OF_DAY based on CRS_DEP_TIME

        6.4. Add temporal features to D

    7. Merge airport and airline data with D

    8. Compute flight statistics, passenger statistics, and employee statistics:

        8.1. Add all statistics features to D 

    9. Compute historical performance metrics:

        8.1. Calculate CARRIER_HISTORICAL (average delay by carrier and month)

        8.2. Calculate DEP_AIRPORT_HIST (average delay by departure airport and month)

        8.3. Calculate DEP_BLOCK_HIST (average delay by departure time block and month)

        8.4. Add historical metrics to D

    10. Handle missing values and perform necessary data type conversions

    11. Return enriched dataset D

Performance comparison across Baseline, Logistic Regression and Decision Tree

A graph showing different types of flight delay Description automatically generated

Model Strengths Weaknesses Key Observations Important Features
Baseline - Simple and fast - Very poor weighted F1 score (0.0373)
- Low weighted PR AUC (0.63)
- Poor weighted ROC AUC (0.50)
- Low accuracy (0.1461)
- Unable to distinguish between classes effectively
- Performs poorly across all metrics
- Not suitable for this classification task
N/A
Multinomial Logistic Regression - Best overall performance
- Highest weighted F1 score (0.7329)
- Highest weighted PR AUC (0.77)
- Best weighted ROC AUC (0.74)
- Best accuracy (0.7051)
- Good balance between precision and recall
- Still struggles with minority class (class 1)
- Slightly lower interpretability compared to Decision Tree
- Shows the best overall performance
- Outperforms other models in most weighted metrics
- Provides a good balance across different metrics and classes
Positive influence on class 2:
- DAY_OF_WEEK
- CARRIER_NAME
- PREVIOUS_ARR_DELAY
- MONTH
- ARR_PART_OF_DAY
- DEP_PART_OF_DAY
- SEASON

Negative influence on class 2:
- PREVIOUS_DURATION_CATEGORY
- FLIGHT_DURATION_CATEGORY
- DISTANCE_GROUP_DESC
Hyperparameter-tuned Decision Tree - Competitive weighted F1 score (0.7422)
- Good weighted PR AUC (0.74)
- Decent weighted ROC AUC (0.70)
- Highest accuracy (0.7359)
- Better interpretability than Logistic Regression
- Slightly lower weighted F1 score than Logistic Regression
- Lower weighted PR AUC and ROC AUC compared to Log
   

Performance comparison across Ensemble Bagging and Boosting Classifiers

A graph showing different colored bars Description automatically generated with medium confidence

Model Strengths Weaknesses Key Observations Important Features
BaggingClassifier (Decision Tree) - High weighted F1 score (0.7888)
- High weighted PR AUC (0.81)
- Good weighted ROC AUC (0.78)
- Slightly lower weighted ROC AUC compared to some other models - Balanced performance across weighted metrics
- Good overall predictive power
Top 5 (Permutation Importance):
1. PREVIOUS_ARR_DELAY: 0.1340
2. PREVIOUS_DURATION: 0.0802
3. SEGMENT_NUMBER: 0.0766
4. DEP_PART_OF_DAY: 0.0597
5. ARR_PART_OF_DAY: 0.0199
Random Forest Classifier - High weighted F1 score (0.7887)
- High weighted PR AUC (0.81)
- Best weighted ROC AUC (0.79)
- Marginally lower weighted F1 score than BaggingClassifier - Very similar performance to BaggingClassifier
- Slightly better at handling class imbalance
Top 5 (Built-in Importance):
1. PREVIOUS_ARR_DELAY: 0.1370
2. DISTANCE: -0.0006
3. TMAX: -0.0000
4. FLIGHT_DURATION: -0.0005
5. AWND: 0.0001
XGBoost Classifier - High weighted PR AUC (0.81)
- High weighted ROC AUC (0.79)
- Lower weighted F1 score (0.7682) compared to BaggingClassifier and Random Forest - Good balance between precision and recall
- Strong performance in AUC metrics
Top 5 (Built-in Importance):
1. PREVIOUS_ARR_DELAY: 0.1718
2. DEP_PART_OF_DAY: 0.0503
3. PREVIOUS_DURATION_CATEGORY: -0.0040
4. PRCP: 4.3230
5. ARR_PART_OF_DAY: 4.4493
LightGBM - High weighted PR AUC (0.81)
- High weighted ROC AUC (0.79)
- Lower weighted F1 score (0.7182) - Underperforms in F1 score compared to other models
- Maintains strong AUC performance
Top 5 (Built-in Importance):
1. AIRLINE_AIRPORT_FLIGHTS_MONTH: 1207.0000
2. AIRLINE_FLIGHTS_MONTH: 996.0000
3. PREVIOUS_ARR_DELAY: 1031.0000
4. DISTANCE: 915.0000
5. DEP_AIRPORT_HIST: 856
CatBoost - Relatively high weighted PR AUC (0.78) - Lowest weighted F1 score (0.5134)
- Lowest weighted ROC AUC (0.75)
- Significantly underperforms compared to other models
- Struggles with overall predictive power
Top 5 (Built-in Importance):
1. PREVIOUS_ARR_DELAY: 64.3610
2. DEP_PART_OF_DAY: 11.9855
3. ARR_PART_OF_DAY: 4.4493
4. PRCP: 4.3230
5 SEGMENT_NUMBER: 2.8930

Performance comparison across Hybrid Ensemble Classifiers

A graph of different colored bars Description automatically generated with medium confidence

Model Strengths Weaknesses Key Observations Important Features
Voting Classifier - Highest weighted F1 score (0.7944)
- Highest accuracy (0.8290)
- Best weighted PR AUC (0.82)
- Best weighted ROC AUC (0.80)
- Low F1 score for class 1 (0.0677) - Best overall performance
- Strong in identifying on-time flights (class 0)
- Good balance between precision and recall
Top 5 (Permutation Importance):
1. PREVIOUS_ARR_DELAY: 0.1311
2. SEGMENT_NUMBER: 0.0552
3. PREVIOUS_AIRPORT: 0.0476
4. PREVIOUS_DURATION: 0.0429
5. DEP_PART_OF_DAY: 0.0184
Stacking Classifier - Good weighted F1 score (0.7896)
- Good accuracy (0.8118)
- High weighted PR AUC (0.81)
- High weighted ROC AUC (0.79)
- Lower performance on class 1 (F1 score: 0.0936) compared to other classes - Slightly lower performance than Voting Classifier
- Better performance on class 1 compared to Voting Classifier
Top 5 (Permutation Importance):
1. PREVIOUS_ARR_DELAY: 0.1059
2. PREVIOUS_AIRPORT: 0.0247
3. SEGMENT_NUMBER: 0.0200
4. PREVIOUS_DURATION: 0.0192
5. DEP_PART_OF_DAY: 0.0173
Tuned Stacking Classifier - Improved weighted F1 score (0.7921)
- Improved accuracy (0.8180)
- High weighted PR AUC (0.81)
- High weighted ROC AUC (0.79)
- Still struggles with class 1 (F1 score: 0.0901) - Performance improvement over base Stacking Classifier
- Better balance across all classes
Top 5 (Permutation Importance):
1. PREVIOUS_ARR_DELAY: 0.1320
2. PREVIOUS_AIRPORT: 0.0734
3. SEGMENT_NUMBER: 0.0530
4. PREVIOUS_DURATION: 0.0472
5. DEP_PART_OF_DAY: 0.0256
Hybrid Ensemble Classifier - High weighted F1 score (0.7935)
- Good accuracy (0.8234)
- High weighted PR AUC (0.82)
- High weighted ROC AUC (0.80)
- Struggles with class 1 (F1 score: 0.0813) - Performance comparable to other ensemble methods
- Good balance between precision and recall for class 0 and 2
Top 5 (Permutation Importance):
1. PREVIOUS_ARR_DELAY: 0.1324
2. PREVIOUS_AIRPORT: 0.0495
3. PREVIOUS_DURATION: 0.0458
4. SEGMENT_NUMBER: 0.0431
5. DEP_PART_OF_DAY: 0.0216

Features influencing Flight Delay Predictions

Based on the feature importance results from across these models, the following features are consistently influential in flight delay predictions – ref: feature descriptions:

These features consistently appear among the top influential factors across different models (Bagging Classifier, Random Forest, XGBoost, LightGBM, and ensemble methods like Voting and Stacking Classifiers). While the exact order and magnitude of importance varies between the models, these features represent a mix of temporal factors (previous delays and time of day), operational aspects (segment number and carrier), geographical elements (distance and previous airport), and weather conditions (precipitation).

Partial Dependence Plots - Visualize Feature Impact on Flight Delay Predictions for each Delay Class

A group of graphs showing the results of a performance Description automatically generated with medium confidenceA group of graphs showing the results of a graph Description automatically generated with medium confidenceA group of graphs showing the results of a test Description automatically generated with medium confidence

Data Sources

Kaggle Dataset from here, that is comprised of multiple csv’s listed below.

Methodology Used for Data Preparation and Modeling

Data Preparation: Involved cleaning and merging multiple raw CSV files to create a unified dataset with ~4M entries (for training) and ~2M entries (for testing) with 34 predictor variables and 1 target variable. Raw data-set description is here.

Feature Engineering:

    CARRIER_HISTORICAL = captures the historical average delay rate of each carrier per month

    DEP_AIRPORT_HIST = captures historical average delay rates for flights departing from specific airports per month

    PREV_AIRPORT_HIST = captures historical average delay rate for the airport from which the aircraft arrived before the current departure

    DAY_HISTORICAL = captures historical average delays associated with each day of the week, adjusted monthly

    DEP_BLOCK_HIST = captures historical average delay rate for different departure time blocks, aggregated by month
    ELAPSED_TIME_DIFF, DEP_DELAY, ARR_DELAY

A chart of flight duration Description automatically generated with medium confidence

    FLIGHT_DURATION, FLIGHT_DURATION_CATEGORY, PREVIOUS_DURATION, 
    PREVIOUS_DURATION_CATEGORY, PREVIOUS_ARR_DELAY
    FLT_ATTENDANTS_PER_PASS, PASSENGER_HANDLING

Data Pre-Processing: Missing values and outliers detected were removed. SMOTE Tomek was applied to just the training dataset. This combined SMOTE’s oversampling of the minority classes (classes 0,1 and 2) and Tomek links’ under-sampling. Categorical features were also target encoded and Numerical features were scaled.

Model Evaluation with Training, Validation and Test dataset:

The dataset was initially split into Training (70%, 4.542M entries) and Test (30%, 1.946M entries) sets. The training set was further divided, with 20% retained for validation. From the remaining training data, a sample of up to 500,000 entries was extracted for model training, ensuring that the sample size did not exceed the available data.

All splits were performed using stratified sampling to maintain class distribution. This approach was adopted to manage the large dataset by creating a more manageable training set size while still preserving a substantial validation set.

Project Structure

Data:

Analysis and Visualization:

Notebooks:

Links to the latest set of Notebooks from this folder are noted below. Please note, earlier revisions continue to be available in the same folder to track iterations.

Model Artifacts:

Folder here contains:

StreamLit and FastAPI interface:

Repository with GitLFS:

This project uses Git Large File Storage (LFS) to handle large files efficiently. Git LFS replaces large files with text pointers inside Git, while storing the file contents on a remote server.

To work with this repository:

   git lfs install 
   git lfs pull 
    git lfs track "path/to/large/file" 

Project Infrastructure

This project utilized Google Colab Pro to handle computationally intensive notebook operations for data exploration and modeling. Key components include:

Notebooks:

AutoViz Visualizations:

Decision Tree and Random Forest Artifacts

MLOps with SkyFlow

Key Insights from Phase1 to Phase2 of Project

Future Work

Feature Engineering: Improve flight prediction performance of the minority classes (Class1 and Class2) with engineered features.

Use of Principal Component Analysis (PCA): With 2D visualization to explore patterns within the current delay classes. If analysis reveals significant overlap between classes or a lack of distinct patterns, it may be beneficial to consider a more granular classification, such as separating arrival delays and departure delays into their own distinct classes.

Extend Forecast Horizon and Implement Multi-Step Forecasting: Increase the prediction timeframe beyond the current 24-hour forecast, implementing a multi-step forecasting approach that provides:

Explore use of Deep Learning Architectures: Investigate if performance can be improved further by:

Expand SkyFlow: Refine its StreamLit interface beyond the initial prototype to include dashboards and to work with reduced number of inputs.

Real-time Updates: Incorporate real-time data to provide predictions as the departure time approaches.

Appendix

Baseline Dummy Classifier

A screenshot of a computer Description automatically generated

A graph with numbers and lines Description automatically generated with medium confidence

Multinomial Logistic Regression Classifier

A screenshot of a computer Description automatically generated

A graph of a graph Description automatically generated with medium confidence

A red bar graph with white background Description automatically generated

A graph showing a number of different colored squares Description automatically generated with medium confidence

A graph showing a red and blue bar graph Description automatically generated

Decision Tree – HyperParameter tuned Decision Tree

PlotTree

A white background with black text Description automatically generated

Ensemble and Hybrid Ensemble model evaluation metrics

Similar metrics for the ensemble and hybrid classifiers can be found in this notebook here

References

How are airlines using AI to minimize disruptions

Case Study with JetBlue’s use of Tommorow.io

KDD2018: Predicting Estimated Time of Arrival for Commercial Flights

Mamdouh, M., Ezzat, M. & A.Hefny, H. A novel intelligent approach for flight delay prediction. J Big Data 10, 179 (2023). https://doi.org/10.1186/s40537-023-00854-w

Yuemin Tang. 2021. Airline Flight Delay Prediction Using Machine Learning Models. In 2021 5th International Conference on E-Business and Internet (ICEBI 2021), October 15-17, 2021, Singapore, Singapore. ACM, New York, NY, USA, 7 Pages. https://doi.org/10.1145/3497701.3497725