A new unit for our predictions: minutes

Because arrival rate can be tough to interpret

Mar 03, 2023

Spring Break travel kicks off this weekend, with 183 universities and colleges starting their breaks, including University of Florida’s 68k students and Arizona State’s 58k students. We’d bet TSA screenings peak during the weekend of March 18—with the benefit of travelers both setting out for and returning from break—but still expect something like 2.45 million screened today. That’d be a nice week-over-week bump, though still short of this year’s high-water mark (2.50 million from the Friday of President’s Day weekend).

But! We’re not here to prognosticate about what travelers face this weekend. At least not directly. Today’s post is filed under our favorite category: Product launch.

We sent our first such post in November, wherein we shared our airport arrival capacity predictions.

Aerology

Introduce? Unveil? Announce? Anyways, we've got our first deep learning model to share

We’ve been writing for about a year now and when a post veered into more quantitative waters, it was generally underpinned by Tim working in Excel. Don’t get us wrong, Excel is great—communication ✅, trust ✅, physical intimacy ✅, curiosity ✅, conflict resolution ✅. But its suitability for blog posts doesn’t extend to machine learning and ML engineering…

3 years ago · 7 likes · 2 comments · Tim Donohue

Somewhere between the Excel sex joke and discussion about our forecast error reduction, we wrote in that product launch:

Admittedly there’s an interpretability hurdle for travelers, especially… For now, we’d recommend travelers pair our [capacity] predictions with the FAA’s airport arrival demand chart (AADC), which helpfully encodes hourly arrival demand by way of the stacked columns. It’s an important piece of information—and not yet piped into Aerology—to triangulate capacity, demand and more-interpretable delay.

Capacity is generally measured in arriving aircraft handled per hour; it’s a sneakily technical concept that some operators don’t entirely grasp. We instructed users on how to manually perform that triangulation but knew it was a tall task for travelers. And even if users gleaned some sense of delay from our capacity predictions, any insight was more or less binary: They could determine if unpublished delay existed or not, but how much delay remained obfuscated.

Minutes, not arrival rates

So the next mile marker on our data science roadmap was a pipeline to gauge demand; that was followed by an algorithm to translate that enigmatic demand overage to more-interpretable delay minutes. With a minimally viable pipeline and translation program built, we’re excited to launch our first delay beta. We think it represents a step change in usability.

Ergo, we don’t think its necessary to spend a few paragraphs instructing users on how to combine this product with other information. But we should define what type of delay we’re predicting (and pin a few qualifiers to these forecasts). In service of a definition, let’s review: When arrival demand for an airports’ runway(s) exceeds its capacity, air traffic control will explicitly or effectively assign arrivals a landing slot. It’s these landing slots we’re aiming to predict—after which we can measure delay relative to some planned landing time. For starters, we’re aggregating the resulting delays; later we’ll reproduce how airlines swap slots among their flights.

We want to highlight a few corollaries to this definition. It reflects air traffic dynamics at the arrival airport—not an imbalance in en-route airspace1 or at the departure airport. So we're not trying to assimilate any delay that occurs at a flight's departure gate or during taxi to takeoff. (Yet.) Similarly, these predictions stop short of modeling any taxi-in delay; nor do they simulate how this landing delay can ripple through the network. (For now.)

That’s a lot of things it’s not. But the type of delay it is—let’s call it airport arrival imbalance delay—was responsible for at least one-third of arrival delays at Core 30 airports in 20192. With a lot of delay types to eventually account for, we think this a good place to start.

It’s also analogous to the delays resulting from ground delay programs and ground stops3 that the FAA communicates via their NAS dashboard. We’re frequent visitors to their dashboard, but Google Trends suggests we’re in a pretty small minority. FlightAware, however—which had more site visitors than Geico in January—has done their part to promote awareness of these delays. They repackage FAA data: when we grabbed the FlightAware screenshot, the NAS dashboard coincidentally featured an active BOS ground delay program with 73 minute average delays.

Delay minimization and you

So why go to some lengths to predict what the FAA already shares? Because of the FAA’s wait-and-see approach to administering these delays. On average, ground delay programs are published less than 60 minutes4 before the first effected flight lands; delays from ground stops are effective immediately with no notice. It's a rational, if optimistic, tack from the FAA. By assuming demand won't materialize or the weather forecast verifies favorably, they avoid creating unnecessary delay. But their wait-and-see approach betrays airlines' ability to plan and squanders rebooking options for travelers.

For the moment, Aerology isn’t charged with minimizing delay. We “just” fit known demand5 to our best, unbiased estimate of capacity. Our delay modeling shares our capacity predictions' 16-hour forecast horizon; flights scheduled or estimated to land inside of 16 hours may be delayed beyond the horizon. Like our capacity predictions, we've used Newark (EWR) to prototype when modeling delay—though our infrastructure and modeling readily scales with customers. And we’ll initially re-forecast delay hourly—around the 6th minute of the hour after the capacity model runs—though could refresh more frequently with the latest demand information.

For example, flights scheduled to land in the 1pm hour were planned to land, on average, approximately 1 minute early (i.e. a negative delay) at the time time of the screenshot; we predict these flights will absorb, on average, 4.1 minutes of unpublished delay.

We’re ultimately trying to identify the still-lurking downside. To that end, we encourage you to focus on the ‘Predicted Unpublished Delay’ series; for flights scheduled to land in a given hour, this answers how many minutes of delay, on average, arrivals can still expect to absorb. We’ve also included average published delays to contextualize the situation—are we predicting 15 additional minutes of delay on top of 44 or 4 minutes of already-posted delay?

Delay, disruption and denominators

If you're comparing our delay predictions to the averages provided by the FAA—whether directly via their dashboard or piped to places like FlightAware—you’ll find they diverge in cases where the FAA geographically partitions flights or airlines implement cancellations. In the case of the former, the Command Center may exclude trans- or mid-continental flights from delay assignment; because we spread predicted delay across all arrivals scheduled to land in an hour, our per flight amount could be lower.

In the case of the latter, that slinking downside is somewhat amorphous. In response to FAA-assigned slots, airlines will occasionally cancel flights to mitigate the ensuing delays6. This, too, is on our data science roadmap; however, until we’re reproducing this behavior, we’re liable to over-predict unpublished delay at longer lead hours. In the interim, as airlines work through their schedule reductions, we’ll re-forecast delays: Absent the cancelled demand, our unpublished delay predictions should drift down to reflect the amount of delay allocable to still-operating flights. While we re-forecast with the latest demand, the FAA continues to broadcast the delay associated with their original advisory.

EWR Flight Delays @FlyFAA_EWR

Due to WEATHER/WIND traffic mgmt prgrm causing some arriving flight delays averaging 3 hours and 14 minutes. #EWR

And when the Command Center originally runs a ground delay program, average delays can be eye-watering. But after the fact—with the cancellation dust having settled—average hourly delays exceeded 60 minutes just 2% of the time in EWR7. So, until we're predicting cancellations, we'd bet on a schedule reduction when unpublished delay approaches 60 (or even 40) minutes.

In this way, you can think of our predictions as unpublished disruption—cancellations might reduce average delays, but they still leave planes, crews and passengers out of place.

Like our capacity forecasts, we’re launching a decidedly beta version of delay predictions. If you’re curious about what you’ve read and can put up with some bugginess, reach out to tim@aerology.ai: We’ll get you set up beta access!

Our airport arrival capacity model considers TRACON-adjacent weather that would prompt airborne holding or a reduction in arrival rate, but cruise circuity (e.g. a LAX-EWR flight deviating around ZKC) is not modeled.

Gate arrival delays totaled 42.67mm minutes in 2019 for Core 30 airports while EDCT minutes totaled 13.9mm minutes (per ASPM).

FAA Core 30 airports are ATL, BOS, BWI, CLT, DCA, DEN, DFW, DTW, EWR, FLL, HNL, IAD, IAH, JFK, LAS, LAX, LGA, MCO, MDW, MEM, MIA, MSP, ORD, PHL, PHX, SAN, SEA, SFO, SLC, TPA.

Likewise, it’s equivalent to the more generic “arrival delay” category published to their dashboard when they elect to manage a demand overage with metering (or are forced to manage it with vectoring/airborne holding). Notably, the FAA reports arrival delays only when they reach 15 minutes; we have no such threshold.

From 6,622 GDP advisories sent between 7/12/18-10/28/22 for ATL, BOS, DCA, DEN, DFW, EWR, IAH, JFK, LAS, LGA, MCO, ORD, PHL, SEA, SFO.

Another item on the to-do list: forecast unscheduled demand (e.g. private jets and cargo flights not included in OAG’s database). In the absence of an unscheduled demand model, delay may be under-predicted at longer lead hours due to unknown demand.

Imagine somebody gets out of line for a bank teller—everybody behind them moves up a spot.

Source: ASPM. From 9/28/21 (when the 04R/22L continuous closure ended) to 2/26/23 for all arrivals (not just arrival with an EDCT).

Don Wolford

Mar 3, 2023

Most GDPs are issued 45 minutes prior to the first flight affected’s IGTD. While some are issued from status (flights about to push could be included), that usually indicates an unforcasted change in the AAR. As you have described, airborne holding and delay are two different things. At United, we found that airborne holding would reduce the actual accomplished AAR by 10-20%, as flights are not perfectly sequenced as they depart the holding patterns and fixes. I taught my ATC coordinators to advocate for other tactical TMIs such as MIT or APREQ for individual flights if a modeled GDP generated EDCTs of 15-18 minutes or less, and if the constraint on the AAR was a high value forecasted event. When some tactical delay is necessary, and a GDP is not used, it’s hard to pin down which flights might absorb the delay….although use of TBM (low visibility to the airline) release times might be sufficient. Good reasons why the CDM process is so valuable.

Expand full comment

1 reply by Tim Donohue

Kevin Alexander

"But their wait-and-see approach betrays airlines' ability to plan..."

Story of my life- especially at a line station!

1 more comment...