Airport Data for Artificial Intelligence Forecasting of Air Passenger Throughput

Published: 6 September 2023| Version 1 | DOI: 10.17632/4dsy9vxxgx.1


The dataset comprises 974 daily observations for each of the five selected major airports, namely Hartsfield-Jackson Atlanta International Airport (ATL), Denver International Airport (DEN), O'Hare International Airport (ORD), Los Angeles International Airport (LAX), and Dallas/Fort Worth International Airport (DFW), encompassing the period from February 15, 2020, to October 15, 2022. Data observations include daily airport passenger flow from aggregated airport TSA security checkpoint throughput (pax). Additionally, anonymized Google measured visitor numbers to retail & recreation, grocery & pharmacy stores, parks, transit stations, workplaces, and duration of stay in residential locations (retail, groc, park, transit, work, resident) for the immediate surrounding county where each of the sample airports are located (Fulton, GA for ATL; Denver, CO for DEN; Tarrant, TX for DFW; Los Angeles, CA for LAX; and Cook, IL for ORD) are included, which has been normalized by comparing relative change to baseline days before the pandemic outbreak. Google Trends data denoting the airport's metropolitan statistical area's search intensity for the keywords "COVID", "flight" and "airport" (srch_cov, srch_flght, srch_airprt) are included. Lastly, county data for each respective airport for daily COVID-19 related deaths and COVID-19 confirmed cases obtained from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (cov_death, cov_case) are included.


Steps to reproduce

"pax" from the TSA's FOIA Electronic Reading Room: Visit the TSA website at Download the weekly PDF files. Ensure that you select files covering the relevant time period you are interested in (e.g., from 2/15/2020 to 10/15/2022). Convert the PDF files into machine-readable format (e.g., CSV or Excel). Open the converted files and inspect the data to ensure it contains details such as date, hour of the day, airport, city, state, checkpoint name, and total passenger numbers. Given the data's hourly granularity, you will need to aggregate the data to daily figures for individual airports. After completing the transformation, you will have a dataset that provides daily figures of TSA total air passenger throughput at security checkpoints for individual airports. "retail", "groc", "park", "transit", "work", "resident" from the Google's COVID-19 Community Mobility Reports: Visit the website On the page, scroll down to the "COVID-19 Community Mobility Reports" section and click "Region CSVs." After completing the file download and opening the file, extract and open the 2021_US_Region_Mobility_Report.csv and 2022_US_Region_Mobility_Report.csv files. From here the two spreadsheets can be filtered on the county column. "COVID," "flight," and "airport" using the longtrends Python package: Install the longtrends Python package. You can install it using pip: pip install longtrends Create a Python script for each keyword ("COVID," "flight," and "airport") and each airport (ATL, DEN, DFW, LAX, ORD). You will have a total of 15 Python scripts, one for each combination. Modify the pytrend payload to include the appropriate metro area code for each airport region. For example, for ATL (Atlanta), use "US-GA-524." Adjust this code for other airports as needed (e.g., "US-CO-751" for DEN, "US-TX-623" for DFW, "US-CA-803" for LAX, and "US-IL-602" for ORD). pytrend.build_payload(kw_list=["COVID"], timeframe="2020-02-15 2022-10-15", geo="US-GA-524") Execute the longtrends package using the pytrend object to fetch Google Trends data for the specified keyword and airport region: data = pytrend.interest_over_time() Save the retrieved data to a file for further analysis: data.to_csv("keyword_airport_data.csv") "cov_death" and "cov_case" from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University: Visit the GitHub repository, the URL is Once on the repository page, navigate to the "COVID-19/csse_covid_19_data/csse_covid_19_time_series" directory. Inside the "csse_covid_19_time_series" directory, you will find various CSV files containing COVID-19 data. Look for the files named "time_series_covid19_deaths_US.csv" and "time_series_covid19_confirmed_US.csv". Click on the files to access, then click "Download" button.


Embry-Riddle Aeronautical University


Artificial Intelligence, Machine Learning, Airport, Air Passenger Transport, Neural Network