5  Results (continued)

5.1 Depth of Target by State and Stage of Season (Outdoor Stadiums Only)

Another area we’d like to explore is if teams that play in colder states generally throw the ball shallower during the later, colder months of the season, when compared to southern states that don’t get cold. To do this, we must first remove any data for games that are played in dome stadiums that control the playing conditions no matter what time of year the game is played in. Then, we can group the data into 2 categories – from weeks 1-4, and weeks 15-18.

Our expectation is that in colder states like Wisconsin, Illinois, New York, and Seattle, teams stop throwing the ball deep during the colder weeks late in the season, but that southern states like California, Florida, and Tennessee are less impacted by this.

Let’s see if this idea is true by calculating the Average Depth of Target (ADOT) change across each of these 2 groupings of data, for every state that hosts outdoor football games:

Code
library(dplyr)
library(ggplot2)
library(maps)
library(lubridate)
library(tidyr)

# Reload supplementary data fresh to ensure we have all data
supplementary_fresh <- read.csv("data/supplementary_data.csv")

# Define dome teams to exclude
dome_teams <- c("NO", "DET", "MIN", "DAL", "HOU", "ARI", "ATL", "IND", "LV", "LAC")

# Map team abbreviations to states
team_to_state <- data.frame(
  team_abbr = c("ARI", "ATL", "BAL", "BUF", "CAR", "CHI", "CIN", "CLE", "DAL", "DEN",
                "DET", "GB", "HOU", "IND", "JAX", "KC", "LV", "LAC", "LAR", "MIA",
                "MIN", "NE", "NO", "NYG", "NYJ", "PHI", "PIT", "SF", "SEA", "TB",
                "TEN", "WAS"),
  state = c("Arizona", "Georgia", "Maryland", "New York", "North Carolina", "Illinois",
            "Ohio", "Ohio", "Texas", "Colorado",
            "Michigan", "Wisconsin", "Texas", "Indiana", "Florida", "Missouri",
            "Nevada", "California", "California", "Florida",
            "Minnesota", "Massachusetts", "Louisiana", "New Jersey", "New Jersey",
            "Pennsylvania", "Pennsylvania", "California", "Washington", "Florida",
            "Tennessee", "Maryland"),
  stringsAsFactors = FALSE
)

# Filter supplementary data for outdoor stadiums and valid pass plays
# Create early season (weeks 1-4) vs late season (weeks 15-18) groupings
pass_data <- supplementary_fresh |>
  filter(
    !home_team_abbr %in% dome_teams,
    pass_result %in% c("C", "I", "IN"),
    !is.na(pass_length),
    pass_length >= 0,  # Exclude negative/invalid values
    !is.na(week)
  ) |>
  mutate(
    season_period = case_when(
      week %in% 1:4 ~ "Early",
      week %in% 15:18 ~ "Late",
      TRUE ~ "Mid"
    )
  ) |>
  left_join(team_to_state, by = c("home_team_abbr" = "team_abbr")) |>
  filter(!is.na(state))

# Calculate average depth of target by state for early vs late season
state_early_late <- pass_data |>
  filter(season_period %in% c("Early", "Late")) |>
  group_by(state, season_period) |>
  summarise(
    avg_depth = mean(pass_length, na.rm = TRUE),
    n_plays = n(),
    .groups = "drop"
  )

# Add region column
state_early_late <- state_early_late |>
  mutate(region = tolower(trimws(state)))

# Pivot to get Early and Late in separate columns
state_comparison <- state_early_late |>
  tidyr::pivot_wider(
    names_from = season_period,
    values_from = c(avg_depth, n_plays)
  ) |>
  mutate(
    depth_change = avg_depth_Late - avg_depth_Early  # Positive = deeper passes late season
  ) |>
  filter(!is.na(depth_change))

# Print summary
print(state_comparison |> select(state, avg_depth_Early, avg_depth_Late, depth_change, n_plays_Early, n_plays_Late))
# A tibble: 15 × 6
   state  avg_depth_Early avg_depth_Late depth_change n_plays_Early n_plays_Late
   <chr>            <dbl>          <dbl>        <dbl>         <int>        <int>
 1 Calif…            9.2            9.89       0.690            100          200
 2 Color…            9.10          10.6        1.53              92          178
 3 Flori…           10.6           10.7        0.0757           288          581
 4 Illin…            8.97          10.9        1.97              89          163
 5 Maryl…            9.28          10.2        0.919            186          357
 6 Massa…           11.2           10.9       -0.289            111          169
 7 Misso…           11.4            9.78      -1.62              94          142
 8 New J…            8.94           9.11       0.172            213          362
 9 New Y…            9.04          10.6        1.58              85          156
10 North…            9.57           9.51      -0.0590            79          188
11 Ohio             10.9            9.38      -1.49             228          402
12 Penns…           10.5            8.89      -1.64             205          370
13 Tenne…           10.8           10.0       -0.737             88          237
14 Washi…           10.0           10.1        0.0877           109          191
15 Wisco…           11.0           10.7       -0.302            113          152

Putting this table on a map, we can visually see if our hypothesis that more southern states are less impacted by cold weather when it comes to throwing the ball deep:

Code
# Get US state map data
us_states <- map_data("state")

# Merge with our comparison data
plot_data <- us_states |>
  left_join(state_comparison, by = "region")

# Create choropleth showing change in ADOT (Late - Early season)
# Use a more sensitive color scale that highlights small changes
ggplot(plot_data, aes(x = long, y = lat, group = group, fill = depth_change)) +
  geom_polygon(color = "white", linewidth = 0.3) +
  coord_fixed(1.3) +
  scale_fill_gradientn(
    colors = c("#2166ac", "#67a9cf", "#d1e5f0", "white", "#fddbc7", "#ef8a62", "#b2182b"),
    values = scales::rescale(c(
      min(state_comparison$depth_change, na.rm = TRUE),
      min(state_comparison$depth_change, na.rm = TRUE) * 0.5,
      min(state_comparison$depth_change, na.rm = TRUE) * 0.1,
      0,
      max(state_comparison$depth_change, na.rm = TRUE) * 0.1,
      max(state_comparison$depth_change, na.rm = TRUE) * 0.5,
      max(state_comparison$depth_change, na.rm = TRUE)
    )),
    na.value = "grey90",
    name = "ADOT Change\n(Wks 15-18 minus 1-4)\nyards",
    breaks = scales::pretty_breaks(n = 7)
  ) +
  labs(
    title = "Change in Average Depth of Target: Early Season to Late Season",
    subtitle = "Outdoor stadiums only | Weeks 1-4 vs 15-18 | Blue = shorter passes late, Red = deeper passes late",
    caption = "States without data shown in grey"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 9),
    axis.text = element_blank(),
    axis.title = element_blank(),
    axis.ticks = element_blank(),
    panel.grid = element_blank(),
    legend.position = "right"
  )

From the map, we see that our hypothesis holds true in some ways, but is disproven in others. States like Pennsylvania, Wisconsin, Ohio, Missouri, and Massachusettes all get extremely cold by the later weeks of the football season, and accordingly, we see that the ADOT for passing plays in these states lowers, compared to the beginning of the season – particularly in Missouri, Ohio, and Pennsylvania. Additionally, we see warm states like California and Florida do NOT see a decrease in ADOT (and in fact, see a slight increase in ADOT). However, contrary to our expectations, passing plays in Seattle, Colorado, New York, and especially Illinois do not see a decrease in ADOT. In fact, Colorado, New York, and Illinois see the 3 single largest increases in ADOT across every state in our dataset. Similarly, Tennesse, which is generally a “warm” state in comparison to most others, sees one of the strongest decreases in ADOT across the states in our dataset.

So, overall, it does not appear that we can conclude that colder weather is definitively more likely to cause teams to lower their ADOT.