Learn extra at:
game_data_all <- rio::import(“https://uncooked.githubusercontent.com/nflverse/nfldata/refs/heads/grasp/information/video games.csv”) |> filter(season %in% c(2024, 2025) & !is.na(outcome))
The load_schedules()
perform returns an information body with 46 variables for metrics together with sport time, temperature, wind, taking part in floor, outside or dome, level spreads, and extra. Run print(dictionary_schedules
) to see an information body with an information dictionary of all of the fields.
Now that I’ve the information, I have to course of it. I’m going to take away some ID fields I do know I don’t need and maintain the whole lot else:
cols_to_remove <- c("old_game_id", "gsis", "nfl_detail_id", "pfr", "pff",
"espn", "ftn", "away_qb_id", "home_qb_id", "stadium_id")
video games <- game_data_all |>
choose(-all_of(cols_to_remove))
Though it’s apparent from the scores which groups gained and misplaced, there aren’t really columns for the successful and dropping groups. In my checks, the LLM didn’t all the time write acceptable SQL once I requested about successful percentages. Including team_won
and team_lost
columns makes that clearer for a mannequin and simplifies the SQL queries wanted. Then, I save the outcomes to a feather file, a quick format for both R or Python: