project structure

< Back

The project is structured as follows:

  • source for EDA jobs in the root of the src/ directory:

    These run autonomously, but can also be run manually from the notebooks/nfl_load_nflverse_data_demo.ipynb notebook
    • nfl_00_load_nflverse_data.py - downloads the nflverse data into local storage

    • nfl_01_build_nfl_database.py - builds the nfl database from the nflverse data

    • nfl_02_prepare_weekly_stats.py - merges metrics from all datasets into a single stats dataset

    • nfl_03_perform_feature_selection.py - performs feature selection on the nfl data

    • nfl_04_merge_game_features.py - merge our features with the core nfl play actions dataset

    • nfl_main.py - orchestrates the jobs above

    • config.py - configuration file for the project

  • 3 notebooks in the root of the notebooks/ directory:

    • nfl_load_nflverse_data_demo.ipynb - demos manually running the load and build jobs (nfl_00 - 01)

    • nfl_perform_feature_selection_demo.ipynb - demos manually running weekly stats and feature selection jobs (nfl_02 - 04)

    • nfl_win_loss_classification_experiment2.ipynb - experiment 2 notebook

  • ML models in the root of the models/ directory

  • documentation in the doc/ directory

  • schemas in the schemas/ directory - used during loading to validate the incoming data from nflverse

Last updated