3. Structure du project#
Une bonne structure de projet facilite la maintenance, la collaboration et l’Ă©volution du code. Voici un article de Baran KöseoÄźlu (Towards Data Science), qui dĂ©crit très bien le problème. Il parle notamment de l’outil Cookiecutter, et de la structure créée spĂ©cifiquement pour un projet de science des donnĂ©es ici.
Voici comment l’installer :
pip install cookiecutter
cookiecutter https://github.com/drivendata/cookiecutter-data-science
Et voici la structure du projet :
├── LICENSE
├── Makefile # makefile with commands like `make data` or `make train`
├── README.md # the top-level README for developers using this project.
├── config # all files about database configuration, path, etc.
├── data
│ ├── external # data from third party sources.
│ ├── interim # intermediate data that has been transformed.
│ ├── processed # the final, canonical data sets for modeling.
│ └── raw # the original, immutable data dump.
│
├── docs # a default Sphinx project; see sphinx-doc.org for details
│
├── models # trained and serialized models, model predictions, or model summaries
│
├── notebooks # jupyter notebooks. Naming convention is a number (for ordering),
│ # the creator's initials, and a short `-` delimited description,
│ # e.g. 1.0-jqp-initial-data-exploration.
│
├── refs # data dictionaries, manuals, and all other explanatory materials.
│
├── reports # generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures # generated graphics and figures to be used in reporting
│
├── requirements.txt # the requirements file for reproducing the analysis environment
│
├── setup.py # makes project pip installable (pip install -e .) so src can be imported
│
├── src # source code for use in this project
│ ├── __init__.py # makes src a Python module
│ ├── main.py # main file of the process
│ │
│ ├── data # scripts to download or generate data
│ │ └── make_dataset.py
│ │
│ ├── eda # scripts to analyse the data
│ │
│ ├── features # scripts to turn raw data into features for modeling
│ │ └── build_features.py
│ │
│ ├── models # scripts to train models and then use trained models to make predictions
│ │ ├── predict_model.py
│ │ └── train_model.py
│ │
│ ├── utils # transverse scripts
│ │
│ └── visualization # scripts to create exploratory and results oriented visualizations
│ └── visualize.py
│
├── tests # test code for testing the project
│
├── tox.ini # tox file with settings for running tox; see tox.testrun.org
│
└── venv # virtual environment