Aller au contenu

Structure du project - Cookiecutter

·2 mins·
Industrialisation
IndustrialisationProjet - Cet article fait partie d'une série.
Partie 3: Cet article

3. Structure du project
#

Une bonne structure de projet facilite la maintenance, la collaboration et l’Ă©volution du code. Voici un article de Baran KöseoÄźlu (Towards Data Science), qui dĂ©crit très bien le problème. Il parle notamment de l’outil Cookiecutter, et de la structure créée spĂ©cifiquement pour un projet de science des donnĂ©es ici.

Voici comment l’installer :

pip install cookiecutter
cookiecutter https://github.com/drivendata/cookiecutter-data-science

Et voici la structure du projet :

├── LICENSE
├── Makefile           # makefile with commands like `make data` or `make train`
├── README.md          # the top-level README for developers using this project.
├── config             # all files about database configuration, path, etc.
├── data
│   ├── external       # data from third party sources.
│   ├── interim        # intermediate data that has been transformed.
│   ├── processed      # the final, canonical data sets for modeling.
│   └── raw            # the original, immutable data dump.
│
├── docs               # a default Sphinx project; see sphinx-doc.org for details
│
├── models             # trained and serialized models, model predictions, or model summaries
│
├── notebooks          # jupyter notebooks. Naming convention is a number (for ordering),
│                      #   the creator's initials, and a short `-` delimited description,
│                      #   e.g. 1.0-jqp-initial-data-exploration.
│
├── refs               # data dictionaries, manuals, and all other explanatory materials.
│
├── reports            # generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        # generated graphics and figures to be used in reporting
│
├── requirements.txt   # the requirements file for reproducing the analysis environment
│
├── setup.py           # makes project pip installable (pip install -e .) so src can be imported
│
├── src                # source code for use in this project
│   ├── __init__.py    # makes src a Python module
│   ├── main.py        # main file of the process
│   │
│   ├── data           # scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── eda            # scripts to analyse the data
│   │
│   ├── features       # scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         # scripts to train models and then use trained models to make predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   ├── utils          # transverse scripts
│   │
│   └── visualization  # scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
├── tests              # test code for testing the project
│
├── tox.ini            # tox file with settings for running tox; see tox.testrun.org
│
└── venv               # virtual environment

Sources
#

Thibault CLEMENT - Intechnia
Auteur
Thibault CLEMENT - Intechnia
Data scientist
IndustrialisationProjet - Cet article fait partie d'une série.
Partie 3: Cet article