Skip to content

Big refactor

Summary

Refactor!

Current behaviour/setbacks

Everything is a mess.

Desired behaviour/advantages

Follow some kind of project structure: (inspiration: https://github.com/drivendata/cookiecutter-data-science and aliby/skeletons)

  • data
    • gemfiles (genome-scale metabolic models)
    • interim (for e.g. .pkl files)
    • processed
  • docs (dump .org files in here)
  • notebooks (naming convention: 01-XXXX.ipynb, for ordering. The number has nothing to do with issues)
  • reports (e.g. PDF outputs)
    • figures (e.g. PNG/PDF files)
  • poetry.lock & pyproject.toml
  • src (source code -- most of yeast8model.py will go into this)
    • init.py
    • data (download/generate data)
    • constants
    • gem (dealing with cobrapy model, naming it gem rather than cobra to prevent confusion)
    • calc (performing various calculations)
    • viz (visualisation, e.g. plots)
  • scripts (run scripts, should produce usable outputs)
  • dev (dev scripts, use for developing new features & debugging)
  • tests (unit tests)

Implementation sketch

  • Isolate all of this on a separate branch (with frequent merges from master?)
  • Create sub-directories and sort files according to them.
  • Update references to modules.
    • src
    • scripts
    • notebooks
    • dev
  • Updates references to files.
    • data/gemfiles
    • data/interim
    • data/lookup
  • Add snippet to all notebooks to allow import of local functions.
  • Test that stuff still works -- start from the highest-use scripts & notebooks (the rest can be fixed as errors arise)
  • (maybe?) jupytext to convert notebooks to files that play well with emacs.
  • Convert notebooks that don't have much literate programming into scripts.
  • Remove large blocks of repetitive code across the board.
  • Low-effort refactoring, i.e. the current issues that have 'refactor' labels.
Edited by Arin Wongprommoon