Data analysis with Python and Poetry

·2 minute read

In my current role I work alongside data scientists and support them with code and infrastructure as we aim to use data to make more effective development progress in low and middle-income countries as they work towards the Sustainable Development Goals.

I’m always keen to expand my skills and so have been trying to find opportunities to carry out interesting data science projects in order to be better able to understand the full range of work we do.

Recently I found some interesting data that I was keen to explore and had a problem in mind that I was hoping it could solve. I wanted to explore the data first locally using python and jupyter notebooks. Here is the best way I found to set this environment up.

Firstly, you will need to install python locally. For this I use Pyenv which helps with versioning across multiple projects.

We are going to be using Poetry for managing our dependencies for this project. For those familiar with the javascript ecosystem, Poetry takes a similar approach to how npm works with node projects in that it defines a pyproject.toml which resembles a package.json file. Poetry will also handle our virtual environment for us too as we will come to later.

For now we need to set up our project structure. Either bootstrap the entire project with poetry new my-project or in this case we will create the folder structure we want and then use poetry init to bootstrap the necessary files. So we run

mkdir my-project
cd my-project
poetry init

On running init Poetry will walk you through the configuration options, most of which have sensible defaults. You will also have the opportunity to define any packages you would like to install. In this case we will search for and add pandas for the data analysis and jupyter and ipykernel to allow us to spin up the notebook.

Note that these packages are not installed globally on your machine. Running jupyter --version at this point will return something like zsh: command not found: jupyter. Instead we need to create a shell within a virtual environment which has these packages installed. We do this by running

poetry install
poetry shell

We now have a terminal process running which has access to the aforementioned packages, without having needed to install them globally.

Within this process we can run

jupyter notebook

To start the jupyter server and use any other packages we have defined to start digging into our data.

Resources I found useful for developing this process
[1] Hypermodern Python
[2] Poetry Documentation

Thanks for reading

I hope you enjoyed this article. If you have any thoughts about it please feel free to reach out. You can also follow me on Twitter where I post my latest updates.