Intro to packaging and dependency management for python with poetry

February 2, 2022

Intro

Python has an amazing learning curve! You don’t need to skip over multiple files in a complex project structure to find and run your code. Instead, you can just open up an interactive shell and start playing around. As soon as you reach the limits of typing into the python interactive shell just open up your favourite text editor, write your code and save it as .py and now you have a working python script. What if your script gets a bit too long? Take some of your code and put it into a different file. Then start importing. It just works.

But software engineering is not just the code you write, it’s also the collaboration that’s required to tackle the complexity and scale of modern development projects.

In most programming projects, you will want to use open source (and possibly internal or closed source) packages, document your code, test it, make sure it adheres to your (and community) standards and maybe even publish it either internally or to PyPI.

In many programming languages, a lot of tools for all of your collaboration needs are either a part of the core language ecosystem (like cargo in Rust) or are so ubiquitous that it’s really hard to learn the language without them (like npm in Javascript). In python… well, it’s a bit more complicated. There are standard tools and ways of doing things like setuptools and requirements.txt, but a lot of them seem antiquated and miss so many of the important functions that it’s really hard to recommend them. Some, like an autoformatter, are just entirely missing.

You might have gotten to the point at which you’re ready to abandon the relative simplicity of one file scripts, or maybe you’ve gotten tired of juggling between virtual environments to keep your dependencies clean and separate for every project, or you’re just curious about how to standardize your development in a modern and highly collaborative way. Looks like a great moment for you to be introduced to the python development stack we have at Curvestone.

Packaging python

In the age of cloud computing, you never know where your code will be deployed and with what it will have to be compatible with. That’s why we think, that it is really important to have a standard structure for all of your codes, that is well defined, contains all of the specifications required to use it, and is recognized by as many tools and people as possible. For python, it’s a package.

The problem is, python packaging standards are currently being redefined, and the tools for package management are lagging far behind their counterparts in other modern programming languages.

Out of many alternatives, we have decided on poetry. What convinced me was its similarity to Rust’s cargo (it also borrows a lot of features from tools like Javascript’s npm and yarn).

Poetry replaces all of the manual work involved in packaging python with handy automations and makes it so efficient and easy that you’ll want to put every little piece of your code into a package.

Your manually created requirements.txt files will all be replaced by completely automated poetry.locks. You will never have to maintain your own development virtual environments, all you need to do is jump into the project directory, and it will be already there.

What about defining and updating dependencies? You will get a simple cli with multiple useful commands, a fast resolver, and access to all of the modern ways of defining version constraints.

Creating a project

To install and understand the basics of poetry, follow the excellent instructions at https://python-poetry.org/docs/

For now, we’ll skip the basics and jump straight into the action with:

poetry new example-project --src

For a project name use lowercase + dashes, poetry will make sure to create the main package with underscores, so it is importable.

With packaged python, the trick is to never import modules from your local directory. Instead, you install them first and then import them from your installed packages. That way the package structure and dependencies will be managed by either poetry or pip.

Adding --src makes poetry put our package code in a folder named src. As it is common to run your code from the project root directory, we’ll avoid the ambiguity of having the same-named packages both installed and in the local path.

If you jump straight into the newly created project folder, you’ll get a bunch of new superpowers for your shell, all provided by poetry.

Using a dev environment

poetry install will install your package into your complimentary poetry environment that goes with your every poetry project. It is just a python virtual environment, and if you are already in one, it will be used by poetry instead. However, I recommend deactivating your environment and letting poetry do its magic. That way all of your dev environments will be separate and specific to the work you are currently working on.

What’s more, poetry will create a poetry.lock file, where all of the dependencies are pinned. That way, you’ll never end up experiencing the nausea of trying to understand why a slight version discrepancy in a dependency of your dependency breaks the code when your colleague tries to run it. If you’re already anxious by the micromanagement that requirements.txt usually requires, don’t worry. Poetry manages the lock file for you.

poetry shell is the simplest way to access your new environment. This works the same as activating a regular python virtual environment.

Instead of jumping into a poetry shell, you can also make your command use your env by prefixing it with poetry run. e.g. poetry run python will open up a python interactive shell with your poetry environment activated!

If you don’t have any code in your project yet, add some and try importing it from the interactive python shell. You don’t have to reinstall your package via poetry, it has been installed in an editable mode, every change you do is immediately importable (you might need to restart your interactive environments or use importlib.reload).

Poetry in VSCode

If like me, you’re using VSCode, you might wonder how to have your linter recognize poetry virtual environment as its interpreter. As VSCode is now compatible with poetry it should happen automatically.

If it doesn’t, make sure that your poetry project is open at the root of your VSCode project, and then try setting the interpreter for your project manually.

If you are in a monorepo, just add the project folder to your workspace. You don’t need to have a workspace, it will be automatically created when you add the folder. Now you should be able to set the interpreter manually for each of your roots. Try this also for your monorepos with other programming languages, this setup solves a lot of compatibility issues for VSCode!

Dependency management

We ML and data people love to import numpy. To do this we need to specify it as a dependency. In poetry, there are two ways to do this. You can go on and just modify the pyproject.toml file in the project directory, or use poetry commands:

poetry add numpy will add numpy as a regular dependency. It might break, as numpy tends to require a specific range of python versions that are not compatible with caret defined “^3.8”, just update the python version in pyproject.toml as required and try again.

But what is this caret requirement? Great question! This is one of the ways in which you can specify dependencies in poetry and many other modern tools (). Caret requirements focus on compatibility and conciseness, and if your dependency follows semantic versioning, you can be almost sure that it will not break until its version no longer fits your specification.

How do caret requirements work? “^1.0.1” means that your code will be compatible with any version from “1.0.1” up to the next major release. So “2.0.0”, which will be the first that will not work, which makes sense, as a change in major version will inevitably come with incompatibilities. Keep in mind that caret requirements work a bit differently for versions under “1.0.0”, make sure to have a look into those examples from poetry docs!

Don’t worry if you’re accustomed to requirements.txt or setuptools style of specifying dependencies and don’t want to change to using caret requirements. It still works!

If you instead decide to just modify the pyproject.toml, make sure to run poetry update afterward, which will fix your poetry.lock and install new packages. You can use it to update your poetry.lock to a newer version in case your locked packages get outdated. If you don’t want to update all of your packages or you don’t want to install them, have a look into the documentation for poetry update and poetry lock.

Dev dependencies

Sometimes you may want to add a dependency that won’t be used in production. Those are called dev dependencies. Some of the examples might be pytest – a python unit-testing package, black – a package that formats your code, or jupyterlab – often used to edit notebooks for data science and machine learning.

To add a dev dependency use a --dev flag with poetry add.

If you want to use jupyterlab, you will need to install ipykernel as well. You can do this by running:

poetry add jupyterlab ipykernel --dev

then start it with poetry run jupyter lab.

This will install jupyter in your dev environment. In case you don’t want it to be bloated when running tests, you might want to use extras instead.

Outdated dependencies

One of the most useful commands in poetry is:

poetry show -o

It will show all of the outdated (-o) dependencies in your project. It will include also indirect dependencies. Feel free to pin those in your pyproject.toml if you need to update them due to e.g. potential vulnerabilities.

To update a dependency in your pyproject.toml, either do it manually or use:

poetry add numpy@latest

Build and publish

If you want to install your code, you can do it via pip, thanks to PEP 517. Just run:

pip install path/to/your/project/.

This will, however, ignore your poetry.lock.

What you can do instead is install your package using poetry install in any virtual environment, just make sure it is activated. If you want to go back to using the poetry environment, just deactivate your env. Poetry will never install anything in the global python environment, when in the global env, it will always use its own environments.

Remember to add a --no-dev flag, when installing in production!

If you are unable to access your source code during installation, like with codes installed from a package index, make sure that your code is tested around multiple versions of your dependencies. Tox is a great tool for this.

If you need a wheel (a standard format for installable python packages), poetry build will do the trick.

For publishing use poetry publish or follow the instructions for publishing wheels in your package index of choice. We at Curvestone use Azure Artifact Feed with twine authentication.

Monorepo

What if you are trying to use poetry in a monorepo? Sadly it’s not perfect. Functionalities similar to cargo workspaces or yarn workspaces are not there, poetry.lock is always individual for every package. If you want a detailed example of how a monorepo using poetry can be organized, have a look at this great medium post from Opendoor. Here I’ll provide some basics:

You can use path dependencies in your pyproject.toml, but you will have to edit it manually, e.g.:

example-dependency = {path = "../example-dependency", develop = true}

develop here is really important, it basically means editable.

The problem with this is that now building and publishing basically doesn’t work, as the requirements for the built packages will contain those paths and will not be resolvable by wheels.

If you want to publish your package and have it work with poetry install, you can set it up with both a versioned dependency and a path dev dependency.

[tool.poetry.dependencies]
...
example-dependency = "0.1.0"

[tool.poetry.dev-dependencies]
...
example-dependency = {path = "../example-dependency", develop = true}

This works almost perfectly. Installing with poetry install will work. Building a wheel will completely ignore the path dependency and work correctly. Just make sure to put all of your path dependencies into a wheelhouse (a directory with wheels) and install with pip, using -f path/to/your/wheelhouse.

Sadly, poetry install --no-dev will not see the path dependency at all, and if you have a path dependency that itself has different path dependencies, the setup will break. Nevertheless, it might be a helpful setup to someone, who has a relatively small monorepo and is willing to pin path dependencies of path dependencies.

There is also a small poetry workspace plugin, that we didn’t yet get to experiment with. We really hope that poetry will provide more tools around monorepos in the future.

Next steps

This was a short intro to poetry, but this is not where tools for collaborative code management end. I’ve mentioned tox, black, and pytest, but there are many more! My favourite best practice is to regularly review and evolve the tools for and ways of working. If you have a gut feeling something can be automated, there’s probably a bunch of tools that do exactly that. Have fun learning!

Read more...

July 6, 2023
Read more
June 23, 2023
Read more
June 7, 2023
Read more