by Alex Becker

Scaling Your Team with Python Packages

When you think of Python packages, you probably think of open-source software. But there's a strong argument for companies to package their Python code even if they never distribute it to the broader world: it helps scale your Python development team.

Sharing Code

The most obvious benefit to packaging your Python code is so you can share it between applications. Anything that makes sense as a public package makes sense as a private package, but there are all kinds of business-specific code you might want to extract out into packages:

Shared code is a force-multiplier for scaling—a single engineer's work on the shared code can benefit every application that uses it.

Enforcing Boundaries

A lot of companies turn to microservices as a way to enforce boundaries between units of code so that teams can maintain them separately. Microservices have other advantages as well, but they come with a lot of costs—having to maintain separate deployments, having to upgrade other services whenever the API changes, etc.

You can get the same boundaries within a single, monolithic service by splitting code into packages. Much like a microservice, a package can expose a small public API, and can be versioned based on that public API. But unlike a microservice, you can upgrade consumers of your API individually. You can also test these upgrades much more easily—in unit tests rather than much slower and more costly integration tests.

Oftentimes, sharing code and enforcing boundaries go hand-in-hand. For example, if you extract the ORM layer defining the tables in your database as a package, it can be managed by a DBA and shared across applications that access the database.

Forking Public Packages

Another common use for private packages is forking a public package. This is often a good compromise between reinventing the wheel by writing all of the code yourself, and using an existing package that doesn't quite fit your needs.

However, you should be careful not to make your fork a maintenance headache. The most common problem with forks is that it becomes hard to incorporate upstream changes into the fork. This can be avoided by structuring your fork as a thin wrapper around the upstream package: make your package depend on the upstream package (with an appropriate version constraint), and have your package import the required functionality from upstream and patch whatever it needs to.

With a package index like PyDist, you can fork a package entirely transparently to its consumers by publishing your fork under the same name as the public package.

Distributing Packages

The biggest downside to using private packages is that you need a way to distribute them to your developers, your CI/CD tools, and your build environments. You can distribute them directly from GitHub or other VCS services by specifying a git:// URL in your dependencies, but this has significant drawbacks:

For these reasons it is usually worth using a private Python package index for your private packages, in addition to PyPI for the public packages you depend on—or use PyDist, which mirrors PyPI so you can download public packages from it too. How you integrate with a private package index depends on your use case, so I've written a separate post on the topic.