Scaling Your Team with Python Packages
When you think of Python packages, you probably think of open-source software. But there's a strong argument for companies to package their Python code even if they never distribute it to the broader world: it helps scale your Python development team.
The most obvious benefit to packaging your Python code is so you can share it between applications. Anything that makes sense as a public package makes sense as a private package, but there are all kinds of business-specific code you might want to extract out into packages:
- ORM layers defining tables used by multiple applications
- Shared configuration for development tools
- Common business logic
Shared code is a force-multiplier for scaling—a single engineer's work on the shared code can benefit every application that uses it.
A lot of companies turn to microservices as a way to enforce boundaries between units of code so that teams can maintain them separately. Microservices have other advantages as well, but they come with a lot of costs—having to maintain separate deployments, having to upgrade other services whenever the API changes, etc.
You can get the same boundaries within a single, monolithic service by splitting code into packages. Much like a microservice, a package can expose a small public API, and can be versioned based on that public API. But unlike a microservice, you can upgrade consumers of your API individually. You can also test these upgrades much more easily—in unit tests rather than much slower and more costly integration tests.
Oftentimes, sharing code and enforcing boundaries go hand-in-hand. For example, if you extract the ORM layer defining the tables in your database as a package, it can be managed by a DBA and shared across applications that access the database.
Forking Public Packages
Another common use for private packages is forking a public package. This is often a good compromise between reinventing the wheel by writing all of the code yourself, and using an existing package that doesn't quite fit your needs.
However, you should be careful not to make your fork a maintenance headache. The most common problem with forks is that it becomes hard to incorporate upstream changes into the fork. This can be avoided by structuring your fork as a thin wrapper around the upstream package: make your package depend on the upstream package (with an appropriate version constraint), and have your package import the required functionality from upstream and patch whatever it needs to.
With a package index like PyDist, you can fork a package entirely transparently to its consumers by publishing your fork under the same name as the public package.
The biggest downside to using private packages is that you need a way to
distribute them to your developers, your CI/CD tools, and your build
environments. You can distribute them directly from GitHub or other VCS
services by specifying a
git:// URL in your dependencies, but this has
- GitHub and other VCS services do not offer API keys, so you will have to create accounts for the services you want to download your dependencies and distribute credentials to them. If you enforce a 2FA policy, this gets very tricky.
- You can specify either a branch or a git tag to download, but even if your
git tag looks like a version,
pipand other tools will not treat it as such, so you cannot make use of version constraints like
For these reasons it is usually worth using a private Python package index for your private packages, in addition to PyPI for the public packages you depend on—or use PyDist, which mirrors PyPI so you can download public packages from it too. How you integrate with a private package index depends on your use case, so I've written a separate post on the topic.