Announcing PyDist—NPM for Python
I love Python. It lets me focus on what I need to get done and makes me a more productive software engineer. A large part of that is because of its package ecosystem. Pretty much anything I need to do has a package for it—usually a good one! One at a time these have made their way into my dependencies, and it's let me get a lot done very quickly.
That being said, Python's story for managing dependencies leaves a lot to be desired.
Frustrated, I ended up on a year-long deep-dive into the world of Python packaging, which has culminated in the
launch of PyDist, my answer to NPM.
Python, it turns out, has a much more complicated packaging ecosystem than NPM for a couple of reasons:
- Because packages may use Python's C API, they may need extensive compilation.
But compiling e.g.
numpyfrom source is a time-consuming and error-prone process, so the Python ecosystem provides several ways to embed pre-compiled binary files in distributions (so-called eggs and later wheels). However, these compiled binaries are platform-specific, so each release may have multiple different distributions and
pipor other tools need to make a decision about which to download.
- Because packages are defined using
setup.py, which allows arbitrary Python code, there are no hard guarantees about what a package depends on. Nothing stops me from making a package that depends on X on Thursdays and Y on Fridays.
Before PyDist, I created a package management client called dotlock.
It locks down to the level of distributions instead of just releases, and it uses dependency information instead
setup.py when it is available through PyPI. But using a new client is a lot of friction,
and there are still so many problems it can't solve:
- Releases and even entire packages on PyPI can be deleted, breaking everything that depends on them.
- The PyPI API doesn't have dependency information for most packages, so they must be downloaded just to determine their dependencies. Other package indices are even worse.
- If you want to have public and private dependencies you have to use multiple package indices, which comes with all the drawbacks of both indices and additional configuration headaches.
That's why I created PyDist. It mirrors the public PyPI index, and keeps packages and releases that have been deleted from PyPI. It allows organizations to upload their own private dependencies, and seamlessly create private forks of public packages. And it integrates with standard Python tools almost as well as PyPI does.
But closing the gap with NPM is just the beginning. By controlling the index, I can do so much more for users:
- Show them what packages they are using, more easily and reliably than any client can.
- Enforce organization-wide policies on package and version usage.
- Warn users about security vulnerabilities in their dependencies.
- Build binary distributions for packages that don't have them.
So far I've only implemented the first of these features, which is PyDist's Insights page. I'm sure this list will grow and get re-ordered as I talk to users. I'm excited to see where it goes, and I hope you will be too.