The Problem with
It is possible to use
pip—and by extension other Python tools—with multiple package indices at the same time using the
--extra-index-url flag. This is often done to use both a private package index and PyPI for different dependencies. However, packages are specified to
pip by name only, so how can
pip decide which index to find it in? Many people, myself included (even in a prior blog post), assume that it checks the main index (usually PyPI) and then falls back to the extra index if the package is not found.
But in reality,
pip guesses. It will check the indices in random order and continue until it finds a package by that name. This is less than ideal. Say that your company has a private packages called
core-infra. If there is already a package on PyPI by the same name, you can't reliably install your package with
pip. Worse, even if there is no name conflict now, someone could publish a package by the same name to PyPI at any time—potentially with malicious code!
Some Python tools, like poetry, pipenv and dotlock, let you specify the source for each of your dependencies individually. To support this, they have to
pip install packages one-by-one, passing
--index-url instead of
pip to guarantee which index it will search. As a tool maintainer myself, I can testify that this workaround is both inefficient and cumbersome to maintain. And yet it can still fail—if a new version of a private package introduces a new private dependency, and that dependency isn't already specified as living in the private index,
pip or most tools built on top of it will still try to install the dependency from PyPI.
These problems are one of the reasons why I chose to combine a PyPI mirror with a private package index when I created PyDist. Instead of using multiple indices, PyDist users can install solely from PyDist with
--index-url. This will install both their own packages and public PyPI packages, but in the event of a name conflict will always prefer their private packages.