Python -m pip

Apr 01, 2023 12:29


In work we have two application repos that use Poetry to manage dependencies. They both rely on a common repo that uses Pip requirements.txt. One application is still on Python-3.7, while the other one is decent. I just spent a few hours on the common repo.



Compared to DNF which is fast and the algorithm is used in many places, or Maven where the dependency is managed in hierarchy, the Pip one feels like herding cats or rabbits. The repo is not a strictly controlled one. Dependency does not have generations like in Fedora (new OS version rarely install packages from a previous version, binary incompatible in many cases.)  There is no way to globally "manage" a common version, say limit Python version to >= 3.7.2 for all repos, or inherit such a version, but multiple repos have to negotiate it to figure out. Within the same repo, Maven is a mess because it allows mixed versions unless one parent module starts to "manage" it; while Pip enforces the same version in one repo. Now both DNF and Maven can have "weak" dependencies or "head" dependencies, "pinned" important packages, "remove" or "downgrade" unsupported ones. Pip on the other hand treat all packages equally: it does not care if a random package breaks or locks boto3 to a version, even though boto3 is vital compared to that random package or leaf package. Pip is treating open source modules as a democracy and want every package to live, and it assumes all versions are equal too - if it has to scan 100 patch versions of boto3, it would go from highest to lowest in that order, rather than taking a guess with the bisecting rule;  if it has to try download pandas from 1.5.x all the way down to 1.1.x, every version is about 100MB, it also would not hesitate to do so. That's where manual work comes in - one would have to lock some versions and retry. How painful is that?

I wonder why it has not used any metadata, but has to fetch the packages to resolve dependencies. Or share a local package cache like Maven.

Sometimes it just works - Pip can install any version of package, without doing any dependency checks. It is like running RPM and ignoring any deps. It is dangerous but sometimes needed to avoid running dependency checks at all. A well tested set of versions may not work on a new environment. To re-create a set of working versions is very painful - to bootstrap a set of usable version, with the only constraints is the lowest Python version and the platform - it seems one has to go through the searching process all over again. At the same time, locally installed package is "polluting" the dependency resolution. For a few times I had to clean up packages in the system path, because Pip was executed before activating the virtualenv. How to disable writing to system path, or how to enforce Pip to check for virtualenv?

Thanks to Podman I was able to install Fedora and perform all the  testing with the Python 3.7 environment. It was fun to use, once the users folder is mounted to the default machine. The image for default machine is coreos, and the container to run is "fedora:31" which is a minimal image - had to install many packages, including procps-ng and findutils, to be really useful.

Somehow Poetry feels faster than Pip in updating the dependencies. Poetry only considers what is in the config file, it does not care "what's next", so it would try to install the @latest and complain about incompatible environment, then it would just stop.  On the other hand, Pip would start guessing versions. It would install some versions eventually. When the constraint is a very old version of Python, Poetry won't help you, Pip can help find a possible version. But if one installed with Pip, then it might result in a bad installation, because other packages may require even lower versions of such a package. Either way it requires manual work to coordinate them, but one rule in the DNF or Maven world applies:  always give the package manager enough information. I see many SO posts are using "xargs" to add packages to Poetry interactively; that would not work because Poetry would only get a little information every time.

There is what I did:

- list all packages in Pip requirements without version

- use Pip --dry-run to find the possible versions. Then manually revise the list of packages to freeze a few "head" packages, leave the rest open to change

- initialize Poetry config file

- add the list of package to Poetry. If the list is not compatible, revise to freeze more packages

- let Poetry install and lock version

- export versions to update the Pip requirements.

Both the "to Poetry" and "from Poetry" steps have lots of manual work. The biggest issue is that I don't know what "head" packages to choose to have a minimal list. In Maven it is easy to list the dependency:tree for a module, but that does not help if one listed unnecessary dependencies. There is also another practical factor, that including all package versions can speed up installation. Lastly people prefer to freeze all package versions no matter they use it or not. Frankly that is not working if there are 10+ people in the team. If anything breaks, a developer either rollback or bump versions without considering compatibility, because Aquinas told us, even judges cannot know every impact, and can only decide with current and present case. We need human laws!

小东西

Previous post Next post
Up