Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's rare to see something as systematically broken as Python package/dependencies ecosystem.

What I don't understand - what makes this so difficult to solve in Python? It seems that many other platforms solved this a long time ago - maven 2.0 was released almost 20 years ago. While it wasn't / isn't by no means perfect, its fundamentals were decent already back then.

One thing which I think messed this up from the beginning was applying the Unix philosophy with several/many individual tools as opposed to one cohesive system - requirements.txt, setuptools, pip, pipx, pipenv, venv... were always woefully inadequate, but produced a myriad of possible combinations to support. It seems like simplicity was the main motivation for such design, but these certainly seems like examples of being too simplistic for the job.

I recently tried to run a Python app (after having a couple of years break from Python) which used conda and I got lost there quickly. Project README described using conda, mamba, anaconda, conda-forge, mini-forge, mini-conda ... In the end, nothing I tried worked.



> what makes this so difficult to solve in Python?

Python creates the perfect storm for package management hell:

- Most the valuable libraries are natively compiled (so you get all the fun of distributing binaries for every platform without any of the traditional benefits of native compilation)

- The dynamic nature makes it challenging to understand the non-local impacts of changes without a full integration test suite (library developers break each other all the time without realizing it, semantic versioning is a farce)

- Too many fractured packaging solutions, not a single one well designed. And they all conflict.

- A bifurcated culture of interactive use vs production code - while they both ostensibly use the same language, they have wildly different sub-cultures and best practices.

- Churn: a culture that largely disavows strong backwards compatibility guarantees, in favor of the "move fast and break things" approach. (Consequence: you have to move fast too just to keep up with all the breakage)

- A culture that values ease of use above simplicity of implementation. Python developers would rather save 1 line of code in the moment, even if it pushes the complexity off to another part of the system. The quite obvious consequence is an ever-growing backlog of complexity.

Some of the issues are technical. But I'd argue that the final bullet is why all of the above problems are getting worse, not better.


> Too many fractured packaging solutions, not a single one well designed. And they all conflict.

100% this.

Last 4 years, one of the most frustrating parts of SWE that I need to deal with on a daily basis is packaging data science & machine learning applications and APIs in Python.

Maybe this is a very mid-solution, but one solution that I found was to use dockerized local environments with all dependencies pinned via Poetry [1]. The start setup is not easy, but now using some other Make file, it's something that I take only 4 hours with a DS to explain and run together and save tons of hours of in debugging and dependency conflict.

> Python developers would rather save 1 line of code in the moment, even if it pushes the complexity off to another part of the system.

Sounds odd to me in several projects that I worked on that folks bring the entire dependency on Scikit-Learn due to the train_test_split function [2] because the team thought that it would be simpler and easier to write a function that splits the dataset.

[1] - https://github.com/orgs/python-poetry/discussions/1879 [2] - https://scikit-learn.org/1.5/modules/generated/sklearn.model...


I'm trying to do the same but with uv instead of poetry. So far so good, and it helps that for me delivering as a docker container is a requirement, but I have no idea what's going to happen if I need to run "real" ML stuff. (Just doing a lot of plotting so far.)


I agree with all of these and it makes me wonder as I do from time to time,

has anyone managed to make a viable P#, a clean break which retains most of what most people love about the language and environment; and cheerfully asserts new and immutable change in things like <the technical parts of the above>.

When I have looked into this it seems people can't help but improve one-more-thing or one-other-thing and end up just enjoying vaguely-pythonic language design.


Googling P# led me to this delight which is 100% unrelated:

https://couragetotremble.blog/2007/08/09/p-language/


IronPython? The problem with that is compatibility with, and easy access to, existing libraries which is the main reason to use Python in the first place.

I also think some of the criticisms in the GP comment are not accurate. most of the valuable libraries are native compiled? Some important ones are, but not all.

I think a lot of the problem is that Python's usage has changed. Its great for a wide range of uses (scripting, web apps and other server stuff, even GUIs) but its really not a great match for scientific computing and the like but has become widely used there because it is easy to learn (and has lots of libraries for that now!).


The problem is that Python refuses to take responsibility for the whole ecosystem. One of the biggest success stories in programming language development has been Rust's realization that all of it matters: language, version management, package management, and build tools. To have a truly outstanding experience you need to take responsibility for the whole ecosystem. Python and many other older languages just focus on one part of the ecosystem, while letting others take care of different parts.

If Python leadership had true visionaries they would sit down, analyze every publicly available Python project and build a single set of tools that could gradually and seamlessly replace the existing clusterfuck.

Python developers will pretend the language is all about simplicity and then hand you over to the most deranged ecosystem imaginable. It sure is easy to pretend that you have a really simple ecosystem when you cover your eyes and focus on a small segment of the overall experience.


You can kind of see this in golang. Originally it came with stuff to download dependencies, but it had major issues with more complex projects and some community-made tools became popular instead. But it meant that multiple tools were used in different places and it was kind of a mess. Later on a new system was done in the default toolchain and even though it has problems it's good enough that it's now surprising for somebody to use non-default tools.


Who will pay for all this responsibility?


I don't know, but are we going to pretend that it would be particularly difficult to get funding for drastically simplifying and improving the tooling for one of the world's most popular programming languages?

I'm not sure how Rust is doing it, but the problem is hardly insurmountable.


The PSF does have massive financial challenges. I don't know how Rust does it either, but I think there's far less general overhead due to its specificity. Python has a far broader reach, with a lot of diverse use cases to cater to.


Yeah, who will pay for this drastic reduction in wasted time?


> What I don't understand - what makes this so difficult to solve in Python?

I think there are many answers to this, and there are many factors contributing to it, but if I had to pick one: The setup.py file. It needs to be executed to determine the dependencies of a project. Since it's a script, that allows any maintainer of any package you are using to do arbitrarily complex/dumb stuff in it like e.g. conditionally adding dependencies based on host system specific environment markers, or introduce assumptions on the environment it is being installed to. That makes trying to achieve all the things you'f want from a modern package manager so much harder.

This also means that the problem isn't just concentrated in 1-2 central package management projects, but scattered throughout the ecosystem (and some of the worst offenders are some of Python's most popular sub-ecosystems).

There is some light with the introduction of the pyproject.toml, and now uv as a tool taking advantage of it.


> The setup.py file. It needs to be executed to determine the dependencies of a project.

Yes, this should never have been allowed. It solved a problem in the short term but in the long term has caused no end of pain.


setup.py allowed arbitrary things, but at least it always went through setuptools (or closely related predecessors, such as distribute or distlib). There is now pyproject.toml, but at the same time, there are tons of build backends that can do different things. And one of the most popular modern packaging tools, poetry, uses a non-standard section for the package data.


I think at least part of it is that there are so many solutions for Python packaging, which are often intermixed or only half-supported by developers. It's a tough ask to provide dedicated support for pip, conda, poetry and what else is there plus a couple different ways to create virtual environments. Of course if you do everything right, you set it up once (if even that) and it just keeps working forever, but it is never like that. Someone will use a tool you haven't and it will not work correctly and they will find a workaround and the mess starts.

Also I think that Python packages are sometimes distributed as shared libraries is a problem. When I think about conan or vcpkg (package managers for C and C++), they usually suck because some dependencies are available on some platforms and not on others or even in one version on one platform and in another version on another and you get messes all around if you need to support multiple platforms.

I think generally binary package managers are almost always bad* and source based package managers almost always work well (I think those are essentially easy mode).

*: unless they maintain a source package of their own that they actually support and have a fixed set of well-supported platforms (like system package managers on most Linux distros do).


The problem is a lot of Python source is actually a C/C++ file, so simply having "source based package manager for Python" is very annoying, as you'd have to manage your C/C++ sources with some other mechanisms.

This is exactly the reason I've moved from pip to conda for some projects: "pip" was acting a source-based package manager, and thus asking for C tools, libraries and dev headers to be installed - but not providing them as they were non-Python and thus declared out of scope. Especially on older Linux distributions, getting dependencies right can be quite a task.


This used to be a big headache for me, especially having developers on Windows but deployment targets in Linux, but a lot of the libraries I commonly use these days are either pure python or ship wheels for the platforms I use.

Were your issues recent or from several years ago?


The issues were recent (as of few months ago), but the OS's were pretty old - Ubuntu 20.04 and even 18.04. Those are still officially supported with Ubuntu Pro (free for individuals), but have ancient libraries and Python versions.


A lot of path dependency, but essentially

  1. A good python solution needs to support native extensions. Few other languages solve this well, especially across unix + windows.
  2. Python itself does not have package manager included.
I am not sure solving 2 alone is enough, because it will be hard to fix 1 then. And ofc 2 would needs to have solution for older python versions.

My guess is that we're stuck in a local maximum for a while, with uv looking like a decent contender.


PHP and composer do. You can specify native extensions in the composer.json file, along with an optional version requirement, and install them using composer just fine. Dependencies can in turn depend on specific extensions, or just recommend them without mandating an installation. This works across UNIX and Windows, as far as I’m aware.


PHP and composer do.

Is that a new feature? Pretty sure it didn't a few years ago. If the thing I need needed the libfoo C library then I first had to install libfoo on my computer using apt/brew/etc. If a new version of the PHP extension comes out that uses libfoo 2.0, then it was up to me to update libfoo first. There was no way for composer to install and manage libfoo.


does not seem so... Something as simple as "yaml" already requires reaching to apt-get: http://bd808.com/pecl-file_formats-yaml/

> Php-yaml can be installed using PHP's PECL package manager. This extension requires the LibYAML C library version 0.1.0 or higher to be installed.

    $ sudo apt-get install libyaml-dev
This is basically how "pip" works, and while it's fine for basic stuff, it gets pretty bad if you want to install fancy numerical of cryptography package on a LTS linux system that's at the end of the support period.

I am guessing that PHP might simply have less need for native packages, being more web-oriented.


Nix solves it for me. Takes a bit more effort upfront, but the payoff is "Python dependency determinism," which is pretty much unachievable in any other way, so...


The answer is not Yet Another Tool In The Chain. Python community itself needs to address this. Because if they don’t then you’ll have requirements.txt, setuptools, pyproject, pip, pipx, pipenv, pyenv, venv, nix.


Agreed. Often there's a quite tight coupling between the core platform devs and package management - node.js has its npm, rust cargo, go has one as well and for the most part it seems to have worked out fine for them. Java and .NET (and I think PHP) are different in the sense that the package management systems have no relation to the platform developers, but industry standards (maven, gradle, NuGET, Composer) still appeared and are widely accepted.

But with Python it seems completely fractured - everyone tries to solve it their own way, with nothing becoming a truly widely used solution. More involvement from the Python project could make a difference. From my perspective, this mess is currently Python's biggest problem and should be prioritized accordingly.


FWIW Nuget to .NET is what Cargo crates are to Rust instead of what Maven and Gradle are to Java. The package manager is just a part of the SDK.

Even the CLI workflow is identical: dotnet add package / cargo add (.NET had it earlier too, it's nice that Cargo now also has it).


Wait, newer versions of thr JDK Java SDK now bundle maven and gradle? Why does everyone use mvnw/gradlew for?


This was referring to package manager being just a part of .NET's SDK. Gradle and Maven continue to ship separately.


Right, I forgot NuGet got adopted by Microsoft. But it started and gained prominence independently.


Nix is cross-language though. So it will be useful even if the Python mess is cleaned up a bit.


Nix isn't 'yet another tool in the chain'; Nix demands to run the whole show, and in the Nix world native dependencies in all programming language are first class citizens that the ecosystem is already committed to handling.

> Python community itself needs to address this.

The Python community can't address it, really, because that would make the Python community responsible for a general-purpose package management system not at all limited to Python, but including packages written in C, C++, and Rust to start, and also Fortran, maybe Haskell and Go, too.

The only role the Python community can realistically play in such a solution is making Python packages well-behaved (i.e., no more arbitrary code at build time or install time) and standardizing a source format rich with metadata about all dependencies (including non-Python dependencies). There seems to be some interest in this in the Python community, but not much.

The truth, perhaps bitter, is that for languages whose most important packages all have dependencies foreign to the ecosystem, the only sane package management strategy is slotting yourself into polyglot software distributions like Nix, Guix, Spack, Conda, Pkgsrc, MacPorts, MSYS2, your favorite Linux distro, whatever. Python doesn't need a grand, unifying Python package manager so much as a limited, unified source package format.


Well, there is no way to address it then, no magic will eliminate everything from the list.

So another tool isn't meaningfully different (and it can be the answer): if "the community" migrates to the new tool it wouldn't matter that there's a dozen of other unused tools.

Same thing if "the community" fixes an existing tool and migrates to it: other unused tools will still exist


Nix isn't another tool, it's a tool that subsumes all other tools.


> The answer is not Yet Another Tool In The Chain

Normally, that would be true, but Nix has wiped out my need for homebrew, asdf, and a bunch of other tooling, so it still satisfies your requirement by leaving fewer dependencies overall instead of additional ones!


The thing is, Nix is not Yet Another Tool, it is the tool.


And so was Docker before Nix


Docker is kinda the opposite of Nix in this respect— Docker is fundamentally parasitic on other tools for dependency management, and Nix handles dependencies itself.

That parasitism is also Docker's strength: bring along whatever knowledge you have of your favorite language ecosystem's toolchain; it'll not only apply but it'll likely be largely sufficient.

Build systems like Buck and Bazel are more like Nix in this respect: they take over the responsibilities of soke tools in your language's toolchain (usually high-level build tools, sometimes also dependency managers) so they can impose a certain discipline and yield certain benefits (crucially fine-grained, incremental compilation).

Anyway, Docker doesn't fetch or resolve the dependencies of Python packages. It leaves that to other tools (Nix, apt-get, whatever) and just does you the favor of freezing the result as a binary artifact. Immensely useful, but solves a different problem than the main one here, even if it eases some of the same burdens.


Docker is the cached output of a build that just so happened to succeed

Nix guarantees builds succeed so it doesn't need to cache the output


The inevitable reality: https://xkcd.com/927/


Agreed. But the problem is now fully solved by https://docs.astral.sh/uv/.


I'm enjoying uv but I wouldn't say the problem is "fully" solved -- for starters it's not uncommon to do `uv add foo` and then 5K lines of gobbledygook later it says "missing foo-esoterica.dll" and I have to go back to the multiplatform drawing board.


Could it be a problem with a specific Python package being installed rather than uv itself?


easy if you start from scratch, hard if you want to get existing projects working

also it doesn't always work, I got stuck with some dependencies when it works it's amazing


Charlie Marsh (one of the people behind astral.sh, the creators of ruff and uv) explains it well in this talk:

https://www.youtube.com/watch?v=zOY9mc-zRxk


It is not a new discovery that Python is terrible for packaging and distribution. Unfortunately, very little has been done about this. The fact that Python is used on particular environments controlled by the developers, mainly machine learning, makes this even more difficult to fix.


It's not really true to say "very little has been done." Thousands of person-hours have been invested into this problem! But the results have been mixed.

At least uv is nice! https://docs.astral.sh/uv/


Time was spent, but on what? Creating 15+ different, competing tools? That won’t improve things. Blessing one tool and adopting something equivalent to node_modules could, but the core team is not interested in improving things this way.


> what makes this so difficult to solve in Python?

I think the answer is the same thing that makes it difficult to make a good package manager for C++.

When a language doesn't start with decent package management, it becomes really hard to retrofit a good one later in the lifespan of that language. Everyone can see "this sucks" but there's simply no good route to change the status quo.

I think Java is the one language I've seen that has successfully done the switch.


Java, C#, JavaScript (node) all disagree. If the Python core team wanted good packaging, they could have done it ages ago. Sure, a good solution might not be applicable for past Python versions, but they aren’t doing anything to make it any better.


Python packaging’s complexities are difficult to attribute to any single cause. But path dependency, extremely broad adoption, and social conventions with the Python community (which has historically preferred standards over picking single tools) are all contributing factors.

Most of these aspects have significantly improved over the last decade, at least for the standard packaging ecosystem. I don’t know about Conda, which has always been its own separate thing.


PyPI was always broken due to weird ideas for problems that were long solved in other languages or distributions. They had/have the backing of fastly.net, which created an arrogant and incompetent environment where people listed to no one.

Conda suffers from the virtual environment syndrome. Virtual environments are always imperfect and confusing. System libraries sometime leak through. The "scientific" Python stack has horrible mixtures of C/C++/Cython etc., all poorly written ad difficult to build.

Projects deteriorated in their ability to build from source due to the availability of binary wheels and the explosion of build systems. In 2010 there was a good chance that building a C project worked. Now you fight with meson versions, meson-python, cython versions, libc versions and so forth.

There is no longer any culture of correctness and code cleanliness in the Python ecosystem. A lot of good developers have left. Some current developers work for the companies who sell solutions for the chaos in the ecosystem.


> The "scientific" Python stack has horrible mixtures of C/C++/Cython

Don't forget a whole lot of FORTRAN :)


Python packaging is broken mostly because bootstrapping is broken, and it cascades to packaging but people don't know the bootstrapping is responsible and blame packaging.

Not saying packaging doesn't have faults, but on it's own, on a good Python setup, it's actually better than average. But few people have a good setup. In fact most people don't know what a good setup looks like.

And here is why bootstrapping is broken: https://www.bitecode.dev/p/why-is-the-python-installation-pr...


uv solves this issue nicely. Uv manages Python version and being a single binary, installing uv involved downloading a file and add it to PATH


Yes, that's one of the most important success of the tool. Being in rust, it is completely independent from the Python setup, and therefore it doesn't care if you botched it. And with the indy greg build, it can even avoid the pyenv pitfall of compiling on your machine on linux.


My single setup routine has served me well for years, with little to no change: pipx as the tools manager, miniconda for env bootstrap and management, poetry (installed with pipx) for project management (works great with conda envs) and autoenv to ensure the correct env is always active for any project I'm currently in. The only issue I may potentially have is if I install anything apart from Python via conda, as that won't be reflected in the pyproject file.


>> One thing which I think messed this up from the beginning was applying the Unix philosophy with several/many individual tools as opposed to one cohesive system

Well, Unix IS the cohesive system..


my approach is to ignore all the *conda stuff and:

yay -S python-virtualenv # I'm on arch, do not confuse with 12 similarly named alternatives pyenv virtualenv 3.10 random-python-crap pyenv local 3.10.6/envs/random-python-crap pip install -r requirements.txt

and it works (sometimes deps are in some other places, or you have to pass -c constraints.txt or there is no file and you need to create it in various ways)

At least by not using local .env directories, I always know where to find them.

I install a lot of AI projecst so I have around 1TB just for the same python dependencies installed over and over again.

Sometimes I can get away with trying to use the same venv for two different projects but 8/10 deps get broken.


python developers run pyenv inside of docker containers.. they just have no clue what good dependency management could even possibly look like


I don’t think the Python community has a culture of thinking about software engineering in a principled and systematic way like you would see in places like Haskell, Rust or Clojure communities.

Pythons strength (and weakness) is an emphasis on quick scripts, data science and statistics.

There’s simply not the right people with the right mindset.


Not "right" or "wrong" mindset. Just different.


No it's wrong because of the mess it makes. Which makes even the things that that crowd of people wants to focus on, like wuick scripts or data science, harder.


If your objective is to reliably build software then it’s objectively worse.


Many, probably the majority, just want to build something quickly and be done, or get to the next iteration. It's a huge reason why Python is widely adopted in classrooms and for ML/AI. It's objectively better than other languages that force extra overhead on users by default.


I would argue that it should be possible to make something rigorous and easy to use here. The Python model is pure incidental complexity.


It's not an argument that so far stands in this context. Otherwise a decent amount of the languages most preferred by experienced software engineers would be used more generally by literally anyone else outside that set. And Python then would either be very different, or have far less mind share.

Also keep in mind that a) Python has been around longer than every other "popular" language, and so b) it has a lot of baggage that it has to maintain in order to avoid another 2to3 fiasco.


or lua rocks




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: