Python's Caduceus syndrome
This year, according to StackOverflow’s annual survey, Python became the second-most beloved programming language (behind Rust, which is the language people most love telling you they’re learning aspirationally.)
Python’s meteoric rise is due in part to the web framework Django, which is well-used and has amazing documentation, but mainly due to the enormous boom in data science, driven by scientific computing packages like NumPy that power Python’s numerical computing ecosystem:
@scikit_learn is used by 55,634 other repositories
@TensorFlow by 39,969
numpy, by 205,143 (woot!)...
As much love as there is for Python (including from yours truly - Python is one of the first programming languages I learned and the one I use most frequently), its ecosystem is not without its controversies. Guido Van Rossum’s abdication as Python’s BDFL, and a recent discussion during the Python Language summit that’s been making the rounds, called “Batteries Included, But They're Leaking,” a criticism referencing Python’s “batteries included” approach, are two recent indicators of growing pains.
To understand what’s going on, it’s helpful to get some context. The impetus for many of the changes and discussions in Pythonland has been the upgrade from Python 2 to Python 3. As any beleaguered Pythonista knows, the move from 2 to 3 has been stretching on interminably. At a PyCon I attended all the way back in 2014, Guido came on stage in a shirt that had just the number 2.8 on it, with a big red cross out road sign in front of it, alluding to the fact that Python 2 would not be progressing past version 2.7.
But, finally Python 2 is finally set to reach end of life in 2020. What does this mean? Usually, when software and packages are actively upgraded, broken code is fixed, and any holes that occur in the code as new security threats become obvious are patched. To end support for Python 2 means core Python developers, as well as developers of most other significant Python projects (including TensorFlow, Requests, pandas, IPython, and other packages that developers and data scientists use significantly) will not add any more features.
The main reason the upgrade has been taking so long is that the core developers decided to not make Python 3 backwards compatible, which breaks many codebases and has resulted in a number of complications, especially for many companies that can’t upgrade past 2.7 due to centralized IT bureaucracies and security concerns.
But, at the same time that support for 2 is withering, the Python 3 ecosystem is in full bloom. Core developers working on Python 3 have been building in language features that have been missing for some time, including adding type annotations that mimic (but do not offer the full feature set of) type systems in compiled languages like Java and C++, data classes similar to case classes in Scala, and other goodies like asyncio that bring Python closer to what some developers see as “real” programming languages, aka compiled languages that are easier to package and deploy to production, languages that offer a way around Python’s single-thread model, and languages where type systems mean it can sometimes be easier to catch bugs before they happen.
The problem is that Python, by its nature, is not those other languages. The reason it’s so popular is because it’s dead-simple to get started with, compared, to say, Java, which makes you learn about types and classes from your very first program, whereas Python abstracts it away in favor of writing code.
Python is a fantastic language for exploring, prototyping, writing smaller applications, scripts, manipulating strings, and web scraping - in short, the bread and butter of half of data science work, and a significant part of devops, system administration, and automating boring office clerical work.
But the other problem, though, is that because Python is (was?) so simple and so ergonomic to use, new developers have been flocking to it, and adopting it for use in building large production systems that might have been better off written in different languages that are much less suited to exploring but more suited to long-term, stable systems.
In a way, Python doesn’t necessarily scale. For example, Instagram’s team, the largest Django install, has resorted to disabling garbage collection to speed up processing power.
Of course, Python’s core developers want it to be able to scale, and as a result, Python’s core developer team has been working on the aforementioned features, including some that are completely puzzling to most Python users. The straw that finally broke the camel’s back was PEP 572 (all Python changes are governed by PEPs, Python Enhancement Proposals, which are open to everyone in the community to read and lay out reasons for making changes, with much detail from the core committers. They make great reading if you want to either understand a Python concept to its very core, or are having trouble with insomnia) , which introduced something called assignment expressions, which change code from looking like this:
match = pattern.match(line)
if match:
return match.group(1)
to this:
if match := pattern.match(line):
return match.group(1)
For some developers that have used other languages, this is a familiar pattern. For many Python users (myself included), this is a completely foreign syntax that changes how easy it was to previously read Python code, and it created an enormous controversy (as big as you can get in the world of programming language design mailing lists, I guess), which resulted in many people being angry at Guido for introducing the change. He, in turn became angry that they were being resistant to what he saw as a major improvement, and finally responded on the Python mailing list by quitting his benign dictatorship over the project he has spent most of his life’s energy on. He wrote,
"Now that PEP 572 is done, I don't ever want to have to fight so hard for a PEP and find that so many people despise my decisions. I would like to remove myself entirely from the decision process. I'll still be there for a while as an ordinary core dev, and I'll still be available to mentor people -- possibly more available. But I'm basically giving myself a permanent vacation from being BDFL, and you all will be on your own."
This was an important moment for the Python language community. What happens when charismatic leaders leave projects and leave leadership absences in their wake? What would happen to the future of Python? There was much speculation for a few months.
Which brings us to Python’s leaky batteries issue most recently. During the latest Python Language Summit, when core developers get together to discuss the future, Amber Brown, who works on maintaining Twisted, a framework similar to Django, critiqued Python’s standard library for not being up to par and hard to change, and said that some of the older packages need to be taken out of the standard library in favor of higher-quality dependencies.
Her reasoning was that many of her framework’s users were still on Python 2, and as a result, Twisted couldn’t stop supporting them. Guido, fresh from both this controversy, and continuing to deal with Python 2 migration issues for years, was frustrated:
Brown said her point was to move
asyncio
to PyPI, along with most new feature development. “We should embrace PyPI,” she exhorted. Some ecosystems such as Javascript rely too much on packages, she conceded, but there are others like Rust that have small standard libraries and high-quality package repositories. She thinks that Python should move farther in that direction.
Van Rossum argued instead that if the Twisted team wants the ecosystem to evolve, they should stop supporting older Python versions and force users to upgrade. Brown acknowledged this point, but said half of Twisted users are still on Python 2 and it is difficult to abandon them. The debate at this point became personal for Van Rossum, and he left angrily.
What do all these problems mean? Really, what’s happening now is that Python is popular enough that it has to evolve to shiny new features in Python 3 (and a shiny new steering committee) and is being used everywhere in the latest hotness, from deep learning to cloud computing. This is the first head of the caduceus: looking forward.
But, the language is still very much bound by the rules and history of its past successes: Python 2, and its creator, Guido, whose time, energy, devotion, and passion, have made the language what it has become to date. This is the second head of the caduceus - the technical debt and the ecosystem of users who are now still stuck in migration limbo.
Which of the two should the ecosystem tackle as a priority? Should they clean up all backwards compatibility first? (that would be PEP 594, “removing dead batteries from the standard library”) Should they focus on moving everyone over to the new features?
The caduceus, for now, and as many open-source endeavors driven by lots of volunteers with lots of opinions, seems to be bent on moving forward, but still carrying the weight of Python 2 on its back with every Python 3 decision it’s trying to make.
As a very enthusiastic Python developer, of course I’m interested in the whole ecosystem’s success and hope that these things get worked out and the snake comes back together, but just as interested in seeing which direction “being worked out” will take.
Image: Mozes and the brass snake, Anthony van Dyck, 1620
Links
Gabe Weinberg on Duck Duck Go and the advertising industry (podcast)
Vassily Grossman, an extremely underrated Soviet writer
These people buying old houses are crazy
About the Author
I’m a data scientist in Philadelphia. Most of my free time is spent kid-wrangling, reading, and writing bad tweets. I also have longer opinions on things. Find out more here or follow me on Twitter.
If you like this newsletter, forward it to friends!