More

zahlman · 2026-03-27T23:14:12 1774653252

> If the version shown is 4.87.1 or 4.87.2, treat the environment as compromised.

More generally speaking one would have to treat the computer/container/VM as compromised. User-level malware still sucks. We've seen just the other day that Python code can run at startup time with .pth files (and probably many other ways). With a source distribution, it can run at install time, too (see e.g. https://zahlman.github.io/posts/python-packaging-3/).

> What to Do If Affected

> Downgrade immediately:

> pip install telnyx==4.87.0

Even if only the "environment" were compromised, that includes pip in the standard workflow. You can use an external copy of pip instead, via the `--python` option (and also avoid duplicating pip in each venv, wasting 10-15MB each time, by passing `--without-pip` at creation). I touch on both of these in https://zahlman.github.io/posts/python-packaging-2/ (specifically, showing how to do it with Pipx's vendored copy of pip). Note that `--python` is a hack that re-launches pip using the target environment; pip won't try to import things from that environment, but you'd still be exposed to .pth file risks.

zahlman · 2026-03-27T22:26:26 1774650386

> Does anyone actually do this?

Yes (but not for a browser). My terminal windows are 80x24, pretty much always. I do this today on Linux, I've done it through multiple versions of Windows, and I did it in my childhood on a 9" B&W "luggable" Mac screen.

I just like it, okay?

zahlman · 2026-03-27T21:59:54 1774648794

> uv is just a package manager that actually does its job for resolving dependencies.

Pip resolves dependencies just fine. It just also lets you try to build the environment incrementally (which is actually useful, especially for people who aren't "developers" on a "project"), and is slow (for a lot of reasons).

zahlman · 2026-03-27T21:57:46 1774648666

> I think the python community, and really all package managers, need to promote standard cache servers as first class citizens as a broader solution to supply chain issues. What I want is a server that presents pypi with safeguards I choose. For instance, add packages to the local index that are no less than xxx days old (this uv feature), but also freeze that unless an update is requested or required by a security concern, scan security blacklists to remove/block packages and versions that have been found to have issues. Update the cache to allow a specific version bump. That kind of thing.

FWIW, https://pypi.org/project/bandersnatch/ is the standard tool for setting up a PyPI mirror, and https://github.com/pypi/warehouse is the codebase for PyPI itself (including the actual website, account management etc.).

If "my own curated pypi" extends as far as a whitelist of build artifacts, you can just make a local "wheelhouse" directory of those, and pass `--no-index` and `--find-links /path/to/wheelhouse` in your `pip install` commands (I'm sure uv has something analogous).

zahlman · 2026-03-27T21:50:14 1774648214

> and roughly 700,000,000,000 cubic meters of beach on Earth.

I wonder how they determine the average depth of beach sand?

jl6 · 2026-03-27T22:56:03 1774652163

I imagine you’d go digging on a few sample beaches and then make an assumption about how representative those beaches are.

Think where we’d be without sand!

zahlman · 2026-03-27T16:03:12 1774627392

> It's unfair because it's a different algorithm with fundamentally different memory characteristics. A fairer comparison would be to stream the file in C++ as well and maintain internal state for the count.

The C++ code is still building a tally by incrementing keys of a hash map one at a time, and then dumping (reversed) key/value pairs out into a list and sorting. The file is small and the Python code is GCing the `line` each time through the outer loop. At any rate it seems like a big chunk of the Python memory usage is just constant (sort of; stuff also gets lazily loaded) overhead of the Python runtime, so.

zahlman · 2026-03-27T15:57:36 1774627056

Hmm? Which code are you looking at?

zahlman · 2026-03-27T15:51:15 1774626675

Sure, but making one string from the file contents is surely much better than having a separate string per word in the original data.

... Ah, but I suppose the existing code hasn't avoided that anyway. (It's also creating regex match objects, but those get disposed each time through the loop.) I don't know that there's really a way around that. Given the file is barely a KB, I rather doubt that the illustrated techniques are going to move the needle.

In fact, it looks as though the entire data structure (whether a dict, Counter etc.) should a relatively small part of the total reported memory usage. The rest seems to be internal Python stuff.

zahlman · 2026-03-27T15:41:28 1774626088

> This sounds like a job for Python. Indeed, an implementation takes fewer than 30 lines of code.

I don't know if the implementation is written in a "low-level" way to be more accessible to users of other programming languages, but it can certainly be done more simply leveraging the standard library:

  from collections import Counter
  import sys

  with open(sys.argv[1]) as f:
      words = Counter(word for line in f for word in line.split())

  for word, count in words.most_common():
      print(count, word)

At the very least, manually creating a (count, word) list from the dict items and then sorting and reversing it in-place is ignoring common idioms. `sorted` creates a copy already, and it can be passed a sort key and an option to sort in reverse order. A pure dict version could be:

  import sys

  with open(sys.argv[1]) as f:
    counts = {}
    for line in f:
      for word in line.split():
        counts[word] = counts.get(word, 0) + 1

  stats = sorted(counts.items(), key=lambda item: item[1], reverse=True)

  for word, count in stats:
      print(count, word)

(No, of course none of this is going to improve memory consumption meaningfully; maybe it's even worse, although intuitively I expect it to make very little difference either way. But I really feel like if you're going to pay the price for Python, you should get this kind of convenience out of it.)

Anyway, none of this is exactly revelatory. I was hoping we'd see some deeper investigation of what is actually being allocated. (Although I guess really the author's goal is to promote this Pystd project. It does look pretty neat.)

zahlman · 2026-03-27T01:24:25 1774574665

I'd try this, but I often find that I want to repeat a cycle of two or more commands. Yes, I probably should edit and put them on one line with semicolons (or even make a function), but.

henrik_w · 2026-03-27T10:07:46 1774606066

Or put && between them - I had "compile;run" and when compile failed, it still ran (but the old build). Took me a while to figure out. && ensures the first command succeeds. Anyway, so worth it to combine commands into one line for easy re-run.