Possibly the most interesting anti-pattern I saw was:
a_list_of_words = "my list of words".split(" ")
I never enquired why, since there were bigger issues in the code e.g. "unit testing" by running the code, taking the result and putting it as the check value. By running repr(value), copying out the string then comparing self.assertEqual(repr(value), '[<Object1: unicode_value>, ...]')
I do that all the time in the interpreter, especially when slicing pandas DataFrame objects, e.g.:
df_subset = df['date buyer nwidgets'.split()]
That is far easier to type than the explicit list, with all its punctuation. Now, it's definitely weird that they did a `split(" ")` rather than just using the default, but the idea is the same.
I do try to strip stuff like that out before I put it into a script, replacing it with the explicit list, but I'm never sure if that actually improves anything. It's not as if the explicit list is any easier to read.
Yeah, or re.split(r'\s+', ...), or something - the issue here is that you need to know about this.
From the comment a few levels up I understood that the code which used the str.split with " " argument didn't signify that someone who written it knew about its semantics. If he did and it was really what was intended then ofc it's completely ok, but if not, it can easily lead to bugs.
For example, if the user is required to input several ints separated with whitespace, this:
map(int, input_str.split())
will rise only in expected cases, while this:
map(int, input_str.split(" "))
can lead to rejecting correct input just because someone pressed space twice. It's very frustrating for the user, too, because whitespace are hard to spot visually.
So, I don't know if this qualifies as antipattern, but I think if I saw .split(" ") instead of .split() in the code I'd at the very least expect the comment explaining why it's used.
That's a pandas dataframe (idiomatically denoted df), not a list. It has funky slicing properties, and he's selecting columns of the dataframe in a perfectly valid way.
This is actually useful: you may want to experiment with different list_of_words in the future and typing the words between [" ", " ", " "] is time consuming. It's also less readable.
...because it's a list? Tuples were supposed to have a structure (at least that's what all the rest of the world thinks of them), so iterating through combination of apples, cars and languages makes no sense whatsoever.
But yes, Python misses entirely the point of tuples, treating them as read-only lists.
> Tuples were supposed to have a structure (at least that's what all the rest of the world thinks of them)
No, a structure is, you know, a structure -- what C calls a struct. Python calls it a namedtuple. If some people call it just a tuple, well, that's a difference in terminology, but it doesn't mean Python is confused about the concepts, it's just using terminology you're not used to.
Also, if we're going to be pedantic about the meaning of data types, your blog post is wrong about lists. You say "position in the list doesn't matter", but that means ordering doesn't matter, and an unordered collection of similar objects is a set, not a list. Python makes this distinction clear: a list is ordered, a set is not.
> [...] that's a difference in terminology, but it doesn't mean Python is confused about the concepts
No, it menas exactly this. The term "tuple" and its use predates Python. Sorry, no banana.
> [...] your blog post is wrong about lists. You say "position in the list doesn't matter", but that means ordering doesn't matter
Oh, so what's the difference in meaning of element True on position 1 and element True on position 20? Position in list doesn't matter if we're talking about meaning of the elements.
References, please? And not mathematical references; programming references. C was using the keyword "struct" long before Python to refer to what you are calling a tuple.
> what's the difference in meaning of element True on position 1 and element True on position 20?
The fact that the index is 1 instead of 20. Both elements have the same type, and might well refer to the same property of some sequence of things; but the index being 1 instead of 20 means the element True is describing that property relative to the first item in some sequence, instead of the 20th item. That's why position in the list makes a difference: the ordering of the items, as well as the type of the items, carries information.
(Of course, in Python the list items don't even have to be of the same type; but most uses of Python lists in practice that I've seen do assume that all the elements are "the same kind of thing".)
I'm not sure what you're getting at here, or what you are expecting tuples to be like. They can have as much "structure" as you need--they're just a collection.
Lighter-weight, immutable collections have a use case. The code in OP appears to be one where it makes sense. I follow the rule where variables are mutable IFF they need to be mutable.
Tuples by pythonists are used as they were mere lists, just immutable. This is clearly displayed by Python's own interface.
For the rest of the world, tuples are not immutable lists. They are tuples, i.e. collections of "objects" that could share nothing about their type. Tuples often are not even iterable! (Erlang, Haskell)
The fact that tuples in Python can have as much structure as one wants is derived from dynamic typing, not from the tuples' nature. The same you could say about Python's lists.
This is a really subtle issue. It takes to know more languages to see it clearly.
Have you looked at named tuples? They shipped in the python standard library sometime in the last few years (they are at least in 3.3) and are clearly intended for storing structured data.
A typical rule of thumb in Python land is that heterogeneous data probably belongs in a tuple, so practice goes a little further than immutable lists.
I think you could improve your demonstration of the usage in the standard library by examining a random selection of usages to try to find out what is typical. But maybe you already looked at more than you talk about in the article (and I understand that this might not be an interesting use of your time).
No, I haven't looked at them. Python 2 has them since release 2.6, so it's out of my reach for any practical purpose at the moment (I need to preserve compatibility with Python 2.4).
> A typical rule of thumb in Python land is that heterogeneous data probably belongs in a tuple, so practice goes a little further than immutable lists.
The problem with Python tuples is it's two things mixed: immutable lists and a container for heterogenous data. It's the same situation as JavaScript's objects.
If it's so subtle, does it matter? This sounds like you just have a problem with the word "tuple" applied to an object that behaves differently from tuples in a statically-typed language.
Would you feel better if they named it "ImmutableList" instead?
Can't speak for GP, but I would [feel better with that name].
(Although I agree with you that statically-typed-language-tuples don't seem to make sense in Python.)
But hey... Python's weird choice of how to name the ImmutableList could be worse, right?
For example, someone could be malicious enough to call their general-purpose associative array a "hash", just because a hashmap (note: not a hash) is a good implementation for large associative arrays. Wow, that'd be hilariously misleading, wouldn't it? Good times!
Or imagine someone was silly enough to name their auto-resizing arrays "vectors", even though in all previously existing contexts a "vector" is a sort of thing which absolutely cannot be meaningfully resized/extended. Ha. Think of the tiny cognitive burden placed on generations of future programmers-who-study-math, trying to juggle these two very-similar-but-distinct concepts, multiplied by the number of such future programmers. Amazing practical joke, right?
No, it's not a problem with word "tuple" behaving differently from statically-typed language. It's a problem with word "tuple" behaving differently from all the rest of the world.
Yes, I would feel better if it was named "ImmutableList" or any other way that is not misleading about the purpose.
3. Construct a tuple of a length not known at compile-time
Python allows these because "why not?" but it does break their "one and only one way to do it" rule and confuses beginners a hell of a lot.
There are definitely borderline cases. For instance, should a Vector be a list or a tuple? A Vec3 type is obviously a tuple, but a large Vector destined for BLAS is obviously a list.
No, it allows them because the distinction that those restrictions are founded on is only useful in a statically-typed languages, and Python isn't statically typed.
> For instance, should a Vector be a list or a tuple?
A real vector/array should be its own data type (probably implemented in a C, or similar low-level, extension) that happens to implement the interface expected of an indexable, iterable collection, neither a list nor a tuple.
Yeah, while snarkily made its a good point that Erlang does make use of it without being statically typed.
There is a deep difference that goes beyond use of tuples in language approach between Python and Erlang here where it comes to types in which Erlang, while dynamically typed, has a deep concern for types in its pattern matching system to make path decisions while Python is very much centered on using dynamic OO techniques -- how objects respond to messages -- to do that.
So I'd still say its the same kind of deep language approach difference at work.
Even in Haskell, though, people often write all kinds of type-class magic to allow "iterating" over a tuple. For example, a Binary instance over a tuple wants to call "put" on each element.
Haskell's (Oleg's) HList is basically a tuple with iteration/list-like operations.
The distinction between "tuple" and "immutable list" doesn't make any sense outside of a staticly-typed language, since the only difference is what other values a particular value is type-compatible with.
a_list_of_words = "my list of words".split(" ")
I never enquired why, since there were bigger issues in the code e.g. "unit testing" by running the code, taking the result and putting it as the check value. By running repr(value), copying out the string then comparing self.assertEqual(repr(value), '[<Object1: unicode_value>, ...]')