ValueObject

edejong · on Nov 14, 2016

Equality is an often overlooked complexity in denotational semantics. Dijkstra wrote an EWD [1] on the subject, which should give insight into the difficulties of this presumably simple relation.

In most languages, I prefer to think of equality as a property outside of structure or class. Languages such as Java bind equals() strongly with the object, but this is a mistake. First of all, it promotes ordering where there is none. Equality is reflexive and symmetric. Second of all, it does not allow for multiple forms of equality. Lastly, since it is defined in Object, it forces the equality relation to all underlying types, even when there is no equality defined.

[1] https://www.cs.utexas.edu/users/EWD/transcriptions/EWD10xx/E...

bunderbunder · on Nov 15, 2016

This is another thing that I think .NET got (mostly) right, though the overall shape of things is complicated enough that it's still easy to get tripped up if you don't know all the details.

.NET has:

- Several basic ways doing comparisons: the == operator, the object.Equals() method, and the object.ReferenceEquals() method. == is intended to be for referential equality, and object.Equals() can optionally be overridden to implement value equality. == can be overridden (though it's not generally recommended), which is why there's the separate ReferenceEquals() method for when you absolutely must compare references.

- An IEqualityComparer interface for implementing custom equality. Collection classes that rely on determining if objects are equal (e.g., hash sets and dictionaries) let you supply one of these so that you can do purpose-specific equality.

- IComparable interface for when you want to implement a default ordering on a type.

- IComparer interface for when you want to use custom ordering rules.

emn13 · on Nov 15, 2016

This is one of the things .net did pretty terribly wrongly for it includes equals and GetHashCode on every object.

Not only is that autocompletion (and api) pollution of the absolutely worst kind (infects every "thing" in the language!) it also lends itself to bugs; so most people simply don't implement Equals or GetHashCode at all, or only with tools. If you do, it's easy to get wrong; and the built-in api has no guiderails to help avoid trivial mistakes (why is plain equality by composition so hard to express?). Furthermore, the default choice of reference equality means that in particular reference types need to have some GC-surviving notion of identity. And that means that every reference type has a larger object header: and memory density matters hugely to performance. It's quite conceivable to imagine a system with no object header at all, or more conservatively, only with a (potentially optional) type id.

The built-in api further makes it natural to implement non-symmetric or non-reflexive "equality", and it binds the definition of equality tightly to the type, when in fact (as you point out with IEqualityComparer) equality can easily be perspective dependent.

Edit: Probably by virtue of professional brain damage through years of getting used to the status quo I almost overlooked another bit of insanity: The fact that Equals is not equivalent to "operator ==" ! That simply makes no sense whatsoever, you just have to get used to it. To be clear: it's fine (in rare but necessary niche cases) to have multiple equality relationships; it's a nasty gotcha to spring that on the reader without warning. And to add insult to injury the operators == and != are barely related - there's little help in ensuring they're mutually consistent.

Fortunately, .net has value types, and those by default implement sane equality rules, right? Well... structs work in some sense, but it's not always desirable to couple something with such specific GC and performance characteristics to a semantic, so it's not a trivial switch. And then - the implementation by default of struct only "works" in some academic sense of the word. Sure, it correctly determines equality by composition, but it doesn't implement all the api's (operators and generic interfaces) so it's not always practical, and the performance is consistently very bad (reflection!), and somethings surprisingly terrible: in particular, the hashcode computation ignores some fields and can even ignore all fields due to what can only be described as the worst hashcode algorithm in common use. Even when the GetHashCode does use all fields it's very slow, and it mixes bits so poorly that common data causes unnecessarily many hash collisions. Frankly: struct's equality is a trap, and it's not broadly applicable anyhow.

All in all, to implement something that should be trivial (say: A is equal to B if and only if all components of A are equal to the respective components of B) correctly in a broadly applicable way you'd need to implement not just one method, but five: object.Equals, object.GetHashCode, IEqualityComparer<T>.Equals, operator ==, and operator !=. Just don't make any typos when copy-pasting all that boilerplate, because testing these for correctness isn't easy, and it is easy to end up in a situation that is appears superficially correct but has bugs in corner cases.

So, I'd call equality and related features in .net pretty terribly designed. It looks to me like they didn't think this through and simply copied java 1.0 a little too closely. In fact, I challenge you to name a language that's worse in this regard.

louthy · on Nov 15, 2016

Agree with everything here. I posted an example [1] of how this can be solved now with C# (although in reality it just adds another 'equality solution' to the mix). It uses ad-hoc polymorphism to implement a type-class (interface) and instance (struct) approach, similar to Haskell. There's no IEqualityComparer instance allocation cost either.

[1] https://qht.co/item?id=12954790

hvidgaard · on Nov 15, 2016

The only thing .net could have done differently was to make Equals and GetHashcode into interfaces and perhaps included some best practices.

== is there for legacy reasons, and meaning reference equality with the ability to overload it, seems like a reasonable compromise for object types. What we're missing is the ability to create custom value types, that consist of other value types. We already know that it's working with F#, so the CLR does support it. Question is if C# the language can be extented with it.

alkonaut · on Nov 15, 2016

Yes Equals and GetHashCode should have been in an interface.

After 13years of full time .NET it's the only thing I repeat over and over to newer devs: don't override equals and hash code . Use explicit comparers in your dictionaries because equality is rarely property of your THING, it's a property of the use case. I'd prefer the noise of having to always pass an explicit comparer to a Dict constructor, just for clarity.

A similar but related thing they got wrong was the possibility to format strings without an explicit formatter. The result of this? Run all unit test suites for all C# projects on a French/Swedish machine (decimal comma) and I can guarantee that most that involve any kind of formatting will fail because of it. I'd be delighted if they changed this behavior, breaking compatibility with every single C# program ever written, that's how bad it is.

hvidgaard · on Nov 15, 2016

You don't have much say in Equals and GetHashCode when using objects for for instance ItemSource for a ComboBox. But it's not that bad, you just need to have a senior dev sign off the implementations.

alkonaut · on Nov 15, 2016

A Control that does comparisons isn't much different from any other class that does internal comparisons- just like for a Set or Dictionary. If objects didn't have a default Equals/GetHashCode then all constructors for Dictionaries (And combo boxes) would take an IEqualityComparer argument.

hvidgaard · on Nov 15, 2016

It does complicate things quite a bit because WPF declares things in XAML, and giving constructor arguments is a major pain. You could make it a DependencyProperty, but it's still convoluted.

alkonaut · on Nov 17, 2016

I'm sure it's already possible and you could make the call default implicit so if you don't pass an equality comparer it just uses DefaultParer.Instance which does what the current default (object) comparer does.

emn13 · on Nov 15, 2016

.net could have made equality entirely optional; and it could have exposed machinery to implement equality (including all aspects of it!) by composition, not manual fiddling - at least not by default!

That would have made .net less memory-hungry, faster, less bugprone, and simpler.

There's absolutely no good reason for this horribly complicated and inefficient state of affairs; not all platforms/languages suffer from these limitations. As you point out, even F# manages to do better despite the considerable extra limitation of needing to maintain some sort of compatibility even at runtime with the current crazy state, and despite what is certainly a much more limited development budget.

hvidgaard · on Nov 15, 2016

In hindsight there is a lot they should have done differently. Immuteable by default and not nullable by default are two of the obvious. But they where targeting Java developers, so they couldn't just bring a completely different language to the market. They had to be a better Java.

tikhonj · on Nov 15, 2016

Equality in general is a surprisingly subtle subject. "When is one thing equal to some other thing?"[1] is a fun article about it.

Java's equals method has two core problems in my eyes: it forces every type to define equality (when some types really shouldn't) and it lets you compare any two values, even if they don't have compatible types. My Java is rusty, but I think an Eq<T> interface comparable to Comparable<T> would be a reasonable design:

    public interface Eq<T> {
      public boolean equals(T o);
    }

In particular, I don't mind the fact that there can be at most one equals per type: it's a sensible default 99% of the time. Most things that admit some sort of equality have an "obvious" default equality that makes sense; it's fine to make other comparison functions more explicit. Weird types with no obvious default equals method just shouldn't implement that interface. Worst case, you could just construct a wrapper that exposes a particular equals for that type.

jayd16 · on Nov 15, 2016

What is an example of a type that should not have an equality?

berdario · on Nov 15, 2016

The most obvious are lambdas (anything in this package, basically):

https://docs.oracle.com/javase/8/docs/api/java/util/function...

Iterable/Iterator in general might also be problematic, since they might yield an infinite sequence of values (trying to equate them would loop indefinitely... obviously you can write a concrete class that is guaranteed to iterate a finite number of times, but this means that you should then implement an Equality interface only for them)

Also, anything that is thus closely related to Streams, references to stuff in the outside world or things that aren't easily Serializable (and thus are probably not Value Objects), like File Handles/Sockets (you might compare them by reference, obviously)

Also, Float and Doubles should arguably not implement an Equality interface. Rust for example does not have them implement Eq, but rather PartialEq due to NaN.

Any Product (composite) or Sum type containing any such value might also not make sense to implement value equality

PS: I think that Tikhon forgot to link the document he mentioned, this should be it: http://abel.math.harvard.edu/~mazur/preprints/when_is_one.pd...

ygra · on Nov 15, 2016

While anonymous functions should really only have reference equality, I still think the Java designers fucked up in extending this to method references. this::foo != this::foo, which is a bit painful for registering event listeners when you want to remove them later again. For those things you have to store the reference in a field.

wyager · on Nov 15, 2016

I like the way Haskell does it; equality is provided via the "Eq" typeclass. The language supports auto-deriving "Eq" instances that do a straightforward structural and sub-structural equality check (e.g. if I have data Foo a b = Foo a b Int, then (Foo a1 b1 i1) == (Foo a2 b2 i2) = (a1 == a2) && (b1 == b2) && (i1 == i2)), but you can define it yourself it for types that have more representations than semantic values, like sets or maps or finger trees.

jstimpfle · on Nov 14, 2016

Multiple forms of equality: I think there should be just one "equality": the obvious structural one, which can be inferred by a compiler. What examples are there of cases where there is no obvious equality but still the official "equality" tag is needed?

EDIT: got downvoted, why? It's a serious question. I've never had any use for any equality besides the structural one. Honestly want to see if anyone can come up with a valid case.

Coincoin · on Nov 15, 2016

Is the angle -90 equal to 270?

Is the quotient 1/2 equal to 3/6

Is the string "Bébé" equal to "Bebe"?

Is the float 0.555555559 equal to 0.6?

Is the color RGB(1,1,1) equal to Lab(100,0,0)?

Is the equation y=3x+2 equal to y-2=3x?

Is the function x=>2+x equal to y=>2+y?

jstimpfle · on Nov 15, 2016

> Is the angle -90 equal to 270?

> Is the quotient 1/2 equal to 3/6?

I think these are good examples that come up in the real world a lot. The representation structures we use are not sufficient for precise specification. So we have to overspecify (making a superfluous choice in representation, -90, 270, 630?).

My opinion still is that data must be normalized with explicit procedure calls. Doing stuff implicitly is neither beneficient to efficiency nor to comprehensibility. Explicit normalization allows to leverage what the compiler _can_ infer.

The other questions I have to return because it's not even clear to me, "Should they be"?

dllthomas · on Nov 15, 2016

> My opinion still is that data must be normalized with explicit procedure calls. Doing stuff implicitly is neither beneficient to efficiency nor to comprehensibility.

It can certainly benefit comprehensibility. Needing to explicitly normalize before I can apply standard comparisons leaves more space for things to go wrong. "Having a value of this type means that I have a normalized value" means I have fewer things to keep track of. (I do note that there is significant space between "it can benefit comprehensibility" and "it will always benefit comprehensibility" - that depends on what I need to be readily able to pull out of the implementation from a quick read of surface syntax)

Also, some things may have an easy equality check while normalizing may be expensive or impossible - although I sadly can't think of any examples off the top of my head.

mbrock · on Nov 15, 2016

Whether they should be depends on your needs, so it's nice to be able to choose the equality semantics for your application's types.

cmrx64 · on Nov 15, 2016

Turns out, there are many more interesting equivalence relations on values than simply structural equality!

jstimpfle · on Nov 15, 2016

... that all need to be called .equals()? No - that's actually the best argument for structural equality, since that is the "minimal" equality.

cmrx64 · on Nov 15, 2016

I agree. Lots of language make it too hard to use custom equivalence or ordering relations.

noobermin · on Nov 15, 2016

>Is the equation y=3x+2 equal to y-2=3x?

Sorry, had to be pedantic. These equations are not equal under usual mathematical and logical definitions of equality applied to equations. They do possibly express the similar relationships between the variables.

But I agree with your general point. "Equals" has different definitions in different contexts.

dllthomas · on Nov 15, 2016

> These equations are not equal under usual mathematical and logical definitions of equality applied to equations. They do possibly express the similar relationships between the variables.

Could you elaborate? The second sentence in particular surprises me - don't these necessarily express exactly the same relationship between the variables?

Ntrails · on Nov 15, 2016

I'm not sure he's right at all. I'd define any equation "a" which can be re-arranged to equation "b" to be equal ie. a==b

They are the same thing. Thusly

    y=3x+2

is equal to

    10y-20=30x

is equal to

    5y-15x=10

etc

It's been a while since uni, but I don't remember any other mathematical definition of equations being considered equal. I guess the thought is something about what is being defined in terms of what (y= vs x=) but I don't think that's a meaningful distinction between two equations.

dllthomas · on Nov 15, 2016

> Is the angle -90 equal to 270?

This is trickier than it might seem. Is this a description of static geometry, or is this a delta that I'm going to want to interpolate/integrate/etc?

In the latter case, -90 is very different than 270, which is different than 630...

louthy · on Nov 15, 2016

    "ABC" == "abc"

Depending on the context, case insensitive equality checking may be required. Also precision on floating point value comparisons.

An interesting approach to this in C# (and other languages) is to use ad-hoc polymorphism:

    public interface Eq<A>
    {
        bool AreEqual(A x, A y);
    }

    public struct EqInt : Eq<int>
    {
        public bool AreEqual(int x, int y) => x == y;
    }

    public struct EqDbl : Eq<double>
    {
        public bool AreEqual(double x, double y) => Math.Abs(x - y) < 0.00001;
    }

    public struct EqStringExact : Eq<string>
    {
        public bool AreEqual(string x, string y) => x == y;
    }

    public struct EqStringIgnoreCase : Eq<String>
    {
        public bool AreEqual(string x, string y) =>
            String.Equals(x, y, StringComparison.CurrentCultureIgnoreCase);
    }

    public static class GenericCode
    {
        public static A UseFirstIfEqual<EQ, A>(A fst, A snd)
            where EQ : struct, Eq<A> =>
            default(EQ).AreEqual(fst, snd)
                ? fst
                : snd;
    }

    GenericCode.UseFirstIfEqual<EqInt, int>(10, 10); // 10
    GenericCode.UseFirstIfEqual<EqInt, int>(10, 20); // 20
    GenericCode.UseFirstIfEqual<EqDbl, int>(10.0, 10.0); // 10.0
    GenericCode.UseFirstIfEqual<EqDbl, int>(10.0, 20.0); // 20.0
    GenericCode.UseFirstIfEqual<EqStringExact, string>("ABC", "abc"); // "abc"
    GenericCode.UseFirstIfEqual<EqStringIgnoreCase, string>("ABC", "abc"); // "ABC"

It's slightly more novel than the interface approach, and allows for retrospective equality behaviours to be authored for a sealed type, and importantly, for the selection of the most appropriate equality operation for any particular context.

I don't have a compiler to hand, so apologies for any errors!

mcbits · on Nov 15, 2016

    new Rational(2, 3) == new Rational(4, 6)

You could implement Rationals to be structurally equal by simplifying them immediately on construction, but that may be a lot of unnecessary computation in some cases.

smnplk · on Nov 15, 2016

Clojure supports rational type and it does simplify them.

(def a (/ 1 2)) (def b (/ 8 16)) (= a b) true

jstimpfle · on Nov 15, 2016

With more context we could probably agree that it's not unnecessary computation if you want to do anything useful with the data.

mcbits · on Nov 15, 2016

In a formula with multiple Rationals, it wouldn't always be necessary to simplify them before using them. You could just simplify the result if/when it's compared to another result.

Also, the UI may need to display the unsimplified fraction. If you simplified them immediately, you'd have to store a factor somewhere to reconstruct the original (and not in the Rational class because that would ruin structural equality).

In other words, "equal" for I/O purposes and "equal" for mathematical purposes could be different.

jstimpfle · on Nov 15, 2016

How'd you compare the unsimplified number then? How would it be obvious at the calling site what happens? Better explicitly compare only the simplified part (again, structurally).

mcbits · on Nov 15, 2016

The most straightforward way would be to simplify during the equivalence check and maybe cache the value. Or maybe there is some trick to compare them without simplifying.

Point is that you can have a notion of "equal but not identical". No doubt it's also possible to represent rationals with only structural equality, but I'd have to see two competing implementations to judge which is more intuitive to use.

dllthomas · on Nov 15, 2016

> Or maybe there is some trick to compare them without simplifying.

a / b == c / d is equivalent to a * d = c * b, although whether that's a good approach depends on a lot of things.

hota_mazi · on Nov 15, 2016

Indeed. And you only have to do it once, you can then store the result (you should probably make that calculation lazily, though).

jstimpfle · on Nov 15, 2016

Lazily or not, that would be a global design decision (use a lazy language or not)

dllthomas · on Nov 15, 2016

I feel like all of the responses are missing most of your question. As I read you, you understand that there are other questions we might want to ask, but why not call them "compareRationals" or whatnot rather than giving them all the same name?

I think that it comes down to a question of interfaces/abstraction. Quite a few languages provide you interfaces for asking both questions, "are these structurally equivalent" and "are these practically equivalent" for whatever choice of "practical".

jstimpfle · on Nov 15, 2016

Thanks for putting it that way. I realize "equality" and "equivalence" gets confused a lot here. Equality is a special case of equivalence. It seems absurd to me to have more than one equality.

dllthomas · on Nov 17, 2016

> It seems absurd to me to have more than one equality.

Mathematics has several.

Probably the most useful for CS is the notion from logic of "extensional equality" - two things are equal if I can't tell them apart. But we quickly want to restrict what approaches we can use, or for most languages that would necessarily be reference equality. Structural equality is one such restriction.

For an abstract data type, there's a strong line of reasoning that would say two values should be equal if they aren't distinguishable through the abstraction.

tel · on Nov 15, 2016

Quotients are often used to describe new behavior. You could reasonably say that this ought to define a new type, but it's still an example of a meaningful coarsening of equality.

For example, to define rational numbers using only integers you take a rational to be a pair (n, d) but with a new equality such that (1, 2) and (2, 4) are equated.

Retra · on Nov 15, 2016

"Has the same hash value" is a very important set of equality relations which can depend on many factors.

jstimpfle · on Nov 15, 2016

IMO comparing hashes instead of just comparing would only make sense if it's semantical, i.e. explicit at the call site. Why not call equals explicitly on the hash values? Can you make a case for keeping it implicit?

Retra · on Nov 15, 2016

I don't know why implicit equality is relevant here. The idea is that you'd explicitly recognize that there are multiple forms of equality, and that the one you want depends almost entirely on the problem domain. So you shouldn't have any 'implicit' or 'preferred' equality. You'd always let the call site dictate which equality relation it needs.

You asked for a non-structural equality relation, and I gave you a whole class of them. There are many more. There isn't even one kind of 'structural equality': are two things equal if they have the same memory layout? The same AST representation? What if the layout depends on the allocator? What if they are identical but must be treated differently through aliases?

jstimpfle · on Nov 15, 2016

You agree it should be explicit. But notice that explicit hash equality is just hashing followed by structural equality.

Considering different abstraction levels like AST or memory layout: This is moving up and down the levels of abstraction. It's conventional wisdom that you shouldn't do this in a single context (like a function, module, or even a programming language). But it's a good point - powerful languages like C present all these levels at once. In other words, there is often an impedance mismatch. Structural equality might be not quite the same to the compiler and to the programmer.

Retra · on Nov 15, 2016

And structural equality is just a special (also arbitrary) kind of hash equality. You still have to specify what kind of structure you're comparing. The "structural equality" that you are referring to is vague not uniquely determined.

jstimpfle · on Nov 15, 2016

You made great points, and I've realized that the compiler/interpreter doesn't always know what's the part of the context that we are actually interested in. There's often too much context available and it's not always easy to throw away the information that we're not interested in. However when one does manage to make a function to a structural representation that contains exactly the interesting things, then equality is this function followed by structural equality.

jstimpfle · on Nov 15, 2016

Found a use case: performance optimizations. If there are functional dependencies that the compiler doesn't know, some fields don't need to be compared.

justinpombrio · on Nov 15, 2016

How would you want the compiler to compare graphs for equality? Graph isomorphism is a hard problem.

jstimpfle · on Nov 15, 2016

I seem to think of equality as something trivial. Granted that structural equality isn't graph isomorphism. But I don't see why something as specific and performance-sensitive as that deserves the tag "equality". Make a function graphsIsomorphic(), where's the problem?

Btw. in my practice (writing really dull code, and some algorithms) I've never seen the need for complex equality functions. In practice I test IDs - typically integers or short strings.

jstimpfle · on Nov 14, 2016

Fairly long post with a simple message: Data is the truth, it is useful to think about the values that live in the program, and about their relations. It is useful to think about what things are "objects" and thus need identities.

This is not obvious to OOP people. In OOP everything (each object) has implied identity: their address in memory. This is brutally wrong for most use cases in most programming domains. What usecase justifies new Integer(3) != new Integer(3)? Most languages support overriding an equality method as a partial fix, but that's far from elegant.

On the other hand it should be fairly obvious to people who like relational databases.

hughw · on Nov 15, 2016

Well put and spot on. The trouble with OO is it's not a data model. Objects aren't rows of a database table. Programmers mutate data fields of objects and enforce the consistency logic manually. As a thought experiment I've considered how you'd write, say, a window system in a logic language like SQL or Prolog. Which pixels are visible and which hidden by other windows? The answer is a join over entities like Window, Pixel, and maybe a ZOrder... something like that, you get the idea. All state transitions bounded by transactions. No way to throw your program into an inconsistent state.

lacampbell · on Nov 15, 2016

> This is not obvious to OOP people.

Citation needed!!

> In OOP everything (each object) has implied identity: their address in memory.

That's not an implied identity at all. That's a default identity. You're meant to override it. What sane default would you propose? Objects are not structs or records.

> This is brutally wrong for most use cases in most programming domains. What usecase justifies new Integer(3) != new Integer(3)?

What OOP language provides numeric literals that don't have a built in integer class, with an equality method that makes sense!? Complete strawman.

> Most languages support overriding an equality method as a partial fix, but that's far from elegant.

A partial fix? It is how you solve the problem with objects. I am not sure what other method you would propose.

jstimpfle · on Nov 15, 2016

It is an identity you can't get rid of, and which is often misused, or accidentally used, to create a lot of complexity. No, you can't override '==' in Java and other OO languages. You can override .equals() which defaults to ==.

> A partial fix? It is how you solve the problem with objects. I am not sure what other method you would propose.

You mentioned it yourself: structs or records, a.k.a value semantics. Here is a resource from people with more authority (esp. Guy Steele) http://cr.openjdk.java.net/~jrose/values/values-0.html

lacampbell · on Nov 16, 2016

> It is an identity you can't get rid of, and which is often misused, or accidentally used, to create a lot of complexity. No, you can't override '==' in Java and other OO languages. You can override .equals() which defaults to ==.

Using Java as an example to prove something about OO isn't very useful IMO. It'd be like holding up rust as an example for how functional languages do things - yes, it's fairly functional but hardly canonical.

In ruby, for example, you can indeed override ==.

> You mentioned it yourself: structs or records, a.k.a value semantics. Here is a resource from people with more authority (esp. Guy Steele) http://cr.openjdk.java.net/~jrose/values/values-0.html

I think you're conflating structural equality on the one hand, and reference semantics on the other. The two aren't related in real OO. For example, here's a class that defines equality the same way an ML record might:

  class Value
    def ==(other)
      if other.class != self.class then 
        return false
      end
      
      self.instance_variables.each do |v|
        if instance_variable_get(v) != other.instance_variable_get(v) then
          return false
        end
      end
      
      return true    
    end
  end

Then we can use it to define a simple point class:

  class Point < Value
    def initialize(x, y)
      @x, @y = x, y
    end
  end

And there you go - structural equality, but still passed as references.

  Point.new(0, 0) == Point.new(1, 2) => false
  Point.new(0, 0) == Point.new(0, 0) => true

kazinator · on Nov 14, 2016

TXR Lisp:

  This is the TXR Lisp interactive listener of TXR 157.
  Use the :quit command or type Ctrl-D on empty line to exit.
  1> (defstruct point nil
       x y
       (:method equal (me) (list me.x me.y)))
  #<struct-type point>
  2> (equal (new point x 0 y 0) (new point x 1 y 1))
  nil
  3> (equal (new point x 0 y 0) (new point x 0 y 0))
  t
  4> (hash-equal (new point x 0 y 0))
  536869888
  5> (hash-equal (new point x 0 y 0))
  536869888
  6> (hash-equal (list 0 0))
  536869888

See what I did there? When designing this object system, I realized that rather than a binary equal method which takes this object and that, what we need is a unary equal method. When an object must be compared or hashed for equality under the equal function, and it has the equal method, then that method is called to retrieve a representative object. That object is then used in place of the original object. The method just has to return something which supports equal directly.

This feature is called equality substitution.

Doc link: http://www.nongnu.org/txr/txr-manpage.html#N-00790C76

jstimpfle · on Nov 14, 2016

That will be terribly inefficient for asserting inequality of large objects.

kazinator · on Nov 14, 2016

So: don't write huge objects---it's a code smelly anti-pattern anyway; or don't make every single member of a large object participate in equality; or else, cache the equality substitution and re-compute it if the object is modified.

I'm adding a dirty flag to the object system in the next release. If any slot is modified, the object will be marked dirty; an API will be provided for testing and clearing the dirty flag. This will make it a cinch to write the above point class so that it returns the same list if the object is clean. Also, when the object is dirty and the representative list must be recomputed, the cons cells of the old list can be re-used; we don't have to allocate a new list.

Most of the time, you don't want that anyway; objects should be immutable as much as possible. If you have objects in a hash table, you don't want to fiddle with them in ways that affect their equality.

Speaking of which, this is one of the ways in which the equality substitution is used in real code. Certain slots of the objects are treated as immutable and the equality substitution is based on those slots. The objects are put into hash tables based on that equality. Yet they have other slots treated as mutable. Those don't count toward equality. If they did, the hashing would be in trouble, obviously.

jstimpfle · on Nov 14, 2016

> So: don't write huge objects---it's a code smelly anti-pattern anyway

I tend to agree. But even for structures of only say, 4, primitives, it's still faster to only compare two primitives in the common case, instead of hashing 8 and comparing two hashes.

In the same way it could also be argued that needing hash functions for testing equality is a code smell. Equality makes only sense for small clean data structures of primitives. Equality for those should be inferred from the compiler by the primitives' equalities.

Also, how can hash collisions be avoided?

Traubenfuchs · on Nov 15, 2016

Please avoid hurting Javas reputation by using outdated code.

1. Since Java 8 there are LocalDateTime/ZonedDateTime/etc. classes. They are immutable. LocalDateTime even works with Hibernate.

2. date.setDate(int date) is deprecated for nearly 20 years. The methods Javadoc clearly states that you are changing the object. The method even returns void. Don't use it. Why are you still using it?

3. If you still want to use the less comfortable Date and Calendar, they work fine if you use them as you are supposed to.

talideon · on Nov 15, 2016

He does actually refer to this, but he could be more explicit on point (2) rather than referring to it obliquely.

GuiA · on Nov 14, 2016

> In many situations using references rather than values makes sense. If I'm loading and manipulating a bunch of sales orders, it makes sense to load each order into a single place. If I then need to see if the Alice's latest order is in the next delivery, I can take the memory reference, or identity, of Alice's order and see if that reference is in the list of orders in the delivery.

Presumably, an order would consist of much more than a list of items (order #, delivery address, etc.), including a unique one (order #), so comparing by value here would still work?

If you're properly building your hierarchies, I'm not sure I see cases where comparing by reference would logically be desirable (of course, in the real world you probably want to still compare by reference in cases where performance might matter, or you're doing lower level work).

I guess this isn't disagreeing with the author's main point - that Value Objects are desirable - but pushing it further: Value Objects are desirable the vast majority of the time.

The second part, regarding immutability, I'm fully onboard with.

chrismorgan · on Nov 15, 2016

I like Rust.

`==` is implemented by types as they desire and as makes sense, via the PartialEq trait (and the type can mark whether it’s a partial equality or a total equality by whether Eq is also implemented).

Because of Rust’s strong ownership model, the whole referential equality matter actually becomes comparatively irrelevant, though if you actually need it you can cast to raw pointers as compare those (`x as * const _ == y as *const _`).

This strong ownership model also means that the whole aliasing problem becomes irrelevant also: it’s obvious from the code where you alias things, and you can only do so in a memory-safe way.

Really, this whole article becomes delightfully irrelevant for Rust.

verytrivial · on Nov 14, 2016

I have always used "value" vs "identity" to distinguish these classes. e.g. "Two men called John" (value, equal, no identity) "That man called John, and that man called John." (with identity, so "non-identical", buy may have the same value is some aspect). Every comparison must first be distinguished by whether you are comparing identity or some value of the object. With that distinction made, the arguments about referential integrity (and when it matters) is easier to digest. Pretty obvious I guess, but coming from C++, it is amazing how many people fumble through without having this distinction clear.

jstimpfle · on Nov 15, 2016

I've realized that there's seldom an obvious "identity", because identity is never absolute, but needs context ("semantics"). Mathematically identity is just a function (of some key data to more data) but in practice part of the key is implied (in the program, runtime, database connection, external policies, whatever).

For example, conventional URLs are often frowned upon because they don't "exactly" identify a document. Instead URLs involving hashsums, GUIDs etc. are proposed. I think that's a bad idea because while the hash sum is a quite precise identity, it's actually often too precise. I want to be able to edit a web document without having to create a new identity. When someone surfs the URL and gets the latest "version" that's fine. We could consider all versions "identical" for our purposes. Or alternatively we could say that some of the identity was implied in the context, not the URL (the "get the newest" part).

verytrivial · on Nov 15, 2016

Hmm .. I think you might have that the wrong way around. URLs do exactly identify a document in the sense that the document that is returned is the document in question. That you might get a different document each time is a different matter and relates, again, to value, not identity. (Then there's idempotency which is a useful concept that only appears to causes eyeballs to roll. Even the top in in Google is snarky.) And identity and value are both arbitrary -- John today is John tomorrow, but even that will also cause philosophers to twitch. I guess it not that surprising that programming languages and the like encode the same confusion!

jstimpfle · on Nov 15, 2016

The confusion only goes to show that there is no one obvious definition. That's the beauty of the relational model. John is John, but what it can identify is context-dependent (a string, a human in a group of humans, a human in a group of humans in time?).

mbrock · on Nov 15, 2016

URLs are like mutable cells in a namespace. Content-based address systems usually have separate solutions for named references, like branches in Git or the various naming schemes for IPFS etc.

sly010 · on Nov 15, 2016

Other good examples of value objects are:

- EmailAddress

- PhoneNumber

- FullName

- Address

Perhaps it's just my domain, but I find myself very often defining equality to various degrees for these concepts.

In fact I create value classes for any value that can have multiple valid encodings (e.g. keys and hashes in crypto, dollar value, etc). It makes it very easy to have all encoding/decoding code in one obvious place (in or near the object code). Encoding and decoding can then happen at the edges, and the core of the logic becomes much more readable ... but this gets into the Domain Object territory.

wwwigham · on Nov 14, 2016

Let me attempt to improve upon Fowler's ValueObject pattern in JS, and tell you the upsides:

    function Point(x, y) {
        const hash = Point.__hash(x, y);
        const cached = Point.__cache.get(hash);
        if (cached) return cached;
        const newPoint = {get x() { return x; }, get y() { return y; } };
        Point.__cache.set(hash, newPoint);
        return newPoint;
    }
    Point.__hash = function(ptx, pty) {
        return `${ptx}|${pty}`;
    }
    Point.__cache = new WeakMap();

This does what you would expect a ValueObject (or an algebraic type or struct depending on the languages you normally use) to do in another non-JS language - it interns all equivalent objects so that simple reference equality is sufficient to determine equality - rendering the nonstandard "equals" method unneeded. This also solves all the issues with ".includes" and so forth, again, because the built in reference equality is sufficient to determine equality. This also has much better memory characteristics in a system where many equivalent objects are created, as only one copy is ever stored in memory. The only drawback is the small overhead of the WeakMap used to cache all the references and the overhead of "hashing" at object creation time - neither of which should be noticeable in most applications, and the memory benefits should outweigh these concerns in most performance applications regardless.

phpnode · on Nov 14, 2016

Unfortunately your example won't work. You can't use a primitive value as a key in a weak map, which negates many of their possible use cases.

wwwigham · on Nov 14, 2016

You're right, I hadn't realized that the spec forbade primitives as WeakMap keys - so while to approach should actually work in other languages, in JS it would leak memory as you'd be forced to use a normal map (or clean up after yourself - ich).

tylorr · on Nov 14, 2016

Yeah, weak map have weak references to the keys, not the values.

jstimpfle · on Nov 14, 2016

No. Non-reference plain old structs are typically more efficient for most operations, up to a certain size (how large, 16, 32, 64 bytes?).

Objects have a certain overhead, for example each reference uses 8 bytes, and they typically come with a vtable.

Also cache efficiency is very important these days. Sharing is detrimental unless memory consumption is an issue.

And let's not think about GC overhead. A point class holding two integers is a really bad idea when you have millions of points.

Conversely, as objects get larger, it's less likely for two objects to be equal. Interning is just a bad idea for most cases.

dmalvarado · on Nov 14, 2016

> const p1 = {x: 2, y: 3};

> const p2 = {x: 2, y: 3};

> assert.notEqual(p1,p2); // NOT what I want

> Sadly that test passes. It does so because JavaScript tests equality for js objects by looking at their references, ignoring the values they contain.

What? Has 'assert' been standardized?

Shouldn't it read: "It does so because my assert function tests equality for js objects by looking at their references, ignoring the values they contain."

mcbits · on Nov 14, 2016

I think his point would be the same (and maybe clearer) if he said assert.isFalse(p1 == p2).

I.e. he's not concerned about the testing semantics, but JavaScript's equality semantics which we're expected to assume the test function is using.

hprotagonist · on Nov 14, 2016

this sounds suspiciously like "labeled products are good", which seems fairly intuitive.

easy-ish to do in statically typed languages, but something like attrs (https://attrs.readthedocs.io/en/stable/ ) or just namedtuples will do this in python.

bertan · on Nov 15, 2016

One of the reasons I love Golang is that comparison of structs is among their values, not their references.

To do so, compiler complains if you have circular struct type dependencies, but I think it is OK.

ronreiter · on Nov 14, 2016

Reminds me of Python articles from 2005

maxxxxx · on Nov 14, 2016

Does anybody get anything from his writing? I have seen several posts of his here and they all are either trivial and/or give a fancy name to some standard coding construct. I think what he is describing here is the advantages of immutability which anyone who has heard only a little about FP knows already.

peeters · on Nov 14, 2016

Sure, I find his articles are often "duh" to an experienced programmer. But on the other hand, it's a great resource to be able to point junior devs to. Instead of say explaining why value objects should usually be immutable, you can give a brief summary and then say "read this".

I find it really similar to stuff in Effective Java. Sure there's nothing too revolutionary in that book, but in the aggregate it makes up my best coding practices, and when somebody questions one small piece of those coding practices, I can send them to a brief, to-the-point item in Effective Java.

jdlshore · on Nov 14, 2016

This isn't about immutability, this is about the power of introducing value objects to a system. (Immutability is just a property of good value objects.)

Most code I read does not use value objects. It's more common for me to see primitive obsession [1], data clumps [2], arbitrary structs, or the "struct + service" antipattern (looking at you, Angular 1). Granted, I'm not working with FP languages.

When you migrate code to use value objects, then follow up by moving associated behavior into the value object, there are significant knock-on benefits to your design.

Value objects may be simple, but they're not obvious. If they were, more code would use them.

[1] http://www.jamesshore.com/Blog/PrimitiveObsession.html

[2] http://www.martinfowler.com/bliki/DataClump.html

Silhouette · on Nov 14, 2016

Value objects may be simple, but they're not obvious. If they were, more code would use them.

To the extent that the concept is relevant, the difference is obvious unless you only ever use languages that are ambiguous about reference/value semantics.

If you use a language like C or C++ or even assembly, where pointers and taking an address are explicit and fundamental concepts, there's mostly no problem. Your data is a value by default, and if you want a reference (pointer) instead, you just make one, explicitly. The exception is arrays, and in not entirely unrelated news, array semantics in C and C++ are a significant source of bugs.

If you use a functional programming language, again data is typically immutable unless you explicitly use some sort of reference type, so again no problem.

I have long maintained that one of the worst of Java's many design flaws was using the exact same syntax to represent two distinct but related concepts, depending on whether you were referring to a primitive or a class-based type. At least with the static typing you can see which you've got, though.

I can't fathom why you'd deliberately design a language to work like JS or Python, where not only can the exact same syntax mean two completely different things but which you get depends on dynamic typing information that you can't even see until runtime, or even on the exact values you're using rather than their types, in a case like Python's handling of small integers.

I shudder to think how much productivity has been lost by now to bugs caused by mixing up .equals() with == in Java, == with === in JS, and == with `is` in Python.

lorddoig · on Nov 14, 2016

> Immutability is just a property of good value objects.

It sounds rather like value objects are strictly superior than immutable data structures, based on this statement. I'd love to hear an expansion of this rationale.

int_19h · on Nov 14, 2016

It's actually the other way around. If your object is immutable, whether it has identity or not (i.e. whether it's a "value object" or not) doesn't matter! There's nothing useful you could derive from that identity, so it might as well not exist.

So, languages should stop thinking in terms of "this is a reference type" and "this is a value type", and start thinking about "this references something mutable" vs "this references something immutable". For the latter, the implementation can then use by-value semantics for perf reasons, where appropriate.

comex · on Nov 15, 2016

Not sure what you mean. Identity comparisons are more useful for mutable objects, but even an immutable object can use one as an alternative to a unique ID. For example, you might have a queue of input events, where once an event is posted nothing about it needs to change, but multiple events with the same properties are possible (e.g. pressing the same key twice in a row).

On the other hand, mutable objects with value semantics can provide the ergonomic advantage that mutation has in some cases (e.g. 'point.y += 10' rather than 'point = Point(point.x, point.y + 10)' as well as more predictable performance, while avoiding bugs caused by accidental aliasing.

int_19h · on Nov 15, 2016

> Identity comparisons are more useful for mutable objects, but even an immutable object can use one as an alternative to a unique ID.

I think that's part of a problem. Object identity should not be used for unique IDs - when you need a unique ID, use a UniqueIdGenerator or something like that.

Java and C# and others have got this so very wrong, when they did things like, "okay, all objects have identity, let's just reuse it for other stuff", and made it possible to e.g. synchronize on arbitrary objects - synchronize(x) in Java, lock(x) in C#. Now they can't get rid of object identity, because it's part of the language semantics - even if you never synchronize on objects of your class, something else might, and making them identity-less will break that.

At the very least, let's make it explicit. When you define a class, it should be stated upfront whether it has identity or not. If it doesn't have identity, it should be immutable. If it does have identity, it still can be immutable if you want (if you need that identity for something, like your event example). But this state of affairs - immutable with identity - definitely shouldn't be normal. If it's needed, it should be requested explicitly.

I guess the real point here is that object identity has a heavier cost than it would seem from the first glance, and so it should be opt-in rather than opt-out (and it should definitely be possible to opt out).

As for `point.x += 10`, it doesn't preclude point from being immutable. It just means that the language has to desugar it into `point = Point(point.x, point.y + 10)` for you. That way, there's no accidental aliasing (since objects are still immutable - it's the reference that is mutating - so no alias can observe the change). And then the code generator, knowing that there's no aliasing, can replace it with actual in-place field update, when and where it makes sense.

louthy · on Nov 15, 2016

> point.y += 10

How is that having value semantics? Or are you just referring to equality testing? Even then you're playing a dangerous game if you have multiple threads (one mutating, one running an equality check) and aren't using locks.

encoderer · on Nov 14, 2016

An immutable value object provides a richer contract. An immutable data structure like a tuple is going to provide a guarantee the data hasn't changed, but it doesn't provide a guarantee of what the data is.

mattmanser · on Nov 14, 2016

His book on refactoring was one of the most revelatory programming books I've ever read, and was a major influence on how I program. I'm sure if I re-read it today I'd raise a few eyebrows at sections, but it's 17 years old now. I've no idea how good it is now, given how much we've moved on, but it was very good back in the day.

He's had some real hits.

But I must admit, I find this article a bit silly. He's had some stinkers too. Worse still, this is an obvious problem because the solution's been in other languages for decades, overloading equivalence, it's just not in javascript. It's just not a common programming problem outside of certain domains.

And, he knows it.

So I really do not understand the point of that post. There was a time when every single one of the "learn .Net 1.0 in 28 days" books would have at least a paragraph about overloading Equals and GetHashCode. This is from 2005, for example, and even uses the same example:

https://msdn.microsoft.com/ru-ru/library/ms173147(v=vs.80).a...

The blog post seems a decade too late.

wwwigham · on Nov 14, 2016

IMO, The issue with this article is that it conflated a desire for immutable record types (as you'd see in a functional language) with something which requires equivalence overloading in a language which does not support them (as you'd find in a heavy OO language). I think while he presented _A_ solution to the specific issue he encountered (wanting to compare algebraic types as cohesive units in Java), it is not the _best_ general solution to the issue - in fact, it's really not the best solution in any language without built-in record types, as it can lead to costly hidden comparison times and unneeded memory growth (which most design patterns I would say should discourage this kind of hidden performance issue). This is a data structure problem which should be attacked much like how a compiler implementer would attack it - by being lazy and doing at little work as possible. Instead of making similar objects appear equivalent - make identical objects be references to the same object! Since we already decided to let records be immutable, it actually turns out quite nicely in JS, as I have written elsewhere - and I'm confident that the same cached-reference/object interning strategy could be used in Java, too.

jowiar · on Nov 14, 2016

Key points in here:

- Thinking differently about Values vs. References (which you may or may not be used to depending on your language)

- Mutable "values" are dangerous (and it might be worth thinking about what constitutes a "value")

- Translating incoming low-level numbers and strings into proper "values" helps avoid errors.

gutnor · on Nov 14, 2016

He has been doing a bit of summarising both sides of an argument too, more as a sort of not to biased Software Development journalist.

That can be useful, maybe not when you are in the thick of it, but for newcomer or people that have missed the great ideological battles of the last 6 months.

bshimmin · on Nov 15, 2016

I felt like this quite strongly about his "Collection Pipeline" article (http://martinfowler.com/articles/collection-pipeline/) - he's definitely guilty of fancy name syndrome in that one (I think "Value Object" is a fairly well-known term, though - certainly one I was using a decade ago).

On the other hand, both this article and the "Collection Pipeline" one contain a lot of useful information, concisely and informatively expressed, which would be beneficial to a junior developer as a general introduction to the subject.

hughw · on Nov 15, 2016

As someone told me in college as we watched some Saturday morning cartoons: They don't need to make new cartoons because they keep making new kids.

nickbauman · on Nov 14, 2016

You would be surprised how many people get ValueObject wrong. Most of the time the wrongness is due to how value objects are implemented as such. The concept of instances that are consistent with equals (immutability) comes later in most people's career. It doesn't help that hugely popular languages like Java seem to push people away from immutability.

hota_mazi · on Nov 14, 2016

Agreed. This post is the kind of centithread we were having on Java mailing-lists fifteen years ago, talking about EJB's, DTO, the dangers of mutability, and the like. It's all established knowledge. Every programmer worth their salt understands the difference between value and reference equality and is proficient in how their language of choice supports these concepts.

Also, for perceived thought leaders, all these people have completely failed to update their knowledge with modern technologies and languages. All you see from the Fowler and Uncle Bob of the world is Java and Javascript. How about some Rust, ELM, Kotlin, which encompass the more modern trends of programming and which address the kinds of problems described in this article in one liners?

Also, I still chuckle at the fact that Fowler's web site (which he developed himself and called a "bliki", blog + wiki) is still incapable of supporting article titles with spaces in them, hence "ValueObject". That's not exactly a resounding endorsement of software engineering abilities.

icebraining · on Nov 14, 2016

Also, I still chuckle at the fact that Fowler's web site (which he developed himself and called a "bliki", blog + wiki) is still incapable of supporting article titles with spaces in them, hence "ValueObject". That's not exactly a resounding endorsement of software engineering abilities.

http://www.martinfowler.com/bliki/BimodalIT.html

swanson · on Nov 14, 2016

A concise explanation of a pattern that developers of newly every level can understand and link to? Nope, no value there!

/s

maxxxxx · on Nov 14, 2016

It just seems that this has been discussed to death already. The only new thing is that he calls it "ValueObject". I already dread people talking about "ValueObjects" and I will have no idea what this is. I thought "immutable objects" is the standard term.

macspoofing · on Nov 14, 2016

>The only new thing is that he calls it "ValueObject".

That's not a new name. It's been around forever.

nightski · on Nov 14, 2016

It makes more sense in the context of domain driven design (ddd) but I don't believe he references this at all in the article. But based on his other articles I think that is what he is getting at.