Hacker Timesnew | past | comments | ask | show | jobs | submit | lexi-lambda's commentslogin

I can say that I am not particularly concerned with compile-time range checking. I agree with you that it is a massive headache that is almost always a huge waste of time. Even in dependently-typed languages, tracking ranges and bounds ends up requiring an incredible amount of bookkeeping that definitely does not seem worth the effort in the vast majority of application code.

When I wrote this blog post, I used a very simple datatype because it was an extraordinarily simple example, but given many of the comments here, it seems it may have been too simple (and thus too contrived). It is only an illustration; don’t read into it too much.


> In Java, you'd implement this by making a class with a private constructor, no mutator methods, and a static factory method that throws an exception if the parsing fails.

This is similar, and is indeed quite useful in many cases, but it’s not quite the same. I explained why in this comment: https://qht.co/item?id=35059886 (The comment is talking about TypeScript, but really everything there also applies to Java.)


Thanks for the reply! I wasn't at all expecting one from you.

If I'm understanding the difference correctly, it's that the constructive data modeling approach can be proven entirely in the type system without any trust in the library code, while the Java approach I recommended depends on there being no other way to construct an instance of the class, which can be tricky to guarantee. Is that accurate?


Yes, that’s about right. But really do read the followup blog post (https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...), as it explains that in much more depth! In particular, it says:

> To some readers, these pitfalls may seem obvious, but safety holes of this sort are remarkably common in practice. This is especially true for datatypes with more sophisticated invariants, as it may not be easy to determine whether the invariants are actually upheld by the module’s implementation. Proper use of this technique demands caution and care:

> * All invariants must be made clear to maintainers of the trusted module. For simple types, such as NonEmpty, the invariant is self-evident, but for more sophisticated types, comments are not optional.

> * Every change to the trusted module must be carefully audited to ensure it does not somehow weaken the desired invariants.

> * Discipline is needed to resist the temptation to add unsafe trapdoors that allow compromising the invariants if used incorrectly.

> * Periodic refactoring may be needed to ensure the trusted surface area remains small. It is all too easy for the responsibility of the trusted module to accumulate over time, dramatically increasing the likelihood of some subtle interaction causing an invariant violation.

> In contrast, datatypes that are correct by construction suffer none of these problems. The invariant cannot be violated without changing the datatype definition itself, which has rippling effects throughout the rest of the program to make the consequences immediately clear. Discipline on the part of the programmer is unnecessary, as the typechecker enforces the invariants automatically. There is no “trusted code” for such datatypes, since all parts of the program are equally beholden to the datatype-mandated constraints.

They are both quite useful techniques, but it’s important to understand what you’re getting (and, perhaps more importantly, what you’re not).


There is really no difference between doing this and returning a `Maybe`, which is the standard Haskell pattern, except that the `Maybe` result also allows the result to be structurally different rather than simply a refinement of the input type. In a sense, the TypeScript approach is a convenience feature that allows you to write a validation function that returns `Bool`, which normally erases the gained information, yet still preserve the information in the type system.

This is quite nice in situations where the type system already supports the refinement in question (which is true for this NonEmpty example), but it stops working as soon as you need to do something more complicated. I think sometimes programmers using languages where the TS-style approach is idiomatic can get a little hung up on that, since in those cases, they are more likely to blame the type system for being “insufficiently powerful” when in fact it’s just that the convenience feature isn’t sufficient in that particular case. I presented an example of one such situation in this followup blog post: https://lexi-lambda.github.io/blog/2020/08/13/types-as-axiom...


Hi lexi.

Just wanted to say that fp-ts (now effect-ts, a ZIO port to TypeScript) author Giulio Canti is a great fan of your "parse don't validate" article. He's linked it many times in the TypeScript and functional programming channels (such as the fp slack).

Needless to say, both fp-ts-derived io-ts library and effect-ts library schema[1] are obviously quite advanced parsers (and in case of schema, there's decoding, encoding, APIs, guard, arbitrary and many other nice things I haven't seen in any functional language).

[1]https://github.com/Effect-TS/schema


> As a result I have not found this article a good one to share with junior developers to help them understand how to design types to capture the notion of validity, and to replace validation with narrowing type conversions (which amount to ‘parsing’ when the original type is something very loose like a string, a JSON blob, or a dictionary).

This is sort of true. It is a good technique, but it is a different technique. I went into how it is different in quite some detail in this followup blog post: https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...

I think a common belief among programmers is that the true constructive modeling approach presented in the first blog post is not practical in languages that aren’t Haskell, so they do the “smart constructor” approach discussed in the link above instead. However, I think that isn’t actually true, it’s just a difference in how respective communities think about their type systems. In fact, you can definitely do constructive data modeling in other type systems, and I gave some examples using TypeScript in this blog post: https://lexi-lambda.github.io/blog/2020/08/13/types-as-axiom...


Thanks for responding - just to reiterate, I am a big fan of this original post, and indeed your other writing - my only critique here is that I'm looking for ways to make the insights in them more transparent to, particularly, people who aren't well-positioned to analogize how to apply Haskell concepts to other languages.

I see you read 'narrowing type conversions' rather literally in my statement - that might be my making my own analogy that doesn't go over very well. I literally mean using 'constructive modeled types' is a way to create true type-narrowing conversions, in the sense that a 'nonempty list' is a narrower type than 'list', or 'one to five' is a narrower type than 'int'.


You know what, you’re right—I misread your original comment. I was just going through this thread and replying to a number of comments making that particular misconception, since it is particularly common, but upon taking a closer look, you were saying something else. I apologize!

As for the difficulty in applying these ideas in other languages, I am sympathetic. The problem I always run into is that there is necessarily a tension between (a) presentations that are accessible to working programmers, (b) explanations that distill the essential ideas so they aren’t coupled to particular languages or language features, and (c) examples small enough to be clarifying and to fit in a blog post. Haskell is certainly not the best choice along that first axis, but it is quite exceptionally good along the second two.

For a somewhat concrete example of what I mean, see this comment I wrote a few years ago that translates the NonEmpty example into Java: https://qht.co/item?id=21478322 I think the added verbosity and added machinery really does detract significantly from understanding. Meanwhile, a TypeScript translation would make a definition like this one quite tempting:

    type NonEmpty<T> = [T, ...T[]]
However, I find this actually obscures application of the technique because it doesn’t scale to more complex examples (for the reasons I discussed at quite some length in https://lexi-lambda.github.io/blog/2020/08/13/types-as-axiom...).

There are probably ways to thread this needle, but I don’t think any one “solution” is by any means obviously the best. I think the ways that other people have adapted the ideas to their respective ecosystems is probably a decent compromise.


> If you are trying to mix NonEmpty and AllEven and AllGreaterThan100 for the List example, then you would get the combinatorial explosion of types.

This is overthinking it. Usually, when people are not used to doing constructive data modeling, they get caught up on this idea that they need to have a datatype that represents their data in some canonical representation. If you need a type that represents an even number, then clearly you must define a type that is an ordinary integer, but rules out all odd numbers, right?

Except you don’t have to do that! If you need a number to always be even (for some reason), that suggests you are storing the wrong thing. Instead, store half that number (e.g. store a radius instead of a diameter). Now all integers are legal values, and you don’t need a separate type. Similarly, if you want to store an even number greater than 100, then use a natural number type (i.e. a type that only allows non-negative integers; Haskell calls this type `Natural`) and store half that number minus 102. This means that, for example 0 represents 102, 1 represents 104, 2 represents 106, 3 represents 108, etc.

If you think this way, then there is no need to introduce a million new types for every little concept. You’re just distilling out the information you actually need. Of course, if this turns out to be a really important concept in your domain, then you can always add a wrapper type to make the distinction more formal:

    newtype EvenGreaterThan100 = EvenGreaterThan100 Natural

    evenGreaterThan100ToInteger :: EvenGreaterThan100 -> Integer
    evenGreaterThan100ToInteger (EvenGreaterThan100 n) = (toInteger n * 2) + 102

    integerToEvenGreaterThan100 :: Integer -> Maybe EvenGreaterThan100
    integerToEvenGreaterThan100 n
      | n < 100 = Nothing
      | otherwise = case n `quotRem` 2 of
          (q, 0) -> Just (EvenGreaterThan100 q)
          (_, _) -> Nothing
Of course, this type seems completely ridiculous like this, and it is. But that’s because no real program needs “an even number greater than one hundred”. That’s just a random bag of arbitrary constraints! A real type would correspond to a domain concept, which would have a more useful name and a more useful API, anyway.

I wrote a followup blog post here that goes into more detail about this style of data modeling, with a few more examples: https://lexi-lambda.github.io/blog/2020/08/13/types-as-axiom...


What if you want it to only be less that 100 though? Not everything is so easily expressable the way you're saying.


You can go as far (or as short) as the application warrants. The more static evidence you want, the more cumbersome it's going to become. That seems sort of a natural progression, but the the point is that you get to choose the trade-off. It's often easier to prove a special case than a more general case.

(Certainly, Haskell is probably not the most concise language for this kind of thing. LiquidHaskell adds interesting proof capabilities wrt. arithmetic.)

Regardless, even just parsing at the boundary and using an opaque type

    MySuperSpecialInt = Int @Even Min(2) Max(100)
(or whatever syntax you want) is still better than just using Int. At least you'll know that you'll always be handed a value that's in range (post-parsing).


Yes, certainly. Constructive data modeling is useful for many things, but it’s not a panacea. Other techniques are useful in cases where it’s not practical; I discuss some of the tradeoffs in this followup post: https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...


Certainly I don’t think `parseNonEmpty` would be especially useful in a real program, it’s only there as an example to provide a particularly simple contrasting example against `validateNonEmpty`. The example earlier in the blog post using the `nonEmpty` function (which returns an optional result) is a more realistic example of how such things are actually used in practice, since that allows you to raise a domain-appropriate error message.

Tangentially, in Haskell specifically, I have actually written a library specifically designed for checking the structure of input data and raising useful error messages, which is somewhat ironically named `monad-validate` (https://hackage.haskell.org/package/monad-validate). But it has that name because similar types have historically been named `Validation` within the Haskell community; using the library properly involves doing “parsing” in the way this blog post advocates.


You are sort of mistaken. I wrote a followup blog post that discusses what you are describing at some length: https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...

However, TypeScript does not really provide any facility for nominal types, which in my opinion is something of a failure of the language, especially considering that it is at odds with the semantics of `class` and `instanceof` in dynamically-typed JavaScript (which have generative semantics). Other statically typed languages generally provide some form of nominal typing, even gradually typed ones. Flow even provided nominal types in JavaScript! But TypeScript is generally also quite unsound (https://twitter.com/lexi_lambda/status/1621973087192236038), so the type system doesn’t really provide any guarantees, anyway.

That said, TypeScript programmers have developed a way to emulate nominal typing using “brands”, which does allow you to obtain some of these benefits within the limitations of TS’s type system. You can search for “TypeScript branded types” to find some explanations.


This is fantastic, thank you very much!

When I wrote GP I had in mind branding as the "right" way to get those benefits - though I was unaware of the name - however I see that it is still limited by TS' compiler's limitations.

So then going back to the initial snippets, my issue is that the Prime type is essentially behaving like newtype, thus the inner calls can not actually rely on the value actually being prime, yes?

I have to admit that quite a few of the things in the blog are beyond my current understanding. Do you have any recommended reading for post grads with rudimentary understanding of Haskell who would like to get deeper into type systems?


Haskell’s `newtype` keyword defines a genuinely fresh (nominal) type that is distinct from all other types. There is no direct analogue in TypeScript, but using branded types would be the closest you could get. That’s still not quite the same because TypeScript doesn’t really allow the same strong encapsulation guarantees that Haskell provides (which, to be clear, many other languages provide as well!), but it’s a decent approximation.

The problem with your `Prime` type is that it is just a type alias: a new way to refer to the exact same type. It’s totally interchangeable with `number`, so any `number` is necessarily also a `Prime`… which is obviously not very helpful. (As it happens, the Haskell equivalent of that would be basically identical, since Haskell also uses the `type` keyword to declare a type alias.)

As for recommended reading, it depends on what you’d like to know, really. There are lots of different perspectives on type systems, and there’s certainly a lot of stuff you can learn if you want to! But I think most working programmers probably don’t benefit terribly much from the theory (though it can certainly be interesting if you’re into that sort of thing). Perhaps you could tell me which things you specifically find difficult to understand? That would make it easier for me to provide suggestions, and it would also be useful to me, as I do try to make my blog posts as accessible as possible!


I'd be interested in more about Generic as that section went over my head though it's not an issue with your blog but rather inept knowledge on my part.

I find it quite interesting though I never had the time to study it further until now, so any recommendations are appreciated!


Generic is quite specific to Haskell, so it is probably difficult to explain without a little more understanding of Haskell-like type systems. (Rust has some similar capabilities, so that would help, too.) I wouldn’t worry about it too much, though; it doesn’t contain any particularly deep knowledge about type systems in general.


Okay, is there like a book or some other resource besides your awesome blog that you'd recommend for people looking to get into this some more?


Well, like I said, the subject is extremely broad, so it is difficult to give concrete suggestions without knowing what specifically you’d like to get into. But I can give some potential options.

If you’d like to learn Haskell, I think https://www.cis.upenn.edu/~cis1940/spring13/ is still a pretty nice resource. It is quick and to the point, and it provides some exercises to work through. There are lots of things in the Haskell ecosystem that you could explore if you wanted to after getting a handle on the basics.

If you want to learn about programming languages and type systems, you could read Programming Languages: Application and Interpretation (https://cs.brown.edu/courses/cs173/2012/book/), which has a chapter on type systems. Alternatively, if you want a more thorough treatment of type systems, you could read Types and Programming Languages by Benjamin Pierce. However, both PLAI and TAPL are textbooks, and they are primarily intended to be used as supporting material in a university course with an instructor. I think PLAI is relatively accessible, but TAPL is more likely to be a challenge without some existing background in programming languages.


The textbooks are exactly what I needed! Thank you!


There is a fairly obvious difference between a dynamic enforcement mechanism like contracts or SQL constraints. Though I think it is a bit silly to suggest that I have “never even considered” such things given the blog post itself is rendered using Racket, a dynamically-typed language with a fairly sophisticated higher-order contract system.

SQL constraints are certainly useful. But they don’t really solve the same problem. SQL constraints ensure integrity of your data store, which is swell, but they don’t provide the same guarantees about your program that static types do, nor do they say much at all about how to structure your code that interacts with the database. I also think it is sort of laughable to claim that XSL is a good tool for solving just about any data processing problem in 2023, but even if you disagree, the same points apply.

Obviously, constructive data modeling is hardly a panacea. There are lots of problems it does not solve or that are more usefully solved in other ways. But I really have applied it to very good effect on many, many real engineering problems, not just toys, and I think the technique provides a nice framework for reasoning about data modeling in many scenarios. Your comments here seem almost bafflingly uncharitable given the article in question doesn’t make any absolutist claims and in fact discusses at some length that the technique isn’t always applicable.

See also: my other comment about using encapsulation instead of constructive modeling (https://qht.co/item?id=35059113) and my followup blog post about how more things can be encoded using constructive data modeling than perhaps you think (https://lexi-lambda.github.io/blog/2020/08/13/types-as-axiom...).


> There is a fairly obvious difference between a dynamic enforcement mechanism like contracts or SQL constraints.

What on Earth are you talking about? What dynamic enforcement?

> Though I think it is a bit silly to suggest that I have “never even considered”

In the context of this conversation you showed no signs of such concerns. Had you have such concerns previously, you wouldn't have arrived at conclusions you apparently have.

> a dynamically-typed language

There's no such thing as dynamically-typed languages, just like there aren't blue or savory programming languages. Dynamically-typed is just a word combo that a lot of wannabe computer scientists are using, but there's no real meaning behind it. When "dynamic" is used in the context of types, it refers to the concrete type obtained during program execution, whereas "static" refers to the type that can be deduced w/o executing the program. For example, union types cannot be dynamic. Similarly, it's not possible to have generic dynamic types. Every language thus has dynamic and static types, except, in some cases, the static analysis of types isn't very useful because the types aren't expressive enough, or the verification is too difficult. Conversely, in some languages there's no mechanism to find out exact runtime types because the information about types is considered to be extraneous to the program and is removed from the runtime.

The division that wannabe computer scientists are thus trying to make between "dynamically-typed" and "statically-typed" lies roughly along the lines of "languages without useful static analysis method" and "languages that may be able to erase types from the runtime in most cases". Where "useful" and "most cases" are a matter of subjective opinion. Often times such boundaries lead claimers to ironically confusing conclusions, s.a. admitting that languages like Java aren't statically typed or that languages like Bash are statically typed and so on.

Note that "wannabee computer scientist" applies also to people with degrees in CS (I've met more than a dozen), some had even published books on this subject. This only underscores the ridiculous state in which this field is.

> discusses at some length that the technique isn’t always applicable.

This technique is not applicable to overwhelming majority of everyday problems. It's so niche it doesn't warrant a discussion, but it's instead presented as a thing to strive for. It's not a useful approach and at the moment, there's no hope of making it useful.

Validation, on the other hand, is a very difficult subject, but, I think that if we really want to deal with this problem, then TLA+ is a good approach for example. But it's still too difficult to the average programmer. Datalog would be my second choice, which also seems appropriate for general public. Maybe even something like XSL, which, in my view lacks a small core that would allow one to construct it from first principles, but it's still able to solve a lot of practical tasks when it comes to input validation.

ML-style types aren't competitive in this domain. They are very clumsy tools when it comes to expressing problems that programmers have to solve every day. We, as community, keep praising them because they are associated with the languages for the "enlightened" and thus must be the next best thing after sliced bread.


You come off as a crank.

Perhaps you are one, perhaps you are not, I don’t know, but either way, you certainly write like one. If you want people to take you seriously, I think it would behoove you to adopt a more leveled writing style.

Many of the claims in your comment are absurd. I will not pick them apart one by one because I suspect it will do little to convince you. But for the benefit of other passing readers, I will discuss a couple points.

> What on Earth are you talking about? What dynamic enforcement?

SQL constraints are enforced at runtime, which is to say, dynamically. Static types are enforced without running the program. This is a real advantage.

> There's no such thing as dynamically-typed languages, just like there aren't blue or savory programming languages. […] The division that wannabe computer scientists are thus trying to make between "dynamically-typed" and "statically-typed" lies roughly along the lines of "languages without useful static analysis method" and "languages that may be able to erase types from the runtime in most cases".

I agree that the distinction is not black and white, and in fact I am on the record in various places as saying so myself (e.g. https://twitter.com/lexi_lambda/status/1219486514905862146). Java is a good example of a language with a very significant dynamic type system while also sporting a static type system. But it is certainly still useful to use the phrase “dynamically-typed language,” because normal people know what that phrase generally refers to. It is hardly some gotcha to point out that some languages have some of both, and there is certainly no need to insult my character.

> This technique is not applicable to overwhelming majority of everyday problems. It's so niche it doesn't warrant a discussion, but it's instead presented as a thing to strive for. It's not a useful approach and at the moment, there's no hope of making it useful.

This is simply not true. I know because I have done a great deal of real software engineering in which I have applied constructive data modeling extensively, to good effect. It would be silly to list them because it would simply be listing every single software project I have worked on for the past 5+ years. Perhaps you have not worked on problems where it has been useful. Perhaps you do not like the tradeoffs of the technique. Fine. But in this discussion, it’s ultimately just your word against mine, and many other people seem to have found the techniques quite useful—and not just in Haskell. Just look at Rust!

> Datalog would be my second choice, which also seems appropriate for general public.

The idea that datalog, a first-order relational query language, solves data validation problems (without further clarification) is so laughable that merely mentioning it reveals that you are either fundamentally unserious or wildly uninformed. It is okay to be either or both of those things, of course, but most people in that position do not have the arrogance and the foolishness to leave blustering comments making an ass of themselves on the subject on an internet forum.

Please be better.


Since by now we're arguing style, you would achieve your purposes (whatever they are) if you completed certain aggressive sentences, for example:

> You come off as a crank. [... because of X, Y ,Z ]

..

> Please be better. [in the following manner: ... even if takes summarizing what was said]


> If you can't create a value of type PrimeNumber that doesn't contain a prime number, there's a bit more to it than naming.

Yes, indeed. This is quite useful! But crabbone isn’t entirely wrong that it isn’t quite what the original article was about.

I’ve written quite a bit of code where constructive data modeling (which is what the original article is really about) was both practical and useful. Obviously it is not a practical approach everywhere, and there are lots of examples where other techniques are necessary. But it would be quite silly to write it off as useless outside of toy examples. A pretty massive number of domain concepts really can be modeled constructively!

But when they can’t, using encapsulation to accomplish similar things is absolutely a helpful approach. It’s just important to be thoughtful about what guarantees you’re actually getting. I wrote more about that in a followup post here: https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...


I discussed how/why the point of this article is very much not to “parse everything” in this followup: https://lexi-lambda.github.io/blog/2020/01/19/no-dynamic-typ... In particular, it articulates precisely why it is fine to use relatively wide types for any input which the program treats opaquely, and it gives an example of how the same techniques can still be useful even in that context.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: