Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

> The reason that code represented as XML or JSON looks horrible is not because representing code as data is a bad idea, but because XML and JSON are badly designed serialization formats.

By that same token, a Volkswagen Beetle is a badly-designed boat.

XML was never designed as a data serialization format. It's a markup language. It was designed to sprinkle structure and metadata into large human-readable plaintext documents.

Likewise, JSON is a subset of a general-purpose programming language's literal notation that happened to be very fast to parse in a browser by virtue of the browser implementing that language.

Personally, I don't think s-exprs are a particularly great serialization format either. The problem is that there's no one-sized-fits-all for serialization. What we value is brevity, but basic information theory tells we can only make expressing some things more terse by making others more verbose.

When you say some format is badly-designed, all you're really saying is that it isn't optimized for the kinds of data you happen to want to serialize.



> XML was never designed as a data serialization format. It's a markup language.

Those two things are not mutually exclusive.

> Likewise, JSON is a subset of a general-purpose programming language's literal notation that happened to be very fast to parse in a browser by virtue of the browser implementing that language.

That's true. That is not in conflict with anything I said.

> The problem is that there's no one-sized-fits-all for serialization.

No, that's not true. S-exprs really are a global optimum in the space of serialization designs. All the alternatives are logically equivalent to S-exprs but with extra punctuation that makes them arguably harder to read, but inarguably harder to write. That is why S-exprs are the ONLY syntax ever designed (some would say "discovered") by humans that has been successfully used to represent both code and data.


>> XML was never designed as a data serialization format. It's a markup language.

> Those two things are not mutually exclusive.

I beg to differ. I just replied to someone else about this: https://qht.co/item?id=9509110

I agree with your last paragraph, though. There is a timelessness about S-expressions.

As a side point, I would add that the distinction between strings and symbols is important, and neither XML nor JSON has it.


The comment you were responding to got deleted, which makes it a little hard to figure out what's going on there.

But I am completely nonplussed at your assertion that markup and serialization are mutually exclusive. There is a 1-to-1 correspondence (actually multiple 1-to-1 mappings) between XML and S-expressions, so whatever you can do with sexprs you can do with XML modulo some trivial transformation. The ONLY difference is in the amount of punctuation and redundancy.

> the distinction between strings and symbols is important

Yeah, that's a good point.

> neither XML nor JSON has it

That's not quite true. It's not that JSON doesn't have symbols, it's that Javascript doesn't have symbols. And XML doesn't have symbols natively, but you can easily gin them up yourself, e.g. <symbol>foo</symbol> or <symbol name=foo />.


Obviously you can encode S-expressions in XML (including symbols). But you have to add additional structure to do it. The point is that XML, following the markup metaphor, doesn't work this way out of the box. And, in fact, I've never seen anyone (except, I guess, you) go to the trouble of making all the distinctions in XML, such as the string/number distinction, that S-expressions make -- and I have seen people get into trouble for failure to do this.

It's a psychological/sociological point rather than a technical one, but metaphors matter in design.


Is there any loss less binding of XML to s expressions? I've never seen one.

Usually example where I see people rewrite XML to s expressions (like in this thread) are very lossy -- its easy to be pretty by throwing away most of the information!


DTDs mercifully aside, there's this: http://okmij.org/ftp/Scheme/SXML.html

I'm not sure what example in this thread you consider to be throwing information away - in a case like

    <Term term="slick">
where the attribute name is clearly redundant and exists only to satisfy the syntax, nothing is lost in the transformation to

    (term "slick"


> Is there any loss less binding of XML to s expressions? I've never seen one.

You haven't looked hard enough. Mappging XML to sexprs is trivial:

<tag value=attr ...>content</tag>

-->

((tag value attr ...) content)


True, but to be honest, that isn't really any more useful compact (well, it saves closing tags I suppose).


I'm not sure what you could express in XML that you couldn't translate to s-expressions in one way or another.


One downside to S-exprs compared to, say, JSON: they do not have direct support for unordered mappings (hash tables, dictionaries, whatever you want to call them). You can represent them as trees, but basically every language these days (including, of course, Lisps) has a mapping type as a core concept; requiring the user to figure out what parts of the input data should be converted to that type is annoying, and makes the format less self-documenting (i.e. it may not be immediately apparent whether there can be duplicate keys or not).

http://eli.thegreenplace.net/2012/03/04/some-thoughts-on-jso...


There is no "ISO Standard Sexp" way to have a hash table.

However, Sexps easily support hash tables.

Example:

  $ txr -p "(let ((x (hash))) (set [x 'a] 1) (set [x 'b] 2) x)"
  #H(() (b 2) (a 1))
Here I have chosen a "hash capital H" syntax for hash tables. The first part () has the hash attributes (there are none, so it's an eql-equality-based hash table, with strong keys and values). Then, entries consisting of two element list pairs give the keys and values.

This hash prefix notation builds on existing Lisp concept of using simple prefixes to distinguish various kinds of objects. In Common Lisp we have:

  #(1 2 3)     ;; this is a vector
  #C(3.0 4.9)  ;; this is the complex number 3.0 + 4.9i
The scanning is very simple: you just recognize the prefix like #C( or #( and then recurse into the scanner for list elements that stops at a closing parenthesis.

No whitespace is allowed: it cannot be # (1 2 3) or # C (3.0 4.9).

TXR Lisp above not only reads back the hash notation, so it can be used as a literal, but allows backquoting over it. We can splice keys and values into the syntax to produce a hash table:

  $ txr -p "(let ((keys '(a b c)) (vals '(1 2 3)))
               ^#H(() ,*(zip keys vals)))"
  #H(() (c 3) (b 2) (a 1))


> What we value is brevity, but basic information theory tells we can only make expressing some things more terse by making others more verbose.

But the reverse is not necessarily true. It's possible for a badly-designed serialization format to be longer in all cases than some other format.

And, in particular, I suspect that for the same data, XML is longer in all cases than S expressions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: