Random note. The most commonly "abused" Unix command is cat. The name is short f...

fhars · on Sept 6, 2012

It may be an abuse, but at least it has no influence on the correctness of the result as pipes are a monad http://okmij.org/ftp/Computation/monadic-shell.html

shorter · on Sept 6, 2012

I've sometimes heard it's "catenate" not "concatenate".

Do they mean the same thing?

As for cat, the utility, I'm afraid we'll never stop seeing people doing

   cat file|prog1|prog2

even when it makes absolutely no sense whatsoever.

If it did, I might as well do

   cat file|cat|prog|cat|prog2|cat -

I mean, why not? It "looks nicer" than

   prog file|prog2 or 
   prog < file|prog2

Programmers love their cats.

ralph · on Sept 6, 2012

It's most definitely catenate. I understand catenate to mean chain and concatenate to be to chain together. Since "cat foo bar xyzzy" doesn't modify the files to join them in any way I don't think they're chained together.

Besides, ken & Co. aren't daft. con would be short for concatenate. :-)

EchoAbstract · on Sept 6, 2012

According to the 1st Edition and 6th Edition manual pages: NAME cat -- concatenate and print

See: http://man.cat-v.org/unix-1st/1/cat http://man.cat-v.org/unix-6th/1/cat

I'm pretty sure based on the timeline of pipe (~1973, roughly V4) that the cat command (~1971 V1) predates pipes.

ralph · on Sept 6, 2012

Fortunately they fixed that errant man page. :-) http://man.cat-v.org/plan_9/1/cat

ishkur101 · on Sept 6, 2012

From the man page cat doc1 doc2 > doc.all concatenates the files doc1 and doc2 and writes the result to doc.all

ralph · on Sept 6, 2012

I think I'm missing your point. doc[12] aren't changed.

ishkur101 · on Sept 6, 2012

Doc 1 & 2 are not changed but chained together to create doc.all

jlgreco · on Sept 5, 2012

I have long thought that some sort of zsh completion that detects that abuse of cat and converts it into the more appropriate `< file` might be a good idea. If it did it silently it probably wouldn't be worth it but if it actually preformed the substitution in front of you then it might help users get more comfortable with the carrot syntax.

morsch · on Sept 5, 2012

The ` < file` has to appear at/near the end of the line, right? Using cat has the advantage of being able to read the line from left to right along with the data flow. I often add more piped commands to the end of a line as I refine it, while the source data remains the same. (To be fair, sometimes the opposite is true.)

ephemient · on Sept 5, 2012

Nope,

    <file ./command --args

and even

    ./command <file --args

works fine in Bash and Zsh.

a3_nm · on Sept 6, 2012

Interestingly, there are some circumstances where you actually want "cat file | program" and not "program < file". The case I have in mind is when file is actually a named FIFO which was not opened for writing. If you use cat, program will still run and only reads to stdin will block (but it can perform other things, possibly in different threads). If you use '<', opening stdin will block and program will probably block altogether.

ralph · on Sept 6, 2012

With < on a FIFO it's the shell that blocks on opening before the command, e.g. cat, is run.

ori_b · on Sept 5, 2012

Does it really matter that you're starting an extra process?

veyron · on Sept 5, 2012

it doesnt matter for a file of size 1kb. For a file of size 10Gb, every process matters.

For the downvoters: please time how long it takes to do something like `cat $file | awk '{print $1}' ` and `awk <$file '{print $1}'`

barrkel · on Sept 6, 2012

Not exactly convincing:

    ~/desktop$ du -h c.dat
    11G     c.dat
    ~/desktop$ time cat c.dat | awk '{ print $1 }' > /dev/null
    
    real    0m53.997s
    user    0m52.930s
    sys     0m7.986s
    ~/desktop$ time < c.dat awk '{ print $1 }' > /dev/null
    
    real    0m53.898s
    user    0m51.074s
    sys     0m2.807s

cat CPU usage didn't exceed 1.6% at any time. The biggest cost is in redundant copying, so the more actual work you're doing on the data, the less and less it matters.

ori_b · on Sept 6, 2012

I was curious, so, here goes; 'foo' was a file of ~1G containing lines made up of of 999 'x's and one '\n'.

    $ ls -lh foo
    -rw-r--r-- 1 ori ori 954M Sep  5 22:57 foo

    $ time cat foo | awk '{print $1}' > /dev/null

    real	0m1.631s
    user	0m1.452s
    sys 	0m0.540s

    $ time awk <foo '{print $1}' > /dev/null 

    real	0m1.541s
    user	0m1.376s
    sys 	0m0.160s

This was run from a warm cache, so that the overhead of the extra IO from a pipe would dominate.

dleary · on Sept 6, 2012

Both invocations take similiar amounts of "real" time because the task is IO-bound and it takes roughly 1.5s on your machine to read the file.

But if you add up the "user" and "sys" time in the cat example, you see that it took 1.992s of actual cpu-time... Which is actually about a 30% increase in cpu-time spent.

The perf decrease wasn't visible because you have multiple cores parallelizing the extra cpu-time, but it was there.

njs12345 · on Sept 6, 2012

So the two are different because awk's call to read() is effectively the same as a read directly from a file, whereas copying is taking place through the pipe with the pipeline approach?

jlgreco · on Sept 6, 2012

Basically you see a linear increase in time. If it was going to take a coffee break's worth of time one way, it will take a slightly longer coffee break worth of time the other. It is fairly rare that the additional time involved matters and there isn't something else that you should be doing anyway.

ralph · on Sept 6, 2012

The difference between

    cat file | foo
    foo <file

assuming foo only reads stdin so `foo file' isn't possible, is that with the latter the shell will open file for reading on file descriptor 0 (stdin) before execing foo and the only cost is the read(2)s that foo does directly from file.

With the needless cat we have cat having to read the bytes and then write(2) them whereupon foo reads them as before. So the number of system calls goes from R to R+W+R assuming all reads and writes use the same block size and more byte copying may be required.

jlgreco · on Sept 5, 2012

No.

It is pretty much just a matter of principle.

derleth · on Sept 5, 2012

> carrot syntax

Heh. Be careful with this, though: ^ is the caret (note spelling) according to most sources of information about these things.

Random Fun Geekery Time: Back in the Before-Before, the grapheme in ASCII at the codepoint ^ is now was an up-arrow character, which is why BASIC uses ^ for exponentiation even though FORTRAN, which came first and which early BASIC dialects greatly copied, uses .

These days, ↑ is U+2191, or ↑ in HTML.

http://www.alanwood.net/unicode/arrows.html

http://www.fileformat.info/info/unicode/char/2191/index.htm

http://en.wikipedia.org/wiki/Caret

derleth · on Sept 6, 2012

> Back in the Before-Before, the grapheme in ASCII at the codepoint ^ is now was an up-arrow character, which is why BASIC uses ^ for exponentiation even though FORTRAN, which came first and which early BASIC dialects greatly copied, uses .

That is, two asterisks in a row.

antiterra · on Sept 6, 2012

> Therefore every time you use it to spool one file into a pipeline, that is technically an abuse!

Since everything's a file* in UNIX (and ilk,) it's actually not an abuse.

* Pedantic variations such as "everything's a bytestream" or "everything's a file descriptor" notwithstanding.

ralph · on Sept 6, 2012

No, cat(1) stands for catenate, not concatenate. http://en.wiktionary.org/wiki/catenate

lloeki · on Sept 6, 2012

hmmm, cat(1) mentions interesting flags, and tac, which comes in handy at times.

sigjuice · on Sept 6, 2012

Useless use of cat award! http://partmaps.org/era/unix/award.html

leif · on Sept 5, 2012

shell input redirection doesn't work inside eshell, the only way I know to pipe into stdin from eshell is to use cat