Small C Projects

mahmud · on March 27, 2011

It really surprises me when people look for "small" programming projects.

Boot a Unix, leave X, sit still for 20 minutes. There.

Get the sources to your Unix and just go read the code for your everyday utils. This is best done with a BSD, that isn't bloated with GNUisms.

If you don't believe me about GNU bloat, just look here:

http://www.freebsd.org/cgi/cvsweb.cgi/src/bin/

Get a BSD and devour the beauty that is Unix, unfuckedwith.

FreeBSD also comes with all the papers & research docs you need; love how the troff formatting is readable in console with zcat. The CRT radiation kept me glowing green for many an enjoyable night.

microtonal · on March 27, 2011

If you want minimalism, you could study the BusyBox sources as well. E.g. BusyBox coreutils:

http://git.busybox.net/busybox/tree/coreutils

Most BusyBox utilities leave out more options than FreeBSD equivalents.

psadauskas · on March 27, 2011

I almost always prefer the GNU utils to the BSD ones. The GNU ones usually allow arguments in nicer orders. GNU `sort -h` can sort the human-sized output of GNU `du -h`. I've noticed tons of niceties from these sorts of tools on Ubuntu are completely missing from BSD based OSX.

One man's bloat is another man's convenience.

barrkel · on March 27, 2011

The GNU toolset may have combobulated source, but it's a damn sight more usable and useful than bare-bones utilities.

mahmud · on March 27, 2011

But the metric of this thread was "small C projects", not useful, cushy coreutils.

derleth · on March 27, 2011

> bloat

This word, which I see so often, is almost entirely content-free. At most, it conveys a vague sense of 'big is bad'; the association of 'bigness' is the only thing that saves it from being a purely content-free snarl word.

http://rationalwiki.org/wiki/Loaded_language#Snarl_words

("When used as a snarl words, these words are essentially meaningless; most of them can be used with meaning, but rarely are.")

To say what ought to be obvious, one person's bloat is another person's essential feature. And, yes, I do use that paragon of GNU software, GNU Emacs, and surely you realize the folly of trying to tell me my editor of choice is bloated.

mahmud · on March 27, 2011

Bloat is real, and it affects one in five American teenagers, and 100% of GNU code. Bloat can be defined as `cat` being 7 pages in BSD, and 20 in GNU. It's an aesthetic value judgment, made by someone qualified to say .. "do not want":

http://www.freebsd.org/cgi/cvsweb.cgi/~checkout~/src/bin/cat...

http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/cat....

jbarham · on March 27, 2011

Plan 9's cat is written in a mere 35 lines: http://plan9.bell-labs.com/sources/plan9/sys/src/cmd/cat.c

As always, if you want to read C code as written by the same people who invented C, the Plan 9 source code (http://plan9.bell-labs.com/sources/plan9/sys/src/) is a great resource.

mahmud · on March 27, 2011

Then what? spend an eternity in heart-break hotel with LispM & BeOS fans? ;-)

jbarham · on March 27, 2011

Nah, come join the fun w/ the Plan 9 crew at http://golang.org!

Confusion · on March 27, 2011

314 vs. 784 lines, not adjusted for amount of comments. Bloat or simply an increased set of features? Being used to GNU commands, I've always found BSD commands lacking features I often use. Although you can usually compose the same operation using some additional pipes and although I understand how this would be considered 'prettier', 'cleaner' or 'better', I still feel the GNU commands I am used to are just right and do not constitute 'bloat'.

dfox · on March 27, 2011

In many cases the "GNU bloat" lies not in the added features but in unnecessary cleverness and complexity. See how GNU cat does it's own complex buffering, while FreeBSD version just calls read/write in loop in simple case and uses stdio in the complex one.

bdonlan · on March 27, 2011

From a quick skim of the two sources, the difference in sizes can be attributed to:

1) The GNU version has MUCH more verbose commenting 2) The GNU version reads input in blocks in cooked mode, rather than a character at a time as the BSD cat does. This is MUCH faster, but it leads to a lot more complexity, and thus more code. It also carefully calculates the optimal block size to use for this; this is also quite complex and carefully commented (see, eg, the 20 line comment at line 736). 3) The GNU version has to be portable to multiple unixes, and thus has a number of places where it has to test for multiple error codes, and/or missing features 4) The GNU version has a very verbose --help output, which consumes a good page or so of code by itself.

I don't really see any 'bloat' there. Sure, it's okay to have a simple cat, but it's not a bad idea to optimize a tool that's used so frequently. And the verbose --help output, verbose commenting, and portability are all part of the GNU coding standards. You can argue about whether you want to spend all that effort on it, but I don't think the sheer volume of code is a good measure for whether the code is good or not.

Someone · on March 27, 2011

Seven? That is at least around 85% bloat. http://minnie.tuhs.org/cgi-bin/utree.pl?file=3BSD/usr/src/cm... needed only one page.

As the poster above said, one man's bloat is another man's essential feature.

Back to the subject: reading those old sources really learns you why, back in the seventies, people found Unix so appealing. even ignoring the feature growth/creep (or whatever you want to call it), you do not have to wade through a zillion copyright header lines, option parsing that goes on for ages, locale-specific stuff, etc, before getting to the meat of the program. Disadvantage is that some code dives into assembler fairly quickly (for example, printf is mostly assembly in the system I refer to above)

mahmud · on March 27, 2011

early C source code kept its figure by foregoing any stupid bounds checking.

microtonal · on March 27, 2011

The immediate difference I noticed is that the GNU version contains a lot more comments describing what actually happens.

Now, what do you think is more appropriate for educational purposes. Source code with plenty of comments or source code with nearly no comments at all?

mahmud · on March 27, 2011

If you just wanna see how a few library functions and system calls are used? yeah, just get APUE and dig into the simplest implementations you can find, which is often BSDish.

I routinely gloss over comments when reading code anyway; the most accurate documentation is found via reflection & introspection on the system itself, not comments.

akjj · on March 27, 2011

It's a little different from what the OP was asking for, but my favorite toy programming project: write a shell. This brings you in contact with many of the basic aspects of the Unix system: pipes, fork, exec, environment variables. In around a thousand lines, you can implement a shell with pipes, backgrounding, backquote, variable expansion, conditionals, looping, and lots and lots of bugs.

In general, I like the idea of trying to implement absolutely minimal versions of common programs. Other possibilities are: an HTTP server or proxy, a Lisp interpreter.

l0nwlf · on March 27, 2011

To make a Lisp interpreter you need to know Lisp. But I liked the idea of writing a shell.

dfox · on March 27, 2011

Writing Lisp interpreter is actually good way to learn most of core Lisp concepts.

husted · on March 27, 2011

Buy an embedded system like the Arduino and learn C while haking a bit of hardware too. Yes, the Arduino is not 100% ANSI C but I suspect it's close enough.

For me learning a new programming language is always about finding a problem to solve, something that will keep me interested and make it fun to learn.

rdtsc · on March 27, 2011

One of the good suggestions was exercises from K&R. I also like "Practical C" by Steve Oualline.

Then the best thing is to just work on something you like.

If you like networking, write an RPC server using the basic Linux socket interface or use 0mq to write a pub / sub :

http://www.zeromq.org/intro:read-the-manual

If you like audio or DSP write a an audio processing program and adds an echo or other effect to an input audio stream. You can try portaudio interface:

http://www.portaudio.com/trac/browser/portaudio/trunk/test/p...

If you like file systems & linux write a FUSE file system. Or a kernel module:

http://lwn.net/Articles/68106/

If you like graphics you can try libSDL:

http://friedspace.com/SDLTest.c

metageek · on March 27, 2011

An XML parser makes a nice medium-sized project; it'll get you experience with C data structures and string handling. Plus it has the nice property that you can start small and build on, until you've got the whole spec done, and every bit along the way is useful.

par1970 · on March 27, 2011

These questions always seem so ridiculous. There are thousands of examples of projects that fall into systems programming.

Do one of these things: keyboard/mouse driver, hard drive diagnostics poller, implement raw sockets on windows, system call hooking pattern, a VM for a scripting language, a VM/crypter software protection scheme, etc.

There's all kinds of stuff.

shadowfox · on March 27, 2011

> These questions always seem so ridiculous

Maybe I am misreading this. But I don't quite see why. The poster seemed to want to relearn C and was asking other people's opinions on what would be interesting small projects to start out with.

par1970 · on March 27, 2011

There is no reason to ask such a question because potential projects are outlined in _thousands_ of places across the net. It seems that he put absolutely no effort into googling and finding such projects on his own.

temporarius · on March 27, 2011

Write a Brainfuck interpreter. Seriously, it is trivial to write a simple one, slightly harder to write a compact one, and then just hack on to optimize it for speed. There are great test cases too including ASCII Mandelbrot renderer :)

Someone · on March 27, 2011

For the 'more distinct parts of the C language', I would suggest reading a couple of the obfuscated C contest entries, than writing one of your own.