I don't think there's a local optima here. The gradient with respect to the individual characters always points you at the solution, from all states. Pathologically so, even. But it makes for a good learning problem since the problem space actually fits in human heads and you can clearly see what the GA is doing. When you fire a GA at a real problem you won't have that luxury.
> I don't think there's a local optima here. The gradient always points you directly at the solution, at all times, in all states.
Well, there are no local optima but plenty of global optima other than the correct answer, I think: "Hello, World!" and "!dlroW ,olleH" have the same fitness for example. So no, the gradient won't necessarily lead you to the solution.
Now, suppose we defined the fitness function instead as the number of correct characters plus the number of characters in the right place, then we'd have local optima and a better chance of getting to the optimum.
There is only one global optima with the defined fitness function, and is the one with f=0. There are different local minimum when there is no mutation (for example, if all chromosomes on the initial population don't have the letter W, it will be impossible to create the target unless mutation is used).
The other answers that you are thinking on have different fitness (the comparison is target[i] ?= guess[i]).
In any case, I like this examples because they currently show the process that the GA has to converge in a way easy to see/understand. Of course, in real life it is more used when there is no information of the function to minimize (optimize), or the function is not convex in our search domain, which can lead to getting stuck in a local minimum using other search methods.
I have always felt that the fancy name increased the interest on the topic.
Update Looks like the guy's description of his function is different from his actual function. You're right, there is a single global optimum and no local optima.
I guess "local optimum" wasn't the right word. I got suck in a configuration in which my breed() method couldn't get closer to the goal, because it was just shuffling around characters that existed at the start. (I had started with 1001 randomly generated 13-byte sequences.)
With GAs, you're frequently better allowing your breeding / mutation to make insane changes. Mimic reality: some changes flat-out kill the offspring immediately. Sure, you need letters... but allow breeding to slice the binary strings representing those letters. Otherwise you rely too heavily on mutation for changes, which is typically very slow.