Hacker Timesnew | past | comments | ask | show | jobs | submitlogin
Toyota's killer firmware: Bad design and its consequences (2013) (edn.com)
102 points by Sanddancer on April 26, 2015 | hide | past | favorite | 71 comments


>> The Camry ETCS code was found to have 11,000 global variables. Barr described the code as “spaghetti.” Using the Cyclomatic Complexity metric, 67 functions were rated untestable (meaning they scored more than 50). The throttle angle function scored more than 100 (unmaintainable).

>> Toyota loosely followed the widely adopted MISRA-C coding rules but Barr’s group found 80,000 rule violations. Toyota's own internal standards make use of only 11 MISRA-C rules, and five of those were violated in the actual code. MISRA-C:1998, in effect when the code was originally written, has 93 required and 34 advisory rules. Toyota nailed six of them.

How the ACTUAL FUCK did this happen!? The article makes Toyota's engineering team seem egregiously irresponsible. Is it typical for vehicle control systems to be this complicated? I would love to hear the other side of the story (from Toyota's engineers). Maybe the MISRA-C industry standard practices are ridiculous, out of touch and impractical.


I'm going to go out on a huge limb here and am prepared to get shot down. I've seen this first hand as inheritor of spaghetti firmware.

The limb is: don't let EE's lead a firmware group.

It's a natural division of labor. The hardware guys are familiar with all the component datasheets and the bus timings and all the other low level details. If the prototype is misbehaving, they'll grab a scope and figure it out. They're probably the only guys who can. Software is an afterthought, a "free" component.

Those same EE's who have performed that role get promoted to buck stoppers for hardware production. They're familiar with hardware design, production, prototyping, Design for Manufacturability, and component vendor searches. They might cross over with production engineering and that six sigma goodness. They're wrapped up in cost-per-unit to produce which is direct ROI. Software remains a "free" component; you just type it up and there it is.

The culture of software design for testability, encapsulation, code quality, code review, reuse, patterns, CMM levels, etc etc is largely orthogonal to hardware culture.


As a hardware designer I agree that hardware people are typically not very good software engineers.

I find we can reliably write small programs that will do everything they need to. But as the program grows, we are not adept at managing the explosion in complexity.

However I don't believe that's because hardware culture doesn't value software or anything like that. We just have no proper education in software engineering.

Speaking for myself, I grasp most fundamental software concepts. Memory structure, search algorithms, that sort of thing. I appreciate the value of abstract code qualities like testability, simplicity, reusability, etc.

I just have no actual education in how to design large programs from scratch that will achieve those goals.


I don't think it's about education. It's just that software isn't what hardware people mostly work with and you only get experienced in what you mostly work with. And when you're experienced, you gain some insight into how things will work.

I know some basics of EE but I have no gut feeling of how to build or analyze any hardware for real. It's just something I've read and made my mind to understand but it's not something I know.

Conversely, I've been writing software on the lowest to highest levels for decades and I have a hunch of how to start building something that will eventually grow really big and complex. I don't know exactly how I do it but depending on what I want to achieve I have, already at the very beginning, a quite strong sense of what might work and also what will definitely NOT work.

My take on hardware is that it's mostly a black box that usually does most of what's advertised (but workarounds are regularly needed, and I guess it must be really difficult to build features into silicon and have it work 100% as designed), and these quirks had better be encapsulated in the lowest levels of the driver so that we'll get some real software building blocks sooner.

This would be no basis to design hardware. I would make such a terrible mess out of it that even if I managed to design a chip and make it appear to work somewhat, it would fail spectacularly in all kinds of naive cornercases that a real EE would never have to solve because s/he would never venture to build any of them, just because s/he would know better from the start already.


Oh yeah.

Corollary: Don't let EEs hire your firmware engineers. They have no idea what to look for.


> The culture of software design for testability, encapsulation, code quality, code review, reuse, patterns, CMM levels, etc etc is largely orthogonal to hardware culture.

I'm confused by this statement. Substitute say ASME for CMMI and the rest of this sounds like good engineering practices in general. What, for example, about "hardware culture" is orthogonal to testability?


Well, for one thing, every instance of mission critical hardware needs a full test, because hardware is messy and stuff happens. You wouldn't want someone hurt because a gate on a chip got a stuck-at fault or one of your vendors had a little extra variation etc etc.

Software only needs testing when the tools or inputs change. So design for testability of software is driven by the needs of a development group (eg simulators, debug statements), while hardware (at a minimum) is aimed at a post production group (eg JTAG, test fixtures).

I agree there are many parallels, but I think the culture is different.


Maybe the MISRA-C industry standard practices are ridiculous, out of touch and impractical.

I currently work in the static analysis industry. Broadly speaking, the MISRA-C ruleset is, in my experience and in the experience of many customers who apply it, not impractical, clearly of benefit, and the majority of rule violations (including a number of rules violated by Toyota) easily found with static analysis tools.

I even know that Toyota is a customer of (at least) one of the static analysis tool companies (obviously Toyota is a huge company, and I've no idea if the specific muppets making this clusterf had any static analysis tools; just that Toyota as a company definitely has someone buying them). It seems that they either just didn't use it, or just didn't care (or were told not to care).

As an extra point of data, I sit opposite someone who is on the MISRA-C committee. He is a solid C coder who knows a great deal about the language and how to get it wrong. His day job is writing/maintaining static analysis tools for C programmes. Obviously this is no guarantee that the committee as a whole is good at maintaining the MISRA-C ruleset, but they do have at least one active, experienced and competent C coder (with lots of experience of having to actually automate detection of rule violations) at the table.


It's a matter of scale and economics. Every dollar cost reduction per unit is significant over the multiyear production cycle. The changes for cost reduction make the design process more complex and a Camry or any car is full of legacy features (the model designation is nearly 35 years old).

On top of that car companies are massive and capital intensive with all the lack of agility that requires...retooling a production line takes many years from design to cars at the dealer. The critical decisions about programming stack had to be made in the early 1980's when the first microprocessor based control systems appeared in automobiles.

Back then, automotive engineers and executives would have hardly predicted the software complexity that was on the horizon. It was state level actors who came up with Ada and Drakon...But those are money is no object and failure is not an option inspired. Toyota probably made money via their level of software quality...shutting down production for a year to get it right would have cost billions. It followed the bean counting and rode the tiger.


More interesting: how does it compare to the rest of the industry at that time?


I'm not really qualified here (and if someone is please post) but I've been poking around with the ECU on my 2001 audi and it doesn't seem to have ECC memory either. A section of the AM29F800BB eeprom is used as random-access memory. It seems to stick a couple CRC bits in every line so maybe this is how they're doing it?


I am starting to think that safety critical systems need to have schematics, pcb layouts, design documents and source code be registered with the gov before it touches the public.

I remember when nearly everything came with schematics so it could be fixed.


When hardware designers write code, the results are often ugly. Device drivers, firmware, embedded code, ...


Engineers tend to read specs and apply standards.


>I would love to hear the other side of the story

Me too, it's probably indeed bad, but there's ALWAYS another side to such stories. Unfortunately, we rarely get to hear those.


I'm in no way excusing Toyota. For those who want to understand how this happens, I can shed some light.

Early firmware was a replacement for analog electronics. Consider modeling the classic lunar lander. A simple digital computational loop with no branches, just math, has advantages over analog circuitry, such temperature stability, noise immunity, reproducibility, etc.

From there, embedded firmware grew in the same way complex mechanical systems grew, like mills and robots. Meanwhile, non-embedded software on workstations was growing, theories were developing, algorithms and organization and business models were applied. And firmware grew and grew, but mostly like an engine grows in that abstractions are minimized, optimization happens in-place and theory is largely tribal knowledge.

It's only in the last 10 years or so that firmware has metamorphosis end and is confronting other software. It looks especially conflictual (word?) to high-level devs like Ruby or what-have-you.

So yeah, this is no excuse for negligence and ignorance. I just hope my perspective helps ppl evolve.


The apollo guidance computer actually used a sophisticated software interpreter for most of its code. Not just a "computational loop with no branches"

http://en.wikipedia.org/wiki/Apollo_Guidance_Computer#Softwa...

In addition, there is no way in hell that the control algorithms it was using could have been developed without the use of computers. State-space control theory was specifically developed to take advantage of discrete-time control systems.

https://www.hq.nasa.gov/alsj/ApolloDescentGuidnce.pdf


MISRA-C is practical enough that NASA's JPL used it as a basis for their coding standard (http://lars-lab.jpl.nasa.gov/JPL_Coding_Standard_C.pdf), and everyone would agree those Martian robot works pretty well...


> Barr’s group found 80,000 rule violations

I'm a bit puzzled by this figure given that according to the article the typical LOC count of this kind of software is within that order of magnitude ("tens of thousands of lines of code").


"Normal" C code that is written in a coding style that disagrees with the MISRA guidelines can have more than 1 violation per line of code, even if the quality of the codebase is otherwise quite good.

For example, the static analyser I'm working on finds 32700 violations of MisraC2012 in the SQLite codebase (130 kLOC).

This is because Misra is quite strict about the use of C: a simple statement like `if (ptr) f();` already violates two rules:

* use explicit `ptr != NULL` comparison

* always use braces with `if`

Once you get around to integer arithmetic, where MISRA is quite strict about not relying on implicit conversions or numeric promotion (which is non-portable due to depending on the size of `int`), it's not rare to have 5 or more violations in a single line of code.


How much code reuse is there between the firmware in a manufacturer's different car models? I'm surprised there isn't a consortium (like Symbian was) to create a standard firmware kernel to share the benefits (and costs) of testing and auditing the code.

I'm also surprised C is still so commonly used for mission critical software. I understand that C is familiar, has many static analysis tools (to make up for the language's deficiencies), and has a straight-forward translation from C to object code (though only when using simple optimizations). For example, if MISRA-C's coding guidelines disallow recursion, why not design a language that only supports DAG function dependencies?


OSEK is a an industry standard rtos that is used by almost all automotive players. It is specifically designed for use in the automotive environment. Toyota actually claimed to use an OSEK compliant rtos, but it later surfaced in this lawsuit that they had written their own implementation that was never certified by an outside organization. OSEK is in the process of being superseded by AUTOSAR, which defines much more than just the os, and included a large hal that allows for plug and play middleware libraries. Unfortunately it isn't economical for every ecu to make use of AUTOSAR: it has heavy resource requirements (> 2Mbyte RAM) and so many applications don't use it.

Also on the horizon is ISO26262, which mandates quality assurance for automotive embedded code in the form of paper trails. Unfortunately due to the huge amount of work required by the standard, some automakers are choosing to ignore it and hope it doesn't become mandatory.


Thanks! I'll read more about OSEK and AUTOSAR.


I'm honestly surprised that this kind of thing is running ANY OS, and not direct hardware control paths. Let alone the fact that they didn't use FB ECC ram... that's even before going down the path of code quality. Not having a safety case for a hardware off or powerdown state... though if they had such a thing, odds are it would have been botched worse than the system to begin with.

Personally, I could never handle the stress of designing or building something like this... I worked in a security software team for a bank for a year, that was about as far as I can get in terms of job stress. Just the same, it seems to me some of these choices were clearly rookie level mistakes that shouldn't have been put in production to control motor vehicles.

I also agree, that it would really make more sense for car mfgs to get together to form some standards for production and core controls. If they're determined to use an actual OS, then it better damned well be the most thoroughly tested OS, running on certified hardware, with certified controls in place.

If you mess up the code in a number of places, people don't die... Tasks that control fast moving and/or heavy machinery, medical devices, etc should have very tight controls in place.... this is the kind of situation where software is and should be treated like an engineering discipline. Most of the time it's more of a craft, this isn't one of them.


> I'm also surprised C is still so commonly used for mission critical software.

If you're surprised at that then you probably have not seen much of the embedded world where (gasp!) assembler is still in use and C is considered a 'high level' language.


A java programmer acquaintance of mine calls C "portable assembly language".


It's not that far from the truth. C was designed to be very close to the machine and allows mixing in of inline assembly.

This latter feature is almost unthinkable in what most people would call high level languages today (Java, C#, Swift, etc).

For better or worse, Java is increasingly being used in embedded scenarios. Blu-ray players are a prime example, but televisions, settop boxes of various kinds also use Java these days.


Java, originally called Oak, was created specifically for programming settop boxes for cable TV.


There have been languages designed for verifiably safe, hard realtime operations. The most interesting one among them is probably Esterel[1] (that compiles to C or even VHDL/Verilog), though I think the language has mostly been supplanted by graphical representations[2]. I believe that such development environments (and other relatively expensive ones like Ada and realtime Java) are mostly used in aviation/defense. I don't know whether it's due to tradition or cost.

[1]: http://en.wikipedia.org/wiki/Esterel

[2]: http://www.esterel-technologies.com/products/scade-suite/


We have to thank such developers for the rise of Ada in High Integrity Computing Systems, outside its original military domain.


I believe that when the recalls occurred, at least several models of Toyota/Lexus cars were affected.

Ultimately, it's impossible to know without being inside. I would imagine there's a fair bit of reuse within each manufacturer and very little sharing between manufacturers


Considering

1) how much other components are shared between manufacturers, and having companies like Delphi Automotive doing a lot of work for many manufacturers

2) how often car manufacturers own parts of each other, and how often these ownership structures change (look at Ford and Volvo and Mazda, or Daimler-Chrysler, etc)

I would imagine there are shared platforms and components, also software, between manufacturers.


I have been upset with Toyota's designs, product quality, and business practices since I bought my first (and last) Toyota vehicle. Mr. Barr's testimony reassures me that I've made the right decision in writing them off.

One item I'd like to point out is that Mr. Barr criticizes the lack of hardware logic to close the throttle if the driver rides the brake.

In 1987, BMW introduced the 1988 model year 750i, with a V12 engine that has two intake manifolds and two throttle valves. The engine controls and electronic throttle system was made by Bosch. Whether the logic is in hardware or software, I don't know, but it doesn't take long for it to slam the throttles shut if you hold the brake pedal down. When this happens when the engine is delivering power, it's a very severe shock as engine power is removed and stops working against the brakes.

Due to the time it takes to go to manufacturing from the design stage, obviously the Germans had this figured out in the middle of the 1980s decade. BMW went on to use the electronic throttle system in a bunch of 90's model 5 and 7 series cars with normal six cylinder engines. The only problems that I know this system, called EML, caused were for owners and technicians who did not understand how the system works. In other words, when this failed, the result was that the car would not go anywhere.

And a colleague of mine with a late 90's Volkswagen also proved that the throttle slams shut if he rides the brakes while requesting engine power with the gas pedal.

Mr. Barr points out that in 2005, the Camry had no such logic. I seem to remember a youtube video where Consumer Reports guys test this out in some kind of Toyota, and they could have gone all day long until the brakes melted.

Toyota did not include this logic for safety even though it had been in cars released to the market decades before. The ways that Toyota handled this situation, everything they've done from blaming the operators, the potientometer supplier, the floormats, to their brazen delays in producing discovery, reinforce my bad experiences owning a Toyota and my conclusion that they are a bad actor.


> And a colleague of mine with a late 90's Volkswagen also proved that the throttle slams shut if he rides the brakes while requesting engine power with the gas pedal.

Maybe automatic cars do this but when driving a stick shift a common advanced driving technique is to heel-and-toe on downshift so that you can rev up the engine to the correct RPM to not upset the car. That requires revving up the engine under braking. As far as I know this is still possible in most cars.

The usual guarantee is that the brakes are always specified to be able to overpower the engine even at full throttle. So even if you have a stuck throttle for some reason you should always be able to safely stop just by standing on the brakes, even if it takes you a little longer. Naturally in a manual car you should just stand on the brakes and clutch for emergency braking and then the engine is completely disconnected no matter what the throttle is doing.

[1] https://en.wikipedia.org/wiki/Heel-and-toe


blipping the throttle while clutched to match RPM between the forced road speed and given engine RPM during a gear shift to alleviate weight-shift(heel-and-toe) is still possible on modern cars, but that is not the technique that is being mentioned.

the technique being mentioned is more akin to 'left-foot braking', a technique used to pivot the weight balance from the rear to the front in order to induce certain driving characteristics that may be beneficial in a turn. It is a common technique to balance out the inherent understeer of a front-wheel drive car to a more neutral balance mid-turn in an effort to reduce lap times. It's quite common in rally.[0]

and as was said, many cars disallow left-foot braking now; with the worst responses triggering semi-permanent CELs (CELs which require mechanic intervention, as opposed to being clear by more drive-cycles).

Probably not a bad idea, as it's an advanced technique that quite easily upsets a car mid-turn, and is often just compensatory for an ill-setup race car.

[0]: http://en.wikipedia.org/wiki/Left-foot_braking


You're mixing up techniques. Heel-toe is so called because you are using your right toe to operate the brake and your right heel to operate the gas pedal (at the same time) while the left foot operates the clutch.

The technique is used in racing during corner entry to allow the driver to rev-match downshift while under hard braking. This way they are already in the optimal gear as they release the brakes and initiate the turn


> the technique being mentioned is more akin to 'left-foot braking'

Both heel-and-toe and left foot braking require revving up the engine under braking. The difference between the two is if the car is in gear and the clutch engaged at the time. Maybe modern cars have enough sensors to know if the engine is actually driving the wheels so they can disable the throttle under braking in those situations.


> Maybe modern cars have enough sensors to know if the engine is actually driving the wheels so they can disable the throttle under braking in those situations.

This is possible, any car with traction control has wheelspeed sensors for all wheels, and manual shift cars have a pushbutton switch on the clutch pedal arm (for enabling the starter, or for a car with electronic throttle, for cancelling the cruise control.)


Those are not enough. You need a sensor in the gearbox itself to know if you are in gear. Otherwise you can have the clutch engaged and the wheels spinning but not have the engine connected to the wheels as naturally happens in the middle of a double clutch shift.


Couldn't it work by checking that crankshaft speed increases remarkably without any corresponding increase in wheel speed?

I don't know how racetrack drivers do this, but when I downshift, the crankshaft has some opportunity to fall a little while I get the gearbox in neutral and engage the clutch again. Then the crankshaft speed rises to 4000 rpm or more.


It takes more than 1 s for the throttle to shut, so it's still possible to spin up the gearbox in neutral to downshift.

My friend's VW was a manual shift, some variant of Golf, and he did encounter this behavior on a track day.


Yeah, a time lag is probably a good solution for this. Only kill the throttle if brakes are applied and the throttle has been open for >Xs. That would allow most shifting techniques without issue but wouldn't work for left-foot braking.


I found a test plan for this on the old 5 series [1] BMW say the engine should return to idle speed "immediately." The wiring diagrams imply that for this car, the electronic throttle was only applied to cars with automatic gearboxes and traction controls, the benefit here being that the electronic throttle can act to limit engine power during gearbox shifting or during intervention of the traction control.

So it looks like VW was rather thoughtful to introduce this delay, when there is a manual gearbox.

Wouldn't the left foot braking be more useful in a car with a really light front end, like MR2 or other mid-engines?

[Edit: My 5 series is a manual shift with traction control, with a v8 engine that never got electronic throttle on this series. There is a second throttle plate ahead of the usual throttle that the traction control will close during intervention.]

[1] https://www.bmwtechinfo.com/repair/main/741en/images/7410265...


i didn't get whether they were able to show an actual path of how UA could have occurred (like for example in case of Arian 5 or that Canadian radiation machine - that is what makes those cases useful and supports all the good recommendations produced as result), nevertheless the slant seems to be clear ... Google self-driving car has just become significantly more expensive and several years more further into the future. And Toyota i guess is going to have much more people hired into its engineering department who isn't going to do actual engineering (Senior rule 62 compliance engineer) and produce a lot more and better TPS reports NASA-style instead :)


The problem with comparing it to the Arianne 5 or Therac-25 is that those were situations where there was a single bug that you could blame. Toyota's software had so many bugs that it was impossible /to/ tell what paths would lead to UA, just that there was no accountability and quality assurance.

Regarding your second point, similar complaints were made about other safety features, like mandatory turn signals or airbags. Yes, there will be additional time spent ensuring that they are compliant with safety standards. But in this case, a lot of that time should have already been taking place. Some code needs more review time, a second or third or even fourth set of eyes on it. When people can die because of a bug, then the bar for what is acceptable code can and should be higher.


>Toyota's software had so many bugs that it was impossible /to/ tell what paths would lead to UA,

your statement sounds kind of contradictory to me. If there many - just choose any path. It may be impossible to say which one actually happened, yet should be possible to show an actual path, at least one, that could plausibly lead to UA.


> i didn't get whether they were able to show an actual path of how UA could have occurred

I think the answer is no -- there was never a clear demonstration of an extant buggy execution trace. Just a preponderance of evidence that such a trace could exist.


The lack of ECC memory means that you don't even need that... the effects of a little solar flare, or even noise from a powerline nearby could have caused the issue.


no. there were multiple reports of UA. Solar flare, etc... can't cause such consistency.


Yes, I'm sure it was a bug in this case. However, this paper shows that RAM errors are so common you really don't want non-ECC in anything safety-critical.

http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf

   For example, we observe DRAM error rates that are orders of magnitude higher
   than previously reported, with 25,000 to 70,000 errors per billion
   device hours per Mbit and more than 8% of DIMMs affected
   by errors per year. We provide strong evidence that memory
   errors are dominated by hard errors, rather than soft errors, which
   previous work suspects to be the dominant error mode


For systems that are so critical to the safety of human life, I'm surprised that there isn't a mandatory third-party review of the firmware source code.


Is there evidence that auto firmware issues are a major cause of injury, enough to mandate such a thing? There is already a heavy burden to comply with standards.

I recent called our city's public works dept to propose changing the traffic signage/signals at an extremely confusing and potentially dangerous intersection (every person I've talked to about it hates making the turn there...). The lead traffic engineer was very understanding and agreed it was a terrible intersection. But the hard stats showed few accidents there, so he'd never be able to redirect money from more problematic areas...


Interestingly, the fact that the intersection feels confusing forces people to pay close attention to figure out what's going on. That can lead to fewer accidents in some cases.

I can't currently find it, but I once read about a roundabout that had accident problems, but had very good signage. People felt entirely in control going into it and weren't paying as much attention as they could be. By removing the signage people wouldn't inadvertently take for granted the behavior of the other drivers, they'd slow down more, they'd pay closer attention, and as a result the accident rate dropped.


Is there a mandatory third-party review of, for example, the mechanical design of a car? Serious question.


By and large, yes. NHSTA testing could be considered one aspect of review. There are all sorts of other DoT reviews of various aspects of car design to ensure that various safety lessons learned through the years are applied to cars. Standards like the gas tank is in front of the rear axle to help prevent it from being punctured, the placement and configuration of the gear shifter -- it's why chryslers no longer have push button gear selectors and why no one has column mounted manual shifters any more. Auto ECUs really are an outlier in terms of auditing and examination.


Do these ecu's do anything considered special that you couldn't just mandate their specs to be open and let the public ridicule do the rest?


There are some bits in them that would/could rightly be considered proprietary, air/fuel maps, ignition timing, transmission shift logic, etc. I don't see any reason why those proprietary bits can't be segregated from the code we're interested in.

So, to answer your question, maybe they do, but probably not by necessity.



I think those are two separate issues.

Big corporations thrive with complex standards being imposed by governments, because it increases the barrier to competition. $50m in compliance costs might hurt Toyota a bit, but make the next Tesla unfeasible to get off the ground.

Therefore, they'd most likely see it as a positive, even if some government employees got to see their trade secrets and the potential for leaks.

On the other hand, there's no upside to aftermarket hacking of your product, and plenty of downside (from the corporate perspective).


Given that we barely test drivers beyond a drive around the local walmart lot, I would argue the level of review is more than adequate.

Need to focus where it actually makes a difference.


One of the fortunate things about driving an older car is mechanical accelerator and mechanical steering.

I fear for when I have to upgrade someday to all drive by wire.

As coders we all know how bad some code can be or even how the best code has flaws.


Recursion, though it gives me a big programmer woodie because of the beauty of the code, in my opinion is a poor programming practice. You can always use a loop in lieu of recursion.


Unless of course the language is designed and optimised for it. For instance, Jane Street use OCaml - in which recursion is a standard language primitive - and they sometimes prove their mission critical code correct.


Some languages require you to use recursion. For instance Erlang, without recursion you're not going to get very far.


Yeah and they have TCO, which C doesn't.

It's not recursion that is bad, it's unchecked stack growth, in environments where beancounters made sure the hardware is adequate but not more...


C does have tail call optimization in some instances, that's a function of the compiler and while not all of the compilers can do this some can:

http://ridiculousfish.com/blog/posts/will-it-optimize.html

C the language does not specify how optimizing compilers should handle this situation. -O2 on gcc will get you this optimization (and many more), it will even do TCO on really hard cases, some where FP languages/compilers would fail!


Why do you say that recursion is a poor programming practice (curious)? What if the solution using recursion results in more readable or declarative code? (especially with languages/compilers that do tail-call optimization properly?)


It's the most optimal if readability is the goal, but not if reliability is the goal. Programmers often assume ideal interpreters/compilers and runtime environments. This is often not the reality, especially as the environment gets closer to hardware. Recursive implementations in particular can more easily expose peculiarities in the stack implementation, yes?


Readability is a very important contributor to reliability -- minimizing WTFs is important.

That said, writing code that fits the construction of the language is probably more useful than siding between "loops or recursion". If I'm writing Scala, my code is usually chock-full-of-recursion. If I'm writing C, not really as much, because I'm thinking "which bits go into which memory". With Python I tend to think in terms of list comprehensions, etc.


Recursion is perfectly ok when the code is not mission critical, for example for rapid prototyping.

Recursion is A BIG ERROR for mission critical software because it needs variable memory in the stack that depends on the implementation and depends on the other memory on the stack(depending on where it is called), creating stack overflows, breaking completely the system.

This means that you write software today for a car, it works fine. Tomorrow you reuse it for a motorcycle and only on specific cases it breaks dramatically, because the motorcycle people used another debugger or another OS with different assumptions.

Now when you debug something, you expect errors, and errors in the new code. But when something already runs and fails sporadically, it is extremely difficult to isolate the bug, because you can't see the bug but the cascade consequences of the bug.

"What if the solution using recursion results in more readable or declarative code?"

Recursion is something that looks perfectly ok in code,on paper could look gorgeous, but could fail terribly in the implementation.

The judge is not interested, when you say to her: yeah, it broke and killed someone, but it looks ok in the code!

"(especially with languages/compilers that do tail-call optimization properly?)"

If compilers were perfect, it will be no problem. On real life no compiler is perfect.

We design compilers and they are pretty stupid, it is really hard to represent all the complexity of the world for any given possibility in a compiler.


Off-by-one errors must certainly more common in loops than in writing the base case of a recursive function--because the recursive description is more clear. This is just about knowing the environment your code runs in. If the compiler doesn't optimize tail-calls and you'll have limited stack, then yes, it's the wrong choice.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: