Space Shuttle computer has 1MB of RAM

tomkinstinch · on March 29, 2010

Here is an article about developing the 'perfect' software that runs the shuttle:

"They Write the Right Stuff"

http://www.fastcompany.com/node/28121/print

aidenn0 · on March 29, 2010

And it's 260 engineers to write 420 KLOC. This is why software is buggy; it's to expensive to make it not buggy.

izend · on March 29, 2010

And it required 380 engineers to build the Burj Dubai.

The only difference is for most software projects it is acceptable to have "bugs" as customers will still pay for it, even if it's in a poor state of quality. But "bugs" in a 2,717 ft tower are unacceptable.

It comes down to, in general the cost of a bug in most software is relatively low compared to other engineering disciplines. Of course if the software is critical support systems for Astronauts that's a different story.

wreel · on March 29, 2010

> But "bugs" in a 2,717 ft tower are unacceptable.

People have a tendency to overlook "bugs" in architecture because of the immutability of the medium. But once you start noticing that the HVAC is uneven on your floor, the hot water takes five minutes for ramp up in your office's break room and the conduit runs for networking are bizarre you start to understand that architects and builders run into design and implementation problems as well.

In fact they have their own word for patching. Renovations.

electromagnetic · on March 30, 2010

Having worked doing building renovations I can honestly say it's rather frequent that you look at how the building is laid out and how the wiring/water/gas was run and say "what the heck was this person thinking?" - and that's not even including the times I've seen things completely impractical or wholly illegal.

In my present house we cannot have the central air turned on in the downstairs toilet if we want heating to the smallest bedroom (the 'infant' room) upstairs. Similarly the basement air vent was so distant from the conduits that the main level essentially has a heated floor (hot air can be felt escaping through almost every gap in the basement ceiling except the air vent it's supposed to come out of. Similarly the plumbing is so unbalanced that the shower can essentially be turned off by a good configuration of tap turns and toilet flushes.

Architecture is a joke, and the 'engineers' who design them should really consider building a house themselves, because I've seen every crazy design from doors being hinged to trap people in the corner of a hallway (literally you'd have to step into the far corner to squeeze around the door to exit the bedroom), to lighting layouts that didn't illuminate the room, or switches completely impracticably placed.

You're entirely right, people overlook 'bugs' in architecture, although in my experience it's largely because of ignorance of what it being right is. How do you compare a poor plumbing design in your house? Do you run through your best friend's place screwing around with his sinks and toilets while he's in the shower?

It's easy to compare two software programs to find a less buggy option, however it's hard to compare two houses to find a less buggy option.

kjhgfvgbjnm · on March 29, 2010

The Burj Dubai, like any other structure is full of bugs.

The point of engineering is to make a system that is resilient to a certain level of faults. That's why the tower doesn't collapse if there is a 0.1 mm crack in one of the bolts.

Groxx · on March 29, 2010

It also depends on how you define a "bug". A car that crumples too much and injures the driver when hit at this angle or that angle within these speeds could be considered a bug. In which case, counting the number of knobs that fall off, plastic that cracks, and parts that wear out prematurely, not to mention discoveries of toxic fumes / paint / etc, cars do have thousands of bugs.

kjhgfvgbjnm · on March 29, 2010

The main problem with software 'engineering' is the fragility of software. In a structure you know there will be cracks, so you choose a material that is ductile enough that a crack can't grow to dangerous size in the lifetime of the component. The problem with software is that generally a single bit wrong is a total failure.

I was involved in a case with a turbine blade fracture. The user claimed that there must have been a flaw, and yes the crack must have started at a single atom sized crystal flaw in the meta. But the choice of alloy was such that the crack would have grown at a rate which means it was under the failure size for at least twice the inspection interval - where the customer missed it.

abstractbill · on March 29, 2010

The problem with software is that generally a single bit wrong is a total failure.

This doesn't mesh with my experience to be honest. I often come across code that contains nasty bugs but is still somehow working accidentally. And even more often, there are bugs that stop just one feature from working while the rest of a large system pretty much acts as if the bug didn't exist.

khafra · on March 29, 2010

Perhaps the problem with software is that, after the original QA process, there's seldom a regular inspection except adversarially.

dkl · on March 29, 2010

The door knob example is terrible. How about this: an office that is either too hot or cold? Would that qualify as a bug? I think it does, and it happens all the time in new buildings.

nostrademons · on March 30, 2010

Tunnels in Boston or bridges in San Francisco, however...

andr · on March 30, 2010

Last week all the public parts of the Burj Dubai were closed for maintenance. I guess they did have some bugs.

rbanffy · on March 30, 2010

That's just a bad excuse.

pmiller2 · on March 30, 2010

>That's just a bad excuse.

No, it's good economics. In commercial software development, after a certain point, the marginal benefit of fixing bugs divided by the marginal cost of doing so starts to exceed what the market will bear for consumer software. We literally get what we pay for when it comes to commercial software.

rbanffy · on March 30, 2010

I believe we have come a long way from the days when the shuttle computer was cutting edge.

In fact, it's a bit of a shame we still haven't got how to make correct software, but it's no way impossible and should be getting cheaper by the day.

I think should is the important word here. It's not and we have nobody but ourselves to blame.

Groxx · on March 29, 2010

Must be nice to have such clearly defined problems.

edit: woah, feature-fail == bug? Why'd it take them until 2007 to not have to reboot whenever they reached a new year? I'd call that a rather significant, costly bug, even if they don't, and it may very well be one of the longest-running in actively-developed software.

jonknee · on March 29, 2010

I'd imagine they go by if it's not in the spec it's not a bug. They can't do something not in the spec. It was a known limitation, not a "bug".

Groxx · on March 29, 2010

It was a known limitation, not a "bug".

That pretty strongly implies that all known strange-behavior is not a bug. Also, all known bugs are not bugs, as they can be known and worked around.

If you had to shut down your computer during the year cross-over, would you consider it a bug? Or what if you couldn't go on your business trip because of the year cross-over.

And a bug in the spec is still a bug. Oversights in specifications are equivalent to oversights in programming.

jonknee · on March 29, 2010

If I commissioned a computer to be made that wasn't supposed to be able to span the new year then no I would not consider it a bug that it could not span the new year. Sure it would be great if it would do that (though only if I had asked that it would), but as long as it behaves exactly as expected then I'm happy. No astronauts die and my budget won't cut. Success.

A similar parallel in consumer computing is switchable graphics. The first models that supported this required a reboot to accomplish it. That wasn't a bug, just part of the plan. They have since gotten more advanced and allow switching without a restart.

Groxx · on March 29, 2010

Which is why I specifically included the business trip part. They had to schedule around those dates, losing a week or more of possible launch times, because they couldn't handle a year+1 operation. Given how picky they are with launch windows, this could easily mean large delays, meaning large amounts of money.

Not a minor feature, methinks.

jonknee · on March 29, 2010

I don't know why planning a launch several years in advance a week later means a huge cost. They never scheduled a flight and had to change it because of the limitation, it was built into the calendar. Not having shuttles in the air during New Years might actually save money--less overtime pay.

derefr · on March 30, 2010

The difference is that NASA uses systems engineers. From their point of view, it is not the "program" which can have bugs, but rather the "system" in its entirety—where "system" consists of the software, the hardware in the field, the hardware in the control center, and all the operations staff paid to perform certain tasks (i.e. execute algorithms—basically additional hardware.) If one part of your system (the software) needs to be rebooted, but another part (the people) is "programmed" to do it autonomously, then the system as a whole has no bugs.

It's sort of the thinking that goes into Erlang programs—rather than, say, thinking out complex GC strategies, you just create little process-sized vessels which grow, overflow, are killed, and then are recreated by other processes. It's not a "bug" that an individual process has died any more than it's a bug when one of the cells in your body dies.

robryan · on March 29, 2010

I think the article skims over a little bit at the end the cost of developing software in this way. For the vast majority of non critical systems there really isn't much value in this methodology.

wallflower · on March 29, 2010

I remember my Digital Circuits prof saying the Apollo Flight Computer was constructed out of thousands of NOR-gates.

At the time I was struggling to comprehend how to construct a basic circuit, so I was wow'd (and still am).

http://en.wikipedia.org/wiki/Apollo_Guidance_Computer

sandGorgon · on March 29, 2010

That is'nt very surprising for a person working the semiconductor design industry. Actually there are no AND/OR/NOT gates fabricated on a modern IC.

They are all NAND gates, since they have better electrical characteristics and are Universal Gates (they can form AND/OR/NOT gates in some combination).

What is surprising is that they used NOR gates - while they are Universal Gates as well, their fabrication apparently leads to poorer electrical properties.

sparky · on March 30, 2010

NAND gates have two PMOS (pull-up) transistors in parallel and two NMOS (pull-down) transistors in series. This matches well with most CMOS processes, in which NMOS are faster. NOR gates have two PMOS in series and two NMOS in parallel, so either your pull up time will be significantly longer than your pull down time, or you're going to have to make the PMOS pretty big.

None of this really mattered at the time; the Apollo used resistor-transistor logic (RTL), which in turn used bipolar junction transistors (BJTs) instead of the CMOS (C is for complementary, meaning NMOS and PMOS) used for most digital logic today. In RTL (there are analogous logic families using CMOS too), instead of having a pull-up network of transistors, there is a weak pull-up resistor that makes the output high by default, unless it is pulled down by a network of (usually NPN) BJTs. In the case of an RTL NOR gate, it is just a bunch of NPN BJTs in parallel for the pull-down network, so it's pretty efficient ( http://www.play-hookey.com/digital/experiments/rtl_nor4.html ).

Sorry for the longish post.

sandGorgon · on March 30, 2010

oh wow.. I had completely forgotten about RTL.

Thanks for the explanation.

jonsen · on March 29, 2010

Electrical switching circuits have the possibility of two logical interpretations, fx

  Electrical  Positive   Negative
  function    logic      logic
  
   L L | H    0 0 | 1    1 1 | 0
   L H | H    0 1 | 1    1 0 | 0
   H L | H    1 0 | 1    0 1 | 0
   H H | L    1 1 | 0    0 0 | 1
                NAND       NOR

joezydeco · on March 29, 2010

The hand-woven core memory "rope" is what blew me away.

http://authors.library.caltech.edu/5456/1/hrst.mit.edu/hrs/a...

tewks · on March 29, 2010

http://en.wikipedia.org/wiki/Functional_completeness

nand is more likely.

drbaskin · on March 29, 2010

I don't think that the wiki article you cite supports your claim -- it states that nand and nor are both functionally complete but does not (at the time of this writing) indicate why nand would be more likely.

tewks · on March 30, 2010

The last sentence of that article states, "In particular, all logic gates can be assembled from binary NAND gates." It doesn't mention NOR in the same context for good reason. NAND gates are preferred because they are more cost effective.

"In complicated logical expressions, normally written in terms of other logic functions such as AND, OR, and NOT, writing these in terms of NAND saves on cost, because implementing such circuits using NAND gate yields a more compact result than the alternatives."

http://en.wikipedia.org/wiki/NAND_gate

Also see sandGorgon's comment above: http://news.ycombinator.org/item?id=1226971

icefox · on March 29, 2010

"Similarly, the Russian Soyuz capsule’s computer ran on only 6 kilobytes of RAM until it was replaced with newer systems in 2003, which most probably was the cause of its subsequent crash-landing in Kazakhstan."

Really? Really?

ugh · on March 29, 2010

Really (http://en.wikipedia.org/wiki/Soyuz_TMA-1). That was the first flight of the new Soyuz TMA with a glass cockpit.

The capsule’s failsafe mechanisms where triggered and the it fell back to the harsher ballistic reentry instead of the normal controlled one (nobody was harmed). It seems that a system which has been in use since 1979 somehow got confused by the sensory data. Some sort of odd bug, they have never been able to reproduce it.

Fallback to ballistic reentries also happened later with TMA-10 and TMA-11 – in those two cases a damaged cable and a pyro bolt malfunction (which nearly got the crew killed) were responsible respectively.

russss · on March 29, 2010

The guidance computer doesn't have enough RAM to contain the code for the entire mission, though. They have to load in a new program once the the orbiter is in space, and another one before it de-orbits.

There's a fascinating (and very detailed) series of lectures on the Shuttle's design online on MIT's OpenCourseware site: http://ocw.mit.edu/OcwWeb/Aeronautics-and-Astronautics/16-88...

henning · on March 29, 2010

For those interested in software issues in aerospace, you might want to listen to Episode 100 of Software Engineering Radio: http://www.se-radio.net/podcast/2008-06/episode-100-software...

It features an extensive interview with a guy from DLR, the German equivalent of NASA. They talk at length about culture (freedom to fail), practices (extensive re-use), and ballpark performance metrics (on the order of < 10 LOC per programmer per day).

vital101 · on March 29, 2010

The shuttle computer does what computers do best: crunch numbers. I'd be interested in knowing what sort of hardware that RAM plugs in to, and also exactly how fast operations need to be completed in order for the shuttle to not fall out of the sky.

Retric · on March 29, 2010

The shuttle uses 4 separate 1.2MHz IBM AP-101 operating in lockstep / redundant systems, with a 5th running an independent system. I assume the fifth is setup in case of software failures.

http://en.wikipedia.org/wiki/IBM_AP-101

http://en.wikipedia.org/wiki/Space_Shuttle

PS: It's easy to forget just how old and hacked together the Space Shuttle is: Historically, the Shuttle was not launched if its flight would run from December to January (a year-end rollover or YERO). Its flight software, designed in the 1970s, was not designed for this, and would require the orbiter's computers be reset through a change of year, which could cause a glitch while in orbit. In 2007, NASA engineers devised a solution so Shuttle flights could cross the year-end boundary.[38]

brk · on March 29, 2010

* I assume the fifth is setup in case of software failures.*

It is. The 5th has just enough to get the shuttle home, and is designed by a separate team. The presumption being that anything that knocked out the first 4 is probably a systematic failure.

skoob · on March 29, 2010

That's weird. Why would they even need date logic in the flight software?

sophacles · on March 29, 2010

It probably has something to do with where to point the thing during re-entry. The shuttle would need to be able to land on different days in case something went wrong. Actually, the date stuff is probably used for all sorts of orbital calculations.

kjhgfvgbjnm · on March 29, 2010

It doesn't - but can you guarantee that everything it is talking to doesn't also change?

A GPS string might erroneously say it's day 367 or be one character too long because it printed 20010 or a telemetry link might ask for 4million readings because it calculated "-1"

zandorg · on March 29, 2010

Always when this story comes up, I reference this: http://www.dreamsongs.com/LessonsFromNothing.html - a Lisp guru talking about this story.

bmalicoat · on March 29, 2010

Pretty interesting. Designing the whole system from scratch and being able to remove all unused overhead associated with 'modern' OSs certainly helps being able to fit it all in 1MB.

jcromartie · on March 29, 2010

That, and it was built a few decades ago.

ThomPete · on March 29, 2010

The kind of programs that go into space shuttles and airplanes are quite different than the windows and os x systems out there.

Their task are very specific not really leaving room for flexibility. You can almost compare the importance of these systems working exactly as expected with the physical parts of the shuttle.

dschn · on March 29, 2010

Some (if not most) of the on-board shuttle programs are written in HAL/S.

http://history.nasa.gov/computers/Appendix-II.html

http://en.wikipedia.org/wiki/HAL/S

jorgecastillo · on March 29, 2010

I just hope they don't follow the example of the British Navy(installing Windows XP in nuclear subs).

P.S. A BSoD would indeed really become a Blue Screen of Death.

ugh · on March 29, 2010

And now it’s going to the junkyard. Together with the shuttles that never lived up to their aspirations. Sad, really.

robryan · on March 29, 2010

Given budget constraints and the level of safety required in something like the shuttle I think they have lived up to realistic aspirations.

ugh · on March 30, 2010

The Shuttle did well for what it was. I still think, though, that the original plan for a lean little ship which can bring people into orbit would have led the Shuttle to a better future. Instead it became this overblown space truck.

But, yeah, it’s not all black and white. Building the ISS would have been very hard without the overblown space truck.

code_duck · on March 29, 2010

I wonder how that was all written. Is it mainly hardware, assembly or a language such as Forth?

ChillyWater · on March 29, 2010

Most of it is written in a custom language called HAL/S (High Level Assembly Language / Shuttle). Some of the nitty gritty is written in assembly.

HAL/S looks a little bit like basic or FORTRAN. It was designed to be very readable and uses some interesting formatting. When you print out a module (lets say GG1ASC), it uses three lines for each line of code so that the superscripts and subscripts are above and below like they should be.

zokier · on March 29, 2010

I have heard that Ada is popular in systems such as this, although I'm not sure if its actually used in here.

imd · on March 29, 2010

I'd like to hear how the code of the new commercial space companies, like SpaceX, compares.

kjhgfvgbjnm · on March 29, 2010

They use the same industrial computers ands real-time OSs that modern airliners use, and which Nasa uses on more modern kit.

The shuttle software isn't necessarily a good engineering solution.Spending the same time/effort/budget on a more critical part might have saved some of the shuttle failures.

If you think of it in aircraft terms, is it better to spend $ making the avionics software 99.9999% reliable rather than 99.999% or is it better to fit smoke hoods/more exits/better weather radar etc.

protomyth · on March 29, 2010

an probably some of the most audited code on the planet