Software Does Not Fail
Paul Niquette
Copyright ©1996 Resource Books, Inc. All rights reserved.
Part 1
The Meaning of Work

Do not read the title again. I'll repeat it for you: "Software does not fail."

"Everything fails," people say. "Hey, especially software."

Whatever their intelligence, their knowledge, their experience, these people have one thing in common.


Software either works or it does not work.

Software that does not work has not failed. It did not work in the first place.

Neither Moth nor Dust

If software ever worked, it will always work. Always.

People think that software worked in the first place and then stopped working.

Hardware Fails

To work, software must run, which means commanding hardware to do what hardware is supposed to do. Software, then, depends upon hardware. And hardware...

Hardware Always Fails

It is possible for hardware to work and then stop working.

No question about it, when hardware works and then stops working, that's a failure.

Repair Applies to Failures

A hardware failure can be fixed by replacement of the hardware in whole or in part. Replacement by identical hardware, in fact. That's what 'repair' is.

If software does not comply with the specification, replacing it in whole or in part with identical software will not make it comply with the specification. Replacing software that does not work with redesigned software that does work is not 'repair.' {HyperNote 5}

Reliability Applies to Failures

Reliability is a term which has meaning only for hardware not software. Reliability is defined as the probability of not failing within a given period of time.

Software does not fail within any period of time. The statistical expression 'mean time between failures' (MTBF) has no meaning for software, inasmuch as the inverse, 'failure rate,' is zero for software.{HyperNote 6}

Concrete Versus Abstract

Hardware is 'hard,' which is to say concrete, not abstract. Hardware can be made, in some sense harder, more reliable. Hardware can be...

Harder the better, presumably. But never hard enough. Sooner or later, hardware fails.

Software, being abstract, is -- well, 'soft.' No reason to make software hard, though.

Nothing Else Is Software

There is nothing softer than software. Other things may be as abstract as software but hardly softer.

Software is a procedure but not all procedures are software. In fact no other procedures are software.

People like to call books and films and videos "software."

The Specification

The specification for either hardware or software is abstract and therefore soft. The specification does not do what software does, however. So the specification cannot be said to work or not work.

The specification, which is derived from functional requirements, can be dichotomized many ways:

To go from one to another, the specification itself would need to be 'revised.' Or the functional requirements would need to be 'amended.'

One Dichotomy

Only one dichotomy applies to software: either it works or it does not work, which is determined by whether the software complies with the specification. Same for hardware.

A failure, however, takes hardware from working to not working; conversely, repair takes hardware from not working to working.

For the record, then, complying with a specification that is...

...does not mean failure of either hardware or software.{HyperNote 8}

Objects

What software does is access 'objects' and assign values to 'objects.'

Accessing an object in memory is called 'reading.' Assigning a value to an object in memory is called 'writing.' Software does something else, too.

Internal States

Software references its 'internal state' and changes its 'internal state.'

Hardware does what software does, but hardware does something else, too. Hardware fails.

Cases

To be said to work, software must comply with the specification in all 'cases.' Not some, all.

Now, depending on its 'internal state,' software generally does something different with a given combination of input values. A given case, then, is not sufficient to determine what software does.

Sequences

Each internal state is reached as the result of software being acted upon by a sequence of input values -- cases. Software changes its internal state whenever necessary to comply with the specification. Which means that a sequence of cases is necessary to determine what software does.

The statement that software must be designed to work in all sequences of cases is general enough to accommodate software's ability to reference and change its internal state.

Doing the Right Thing

Software necessarily does the same thing in each sequence of cases -- the right thing. If what the software does in a given sequence of cases complies with the specification, the software is said to work -- for that sequence of cases.

Not only that but the specification must determine software's response to a class of inputs called 'interrupts,' which are distinguished by the fact that an 'interrupt' can directly change software's internal state -- but not without the authorization by software. The specification must also determine software's treatment of the 'non-maskable interrupt' and 'reset.'

Specification in Code

To do what it does, software must run on hardware, which means commanding the hardware on which it runs.

The specification is not code, and more than one code may comply with the specification. Each code can be designed differently and produce a different sequence of internal states.

In other words, a specification determines what software is supposed to do but not how the software is supposed to do it. A specification must only be explicit enough to determine what values software will assign to output objects in each sequence of cases -- the external consequences of each sequence.


A sophisticated person would describe 'design' as a process not a procedure, the latter implying the power to determine. Code is not determined by a procedure.

Software Does Not Fail
Part 2
Quality Assurance

The expression 'quality assurance' was initially coined for software. Today the term also applies to hardware.

To be said to work, quality assurance must prove that software works. More to the point, quality assurance must not allow software that does not work to be said to work.


Quality assurance is a procedure and therefore soft -- as soft as software. It either works or it does not work.

That depends on the complexity of software. Complexity is indeed a matter of degree. Degrees, plural.

The quantity of cases and sequences of cases can be exceedingly large. But not infinite. Quality can in principle always be assured.

Testing: Myth and Reality

Testing is the best way to assure quality, some people say. They are not sophisticated.

If the software does not do what it is supposed to do in a given test, the software does not work. On the other hand, if the software does do what it is supposed to do in a given test, the software may or may not work.

A typical 'test' would be prepared as a sequence of input objects that are systematically caused to act upon the software. The sequence of values assigned by the software to the output objects would then be observed. The output sequence, having been independently predicted, would need to be compared with what the software actually outputs.

Cleverness: The Myth

A clever selection of tests might make quality assurance by testing practical.

The clever selection of a test from all possible tests can be exceedingly difficult. More difficult, in fact, than designing the code. That testing is not the best way to assure quality can be demonstrated for software of any complexity.


Simple Example

Consider the following simple specification:

  1. Software is required to assign a value to one output object based on the value of one input object. The output object is a bit, and the input object is a byte.{HyperNote 13}
  2. Starting out in a 'reset' state, the output is set to a value of zero.
  3. Whenever the binary value of the input is greater than one hundred, the output is set to the value one,which is sustained until...
  4. The binary value of the input equals zero, whereupon the output is reset to a value of zero.

Many practical applications rely on such a specification. The input object may represent...

...the output object in each case being used to allow or to limit some action.


Trivial Sequence

There are merely two 'internal states' required to fulfill the specification in the Simple Example: Call them 'on' and 'off,' which in conjunction with the input object determine the value of the output object to be one and zero.

After 101 tests, then, the software would not have been proven to work. The software may, in fact, be utterly ignoring the input values.

After 102 tests, the software has not been proven to work. There are still 154 more tests to go. {HyperNote 14}

Cleverness: The Reality

For the Simple Example, tests do indeed relate to each other. There are four sets of cases:

  1. those which are specified to leave the internal state 'off,'
  2. those which are specified to switch the internal state from 'off' to 'on,'
  3. those which are specified to leave the internal state 'on,'
  4. those which are specified to switch the internal state from 'on' to 'off'

For quality-assurance-by-testing, sets numbers 1 and 2 are the most interesting, since 256 cases must be tested. Still, it is tempting to suppose that a 'clever' selection of cases will make quality-assurance-by-testing practical. There seems to be no need to bother with input values much above or below the dividing line, 100. The idea would be to concentrate testing in the range of, say, 90 to 110.

The Rational Software Engineer

Assume there to be a rational software engineer who, given the specification for the Simple Example, would yawn and then design his or her software to perform a simple arithmetic comparison between the binary value of the input byte and a constant value of 100.

For example, he or she might have neglected to code the software to treat the input object as an unsigned integer. If so, then for all tests of input values higher than 127, the software would interpret the binary number as negative.{HyperNote 15}

Accordingly, the software does not work.

Finding Out the Hard Way

For that particular mistake in code, then, a test having an input value larger than 127 would prove that the software does not work. On the other hand, quality assurance based only on a 'clever' selection of tests near the value 100 would allow software with a design mistake to be said to work.

Then one of the untested input values comes along and poof:

Hardly the preferred way to find out that the software -- what? -- did not work.

Quality Assurance Did Not Fail

Some people will say it was the quality assurance that failed.

Quality assurance does not fail. For the Simple Example, what was demonstrated is...

...by proving that the software did not work.{HyperNote 17}

Quality assurance did not work for a few tests -- and then stop working. That would be the evidence of failure. Instead, quality assurance did not work, period.

Cleverness: The Impossibility

If performing all possible tests is not practical, then quality-assurance-by-testing will work if and only if the selected tests are the ones that do not allow software that does not work to be said to work.

A sophisticated person will say that it would be impossible to select tests that assure quality without knowledge of the code. Some people will say that 'improbable' is a better word, but software deserves more stringent treatment:

Knowing the specification is not enough, as we shall see.

The Reasonable Software Engineer

Consider the same Simple Example that was coded by the rational software engineer in the hands of a different person -- a reasonable software engineer. After yawning, he or she designs the software to use a table look-up procedure.

Rather than arithmetically comparing the binary value of each input byte to a constant 100, the software is coded to treat the byte as a 'pointer' which references an object in memory called a 'bit-map.'

There are reasons for the reasonable software engineer to adopt this more general design approach. A 'translator,' for example, would have called for a byte as the output, which is best coded as a table-lookup.

Let us again postulate that quality assurance is to be achieved by testing -- without knowledge of the code, only the specification. As before, there are 256 possible tests. The problem would be to select fewer than all possible tests and still assure quality.

Quality-assurance-by-testing would not work unless the selected tests happen -- by pure chance -- to include the one or more in which the software does not work. After the software is put into use and the untested input value arises, the consequences would be attributed to a software failure by some people -- and to quality assurance failure by other people.

A sophisticated person would say...

The tests that might have proven that the software does not work in the first place were not performed, so the software was said to work.{HyperNote 18}


The Simple Example is not uncommon and the coding errors postulated above are not far-fetched. The 'software age' is with us. What ever shall we do? {HyperNote 19}


Software Does Not Fail
Part 3
The Code Walk-Through

A sophisticated persons will say that testing does not assure quality unless all possible tests are performed.

In software, complexity makes testing all possible cases not practical, and making a 'clever' selection of tests that will assure quality without knowledge of the code is not possible. The next sentence may need to be read more than once.


First Principles Apply

Setting aside quality-assurance-by-testing, a sophisticated review of First Principles would include...

  1. Assuring that a given thing will work requires knowledge of the specification of that thing plus...
  2. Assuring that a given thing will work requires knowledge of how the thing works.
  3. Knowledge of anything requires study.
  4. Knowledge of how things work requires a special kind of study, called the 'Design Review.'
  5. There needs to be a procedure for conducting the Design Review.
  6. The procedure used for conducting the Design Review of software is called the 'Code Walk-Through.'
  7. As a procedure, the Code Walk-Through is abstract and therefore soft.
  8. Being abstract and soft means that the Code Walk-Through either works or it does not work.
  9. To be said to work, the Code Walk-Through must find all mistakes in the code. Not some, all.
  10. The Code Walk-Through runs on a special kind of hardware: humans.
  11. The Code Walk-Through does not have the power to command the hardware on which it runs.
  12. The Code Walk-Through, therefore, is not software; nevertheless,...
  13. The Code Walk-Through does not fail. Instead,..
  14. The Code Walk-Through which does not find all mistakes did not work.

Undersight

The human conducting the code walk-through can make an 'undersight,' of course, much as a software engineer designing software can make a mistake. {HyperNote 21}

If there is no undersight but not all the mistakes are found, the Code Walk-Through did not work.

Unlike quality-assurance-by-testing, the Code Walk-Through cannot be automated much.

Unlike quality-assurance-by-testing, the Code Walk-Through is mind labor. Like designing software.

Soft Things Made Soft

Quality assurance by the Code Walk-Through can be exceedingly difficult. But no more difficult than designing the software itself. And far less difficult than testing all possible sequences of cases.

Consider the Simple Example again. Only a few lines of code were needed to meet the specification.

Performing the Code Walk-Through of a few lines would surely be less difficult than quality-assurance-by testing, which means...

Then too, many of these steps are vulnerable to human error -- undersights.

Managing Complexity

Some people will say that complexity makes the Code Walk-Through not practical.

If the code is too complex to walk-through, it is too complex to design -- and too complex to test, since without knowledge of the code, quality assurance requires testing all sequences of cases, which is not practical.

Managing Practicality

One way to do that is to partition the code into 'modules.' A specification must then be prepared for what each module is supposed to do and for what all the modules together are supposed to do.

On the other hand, if the code is modular, the Code Walk-Through can indeed more readily assure quality of software.

The Sophisticated Code Walk-Through

The sophisticated Code Walk-Through must include procedures...


A human must perform all of those procedures -- and more -- with no undersights.The consequent software may still not work.


HyperNotes


{1} References That Fail

See for example Pressman, Roger S., Software Engineering (1987, McGraw-Hill) or von Mayrhauser, Anneliese, Software Engineering (1990, Harcourt Brace Jovanovich). As for hearsay references, suffice it to say that forcible expressions of these sentiments have reached my ears from a hundred sources since the mid-fifties. {Return}


{2} Cannon-Balls

My license for bluntness here is appropriated from Ralph Waldo Emerson (1803-1882).


{3} Unintended Features

Permit me to quote William Makepeace Thackeray (1811-1863):

Words well suited to lamenting the unintended consequences of features gratuitously designed into software. {Return}


{4} Give me a break.

Some people will say that not working in the first place is just another expression for 'failure.'

An immense distinction is being presented here. We need to restore the verb 'fail' to its full meaning. One might substitute 'break'-- but only for hardware. Nobody says software breaks. {Return}


{5} Words to Work By

Terms selected here are somewhat arbitrary but intended to serve clarity by virtue of consistency, thus one might...

Software 'maintenance' in every meaningful respect...

...does not even faintly resemble hardware maintenance. {Return}


{6} References That Crash

Unhelpful indeed are statements such as are found in Siewiorek, Daniel P. and Swarz, Robert S., The Theory and Practice of Reliable System Design (1982, Digital Press, Bedford, MA):

The passage goes on to attribute software 'failures' to "the introduction of new features" or "previously undetected faults." {Return}


{7} Softening Hardware

Ironically, hardware gets effectively harder when more of its functions are given over to software, as in embedded controllers or, in the extreme, the 'reduced instruction set computer' (RISC).


{8} Words in Use

The distinction being drawn is between 'software' and 'hardware,' the latter can be taken to mean a 'computer,' 'processor,' 'controller,' whatever.

Professional 'programmers' were replaced by 'software engineers' in the mid-seventies; the other kind became 'hackers.' As a term of disparagement, 'hacker' has two meanings: 'cybumblers' and 'cyburglers.' {Return}


{9} Generosity to Generality

The sophisticated term 'object,' which is general in the extreme, gives renewed evidence of a linguistic struggle dating back to the dawn of the 'software age' and the realization that calling 'software engineering' 'computer science' is tantamount to calling 'mechanical engineering' 'automobile science.' {Return}


{10} Who cares?

Not all cases can arise in real life, thus we have the term 'don't-care' for those cases.


{11} Reference to Knowledge

In Cultural Literacy (Random House, 1988), E. D. Hirsch frames a compelling picture of the knowledge-bound character of all cognitive skills. {Return}


{12} Code of Honor

The term 'code' is taken here to mean 'source code,' which is what software engineers produce.

Unless the development software does not work. {Return}


{13} Taking a Byte

The word 'byte' may be the purest neologism of the 'software age,' giving evidence of the forlorn and tardy impact of the 'computer science' on the language we speak. I dare to use the term here without explanation. {Return}


{14} Numbers to Count On

The 256 test cases in the Simple Example are not too many to do, even by hand. A minor tweak of the example, though, to 16 bits on the input will change that.

The issue is, of course, Sir Karl Popper's Principle of Non-Falsifiability. {Return}


{15} Song of a Bit

The leftmost digit of a binary number of a given length can be treated either as a binary bit or as the sign of a binary number one bit shorter in length.


{16} Reference to Nonsense

A compendium of the most sensational consequences of -- and misguided attributions to -- software failures appears in Leonard Lee's The Day the Phones Stopped (1991 Donald T. Fine, Inc), which inspired the present polemic. The book is listed by the Library of Congress under the category 'Computer software -- Reliability.' {Return}


{17} Bugs and Glitches

Most people like to say that if software works in some cases but does not work in other cases, the software has a 'bug' in it. Which is more meaningful than saying that the software failed. But not by a lot. The software did not get a bug in it. The software engineer made a mistake.

A few people like to say that if software works in some cases but does not work in other cases, the software has a 'glitch' in it. Which is less meaningful than saying that the software failed. By a lot.


{18} Crap Dusting

One handy software engineering tool is the 'debugger,' which is a misnomer tantamount to calling a 'magnifying glass' an 'insecticide.' {Return}


{19} Mole Hill Mountain

The Simple Example appeared as a portion of a real-life specification, which called for the software to use the binary value of the input byte to select an object from memory for reading.


{20}Verified Validity

However inconclusive, testing of software will continue to be practiced until the sun flickers from the sky.


{21} Oversight and Undersight

The word 'oversight' like 'cleave' can cut both ways ("split apart" or "cling together").

A sophisticated writer might use 'undersight' to denote the latter. {Return}


Home Page
Table of Contents
Top of Article
Emporium