Steps to Diagnose a Trap
The intent of the following material is to illustrate a proven method for
finding the cause of a trap in an application program. By first learning
how to solve the simplest problems, one will have a much better basis for
approaching more difficult problems. Historically, problem solving skills
have been largely self-taught. Much can be learned by observing others solve
problems. Many problems can be solved quickly by using significant short-cuts
and assumptions and then verifying them. When a novice observes an experienced
diagnostician, the activities are difficult to understand, and may lead
to the opinion that each problem has its own special method for solution,
which in turn leads to questions about when to use which method.
The following process will lead you to the cause of a trap.
Remember to take notes as you proceed. This will help if you are interrupted,
and want to continue later, or if you need to explain to someone else what
you found, and what facts led you to a particular analysis of the situation.
You can obviously do this manually, but you can use a log file more easily.
Just type ?' followed by whatever you wish to log. The tools will evaluate
the string, supplying the trailing quote, and show you the string, thus
adding your thoughts to the log.
- Locate the failing instruction.
If
you cannot do this, you have no place to start. Most operating systems will
provide at least an excellent clue to the location of the failing instruction,
if not its exact address.
- Determine why the failing instruction will not
execute.
A knowlege of hardware operation, or a
reference manual kept handy, is essential for this step. At the very worst,
each of the possible exceptions described in the manual can be eliminated
one by one until the cause is found.
Until you know why the instruction will not execute, you do not know what
went wrong at the machine level. Conversely, as soon as you do know, you
are prepared to begin the diagnosis of how things got into such a state.
Observe that this does not require knowlege of C, FORTRAN, COBOL, SMALLTALK,
etc. It requires only hardware knowlege.
- Analyse how the conditions for failure occurred.
It may be that an address calculation was done incorrectly, or that the
failure was due to an invalid parameter. If the former, you now need only
to discover what program has done this, and where in that program the error
occurred. Skip the next two steps.
- If an invalid parameter has been received, you
must now update your notion of the cause of the problem. You need to consider
the call as the location of the failure, and the specific parameter value
as the reason why the called routine did not execute.
- You must now analyse how the parameter was created,
and where it came from. Unwind one stack frame, and return to step 3.
- You now know what caused the problem, and now
it is time to identify the failing program, locate the failing line, find
the value of the program's variables, and, in general, collect all the data
the programmer would have had if the failure had occurred at his desk. This
step is usually a mechanical one.
Once this is done,
go find the programmer, and turn over all you know about the problem. Be
prepared to continue helping, or to show the programmer how to get additional
information.
[Back: Exercise 8: Identifying the Owner of Storage]
[Next: The OS/2 System Trace]