Sunday, April 1, 2012

Human Mind - The Users Manual

After having used our minds for many years, the only thing we know for sure about it is that we don't know enough! It is true that after many years of research, scientists have an inkling about what they do and don't know. It is also true that they do have an understanding of what they think they know, but really don't. It is an interesting fact that if we ask ourselves the right questions, we can occasionally come up with things we didn't think we knew, but actually did. But largely, the functioning of the human mind remains secretive, seemingly out of reach and full of surprises.

But you might be thinking, many researchers have told us so already, so what is new? Quite a lot, as I hope to show you further in this post. Freud was closely linked with the idea of an unconscious mind. But he did not invent that term and nor is his rather narrow view of the unconscious widely accepted by the modern psychologists. They do not reject the idea of an unconscious mind, but they do dispute the strict Freudian interpretation of its role in our mental lives.

The purpose of this post is to make you, the everyday mind user and the erring human, more familiar with the mysteries of your own mind and in the process, maybe tell you something that you did not already know. This post will also help introduce the larger subject of why Humans Err and commit violations.

The Tip of the Tongue.   The "Tip of the Tongue (TOT)" is a fairly commonplace experience and there is nothing like this to expose the subtleties of knowing and not-knowing the things that go on in your own mind. A TOT experience begins with an attempt to retrieve from your memory an item that you are sure you know, but the search fails to yield an immediate and "felt to be correct" response. Instead, it produces something that you know to be "close" or "similar", but not exactly what you are sure you know to be the correct item.

When you persist with the search, the same incorrect response keeps coming to mind in an irritating and obstructive manner! What makes this experience even more frustrating, is that this thought blocker is recognized as being "very close" to what you want to recall, and yet not exactly the correct one. We recognize that it might have very similar properties, like sound, meaning, spelling etc., but yet we are certain that it is wrong. So, how do we know all this when we do not know the correct item? Well, some part of our mind knows, but it's not the conscious part. After many attempts, we may suddenly "hit" on the correct response, or some other external stimuli may point us in the right direction, leaving us feeling stupid or inadequate for not being able to recall something "so simple and basic".

The Conscious and Automatic modes of control.   The TOT experience shows us that a user is in direct conscious contact with only a part of the whole. The conscious part is obviously located somewhere between the ears and behind the eyes, yet, at any given time, within this very limited space, the larger part of our current waking thoughts and feelings are experienced, our sense data interpreted, and our actions planned, initiated and monitored. It is also this tiny space that is is at any instant most closely identified with our innermost selves - our personal believes, attitudes, values, memories, likes, dislikes, loves, hates and the other passing clutter and 'baggage' that goes to make ones mental lives. But we are only aware of a very limited amount at any one time. The ideas, feelings, images and sensations seem to flow like a stream past a blinkered observer standing on its bank. We can't see very far upstream or downstream, but we can take in between one to two seconds worth of what goes past. This is what comprises our conscious workspace, our "here and now".

Beyond this present experience lies a vast and only partially accessible knowledge base. Some of the information contained therein is in the form of previous life experiences and events (though this gets very patchy for the time before we were 5 yrs old, maybe even later). Other parts are used to make sense of the world. And yet other knowledge structures control our routine perceptions, thoughts and actions.

We do have a rough idea of the contents of this long-term knowledge base - not all of them, of course, but enough to be aware of the general headings. But, what we don't know is how stored items are called to mind. Such retrievals can be so accurate and immediate as to convince us - incorrectly as I hope the TOT example has show - that we have direct voluntary access to all parts of the store. The reality is that while we are conscious of the products of such retrievals - the words, feelings, images, thoughts and actions - we have little or no awareness of the processes that seek them out and call them to mind. Understanding this is very important since most of our mental lives involve a continuous interaction between the conscious workspace and the long-term memory store. Sometimes we deliberately call items to mind, but at other times, they simply pop-up unbidden and at yet other times, fail to be recalled despite repeated and conscious attempts. 

The "conscious workspace" and the "long-term storage" are the two co-existing, and sometimes competing controllers of our mental lives. They work in harmony for much of the time. But they can also compete for command of the body's output mechanisms, both in observable physical world (through unintended words and actions), and in the conscious workspace (into which items may be delivered without conscious intent). This is hardly surprising, given their radically differing properties and the power of familiar environments to evoke habitual responses.

Comparing properties of "Conscious Workspace" and "Long-Term Storage".  The Conscious Workspace is: Accessible to memory, closely linked to attention and working memory, selective and resource limited, slow laborious and serial (i.e. one thing after another), Intermittently analytical (sets intentions and plans and can monitor them at various choice points, but often fails), Computationally powerful (Accepts inputs from nearly all senses, vision dominates) and accesses long-term memory by generating "calling conditions" or retrieval cues.

With the Long-term Memory, on the other hand, while the products (actions, thoughts, images, etc) are available to consciousness, the underlying processes are largely outside its reach. It has apparently unlimited resource in both the amount of stored information and the length of time for which it is retained. It is fast, effortless and parallel (i.e. can handle many things at the same time) and automatic and operation. Its behavior is governed by stored specialized knowledge structure, called Schemas, that respond only to related sensory inputs and do their own thing. It has twin basic retrieval processes: similarity-matching (like with like), and Frequency-gambling (resolving possible conflicts in favor of the most frequent, recent or emotionally charged items).

Three Levels of Performance.   The extent to which our current actions are governed either directly by conscious attention or more emotionally by pre-programmed habit patterns gives rise to three levels of performance: Knowledge Based, Rule Based and Skill Based.

KNOWLEDGE BASED PERFORMANCE.  All human performance, with the exception of what comes "hard wired" at birth, begins at Knowledge based level in which our actions are governed online by the slow, limited and laborious application of conscious attention. This level relies very heavily on conscious images or words to guide our actions, either in the form of inner speech or through the instructions of others. While this type of control is flexible and computationally powerful, it is also highly effortful, enormously tiring, extremely restricted in scope and very error prone - and we don't like it very much.

Although we all know what attention feels like, its precise function in mental life is not at all obvious. An optimum  amount of attention is needed for successful performance in all spheres of activity. But, both, too little as well as too much, can be highly disruptive. The consequences of inattention are clear enough. For understanding over-attention, try using the keyboard while thinking about what the index finger of your right hand is doing. The greater your typing skills, the more likely it is that this will cause problems.

SKILL BASED PERFORMANCE. At the other end of the spectrum, there is skill based performance. By regular practice, self-discipline, and the reshaping of our perceptions, we can gradually acquire the rudiments of a skill - that is, the ability to mix conscious goal-setting and guidance with the largely automatic control of our individual actions. This is what habits are made of. Instead of thinking of each individual word and action, we are able to package them into a series of automated actions or pre-packaged sequences. Habits diminish the conscious attention with which our acts are performed. But there are no "free lunches". Automation comes with the penalty of occasional absent-mindedness, when our actions do not go as planned.

RULE BASED PERFORMANCE.  Intermediate between the above two comes the rule based performance. This comes when we need to break-off from a sequence of largely habitual (Skill-based) activity to deal with some kind of problem, or in which our behavior needs to be modified to accommodate some change of circumstances. The commonest kinds of problems are those for which we have a pre-packaged solution, something that we have acquired through training, experience or some written procedure. These solutions are typically expressed as : if [problem X] then [apply solution Y], or if [indications A and B are present] then [it is a type C problem]. These rules are what our "experience" is made of.

However, even here there are no "Free Lunches". This is also associated with a variety of errors. We can misapply a normally good rule to an incorrect situation because we did not notice the contraindications, we can apply a bad rule, or we can fail to apply a normally good rule - a mistaken violation!

When we run out of pre-programed solutions, as in some novel or unexpected situation, we are forced to resort to working out a solution "on the hoof" using the slow, effortful, but computationally powerful conscious control mode. This is a highly error prone level of performance and is subject to a range of systematic biases. These knowledge based mistakes will be discussed later. 

Interacting with the "Long-term knowledge base".  

The selective properties of long-term memory and the process by which stored items are recalled to mind lie at the heart of the mind user's misunderstandings of his or her mental function. While there appear to be many different mechanisms involved in this, similarity matching and frequency-gambling are the two that are automatic, unconscious and continuously operative. 

When the initial search cues are detailed or highly specific, matching these calling conditions to the characteristics of stored memory schema's on a like-to-like basis is the primary retrieval process. However, when the search cues match several stored knowledge structures, the mind gambles that the most frequently used knowledge item in that particular context will be the one that is required.

For example, if we ask "what is it that has four legs, barks, cocks its leg at lamp posts and is regarded as the mans best friend" most of us will almost instantly retrieve the word "Dog" from their memory. This retrieval is so quick and instantaneous that we would feel as if this was reached out and retrieved in a conscious and deliberate fashion. However, if we are asked to list examples of all "Four legged animals", it is unlikely that "Dog" would be the first item listed for every one that responded with Dog in the first example. While it is most likely that the items called to mind here will include Dog, cat, horse and cow, but the order and rapidity would depend on the responders familiarity with the animal. Familiarity being a function of frequency of encounter, in this divergent memory search, frequency-gambling is the primary search process.

Memory searches are strongly influenced by the feeling of knowing. While we would continue to make attempts to search for items we "feel we know", we would quickly abandon search for an item we think we do not know, even if we actually knew it in a different way. This is the genesis of the "Tip of the tongue" state mentioned earlier in this post.  So, from a mind-users point of view, these feelings about the contents of memory are of considerable value. They are not always right, of course. But they are right often enough for us to rely on them and treat them as a handy guide to whether or not we should invest mental effort in a memory search. There are also things we don't always realize we know, but actually do as well as those we think we know, but actually don't!

The human mind is exceptionally good at simplifying complex information-handling tasks. It does this by relying as far as possible on the automatic mode of control and using intuitive "rule of thumb" or heuristics. These are unconscious strategies that work well most of the time, but they can be over-utilized and produce predictable forms of error. The two most common heuristics are those mentioned above i.e. matching like for like (similarity-matching) and resolving any competition for limited conscious access by favoring the most frequently encountered candidate (Frequency-gambling).

Under conditions of stress and danger, we are inclined to fall back on well-tried solutions or past plans rather than ones that might be more appropriate for the current circumstances. The moral here is: Be wary of the commonplace and familiar response for any situation. It may indeed be appropriate, but it needs to be considered carefully as it may well be an automatic heuristic response that could equally well be inappropriate for the situation! This is the stuff "strong-but-wrong" error responses are made of!

So, where is all this leading us? We will discuss that in the next post where we discuss the nature and varieties of Human Error.

Until then,

The Erring Human.

Thursday, March 1, 2012

Organisational Factors in Human Error Accidents - A case study

There is no better example of how Organizational deficiencies can precipitate Human Error than the crash of Air Ontario Flight 1363 at Dryden on 10 March 1989.
Ice and snow on the wings of a Fokker F-28 Fellowship commuter plane caused the crash of this aircraft minutes after take-off near Dryden Ontario, Canada
Flight 1363 departed from Thunder Bay, Ontario at 11:55 am on March 10th, 1989 and landed in Dryden an hour later for refueling and passenger discharge/boarding before heading on to its final destination of Winnipeg, Manitoba.

As a rule, Dryden is not a normal refueling site but the small plane was restricted on the amount of fuel it could take on due to having a full passenger load. Having a full fuel tank meant the aircraft would have exceeded the maximum weight allowance. Dryden airport was not a full-service facility and this caused a problem for the pilot, Captain George C. Morwood.

Stopover at Dryden Airport

 

There were no ground start facilities at the Dryden airport; therefore, Captain Morwood could not re-start the main engines if they were turned off. The Auxiliary Power Unit (APU) located at the rear of the plane, could have been used to re-start the engines but the APU was not working on this particular aircraft.

As a result, Captain Morwood was forced to keep the main engines running while refueling the plane. A thin layer of ice and snow had accumulated on the wings of the plane but de-icing fluid could not be used when the main engines were running due to the chance of toxic fumes leaking into the cabin.

Approximately 40 minutes later, Air Ontario Flight 1363 departed the Dryden airport with 65 passengers and 4 crew members on board. The plane did not gain altitude and it flew through the trees. It crashed and caught fire less than one mile from the runway resulting in the death of 24 people and severe injuries to most of the 45 survivors.

Cause of the Crash of Air Ontario Flight 1363.

A judicial inquiry was held under the supervision of Honorable Virgil P. Moshansky. The cause of the crash was attributed to the recent airline deregulation, which exercised less stringent safety procedures, equipment maintenance in addition to insufficient pilot training.

The aircraft was operating with an excessive number of un-rectified defects, the aircraft should not have been scheduled to refuel at an airport which did not have proper equipment and that neither training nor manuals had sufficiently warned the pilot of the dangers of ice on the wings. 

Learning from the Tragedy.

While there is no denying that the flight crew of Air Ontario 1363 failed in its role as the last line of defense against system deficiencies, however, not all unsafe acts surrounding the crash of Flight 1363 were committed by the flight crew. Other operational personnel, including ground handlers, dispatchers and even the cabin crew, contributed by their actions or in actions to deny Capt. George Marwood and First Officer Keith Mills feedback which could eventually have contained the consequences of their own unsafe acts. Arguably, many unsafe acts were none other than the behaviors fostered by the system and therefore the behaviors different personnel perceived the system expected from them. Let us analyze the cause factors one by one. 

No ground de-icing by the flight crew.  The failure to de-ice was clearly the most obvious unsafe act committed by the crew of Flight 1363. That Capt. Marwood was of a conservative nature and conscious about de-icing is a matter of record. Earlier on the very day of the accident, for example, he had de-iced C-FONF before departing Winnipeg because of a layer of frost over its wings. He had walked to the terminal in his shirtsleeves; it is impossible that he was not aware of the weather conditions. His decision not to de-ice at Dryden demands more than the simplistic observation that "with his experience, he should have known better." Capt. Marwood did not choose to make a bad decision. His error on 10 March 1989 must be understood within the context and the constraints in which it was made. 

No walk-around by the flight crew.  Neither flight crew member per formed an exterior walk-around inspection. It remains, however, a matter of argument whether such a walk-around would have accomplished anything, given the inaccurate and incomplete knowledge regarding wing contamination that existed among Air Ontario's crews at the time of the accident. 

Dispatch with unserviceable APU by SOC.  Had he followed the operational restrictions contained in a company's memorandum issued by the director of maintenance, the Air Ontario Systems Operations Control (SOC) dispatcher should have advised the pilots of Flight 1363 to overfly Dryden on the day in question because of the potential necessity for de-icing with an unserviceable auxiliary power unit (APU) at a station without ground-start facilities. 

Inaccurate flight release from SOC.  The flight release provided to Flight 1363 contained numerous errors, including an erroneous maximum take-off weight from Winnipeg, incorrect fuel figures for the revised alternate (Sault Ste. Marie) and an incorrect, greater than allowable payload. Similar errors were found in the flight release for the Thunder Bay to Dryden leg. Inaccuracies in flight releases occurred often and pilots would telephone SOC to notify staff of the discrepancies. Because Capt. Marwood did not communicate any problem to SOC, the Dryden Report concludes that throughout 10 March he relied on erroneous information. 

Revised forecast not disseminated by SOC.   An amended Dryden terminal weather forecast as well as the Dryden terminal weather forecast issued at 16:30 GMT (11 :30 a.m. EST) called for freezing rain at Dryden during the time span of the operation of Flight 1363. Both were available to Air Ontario SOC while Flight 1363 was still on the ground at Thunder Bay. Such information, which could have induced Capt. Marwood to overfly Dryden, was never transmitted to the pilots of Flight 1363. 

Failure to follow-up by the ground handler.  There was no follow-up by the ground handler to Capt. Marwood's inquiry about the availability of de-icing, even when evidence suggests that the ground handler knew that the wings were covered in snow. 

The cabin crew failure to communicate.  Both flight attendants were aware of the snow covering the wings, although they never attempted to bring this fact to the attention of the pilots. 

As later discussed, serious flaws in organizational processes underlie this unsafe act, including an industry culture which did not (and to a large extent does not) encourage cabin crew to discuss operational matters with flight crews. In all unsafe acts it is possible to identify numerous contributory situational and task factors such as poor communications, time pressure, inadequate tools and equipment, poor procedures and instructions, and inadequate training. Personal factors such as preoccupation, distraction, false perceptions, incomplete or inaccurate knowledge, and mis-perception of hazards, are also readily identifiable. However, flawed organizational processes and latent organizational failures are the source of most unsafe acts committed by operational personnel. 

Error-producing conditions

Numerous error-producing conditions led the pilots of Flight 1363 to make the decision to take off without de-icing the wings, and led other operational personnel to commit their unsafe acts. These conditions are the by-product of latent organizational failures. The Commission demonstrated that latent organizational failures generate not only error-producing conditions, but also have the potential to create a working environment where violations are inevitable if operational personnel are to accomplish their assigned tasks. 

Ambiguous operating procedures.  These ambiguities not only include flight deck procedures, but also maintenance and dispatch procedures. Ambiguities include incomplete information regarding take-off with contamination on the wings and cold weather operations in general, lack of corporate policy regarding hot refueling and de-icing, and the informal "blessing" by management of unapproved procedures carried over from the propeller-driven fleet, including a disregard spread among Convair 580 pilots on the effects of wing contamination. These ambiguities are strictly relevant to the events of 10 March. In the larger picture, however, the Commission's overall appraisal of the F28 operation reflected operational procedures which are not recommended in jet operations. 

Lack of standardized operations manuals.  Some Air Ontario F28 pilots used the Piedmont F28 Operations Manual while others used the USAir F28 Pilot's Handbook, since Air Ontario did not have its own F28 operations manual. Although both manuals are comprehensive and both obviously deal with the same type of aircraft, there were sufficient differences in the operating procedures of these two carriers to create potential problems on the flight deck. Air Ontario F28 pilots were often left to learn and to discover for themselves what were the best operational flight procedures for the F-28 ... an additional and unnecessary burden on the pilots. 

Training deficiencies.  Aircraft wing contamination, the cold-soaking phenomenon and runway contamination were subjects in which the Commission verified diverging depths of awareness and understanding among Air Ontario pilots. Lack of CRM training was another shortcoming identified, although CRM or equivalent training cannot alleviate operational problems associated with lack of management stability and consistent direction. Deficiencies in cabin attendant training, ground handling training and aircraft refueling training were also discussed. 

Pairing of inexperienced crew members.  Although both pilots of Flight 1363 had considerable experience, they were "newcomers" to the F28. Capt Marwood had only 62 hours in the type, and had received his line check on 25 January 1989, after 27.5 hours of line indoctrination. First Officer Mills had 66 hours, having accumulated 29.5 hours of line indoctrination before receiving his line check on 17 February 1989. Although both were legally certificated to operate the F28, evidence both from accident investigations and research has alerted us to the dangers in pairing crew members that are "new" to the type. 

Crew frustration.   It had not been a good day for the crew of Flight 1363. The unserviceable APU and other deferred maintenance items, the confusion over de-fuelling versus deplaning passengers at Thunder Bay, an inexperienced SOC dispatcher, the absence of ground support facilities, concern over passenger connections and the ground hold for the Cessna 150 are some of the local conditions which fostered crew frustration. The Dryden Report leaves no room for doubt that Capt. Marwood was exhibiting distinct symptoms of stress when he landed in Dryden on the return trip. Stress degrades the ability of humans to process information. 

Corporate merger and corporate cultures.  Air Ontario is the product of a merger between Austin Airways Ltd. - a northern or "bush" operation- and Air Ontario Ltd. - a scheduled service operation in the region of the Great Lakes. Austin Airways was the "winning" party or buyer; Air Ontario Ltd. the "loser," or acquired company. The two companies were different in almost every respect: their fleets, their operating environments, their employee groups and their management styles. The harsher demands of flying in the Canadian north are qualitatively different than those of flying in the south, demands which were reflected in the experiences of each pilot group. Furthermore, in the non-unionized, northern environment, employee responsibilities were rather unstructured, while in the southern unionized environment, employee tasks were clearly delineated. It was not a happy marriage. The two very different corporate cultures were incompatible; yet, their effects were enduring and difficult to change. Among other conflicts, the negotiations to merge the two pilot groups under the representation of the Canadian Airline Pilots Association (CALPA) ended in a prolonged labor strike, between March and May 1988. As with any corporate rationalization, resources were greatly taxed. The efficiency with which the various organizational processes were managed is worth examining; of immediate relevance to the events of 10 March is the fact that while Capt. Morwood came from Air Ontario Ltd., First Officer Mills came from Austin Airways. The Dryden Report includes the contention that the working relationship between the pilots over the previous two days had probably not been cordial, with the subsequent impact on crew coordination. 

Latent organizational failures. 

Three distinct groups of high-level management "contributed" in harboring the latent organizational failures which eventually led to the crash of Flight 1363: the operator itself, the regulatory agency and the parent company. 

Corporate reorganizations generate anxieties among employee groups. In this case, there is evidence of high management turn over, low employee morale and poor job performance, all with potential effects on flight safety. The period following the merger was turbulent. The basic issue examined by the Commission was " ... whether Air Ontario management was able to support the flight safety imperative during this period of distraction." 

In the two years previous to the accident, there had been significant changes in the management of flight operations. There was instability within the flight operations organization, and individuals who had been expected to play a major role in the introduction and management of the F28 programme had left the company. The Dryden Report reveals a situation where effective coordination of efforts had been essential, and which had in stead been characterized by a troubling lack of it as well as of effective management. 

Management turnover and selection.  There were changes in two critical areas in operational management during the period from June 1987 until 10 March 1989: Vice President of Flight Operations and Director of Flight Operations. The instability and problems of supervision created by the lack of management continuity were an obstacle to the implementation of the changes required by the introduction of a new aircraft type. The Dryden Report introduces evidence that the President and Chief Executive Officer of the company would personally select all senior management personnel, not always based on the merit principle, but rather in what the report describes as " ... the entrepreneurial management style of a man who has built his company from a small family  business." 

Some of the appointments, such as those of the President's close relatives to key managerial positions, were " ... the subject of considerable discussion at the Air Ontario committee meetings." The outcome of this process was that the operational management of Air Ontario was dominated by individuals whose experience had been mostly in charter operations in the northern, economically regulated environment, while the new company operated in the southern, deregulated environment as a scheduled carrier.  Air Ontario managers were thus confronted by demands for which their experience may not have been adequate. 

Operational control.  Canadian legislation grants operational control departments the functions of flight dispatch and flight following, including the authority to initiate, continue, divert or terminate a flight. Operational control personnel provide a crucial support to flight crews by providing updated information to enable them to make safe and efficient decisions. Such control is indeed intended to prevent circumstances like those presented to Capt. Morwood at Dryden. However, the report pointed out," ... it was stated by all of the operational control personnel who testified that the training and qualification of the Air Ontario dispatchers was inadequate." 

The inquiry revealed that when weather was poor, when aircraft had unserviceable equipment or when irregular circumstances were present - situations in which operational control is an asset - SOC performance usually deteriorated. The Commission concluded that this was a consequence " . . . of poor planning and organization within SOC, a lack of training and qualification of Air Ontario SOC personnel, and the failure of SOC personnel to appreciate the importance of their function." 

The F28 programme. The introduction of the F28 was the first exposure of Air Ontario's management to the operation of a transport category jet aircraft in commercial scheduled service. The management problems discussed revealed themselves in the various flaws and safety shortcomings within the F28 operation, and can be grouped into two general areas: lack of standard operating procedures, manuals and documentation for the F28, and inconsistencies and deficiencies in training the F28 flight crews, cabin crews and ground support personnel. 

Programme management.   The F28 project manager had the responsibility to ensure that the implementation and operation of the F28 programme was properly monitored and supervised. The appointed manager- a relative of the president of the company - lacked experience in the F28. On the other hand, he was overburdened beyond reason, since he also had responsibilities as F28 chief pilot, F28 training pilot, F28 company check pilot, Convair 580 chief pilot and F28 line pilot. These combined responsibilities led to an ineffective management of the programme, allowing deterioration of the operational standards below acceptable levels. The report describes the project manager as "a well intentioned individual." It nevertheless must always be remembered that the best intentions, if failed to be carried into deed, are as good as no intentions at all. 

Maintenance management. Senior maintenance management was stable during the period June 1987 to 10 March 1989. Nevertheless, numerous maintenance problems were evident in the F28 operation, including lack of familiarity with the aircraft and an aircraft purchase decision which did not include an adequate supply of spare parts. This, combined with the enthusiasm and subsequent organizational over-commitment to the F28, pressured maintenance personnel and pilots alike to defer and carry maintenance snags for long periods of time. 

Safety management.  Although mission statements included flight safety as part of Air Ontario's objectives, compelling evidence presented in the Dryden Report suggests a rather haphazard  approach, ala "safety is everybody's responsibility" (if employees do their jobs correctly, then safety will be optimized). Such a simplistic view denies the technological and sociological realities of contemporary aviation - realities which impose the imperative of professional safety management. Air Ontario's flight safety officer had resigned in late 1987 because of lack of management support, including the lack of access to the CEO. He was not replaced, and the position remained vacant until February 1989, although Air Ontario suffered a fatal accident which involved a Douglas DC-3 in November 1988. 

The lack of continuity in the position of a /light safety officer, the lack of adequate support of the FSO position by senior management, and the lack of a flight safety organization over the material time span was a managerial omission. The management assigned a low priority to the importance of filling the vacant position of FSO. This period of instability, carried over into the introduction of the F28 programme, had an impact on flight safety. Air Ontario was not ready in June 1988 to put the F28 aircraft into service as public carrier. 

The regulatory authority 

Transport Canada is the federal agency that is responsible to the people of Canada for ensuring that aviation is carried out effectively at an acceptable level of safety. The 1980s had not been a  sympathetic decade to Transport Canada. Economic deregulation- a policy of the federal government - had brought a considerable increase in workload, while at the same time measures aimed at federal deficit reduction had led to downsizing of the workforce. This produced a situation where the agency saw its ability for surveillance, inspection and monitoring greatly reduced. Numerous warning flags were raised by different sectors within the Canadian aviation industry, with little or no effect. The decrease in personnel, inadequate training policies and supporting programmes, and mismanagement of human resources led to a state of affairs where Transport Canada was in a very precarious position to discharge its responsibilities in a timely and effective manner. 

Regulatory audit of Air Ontario.  An audit of Air Ontario by Transport Canada was scheduled for February 1988. While the airworthiness, passenger safety, and dangerous goods portions of the audit were completed as scheduled, the operations part of it was postponed because Air Ontario did not have an approved flight manual in place. The operations portion was re-scheduled for June 1988 and eventually conducted between October and November 1988. Because the audit team leader had no jet experience, the audit did not cover the F28 programme. The Dryden Report considers this " . . . a serious omission. Had the F28 been audited, it is reasonable to assume that a number of deficiencies relating to Air Ontario's F28 operation would have been discovered prior to the Dryden crash." 

Notwithstanding federal policy to release audit reports within 10 working days of the completion of the audit, Air Ontario was not presented with a report until after the crash, more than five months after the audit had been completed. The Dryden Report characterized the Transport Canada audit of Air Ontario as " ...poorly organized, incomplete and ineffective." 

The operating rules. The inquiry accumulated evidence that existing regulations applicable to Canadian air carriers were " . . . deficient, outdated and in need of overhaul and outright  replacement." Areas in which deficiencies were identified, or in which there existed no regulations at all, included flight dispatch requirements, minimum equipment lists, shoulder harnesses for flight attendants, approval of aircraft operating manuals, and qualifications for air carrier managerial personnel. The hearings also disclosed ambiguity of aviation regulations and air navigation orders. In the case of the minimum equipment list (MEL), none of the witnesses could define with reasonable precision one of its most critical terms: essential airworthiness item. 

As another case in point, the pilots of C-FONF carried two aircraft operating manuals, different in form and contents, and without amendment service (Capt. Marwood had the Piedmont manual and F /0 Mills had the USAir manual). Neither manual was approved by Transport Canada, since no regulatory requirement existed to this effect. Transport Canada operational staff who testified at the inquiry were unanimous in their views about the inadequacy of existing regulations and " ... the chronic inaction on the part of Transport Canada senior management in many areas of urgent concern... ". 

Safety management.   The inquiry revealed that because of resource constraints, an inadequate regulatory framework and organizational deficiencies, Transport Canada was not ideally able to ensure an efficient and uniform level of safety. Deficiencies uncovered included distinctly separated lines of reporting to the top of the organization and the apparent inability of different internal groups to work together in identifying and addressing safety issues. The Commission also expressed concern that Transport Canada was " ... spending too much energy on minor violations that were of little safety consequence, while not enough effort was being put into overall education and safety promotion." 

The Dryden Report concluded that Transport Canada:
  
  • did not provide clear guidance for carriers and crews regarding the need for deicing;
  • did not enforce the provision of performance data on contaminated runways;
  • did not closely monitor Air Ontario for regulatory compliance following the merger and during the initiation of the jet service; 
  • did not require licensing or effective training of flight dispatchers; 
  • did not provide clear requirements for the qualification of candidates to management positions, including director of flight operations, chief pilot and company check pilot; 
  • did not develop a policy for the training and operational priorities of air carrier inspectors;
  • delayed the audit of Air Ontario and did not include the F-28 programme in it;
  • followed an excessively complex MEL approval process; and 
  • did not have a clear definition of what constitutes an essential airworthiness item.

The report stated that these oversights and flaws nurtured the trajectory of opportunity and, combined with local triggers at Dryden on 10 March 1989, led to a break in the system defenses, safeguards and barriers, permitting the accident. 

Air Ontario, as a commercial air carrier, was not operating in a vacuum. Transport Canada, as the regulator, had a duty to prevent the serious operational deficiencies in the F28 programme. Had the regulator been more diligent in scrutinizing the F28 implementation at Air Ontario, many of the operational deficiencies that had a bearing on the crash of flight 1363 could have been avoided. 

The parent company. 

The controlling interest in Air Ontario was owned by Air Canada. Air Ontario was marketed as part of Air Canada's network, and a public perception of an integrated company had been fostered. Air Canada dedicated a significant effort to present a close integration in the marketing functions. These marketing efforts had been rewarded by a measure of success; many of the passengers of Flight 1363 believed that they were in fact flying with Air Canada. In specific relation to the crash of Flight 1363, the Commission raised the issue of the lack of application of Air Canada's expertise in scheduled jet operations to the Air Ontario F28 programme. The evidence in the report reveals that " ...these initiatives were not in any way directed towards verifying and monitoring the operational procedures and flight safety standards of its new subsidiary. On the contrary, Air Canada deliberately maintained its corporate distance from the operational end of Air Ontario."

The regulatory standards defined by Transport Canada- and by any other civil aviation administration - represent minimum standards, referred to in the Dryden Report as "the threshold level of operational safety." The evidence demonstrates that Air Canada operated at a greater level of safety than that required by Transport Canada. The evidence also demonstrates that Air Canada management" ... while imposing on Air Ontario its own high marketing standards, required Air Ontario only to comply with Transport Canada's threshold operational safety standards."

The report discusses Air Canada's lack of support to Air Ontario during the introduction of the jet service, and compares standards in specific areas such as operational policies for dispatch with an unserviceable APU; minimum equipment lists; manuals; aircraft defects; hot refueling policies; de-icing policies; operational control and flight planning and dispatcher training. 

The evidence reveals the existence of a double safety standard. The report also reviews Air Canada's flight safety organization and its involvement with Air Ontario. In spite of the fact that Air Canada had significant experience in introducing jet service (on several types), this experience was not made available to Air Ontario when it introduced F28 service. The assistance Air Canada planned to provide its connector was limited to the provision of information relating to flight safety and playback facilities for flight data recorders. In practice, this intention was further reduced to a post-accident response seminar in 1985 and another in May 1989, three months after Dryden. The report also describes with detail Air Canada's flight safety organization, leaving no room for doubt about its importance and the corporate commitment which supported it. The double standards again become obvious when reviewing Air Ontario's flight safety organization. The director of flight safety for Air Canada testified that he was under the impression that Air Ontario had a flight safety officer. It did not. He also assumed that computer recording and trend analysis was being carried out by Air Ontario. It was not. When asked about the degree of integration between the flight safety organizations of the parent company and the subsidiary, he conceded that there was none. Lastly, the representative of Air Canada on the board of directors of Air Ontario appeared to be unaware that for more than one year and during the crucial time frame of the F28 introduction, there was no flight safety officer or flight safety organization in Air Ontario. 

In conclusion

The Dryden Report found: "... The corporate mission statements of Air Canada and Air Ontario both contain words to the effect of the primacy of safety considerations. The evidence disclosed that other corporate concerns, important in their own right, were allowed to intervene and subordinate safety. The difference between the attention and resources expended by Air Canada and Air Ontario on marketing, as compared with safety of operations, must, when held up to their  respective mission statements, be described as inadequate and short-sighted." 

When the moment arrived to close the file, the evidence obtained and discussed over 20 grueling months Jed Justice Moshansky to conclude: "Capt. Marwood, as the pilot-in-command, must bear responsibility for the decision to land and take off in Dryden on the day in question. However, it is equally clear that the air transportation system failed him by allowing him to be placed in a situation where he did not have all the necessary tools that should have supported him in making the proper decision."

This statement holds the key to securing and advancing safety and effectiveness in modern, complex socio-technical systems. The captain, first officer, cabin crew, SOC dispatchers, ground handler and other personnel involved in the operational events surrounding Flight 1363 fail ed in their role as the last line of defense and thereby precipitated the accident. For this, they must be held accountable. If we are looking for scapegoats, we need go no further. But if what we seek is to avoid future tragedies like Dryden, we must examine the organizational processes which generate gaps in the system defenses and induce properly qualified, healthy and well-intentioned individuals to make such damaging mistakes. 

The message from the Dryden Report is two-fold. On the one hand, there should be no doubt: there is still no substitute for a properly trained, professional flight crew; they are the goalkeepers of aviation safety. On the other hand, no matter how hard they try and no matter how professional they might be, humans can never be expected to outperform the system which bounds and constrains them. System flaws will, sooner or later, defeat individual human performance. 

So, are we absolving the humans of their mistakes by blaming the Organizations? Are we trying to escape responsibilities? Are we looking for ways to save humans from culpability? We will discuss these questions in the next post. 

Until then, 

The Erring Human.

Saturday, February 18, 2012

Managerial and Organizational contributions to accidents

The main purpose of this post is to provide a principled basis for understanding the contribution of organizational management to accidents and for understanding how that can be remedied. The main question here is, is this Organizational approach just another passing fad or is there some real substance in it? To answer this question, we need to look at the tangled question of the Individual Vs Collective.

Individual or Collective Errors?

This issue has a number of dimensions. The first is a moral one, relating to blame, responsibly and legal liability. The second is scientific, having to do with the nature of cause and effect in an accident sequence. The third is entirely practical and concerns which standpoint, individual or collective, leads to more effective countermeasures.

The Moral Dimension.

From a Moral or Legal perspective, there is much to be gained from pursuing an individual rather than a collective approach to accident causation. The main reasons for this are:

  • It is much easier to pin the legal responsibility for an accident upon the errors and violations of those in direct control of the aircraft or vessel at the time of the accident. The connection between these individuals and the disastrous consequences is far easier to prove than any possible links between earlier management decisions and the accident.
  • This is further compounded by the willingness of professionals such as aircraft Captains and ships Masters to accept this responsibility. They are accorded substantial authority, power and prestige and in return, they are expected to "carry the can" when things go wrong. The buck and the blame traditionally stops at them.
  • Most people place a large value on personal autonomy, or on sense of free will. We also impute this to others, so that when we learn that someone has committed an error with bad consequences, we assume that this individual actually chose an error-prone rather than a 'sensible' course of action. In other words, we tend to perceive the errors of others as having an intentional element, particularly when their training and status suggest that 'they should have known better'. Such voluntary actions attract blame and recrimination, which in turn are felt to deserve various sanctions.
  • Our judgements of human actions are subject to similarity basis. We have a natural tendency to assume that disastrous outcomes are caused by equally monumental blunders. In reality, of course, the magnitude of the disaster is determined more by situational factors than by the extent of the errors. Many monumental disasters have resulted from relatively minor  failings in different parts of the system (e.g. the Tenerife runway disaster).
  • Finally, it can not be denied that there is a great deal of emotional satisfaction to be gained from having someone (rather than something) to blame when things go badly wrong. Few of us are able to resist the pleasure of venting our psychic spleens on some convenient scapegoat. And in the case of Organizations, of course there is considerable financial advantage in being able to detach individual fallibility from corporate responsibility.
Flip side of the Moral Dimension. 

Lets say you just had an accident. It destroyed one of your beloved and expensive aircraft and put your company in a lot of financial and social distress. You are sitting on your desk thinking about it and anyway you look at it, the pilot f..... up! Of course, someone with that kind of experience should have known better, Isn't that what they were trained and paid for? Anyway you look at it, the pilot's actions were absolutely unacceptable. You are sitting at your desk thinking of all the bad things that you would like to do.

Question:  Will this do any good?  Will it change the pilot's behavior?  Will it change the behavior of other pilots? Will it prevent this type of Accident in the future?

These are good questions. You deserve answers to them before you reach for our skinning knife. Let's discuss a few ideas: 

  • Do we want revenge?  Maybe, but how much are we going to get out of the Pilot's body - or bank account? Before we let the Pilot fly our aircraft, we should have covered ourselves with insurance. We might wish that we had either trained the pilot better or withheld some clearances. But it's a little late for that now, Our revenge, if any, is going to come from the insurance company. Punishing the pilot isn't going to help.
  • Do we need to protect society from this Pilot? Probably not. As long  as we are talking about errors of judgement or technique and not willful violation of flying regulations, society is not in much danger.  Maybe society needs protection from a system that will not prevent an individual eventually achieving PIC status based entirely on longevity or time and space. 
  • Do we need to change the behavior of this Pilot?  If we are still talking  of errors of judgement or technique, it is safe to say that the Pilot had absolutely no intention of having that accident in the first place. Now that it  has occurred, the Pilot has even less intention of having it again. Applying punishment won't improve that.
  • Do we need to make an example of this Pilot to others?   This is an interesting question and can be argued a number of different ways.  My View is that it depends entirely on whether the other Pilots were planning on doing the same thing our prospective recipient of the punishment did. If the accident was genuinely the results of mistakes and misjudgement, punishment probably has no effect on others. They were not planning on making those mistakes in the first place and seeing another Pilot punished won't change their mind. On a negative note, the punishment could influence them to avoid the circumstances where there might be an opportunity to make the same mistake. Maybe that's what we want. Be careful though. That's how we develop students who have never been taught the hard things, instructors who won't get off the controls; and pilots who can't operate their aircraft in the corners of the Flight Envelope. On the other hand, lets suppose that our other pilots were planning on doing the same thing that got this pilot in trouble. Here, punishment can be very effective - provided it is applied to the act and not the result. By this I mean that we always punish the pilots who disregard our rules, regardless of whether it results in an accident or not. This is an effective way to manage all our Pilots. If we wait for  an accident to occur and then apply punishment, we are behaving inconsistently and loose any benefit that punishment might have on the rest of our Pilots. They realize that we are willing to tolerate their misbehavior as  long as they don't have an accident.

Another inherent weakness in this idea of punishment as a means of deterring others is the short-term effect of that action. There is a certain amount of turn over in any organization and the effects of punishment are completely wasted on people who join after the situation occurred.

On the whole punishing poor judgement or technique doesn't seem to satisfy any of the classic reasons for punishment (unless, of course, we include "making me feel better" as a reason).

To summarize, there is nothing wrong with punishment fairly and consistently applied.  We can do it to change behavior provided we consistently punish behavior and not the result. Used this way, we can even use it to change the behavior of others. We cannot, however, punish a judgmental error and expect that error will never occur again. Not only have we not prevented a future error, we have also missed the opportunity to take some other action (training? Change in procedures?), which might have prevented a future error.  

Punishment may do a lot of things, but preventing accidents is rarely one of them. So, the Moral Dimension falls flat if prevention of accidents is one of our goals.

The Scientific Dimension.

Should one halt the search for causes after identifying the human and/or component failures immediately responsible for the accident (as has been done in many accident investigations), or is it scientifically more appropriate to track back to their organizational root causes? On the face of it, the answer seems obvious. Yes, it must be better (in the sense of being a more accurate representation of the true state of affairs) to try and find all the systemic factors responsible for the accident. But the issue is not quite so simple. Let us examine some of the problems.

  • Why should we stop at the Organizational roots? In a deterministic world, everything has a prior cause. In theory, therefore, we could go back to the Big Bang! Seen from this broader historical perspective an analytical stop-rule located at the organizational root causes is just as arbitrary as one located close to the proximal individual failures.
  • The scientific logic to apply here is, in seeking the reasons for an accident, we should go far enough back to identify casual factors that, if corrected, would enhance the systems resistance to subsequent challenges. The people most concerned and best equipped to do this are within the organization(s) involved, so it makes sense to stop at these organizational boundaries. However, these boundaries may often be indistinct, particularly in Aviation where there are a large number of inter-related sub-systems involved.
  • Perhaps the most serious scientific problem has to do with the particular nature of accidents and how they change our perceptions of preceding events. In retrospect, an accident appears to be a point of convergence of a number of casual chains. Looking back down these lines of causation, our perceptions are colored by the certain knowledge that they caused a bad outcome. But if we were to freeze any system in time, without an accident having occurred, we would see very similar imperfections, latent failures and technical problems. No system is ever perfect. The only thing that gives these same kind of systemic weaknesses causal significance is that in a few intensely investigated events they were implicated in the accident sequence. If all that distinguishes these latent factors is the subsequent occurrence of a bad outcome, should we not limit our attention only to those proximal events that transformed such commonplace shortcomings into an accident sequence? In other words, should we not run with the Moral and Legal tide and simply concentrate on those individual failures having an immediate impact upon the integrity of the system?
The Remedial Dimension.

The answer here depends crucially upon two factors. First, whether or not latent organizational and managerial factors can be identified and corrected BEFORE an accident occurs, and second, the degree to which these interventions can improve the systems natural resistance to local accident producing factors.

In a recent survey of the Human factors literature, it was revealed that the estimated involvement of human error in the breakdown of hazardous technologies had increased fourfold between the 1960s and the 1990s, from minima of around 20% to a maxima of more than 80%. During this period it also become apparent that these contributory errors were not restricted to the 'sharp end', to the Captains, Masters, Ships officers, control room operators, pilots, drivers etc. in direct control of the operation. Nor can we only take account of those human failures that were the proximal causes of the accident. Major accident inquiries (like the three mile island, Challenger, Kings Cross, etc.) indicate that the human causes of major accidents are distributed very widely, both within the organization as a whole and often over several years prior to the actual event.

The only way to proceed in such a scenario is to ask: What do all of these complex, well defended technologies have in common? The answer is, Organizational processes and their associated cultures, a variety of different workplaces involving a variety of local conditions and defenses, barriers and safeguards, designed to protect people, assets and the environment from the adverse effects of the local hazards. Each of these aspects is addressed in the Reason Model of accident causation discussed in the earlier post and reproduced below:

The Reason Model of Accident Causation

The organizational processes - decisions taken in the higher echelons of the system - seed Organizational Pathogens into the system at large. These resident pathogens take many forms: Managerial oversights, ill-defined policies, lack of foresight or awareness of risks, inadequate budgets, lack of legal control over contractors, poor design, specifications and construction, deficient maintenance management, excessive cost cutting, poor training and selection of personnel, blurred responsibilities, unsuitable tools and equipment, commercial pressures, missing or flawed defenses and the like. The adverse consequences of these pathogens are transported along two principal pathways to the various workplaces, where they act upon the defenses to create latent conditions and upon local workplace conditions to promote active failures.

Subsequently, these active and latent failures act to create an event (a complete or partial trajectory through the defensive layers). Events may arise from a complex interaction between active and latent failures, or from factors present predominantly in one or the other pathway. Both, local triggering factors and random variations can assist in creating trajectories of accident opportunity.

By specifying the organizational and situational factors involved in the causal pathways, it is possible to identify potentially dangerous latent failures before they combine to cause an accident. Hence we can have a measure of control over Human Errors and me, the Erring Humans that work for you.

But why do we Err? We will discuss that in the subsequent posts.

Until Then,

The Erring Human.