Saturday, December 3, 2011

Case Study: Ueberlingen Midair Collision

Before we move any further, lets do a case study to drive home the point of Errors and Violations.

On the night of July 1, 2002, a Boeing 757 collided with a Tupolev-154 at 35,000 feet, resulting in 71 fatalities. Initially, this accident was immediately blamed on two individuals. First, the pilot of the Tupolev aircraft and second the controller on duty. Let us re-examine the event, highlighting fundamental human and system errors that occurred that night: errors that contributed to one of the worst midair collisions in recent history.

Kindly visit the following links to view video re-construction  of the events. The first link is a 10 minute video that those of you who may be short of time can watch to get the essentials of the accident. Those with more time at hand may prefer the second video that discusses the case in much more detail over a 45 minute period.

The following narrative draws heavily from the research paper presented by Dr. Ashley Nunes & Dr. Tom Laursen of the University of Illinois, Aviation Human Factors Division Savoy, IL in coordination with Sky guide, Air Traffic Control Operations, Zurich Area Control Center, Switzerland at the 48th Annual Chapter Meeting of the Human Factors and Ergonomics Society, September 20 - 24, 2004, New Orleans, LA, USA.

Known Sequence of Events

The Boeing 757 (registered to DHL) was en route from Bergamo (Italy) to Brussels on a heading of 004 degrees at FL 260. The Tupolev-154 (registered to Bashkirian Airlines) was flying from Munich to Barcelona on a heading of 254 degrees at FL 360, correcting its heading twice within the last minute to end up on heading of 274 degrees. Both aircraft were equipped with the Traffic Collision and Avoidance System (TCAS) and their trajectories put them on a converging course at a 90° angle in airspace above Lake Constance, Germany.

Under a contractual agreement between the German and Swiss government, this airspace was under the authority of the Zurich Area Control Center (ACC). After making contact with the B757, the Swiss controller issued two clearances to the B757. First he cleared the B757 to climb to FL 320 and at time 21.26.36 to climb to FL 360. At time 21.30.11 the T-154 called in. After that, the Swiss controller did not initiate any contact with either aircraft until just seconds before the TCAS system aboard gave both pilots a traffic advisory. Following this, the controller instructed the T-154 to descend from FL 360 to FL 350 to avoid collision with the B757. However, the TCAS on board the T-154 and B757 instructed the pilots to climb and descend respectively. After receiving contradictory instructions, the T-154 pilot opted to obey controller orders and began a descent to FL 350 where it collided with the B757, which had followed its own TCAS advisory to descend. All 71 people were killed. 

Trajectories of B757 and T-154.
At first glance, knowledge of the timeline of events would suggest that there were two individuals who were solely to blame for the accident. Firstly, the Russian pilot who disobeyed his TCAS system and followed controller instructions to descend instead of climbing. Second and more importantly, blame should lie on the controller who was fully aware of the presence of both aircraft in his sector but waited for more than four minutes before issuing a descent clearance and a traffic information report to the Russian pilot. The controller’s most important task is to ensure safety in the sector. The controller failed in that task: or did he?

Identification of Contributing Factors

Contributing Factor 1. Single Man Operations. The presence of only one controller working the radar screen represents one of the underlying causes of the accident, namely lack of supervision or assistance in safety-critical situation. This Single Man Operation (SMOP) was a controversial procedure implemented in 2001, despite numerous protests from the controller union.

Contributing Factor 2. Downgraded Radar.  Procedures in force stated that when the SMOP is in effect, a conflict detection system be on and fully functional. The Zurich ACC’s system, known as the Short Term Conflict Alert (STCA), provided the controller with a two-minute alarm, which visually indicated the presence of a conflict. On that night, maintenance work was being done on the main radar system, which placed radar services in their fall back mode. As a result, separation minimums between aircraft were increased from 5 miles to 7 miles (corresponding to approximately one minute). The fall-back radar mode also meant that the STCA was not available. While unit procedures specifically mandated that the STCA be available when SMOP were taking place: but it was not.

Contributing Factor 3. Dual Frequency Responsibility.  The controller had to monitor two display consoles that were separated by over a meter, resulting in the maintenance of divided attention for a sustained period of time.

Contributing Factor 4. Phone System.  The automated phone system used in the Zurich ACC enabled controllers to communicate with one another at the touch of a button. In addition to inter-facility coordination, the controller could also communicate with ATC facilities in Germany to coordinate local approaches such as that to the FHA airport. On the night of the accident the main telephone system was also out for maintenance and the back-up system had a software failure, which no one in the company had noticed, not even during tests run three month before the accident. As a result, when the controller tried to contact the FHA tower to inform them that the second aircraft was requesting a different approach, he could not get through. Given that the phone system had worked perfectly since its implementation (more than four years ago), the controller had a high degree of trust in the system and as a result did not think the system had failed, rather believing he had dialed the wrong number. He continued his attempts to reach the FHA tower while neglected to maintain his usual scanning pattern on the other radar console, which depicted the B757 and T154 converging at the same altitude. The severity of the malfunctioning phone system cannot be underestimated. Two minutes before the collision occurred, controllers working the Upper Area Sector at Karlsruhe, Germany noticed the situation unfolding, given that their own STCA had gone off, and tried to contact the Swiss controller to warn him. Despite numerous attempts, they could not get through to him because of the malfunction in the phone system. The controller’s communication with the outside world was essentially cut-off. The next line of defense at this point was TCAS.

Contributing Factor 5. TCAS.  TCAS is designed to provide not only traffic advisories but also resolution recommendations to avoid a midair collisions and it was in fact this system that alerted the pilots of both aircraft to the pending conflict a full seven seconds before the controller, who was busy vectoring another aircraft in for landing using a separate radar screen. After the pilots were alerted to the collision, the TCAS instructed the DHL pilot to descend and the T154 pilot to climb. However, the T154 had already been instructed by the controller to descend.

This choice exacts that two technical issues be considered. Firstly, TCAS does not provide the controller with information regarding resolution advisories: the pilot only knows these advisories. Therefore, the controller had no way of knowing that the system had instructed the T-154 to climb, resulting in an ‘honest’ decision error on the part of the controller. Second and more importantly, TCAS does not account for situations where one of the aircraft does not follow its instructions. In the present case, T-154 disobeyed its own TCAS instructions to climb (the pilot opting to follow controller instructions) and descended to FL 350. 

The result in the B757 cockpit, was an instruction to increase the rate of descent rather than remaining level at its original altitude of FL 360. Had this been done, safe separation would have been maintained. 

This inability of TCAS to make the controller aware of what resolution advisories were issued to the pilot or account for the execution of alternative actions by the pilot represent major limitations of the system; limitations that played a role in this event.

Contributing Factor 6. Corporate Culture.  Whereas the B757 pilot followed the TCAS advisory to descend, the T-154 pilot opted out of following this advisory to climb and followed controller instructions to descend. This raises the issue of why the pilots of two separate aircraft would respond to the system in such a different way. When presented with conflicting information between ATC and TCAS, European pilots are advised to follow TCAS whereas Russian pilots were trained to take both into account before rendering a decision. In most instances, the latter group chose to follow ATC. This may help explain why the B757 pilot (who was British) and the T154 pilot acted in the manner observed. 

Today, of course, we train all Pilots to follow TCAS without any need for an approval or prior information to the controller. However, this was not the case back in 2002.

Conclusion. 

As can be seen, what appeared to be a simple case of Human Error on the part of controller and the pilot, on deeper analysis turns out to be combination of organisational and systemic failures. Lets try and understand this more clearly.

Single Man Operation (SMOP). The Zurich ATC had implemented SMOP procedures despite objections from the unions and implemented it also during the night, eliminating the Safety Layer of supervision and assistance from the system. This put the controller under stress and forced a Human Failure. We can classify this as a case of Routine Violation by the organisation.

Short Term Conflict Alert (STCA).  The procedures required STCA to be available when SMOP was in force, but it was not. This can be classified as an Exceptional Violation by the organisation.

The Phone System.  When the controller was unable to contact FHA over the phone, he did not think the system had failed, rather believing he had dialed the wrong number. He continued his attempts to reach the FHA tower while neglecting to maintain his usual scanning pattern on the other radar console, which depicted the B757 and T154 converging at the same altitude. There are two errors here. The first is a case of applying incorrect solution to a given problem, or Problem Solution Error while The second is breakdown of his scanning pattern, classified as Technique Error. However, it can be clearly seen that these were precipitated by an overworked controller facing an automation surprise and thereby resulting in cognitive tunneling, a totally avoidable situation!

The Russian Pilots.  While there is some debate over the actions of the Russian pilots in following the controller instead of TCAS, and some analysts tend to classify that as an Exceptional Violation. However, it must be understood that they acted purely in accordance with their training and the corporate culture prevailing in their company.

This also brings us to another very important cultural issues point. Individuals that were born and brought-up in the technology savvy western world are more comfortable with technology and more likely to trust a machine input, however, those that grew up in the developing and third world countries are not so comfortable with technology and more likely to believe me, The Erring Human, over the machine/technology...and hence the comfort level of the Russian Pilots in following the controller over TCAS ("...he is guiding us down!").

So, as we delve deeper in the details of an error or of a violation, we can see clearly the linkages it has to the higher levels in the food chain described earlier. If the concept is clear thus far, we are ready to move on to study the next level in the accident causation food chain.

I look forward to your comments and questions before moving further in the subject.

Until next week,

The Erring Human.

3 comments:

  1. Excellent naration, got the feeling as if we were watching the video by reading your text, and the contributing factor. Good work sir.

    ReplyDelete

  2. Your article is very or very helpful to me! thank you. mayo international school delhi Wishing you every success in your life!

    ReplyDelete
  3. https://theerringhuman.blogspot.com/2011/12/case-study-ueberlingen-midair-collision.html?showComment=1554881633580#c6827821267616035458

    ReplyDelete

Kindly refrain from posting obscenity or advertisements. Users posting inappropriate or unrelated comments will be blacklisted from further postings. Thank you for your understanding and for maintaining the professionalism of this blog.