How working in a windowless basement with artificial intelligence could make aviation safer – a field report

Mysterious and important

Mysterious and important
An image from the daylight-free basement of the DLR Institute of Aerospace Medicine.

The rather dated eye tracker whirs at me like an annoying but lovable droid from Star Wars. Above it, twelve semicircular displays glow on a monitor – with minimal colour and little movement. However, should any of these indicators shift, I must immediately alert both my human colleagues and the digital operators monitoring the system. It's the middle of the night; I've been awake for 22 hours, and my eyelids are growing heavy as I battle fatigue. Just two more hours to go – hopefully without incident.

Since 4 p.m. yesterday, I've been stationed in the windowless basement of DLR's Institute of Aerospace Medicine in Cologne, participating in the LOKI study (Kollaboration von Luftfahrt-Operateuren und KI-Systemen; Collaboration of Aviation Operators and AI Systems), which examines the collaboration between human operators and AI systems. In the study, six DLR institutes, alongside external collaborators from aviation and research, are conducting various studies into acceptance, explainability, user satisfaction and predictability. The partnership between AI systems and human operators must function effectively not only under optimal working conditions, but also during high-stress scenarios such as fatigue-induced impairment during night shifts. This is precisely what's being simulated here – in this basement environment, sealed off from natural light and under carefully controlled conditions. That's why I'm here.

From caffeine withdrawal to sleep deprivation

It all begins with a newsletter in my inbox. A brief expression of interest leads to an invitation for a preliminary meeting and training session. In the week before the study starts, I must adhere to a strict sleep schedule from 11 p.m. to 7 a.m., wear an activity tracker to monitor my movements (including any unauthorised inactivity, like napping) and avoid alcohol and caffeine entirely. Maintaining the sleep diary and keeping to the schedule proves straightforward enough, but giving up coffee triggers severe headaches on the first day of withdrawal. Gradually, however, my body adapts to both the new sleep pattern and the absence of caffeine, so that by the Sunday before the study begins, I feel unusually fit.

Motivated and well rested, I descend into the institute's basement where the sleep laboratory is located. Drinks, fruit and snacks are laid out on tables, while a large sofa in the lounge area invites us to relax and pass the time with board games. After a brief welcome and the handover of my activity tracker and sleep diary, my fellow test subject Alex and I are led into the sleeping quarters. The winding, colourless corridors bathed in fluorescent light remind me of the television series 'Severance', in which employees toil away in the labyrinthine, sterile basement of a corporation. This association will return to haunt me.

The experimental setup
The test subject – and author of this article – monitors power plant machinery and has to respond to system malfunctions.

No sooner have we stowed our suitcases and exchanged our trainers for slippers than training begins. The tests assess our general reaction times and ocular responses – some using smartphones, others using computers fitted with eye trackers that initially struggle with my glasses. Nevertheless, we're given the green light and move on to the task at hand. As we discover shortly after, our first proper test run takes place on our very first night – meaning we're forbidden from sleeping or dozing off and then must perform our duties from 3 a.m. to 7 a.m., exhausted from sleep deprivation.

Monotony and sluggish glances

We battle to stay awake with TV series, video games and snacks. Rising fatigue around 11 p.m. gives way to a sudden burst of alertness around 1 a.m., as our bodies appear to cross some physiological threshold. At 3 a.m., we're escorted to the testing room. Once again, the eye trackers require calibration – this time with even older models than those in the preliminary assessment. The camera system, housed in bulky casing, produces a surprising array of buzzing, whirring and squeaking sounds. Alex and I – both Star Wars enthusiasts – are reminded of the quirky droids from the films and series, which immediately endears us to the eye trackers. Apart from their growl and our occasional mouse clicks, the next few hours pass in complete silence.

Contrary to what the study title suggests, we're not working in an aviation scenario but in a simulated production line environment. In essence, our task involves overseeing three fictional production sites, each equipped with a power plant and three unspecified machines. Alex and I are designated as human operators A and B, working alongside an AI operator that functions in the background, periodically relaying information to our screens. The first run commences.

As soon as a machine's target or actual reading changes, we must report it by clicking on a small button beneath the corresponding machine.

Daniel Beckmann

For thirty minutes, we have to stare intently at the screen, watching for changes in one or more of the twelve displays. What is supposedly being 'produced' here remains a mystery. Memories of another Severance scene spring to mind – the characters also sit in a sparse room, staring at screens and clicking on numbers. When they ask what they're doing there, a supervisor replies: "The work is mysterious and important."

The machines and their usage are displayed as semicircular gauges showing target and actual values, whilst the power plants appear as additional semicircular scales ranging from green through yellow to red. If a target or actual reading shifts, we must report it by clicking on a small button beneath the corresponding machine. We assign risk scores to each parameter according to rules we've learnt beforehand and submit our proposed actions to resolve possible machine failures and their combinations – along with an assessment of our confidence in these actions. If we agree on the diagnosis, the malfunction is resolved. With each passing hour, our performance noticeably and rapidly deteriorates.

The actometer
The movement and heart rate of the test subjects are monitored using activity trackers.

This is where cooperation with – and trust in – the AI becomes crucial. The computer operator has already suggested solutions at the start of each diagnosis, so we first review its recommended actions to see if we can identify alternative options or if we need to correct those existing. At around 5:30 a.m., I experience a surprising moment of clarity when I identify a third, superior solution that neither the AI nor Alex had considered. After roughly three and a half hours, we finally complete the session. Then, we must perform another reaction test. The assessment feels endless, and we both notice our eyes beginning to fail us – closing involuntarily or darting about unfocused. It's finally bedtime.

The hardest part is over

The following day, a gong rouses us from our five-hour power nap at 1 p.m. For the remainder of the day until bedtime at 11 p.m., we enjoy our leisure time, catch up on work and chat about our interests.

Our final joint operator session takes place on the last morning, with only fifty minutes between the wake-up call and the start of the experiment. The process is essentially identical to that of the first night, except this time we're well-rested and the monotonous screen-watching troubles us considerably less. Then, we're finished with everything – operator sessions, reaction tests, questionnaires – and we pack our bags and emerge from the basement back into daylight. With the satisfying sense of having contributed to scientific research, we savour the gentle breeze and revitalising sunshine.

Plot twist: supposed AI with a script

Weeks later, we discover that the artificial intelligence in our experiments wasn't genuine AI at all but had merely followed a predetermined script. Ultimately, it's all about relationships, trust and support – between people as well as in relation to artificial intelligence. Can AI team members provide the same technical and psychological support as humans? Under what circumstances? Does the transparency of AI colleagues make a difference? And what does AI actually do after hours...?

Related links

Contact

Editorial team DLRmagazine

German Aerospace Center (DLR)
Corporate Communications
Linder Höhe, 51147 Cologne