What You Look At is What You Get: Eye Movement User Interfaces Robert J.K. Jacob Naval Research Laboratory Washington, D.C. As computers become more powerful, the critical bottleneck in their use is often in the interface to the user rather than the computer processing itself. A goal of research in human-computer interaction is to increase the communication bandwidth between the user and the machine. At NRL, we are studying hitherto-unused methods by which users and computers can communicate information, focusing on obtaining input from the user's eye movements. That is, the computer will identify the point on its display screen at which the user is looking and use that information as a part of its dialogue with the user. For example, if a display showed several icons, a user might request additional information about one of them. Instead of requiring the user to indicate which icon was desired by pointing at it with a mouse or by entering its name with a keyboard, the computer can determine which icon the user was looking at and give the information on it immediately. Eye trackers have existed for a number of years, but their use has largely been confined to laboratory experiments. The equipment is now beginning to become sufficiently robust and inexpensive to consider use in a real user-computer interface. What is now needed to make this happen is appropriate "interaction techniques" that incorporate eye movements into the user-computer dialogue in a convenient and natural way, and this is the focus of research currently under way at NRL and several other laboratories investigating human-computer interaction. A user interface based on eye movement inputs has the potential for faster and more effortless interaction than current interfaces, because people can move their eyes extremely rapidly and with little conscious effort. A simple thought experiment suggests the speed advantage: Before you operate any mechanical pointing device, you usually look at the destination to which you wish to move. Thus the eye movement is available as an indication of your goal before you could actuate any other input device. However, people are not accustomed to operating devices in the world simply by moving their eyes. Our experience is that, at first, it is empowering to be able simply to look at what you want and have it happen, rather than having to look at it and then point and click it with the mouse. Before long, though, it becomes like the Midas Touch. Everywhere you look, another command is activated; you cannot look anywhere without issuing a command. The challenge in building a useful eye movement interface is to avoid this Midas Touch problem. Carefully designed new interaction techniques are thus necessary to ensure that they are not only fast but that use eye input in a natural and unobtrusive way. Our approach is to try to think of eye position more as a piece of information available to a user-computer dialogue involving a variety of input devices than as the intentional actuation of the principal input device. A further problem arises because people do not normally move their eyes in the same slow and deliberate way they operate conventional computer input devices. Eyes continually dart from point to point, in rapid and sudden "saccades." Even when a user thinks he or she is viewing a single object, the eyes do not remain still for long. It would therefore be inappropriate simply to plug in an eye tracker as a direct replacement for a mouse. Wherever possible, we therefore attempt to obtain information from the natural movements of the user's eye while viewing the display, rather than requiring the user to make specific trained eye movements to actuate the system. We partition the problem of using eye movement data into two stages. First we process the raw data from the eye tracker in order to filter noise, recognize fixations, compensate for local calibration errors, and generally try to reconstruct the user's more conscious intentions from the available information. This processing stage uses a model of eye motions (fixations separated by saccades) to drive a fixation recognition algorithm that converts the continuous, somewhat noisy stream of raw eye position reports into discrete tokens that represent user's intentional fixations. The tokens are passed to our user interface management system, along with tokens generated by other input devices being used simultaneously, such as the keyboard or mouse. Next, we design generic interaction techniques based on these tokens as inputs. The first interaction technique we have developed is for object selection. The task is to select one object from among several displayed on the screen, for example, one of several file icons on a desktop. With a mouse, this is usually done by pointing at the object and then pressing a button. With the eye tracker, there is no natural counterpart of the button press. We reject using a blink for a signal because it detracts from the naturalness possible with an eye movement-based dialogue by requiring the user to think about when he or she blinks. We tested two alternatives. In one, the user looks at the desired object then presses a button on a keypad to indicate his or her choice. The second alternative uses dwell time--if the user continues to look at the object for a sufficiently long time, it is selected without further operations. At first this seemed like a good combination. In practice, however, the dwell time approach proved much more convenient. While a long dwell time might be used to ensure that an inadvertent selection will not be made by simply "looking around" on the display, this mitigates the speed advantage of using eye movements for input and also reduces the responsiveness of the interface. To reduce dwell time, we make a further distinction. If the result of selecting the wrong object can be undone trivially (selection of a wrong object followed by a selection of the right object causes no adverse effect--the second selection instantaneously overrides the first), then a very short dwell time can be used. For example, if selecting an object causes a display of information about that object to appear and the information display can be changed instantaneously, then the effect of selecting wrong objects is immediately undone as long as the user eventually reaches the right one. This approach, using a 150-250 ms. dwell time gives excellent results. The lag between eye movement and system response (required to reach the dwell time) is hardly detectable to the user, yet long enough to accumulate sufficient data for our fixation recognition and processing. The subjective feeling is of a highly responsive system, almost as though the system is executing the user's intentions before he or she expresses them. For situations where selecting an object is more difficult to undo, button confirmation is used rather than a longer dwell time. Other interaction techniques we have developed and are studying in our laboratory include: continuous display of attributes of eye-selected object (instead of explicit user commands to request display); moving object by eye selection, then press button down, "drag" object by moving eye, release button to stop dragging; moving object by eye selection, then drag with mouse; pull-down menu commands using dwell time to select or look away to cancel menu, plus optional accelerator button; forward and backward eye-controlled text scrolling. The next step in this research is to perform more controlled observations on the new techniques to put our results on a more objective footing. Our first experiment will compare object selection by dwell time with conventional selection by mouse pick. Initial pilot runs of this procedure suggest a 30 per cent decrease in time for the eye over the mouse, although the eye trials show more variability. Finally, eye movement-based interaction techniques are an exemplar of a new, non-command style of interaction. In a non-command-based dialogue, the user does not issue specific commands; instead, the computer passively observes the user and provides appropriate responses. Because the inputs in this style of interface are often non-intentional, they must be interpreted carefully to avoid annoying the user with unwanted responses to inadvertent actions. Our research with eye movements provides an example of how these problems are being attacked. FURTHER READING (EYE MOVEMENT INTERFACES) T.E. Hutchinson, K.P. White, W.N. Martin, K.C. Reichert, and L.A. Frey, "Human-Computer Interaction Using Eye-Gaze Input," IEEE Transactions on Systems, Man, and Cybernetics, Vol. 19(6) pp. 1527-1534 (1989). R.J.K. Jacob, "The Use of Eye Movements in Human-Computer Interaction Techniques: What You Look At is What You Get," ACM Transactions on Information Systems, Vol. 9(3) pp. 152-169 (April 1991). I. Starker and R.A. Bolt, "A Gaze-Responsive Self-Disclosing Display," Proc. ACM CHI'90 Human Factors in Computing Systems Conference pp. 3-9, Addison-Wesley/ACM Press (1990). FURTHER READING (GENERAL) J. Nielsen, "Noncommand User Interfaces," Comm. ACM, Vol. 36(4) pp. 83-99 (April 1993). B. Shneiderman, Designing the User Interface: Strategies for Effective Human-Computer Interaction, Second Edition, Addison-Wesley, Reading, Mass. (1992). See the Proceedings of the ACM CHI'83-CHI'93 Human Factors in Computing Systems Conferences, Addison-Wesley/ACM Press (1983-1993) BIOGRAPHY Robert Jacob is a Computer Scientist in the Human-Computer Interaction Lab at the Naval Research Laboratory in Washington, D.C. and head of its Advanced Interfaces Section, where he does research on interaction techniques and user interface software. He received his Ph.D. from Johns Hopkins University and is also on the faculty of George Washington University. He is Vice-Chair of ACM SIGCHI and a member of the editorial board of ACM Transactions on Computer-Human Interaction.