Standardizing Visual Rehabilitation using Simple Virtual Tests

Many different visual rehabilitation approaches are being utilized to offer visual information to the blind. User proficiency and functional ability are currently evaluated either via ad-hoc tests or via standardized visual tests which are not sensitive enough in the range of extreme low vision. Unfortunately, this is the functional level that these approaches typically offer. This is especially important as the main criteria by which most users will judge the efficacy of these rehabilitation approaches is by the functional benefits it grants them. Furthermore, currently, there are no accepted benchmarks or clear comparative testing of each rehabilitation approach, leading to the development of many new aids but the practical adoption of few. Combined these indicate a need to add standardized functional tests to this evaluation toolbox. Indeed, several functional tests have recently been suggested but their adoption has been very limited. Here, we review current tests and then conduct a formative study consulting experts in the field to map issues with current standardization attempts. This formative study offered a list of practical design suggestions for functional standardization tests. We then suggest using simple virtual environments as one such family of tests. Virtual scenarios meet many of the experts’ suggestions - they are easy to share, flexible, affordable, safe, identical wherever run, can be run by a single operator and offer control over external parameters enabling a focus on the offered visual information. Finally, we demonstrate this approach via a freely available virtual version of a relatively standard functional test - finding a door - in a 10-minute paradigm which includes 30 trials. We find that congenitally-blind and sighted-blindfolded subjects cannot perform this task without the device, but that they perform it successfully with it, demonstrating the tests’ potential viability.


I. INTRODUCTION & MOTIVATION
Vision is one of the most dominant senses for humans, with severe visual impairments and blindness leading to significant everyday challenges. Many different approaches are being utilized to deal with these challenges and develop visual rehabilitation and assistive tools (reviewed at [1]). These include among many others retinal (reviewed at [2]) and cortical (reviewed at [3]) prosthesis, gene therapy (reviewed at [4]) and sensory substitution devices (reviewed at [5]). Though none of these approaches is currently mature enough to offer users an adequate functional solution and none have been widely adopted, each of these methods seems to hold great potential and to have its own advantages and disadvantages. An important step which is currently missing is a standardization of the field [3,[5][6][7]. Standardization will enable clear comparative evaluation of differences between approaches and devices, the evaluation of the state and progress of each project using common benchmarks and the assessment of the proficiency level and functional skills of a given user in a way that is practical in the average clinic.
To aid in standardizing the field, we first review the existing methods and then conduct a formative study with visual rehabilitation experts to determine the needs and requirements for standardized tests. We will then suggest a potential family of simple virtual tests which meet many of these requirements and demonstrate an implementation of one such test.

II. HOW ARE VISUAL SKILLS CURRENTLY EVALUATED?
We will first review how visual skills are currently evaluated during and following visual rehabilitation. These include three main categories of tools: 1. Dedicated Questionnaires. One option is using standardized questionnaires such as VisQol and Massof. However, these are not sensitive enough within very low vision levels. To address the need for higher sensitivity, researchers working with the Argus-2 retinal implant developed standardized questionnaires such as FLORA [8], ULV-VFQ [9]. A common downside to using questionnaires is their subjective nature, making it harder to generalize across different devices and subjects [10].
ophthalmology. However, while these tests have also been used to test assistive technology [6,[13][14][15] it is important to note that they too are not considered sensitive to differences within the ranges of extreme low-vision, with reports typically focusing on whether or not users could pass the threshold of legal blindness.

Functional ability tests.
A third way of assessing visual skills is by testing the ability of users to perform a specific task rather than at a specific parameter (e.g. acuity, field of view) [9]. Most studies attempting to create new assistive tools use an adhoc functional test to demonstrate its abilities. Many of these tasks share common themes -locating the door to a room and navigating to it, recognizing colors or specific objects, testing eye-hand coordination, finding an object or person around the user, avoiding obstacles and more (e.g. [10,14:20], Figure 1). However, none of these tasks have become standardized or have been adopted across the community, despite several impressive recent efforts (e.g. the Brainport team created a kit for assembling a standardized obstacle court [21]). A notable exception is the work done within the retinal implant community, with emphasis on the Argus-2 retinal implant, driven by FDA regulations and requirements as part of approving the implants for clinical use (review: [6,16]). These included a series of detailed tests for evaluating basic functionality -light detection (e.g. BaLm, detecting if lights are on/off), light localization, movement detection, color identification, and shape identification. However, these tests are not fully standardized even within the bionic eye community [2] and have not spread out of it to other visual rehabilitation approaches. Retinal implant researchers are also the driving force behind an ongoing attempt to create community-wide standards and guidelines for test design [22], though it's current status is unclear.
Here we will focus on the third category of testing functional abilities, which is one of the main aspects potential users care about. Why have none of these tasks become standardized across all fields?

III. 3. FORMATIVE STUDY -INTERVIEWS AND SURVEYS WITH VISUAL REHABILITATION EXPERTS
There are several excellent studies exploring the expectations and needs of potential users (e.g. [23,24]). However, there is far less information available from the complementary perspective of the developers, personal trainers and researchers using them with groups of users. For example, from the perspective of standardization, a personal trainer who has trained several users will have insights into how to relatively gage the progress of each participant, a viewpoint which a single blind trainee may lack. The same holds for many developers and researchers who have personal experience working with multiple users and multiple devices. Thus, we decided to gather the opinions of these groups about the challenges delaying standardization and how to address them.

A. Methods
To understand the obstacles preventing standardization, we gathered information in several ways. First, we interviewed 12 developers and personal trainers in person in a semi-structured interview format, to understand the challenges facing standardization. We pre-defined several themes, starting with general questions about what is hampering the adoption of visual rehabilitation devices, and only then focusing on testing standardization. The main defined themes were (a) what tests do they use in practice or think should be used, (b) characterizing the properties of future standardized tests, and (c) querying the specific parameters of interactivity and enjoyment, both of which we expected participants to see as crucial. To further generalize our findings and to avoid over-relying on experts from a specific country and groups, we sent a shorter written survey (https://goo.gl/wwXwp6) to experts around the world, gathering responses from 4 additional teams. The participants of this study had experience working with a wide variety of devices, including the Argus 2 retinal implant, the EyeCane and UltraCane virtual canes and the Brainport, vOICe, EyeMusic Sensory Substitution Devices.
1. It's important, but is there something out there? All participants acknowledged the need for standardizing testing and viewed it as important (though many prioritized other issues, with an emphasis on the availability of training programs). Participants agreed in general that they would be happy to run such a test if it existed and met their practical requirements. However, while they were familiar with tests from categories 1-2, and with many non-standardized functionaltests, most were not aware of a relevant functional task, or of a centralized attempt to promote one. Two participants noted that they had heard of the existence of a multi-national team led by Ayton [22] attempting to address this issue, but were not sure if this team was still active.

Fair test choosing.
A common problem raised was the question of who chooses the test, as each development team  [14]. (B) Snellen visual acuity test using the Brainport [13]. (C) A standardized obstacle course from [21]. (D) Finding an apple with a specific color [38]. (E) Pairing socks [16]. might be biased towards tests in which their device works best. Suggestions for how to solve this issue included choosing these tests via an independent committee, gathering community experts to decide via international consensus on a standard or adapting functional tests from other realms (e.g. using a "water maze" paradigm to assess spatial navigation abilities). Attempts at creating previous international standards have not been successful, with 2 participants noting [22] as a potential future source of authority.
3. Technical issues. The main bulk of the reasons participants suggested as blocking functional standardization tests were technical in nature. Specifically, they noted that such tests are severely hampered by the complexity, size, cost, and difficulty of creating identical setups in different locations worldwide, limiting both the extent of the test and the ability to share it. Participants felt that most of the non-standardized functional tests they were aware of could currently be performed in a typical clinic setting only with extensive investments in money, time, space and manpower. Participants felt that a simpler test is better than one that will not get used, and that if the test were short enough it could be used multiple times over the course of training to track progress. The challenges of consistency across multiple sites, especially without the person running the test having visited other sites in which it was run were another frequently mentioned point.
One recent impressive attempt noted by two of the participants [21] has been the creation of a dedicated portable setup kit for assessing navigation and obstacle avoidance, which can be assembled identically in different locations. However, they noted that even this setup requires a specifically sized and isolated space, and can be difficult and expensive to administer. Accordingly, they saw it as more practical for research settings than for a doctor's office.

Establishing baselines.
Participants noted that many behavioral results tend to be varieties on "The user was able to do X". Without proper baselines, and without assessing the relative contribution of the device compared to test-retest effects without training, it is unclear how much many devices actually contribute. Thus, a good standardization test should have clearly established baselines compared to a group of users without the device, a wider group of users using it and ideally also subject specific results from previous training sessions and from attempting the task without the device.

Testing across devices and naturalness.
Participants' split the tests in their answers to two kinds -testing the abilities of a given user and testing the functional abilities offered by a given device. Several participants noted that these may require separate tests, or at least separate runs in the same test. Tests which focus on a device need to isolate the effect of the device, control for compensatory abilities and for interaction with other devices. These tests also need to gather results from a group of users to control for biases from the abilities of outlier users. On the other hand, tests that focus on user ability should be as natural as possible and enable maximum utilization of the way the skills will be used in the real world, including relying on their combination with compensatory capabilities. Another key point in that aspect was keeping the test agnostic to specific devicesa good test needs to be able to be accessible to as many devices as possible without tweaking it, otherwise, comparisons are problematic and the additional required effort might limit its adoption by for testing new unfitting devices. Thus, participants stressed the need for a battery of standardized tests rather than a single one. This is because different devices, such as augmented white-canes on the one hand or visual-to-auditory sensory substitution devices on the other, have different complementary purposes and should be tested differently. e.g. a test for color perception would not be suitable for an augmented cane.
6. Interactive nature. Several participants stressed the importance of interactive aspects for assessing functional abilities. E.g. "To be truly functional you need to be interactive, otherwise it's more like an auditory or tactile perception task". Participants also noted that this is true for training as well, citing works where active sensing and interactive use of assistive technology boosted learning and performance (e.g. [25,26]).

Duration and enjoyment.
The participants had split opinions on this aspect. Some felt that user engagement and enjoyment were critical components and that having tests that were gamified and fun would enable longer tests. Others, on the other hand, felt strongly that the tests should be as bare-boned and controlled as possible, and that as long as they were short enough user enjoyment was not a key parameter.

C. Discussion
The participants' reactions indicated that the need for standardization is there, and offered a list of recommended attributes -cheap, available in identical versions, easy to set up, easy to run, short, interactive, has existing baselines and agnostic to device identity. These offer us a concrete set of challenges to overcome. These aspects can be considered recommendations for principles for design for future standardization tasks.
Other aspects, such as a community consensus, are harder to solve and it is hoped that a test which is easily available and generalizable enough will be able to overcome this barrier as well. A potential solution may arise from [22] or from FDA regulated devices as assistive technology becomes closer to being market-ready and will require functional testing to garner official approval.
As researchers working in this field most of these results matched our expectations. Notable exceptions were (1) the importance assigned to enjoyment by many participants was lower than we anticipated. (2) The broader emphasis participants put on improving training in general, with many participants seeing standardizing testing as a sub-component of standardizing training.
Finally, while the participants interviewed here were experts in visual rehabilitation, most of the answers given here potentially apply to rehabilitation in general.

IV. FUNCTIONAL TESTING WITH SIMPLE VIRTUAL SCENARIOS
Guided by the formative study, we searched for different potential implementations of these principles. One possible class of tests which meets these requirements are simple desktop virtual environments which test specific functional tasks. This approach fits in with a wider trend of the growing significance of Virtual Reality in the realms of rehabilitation and the testing of a wide variety of functional abilities (e.g. [26][27][28][29][30]).
Specifically, the use of virtual environments has several major advantages in the context of standardized testing: (1) They are easy to share identically by running the same software between centers in a fast and affordable way. (2) They offer control over the parameters within the environment, enabling focus on specific aspects of a task and controlling for compensatory use of the users' other senses. (3) These environments do not require dedicated hardware beyond a standard computer and offer multiple flexible environments without requiring large setups and space. (4) They enable tests to be run safely (5) Enable gamification, enabling longer and more complex testing. (6) This kind of test can be run via the visual output of a standard computer screen without requiring dedicated hardware or costs beyond those of the visual rehabilitation device itself -similar to the use of a computer screen for the standard visual ability tests mentioned above.
On the other hand, virtual environments have critical drawbacks which should be acknowledged. First and foremost, as good as a simulation may be there is still a large gap between real-world scenarios and simulations. Current virtual reality technology does not engage all of the senses, such as taste and smell, and in the case of screen-based virtual reality also the idiothetic (internal self-motion, e.g. vestibular and proprioceptive) senses. This is especially true when considering that most of the Blind and Visually impaired population is elderly, and reside in third world countries where more expensive set-ups are not available. Another problematic factor for immersive VR setups is the potential for nausea ("cybersickness") in many subjects [31]. Combined, these issues would suggest that a screen-based VR might be better suited for clinical settings then immersive Head-Mounted VR.
As virtual environments are typically visually-based their use with users who are blind may seem counter-intuitive. However as demonstrated in [26], even users who are congenitally blind can perceive, conceptualize and even feel immersed in virtual environments when the environment's content is accessible to them.

V. DEMONSTRATING OUR APPROACH WITH A "FIND THE DOOR"
VIRTUAL TASK This section will demonstrate the approach of using simple virtual scenarios as standardized tests, with an implementation of a relatively common functional test -finding a door in a room. Importantly, this test is merely an example of the approach and we do not claim any special advantage to the specific implementation.
This test is freely available online at http://brain.huji.ac.il/EM_Training/ and on Github (https://github.com/shacharma/standardization). We are in the process of translating the instructions from Hebrew to other languages. We invite researchers and clinicians to explore their use and welcome input and feedback.

1) Participants.
Eight congenitally blind and 40 sighted participants participated in the experiment. The sighted participants were divided into three groups -10 participants were blindfolded and did not use any assistive technology ("NoAssistiveDevice" group), 15 participants performed the task visually ("Visual" group) and 15 participants were blindfolded and used EyeMusic ("Blindfolded-Sighted" group). The experiment was approved by the Hebrew University Ethics committee and all participants signed their informed consent.

2) What are we testing here?
The main functional goal is finding and navigating to a door. Underlying this functional goal are several abilities: a. Locating the door. In this case, finding a target on a high-contrast background.

b. Using Visual principles -Different shapes from different angles and distances.
A door is a rectangle only when viewed from directly ahead. From any other angle, the shape becomes skewed. The user must understand that this shape is still the door and use the specific shape in order to understand at what angle they are from it. This also incorporates changes in shape during motion.
c. Spatial representation -understanding where to situate yourself, the target and the distance to it, and then plan a travel vector to it accordingly.

3) Experimental paradigm -the test.
Users received the following instruction: "You are standing in a random location in a room. All of the walls are white and there is a black door somewhere in the wall in front of you. Find it and navigate to it as swiftly as possible without touching any of the other walls. You have either 10 minutes or thirty trials, whichever ends first". They then underwent two training trials with feedback from the instructor which focused on the keyboard controls. Finally, they performed the task itself. In every trial, they were randomly placed in one of five potential starting locations and had to find the door. Touching the walls of the environment ended the trial, with the participant's performance in the trial scored as a success if they reached a door and as a failure if they reached a wall. The visual group performed this task visually. The blind and sighted-blindfolded groups performed the task via the EyeMusic, and the NoAssistiveDevice group performed this task without assistive devices or visual information. The NoAssistiveDevice group was included as a control for verifying users did not find another way to solve the task and to establish a chance level. During the task, no feedback was given by the instructor beyond that supplied automatically by the test software indicating success/fail and the start of a new level. Following task performance, the participants filled out a short questionnaire about their experience.

4) Experimental setup.
The test requires only a standard monitor output and a standard keyboard for an interface.
Each specific visual aid comes with its own specific requirements -here this included headphones and an installed version of EyeMusic (see "EyeMusic" below) activated on screen sonification mode. In this mode, EyeMusic sonifyes the on-screen content with no direct interface to the software generating that content. Importantly, the task could also be performed simply by aiming a camera at the screen. We verified that this test could also be run via the vOICe SSD in screensonification mode. The experiment was created using Unity3D and JavaScript.

5) EyeMusic.
For demonstrating our test we used EyeMusic, which is described in depth in [19]. EyeMusic is a visual-to-auditory Sensory Substitution Device conveying whole-scene location, shape and color information. It does so by using a left-to-right sweep-line technique, which translates the X-axis into time and the Y-Axis into musical notes on a blues scale. Different colors are conveyed by different musical instruments to create relatively pleasant auditory stimuli. The captured image can be taken from a file, camera, or as done here by continuous automatic screen-shots of the display. This device has already been used for a series of experiments including object and shape recognition, target reaching and exploring the neural correlates of number representation, for tasks in virtual environments and for real-world tasks such as finding a specific item in a noisy supermarket [19,26,32].

6) Statistics.
Due to the limited sample size, all analysis were nonparametric Wilcoxon tests (unpaired rank-sum). All statistics have been corrected for multiple comparisons (Bonferroni).

B. Results
We first established the chance and ceiling levels. The chance level was established by the NoAssistiveDevice control group, which had a success rate of 15±8% setting a lower potential bound. The sighted group which performed the task visually had a nearly perfect success rate of 99±3% offering an upper potential bound for comparison. The two groups using the EyeMusic had similar results with 66±19% for the blind and 56±32% for the sighted-blindfolded, both significantly above chance (p<0.0001 and p<0.001 respectively, Figure 3).
We then visualized the paths taken by the users via heat maps of their location-duration. The heat-maps (figure 4) visualize how the visual group clearly went from the five starting locations to the target. These paths are clear, though to a lesser extent, also in the maps of the Blind and Blindfolded-sighted groups but do not exist at all in the map of the NoAssistiveDevice group.
We next wanted to verify that the task was not arduous and that even this minimalistic environment was enough for immersion. We did so by having the participants complete a questionnaire after performing the task. Enjoyment -general reactions by the users who were blind or sighted-blindfolded were generally positive ("this is a fun game, could be any target, not just a door " user B1, "Could be a great game" user SB14). Immersion -Users who were sighted, blind and blindfoldedsighted reported (3.3±1, 4.5, 3.3±1.4 respectively on a scale of 1-5, 1-not at all, 5-immersed) feeling immersed ("really cool, you feel like you're in a room" user SB1 "I could see myself looking for the door" user SB11) in contrast to NoAssistiveDevice users who reported they did not feel immersed.

C. Discussion
We demonstrated our suggestion of using simple desktop virtual environments for functional standardization tests via a simple virtual task. This task had a maximum duration of 10 minutes and did not require any specialized equipment. Our results demonstrate that congenitally blind and blindfoldedsighted users could perform it well over a controlled baseline of attempting the task without the device. Their movement patterns indicated an understanding of their environment, and they reported that the task was pleasant and immersive.
The blind and blindfolded users performed the same task but faced very different challenges. The main challenge for the blindfolded was learning to use this new device despite only ~15 minutes of training. For the blind, however, especially for the veteran users, the main challenge was not understanding the device but rather learning to parse the visual information -in a way, learning to see. They needed to learn how the target changes from different angles from a rectangle to trapeze depending on the visual angle. They needed to learn how the target changes with distance, growing as they got closer to it.
The blindfolded-sighted users only had <15 minutes of sensory substitution device (SSDs) experience and were still able to complete the task with an impressive, though far from perfect, success rate. This indicates the simplicity of basic understanding of the EyeMusic but the difficulty of attaining a higher level of proficiency. One of the oft-mentioned main disadvantages of SSDs is the difficulty in learning to use them and often performing any practical task with them is thought to require significant periods of training. Here we demonstrated that even after only 15 minutes of training with a relatively complex SSD blindfolded-sighted users are able to successfully Fig. 3. virtual door test results: success rates (±SD) for the 4 different groups, the 2 groups using the EyeMusic are compared to the control group without any assistive device and significantly above the bar set by them, though without reaching the bar set by the visual group. complete this task. This was despite the need to utilize a variety of visual skills based on the auditory information -moving through an environment, perceiving the space and target, perceiving the changes in the target from different angles, etc. We suggest that this may be the result of the active sensing aspect of this task, providing another demonstration to the importance of active interaction with training tasks [25,26].

VI. GENERAL DISCUSSION
Here, we suggested that simple virtual tasks may be useful as a set of standardized functional tasks. We based this suggestion on a list of principles for design which arose from interviews with visual rehabilitation researchers, developers, and personal instructors. Finally, we demonstrated our approach with one such simple virtual task.
We are not the first to think of using virtual environments for testing visual rehabilitation tools. Beyond our own work (e.g. [26]]) several other teams have also made use of virtual tasks on an ad-hoc basis (e.g. [6], Figure 5). Indeed we would recommend integrating these existing tasks into a common toolkit, after adapting them to meet the requirements discussed above. We do not claim any specific advantage to our example task and indeed, as suggested above, an ideal toolbox will cover many different functional aspects. We are currently in the process of adding several more tasks, including a virtual version of [21] and tests of specific features based on the experiment used in [26]. We also wish to emphasize that this combined toolbox of virtual tasks will still need to be complemented by real world functional tasks as described above.
The critical next step for this demonstration, and for other tasks which will be added, is integrating them into practical use. Using them to track user progress over time in a longitudinal study (currently underway in our lab) and to compare multiple devices and visual rehabilitation approaches.
Beyond the advantages of using virtual environments for functional standardization testing, they have great potential also for training, which was another key factor mentioned by our participants in section 2 (and see review of this aspect in [5]). Heat-maps for the full routes of all participants from each group. All routes plotted seen from above. Note that in the visual group the paths from the 5 starting locations to the target door are clear. These 5 main paths can be discerned, though less clearly, also in the maps for the blind and sighted-blindfolded groups, but do not exist at all in the noAssistiveDevice group.
Training in virtual environments offers users practice on their general skills with the device, and the ability to train safely from their own homes. This is especially important given that in recent years several teams have demonstrated the potential for transferring learned information and skills from virtual to real environments [33][34][35].
Note that our suggestion here focuses on standard desktop monitors and not on head-mounted displays. Head-mounted displays are more immersive and cutting edge, and most importantly have a much higher ecological validity. However, they are much harder to interact with via current accessibility tools, and require more space, costs and expertise to use. This significantly lowers their availability and the ease of using them identically in different locations. In the future, as these advanced setups become more commonplace these limits will likely be significantly mitigated. Another key future addition will be the use of better motor interfaces, enabling more ecologically valid scenarios in both types of VR.

VII. CONCLUSION AND FUTURE WORK
Here we outlined the need for standardization tests for the visual rehabilitation community. We reviewed the methods currently used, and then Interviewed a group of personal visual rehabilitation instructors and surveyed experts. These affirmed the need for standardization and suggested several practical requirements for standardized tests. While the participants interviewed here were experts in visual rehabilitation, most of the answers given here potentially apply to rehabilitation in general. We suggested that simple desktop virtual environments may prove a useful tool for standardized tests of visual rehabilitation methods, with advantages such as ease of sharing, affordability, wide range of options, transfer of information, safety and the ability to isolate specific visual parameters. Finally, we demonstrated an implementation of one such task and demonstrated its use. We call upon the visual rehabilitation community to choose several such scenarios and use them in a combined toolbox with the different approaches to better define the strong and weak points of each method.