Research Projects

My research interests are in the areas of Human-Computer Interaction, Accessibility, and Human-Centered AI. I design interactive systems that empower visually impaired people to access and engage with visual information through AI-augmented human collaboration. Through a User-Centered Design approach, I investigate how AI can enhance and transform collaborative interactions between visually impaired users and their sighted assistants. These investigations, funded by the National Institutes of Health (NIH) and encapsulated in the project “Conversations for Vision: Human-Computer Synergies in Prosthetic Interactions,” advance three areas: 1) computer-vision mediated spatial navigation that augments environmental awareness, localization, and orientation, 2) privacy-aware visual communication that balances user autonomy with effective assistance, and 3) human-AI interactions that expand and enrich engagement in creative, intellectual, and experiential activities. Together, these research directions are redefining how visually impaired people participate in traditionally visual-centric activities across diverse real-world contexts.

AI-Augmented Spatial Assistance

I design systems to improve sighted-blind interactions through Remote Sighted Assistance (RSA), a paradigm that connects a visually impaired user (“user” for short) with a remotely-located sighted assistant (“assistant” for short) via live video streaming. While RSA enables real-time support ranging from simple text reading to complex navigation, its effectiveness in spatial assistance is limited by challenges in acquiring users’ environmental information, and orienting and localizing users [1]. To address these challenges, I integrate Computer Vision techniques to interpret and augment real-world visual information.

My research began with exploring the design space through a low-fidelity, pictorial prototype using scenario-based design [2]. The prototype enhanced users’ video feeds with distance estimation for collision avoidance and created 3D environmental maps for localization and orientation (Figure 1). Through interviews with 13 professional assistants from Aira (aira.io), I iteratively refined the prototype design. The findings revealed the potential of Computer Vision-mediated RSA systems to empower assistants with spatial knowledge and reduce cognitive load, while highlighting the critical role of human expertise in managing navigation safety.

Building on these insights, I implemented a high-fidelity, interactive RSA system featuring 3D maps with advanced localization and virtual element placement capabilities (Figure 2). I evaluated this system against traditional 2D map-based RSA through a controlled laboratory study with 13 untrained assistants [3]. The results demonstrated significant performance improvements through detailed environmental visualization – reducing task completion time by 30% for locating rooms and 51% for identifying unmarked landmarks. These findings suggest that improved situational awareness can transform sighted-blind collaboration, enabling even non-professional assistants to navigate users safely and efficiently.

Illustration of agent's dashboard. The agent's dashboard is divided into two parts, with a camera feed on the right and an indoor 3D map on the left. On the map, annotations of landmarks, destination, path, and user's location are shown. Distance estimation, augmented reality of the planned path, and annotations of landmarks are shown on the camera feed.

Figure 1. Low-fidelity prototype illustrating proposed assistant dashboard with integrated distance estimation and 3D mapping

Illustration of 3D maps with distance band, user's position and orientation, and room number on the map

Figure 2. High-fidelity implementation of the 3D mapping (left panel of the assistant dashboard from Figure 1)

AI-Enhanced Dynamic Visual Anonymization

Privacy challenges emerge during RSA interactions when users inadvertently share sensitive visuals in their surroundings. To address this concern, I developed BubbleCam, a high-fidelity system enabling dynamic visual anonymization [4]. Unlike previous asynchronous, image-based approaches, this solution empowers users to selectively obscure objects beyond specified distances, preserving privacy during real-time video assistance (Figure 3). An exploratory field study demonstrated BubbleCam’s effectiveness (Figure 4), with 22 of 24 Be My Eyes (bemyeyes.com) participants endorsing its privacy controls. The system allows users to conceal personal items, messy areas, or bystanders, while helping assistants focus on task-relevant visual information.

BubbleCam represents a collaborative approach to privacy preservation in sighted-blind interactions, where both parties engage in establishing and maintaining privacy. This marks a significant advancement over regular RSA services, which often require users to sacrifice privacy in exchange for assistance. Moreover, BubbleCam introduces a novel arrangement for managing visual privacy, transforming what was traditionally an individual task of privacy maintenance into a cooperative and engaging experience.

Illustration of Privacy-Preserving RSA with BubbleCam. BubbleCam creates a virtual "bubble" (depicted in light grey semisphere) with an adjustable radius, where only objects within are visible to remote-sighted volunteers (camera view depicted in dark grey). This figure demonstrates the user placing an object of interest inside the bubble for visibility, while sensitive items (ID cards, prescription bottles, and family photos) remain outside and hidden from the volunteer.

Figure 3. BubbleCam: distance-based visual privacy protection in RSA

This figure shows a scene where documents with sensitive information are in the back of the three pill bottles. While the pill bottles, required to be observed by the agent, are visible, the background documents are obscured by a layer of grey-colored veil with no transparency at all.

Figure 4. BubbleCam obscuring objects beyond 0.8 ft in a field study

Human-AI Co-Design and Creation for Complex Activities

Recent large vision-language model-based AI-powered tools like Be My AI can understand users’ inquiries in natural language and describe scenes in audible text. I investigated whether such AI-powered visual assistance alone could support users in complex activities and possibly substitute the human-powered RSA paradigm [5]. In this investigation, my research extended beyond the usability and accessibility assessments to explore higher-level understandings that, even as specific technologies evolve, can still provide foundational knowledge and guidance for AI-powered assistive systems. Through interviews with 14 users and analysis of real-world cases from 4 social media platforms, this study uncovered both capabilities and limitations of AI-powered tools in assisting users. Be My AI’s context-aware capabilities are diminished by hallucinations, subjective or inaccurate interpretations of social, stylistic contexts, or people’s identities. Unlike human assistants in RSA services, Be My AI’s intent-oriented capabilities struggle to consistently comprehend and act on users’ intentions via real-time feedback. These findings suggested that human capabilities, neither users’ orientation and mobility skills and targeted prompting, nor assistants’ contextual and nuanced understandings of visual contexts, could be fully replicated by current AI systems.

Given these insights, I explored how human-AI interaction between the user, assistant, and AI tool could enhance assistive support for complex activities. This approach expanded traditional two-way RSA into a multi-way conversation paradigm with multiple interaction channels. I first investigated this paradigm through paired-volunteer RSA [6], where two assistants collaborated to support one user (Figure 5). An exploratory study with 9 users and 8 assistants from Be My Eyes demonstrated that multiple interaction channels enabled richer collaborative possibilities for assistants injecting more perspectives and more knowledge. This arrangement extended two-way RSA into a broader coverage of more intellectual and experiential tasks — from art appreciation and creative crafting to fashion help. However, the study revealed the importance of coordinated communication between assistants, as simultaneous information delivery created audio overload for the user.

Advancing from these findings, my current research explores the transition from multi-way human-only collaboration (paired-volunteer RSA) to human-AI interaction by replacing one human assistant with an AI entity. This evolution examines the handoff between the user, assistant, and AI tool, including (i) task allocation: determining which tasks are better handled by the assistant versus AI based on accuracy, efficiency, and user satisfaction, (ii) task transition mechanisms: identifying when and how to shift tasks between the assistant and AI based on complexity and required expertise, (iii) collaborative frameworks: developing effective methods for assistant-AI collaboration on shared tasks, and (iv) intervention protocols: establishing when and how the assistant or AI should deliver information to the user, preventing information overload. These questions explore the synergy between human creativity, emotional intelligence, and contextual judgment, alongside AI’s knowledge base and analytical capabilities. The goal is to advance human-AI co-design in accessibility, creating enriched experiences for visually impaired individuals across various leisure, creative, and intellectual activities.

The figure compares the difference between traditional RSA and the proposed paired-RSA. In traditional volunteer-based RSA, a blind user and a sighted volunteer establish two-way audio and one-way video connections. In contrast, paired-RSA allows a blind user to establish such connections with two sighted volunteers at the same time. Besides, the two volunteers can talk to each other in live video, making it a three-way conversation.

Figure 5. Paired-Volunteer RSA: extending one-on-one RSA to a multi-way conversation paradigm

References

[1] Sooyeon Lee*, Rui Yu*, Jingyi Xie, Syed M Billah, and John M. Carroll. (2022). Opportunities for Human-AI Collaboration in Remote Sighted Assistance. In 27th International Conference on Intelligent User Interfaces (IUI '22)

[2] Jingyi Xie, Madison Reddie, Sooyeon Lee, Syed M Billah, Zihan Zhou, Chun-Hua Tsai, and John M. Carroll. (2022). Iterative Design and Prototyping of Computer Vision Mediated Remote Sighted Assistance. ACM Transactions on Computer-Human Interaction (TOCHI), 29(4), 1-40.

[3] Jingyi Xie*, Rui Yu*, Sooyeon Lee, Yao Lyu, Syed M Billah, and John M. Carroll. (2022). Helping Helpers: Supporting Volunteers in Remote Sighted Assistance with Augmented Reality Maps. In Proceedings of the 2022 ACM Designing Interactive Systems (DIS '22)

[4] Jingyi Xie*, Rui Yu*, He Zhang, Sooyeon Lee, Syed M Billah, and John M. Carroll. (2024). BubbleCam: Engaging Privacy in Remote Sighted Assistance. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24)

[5] Jingyi Xie, Rui Yu, He Zhang, Sooyeon Lee, Syed M Billah, and John M. Carroll. (2025). Beyond Visual Perception: Insights from Smartphone Interaction of Visually Impaired Users with Large Multimodal Models. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '25)

[6] Jingyi Xie, Rui Yu, Kaiming Cui, Sooyeon Lee, John M. Carroll, and Syed M Billah. (2023). Are Two Heads Better than One? Investigating Remote Sighted Assistance with Paired Volunteers. In Proceedings of the 2023 ACM Designing Interactive Systems Conference (DIS '23)

Page updated

Google Sites

Report abuse