Vamsi K Ithapu
My primary research interest is machine learning (ML) and computer vision (CV).
The key technology area(a) that motivates my work include
-
AI for health and wellness
-
AI systems that enhance human perception & reduce cognitive load
-
AI for individual & ecological wellbeing
In the past decade, my technical contributions have spanned various AI domains :
-
Multimodal ML for conversational & social understanding
-
Egocentric (First Person) computer vision
-
ML systems with Human-in-the-Loop
-
Personalization of audio-visual content
-
On-device multimodal machine learning (specifically wearables)
-
Superhuman hearing & wearables based hearing aids
-
Enhancing audio/sounds perceptual realism in virtualization
-
Assistants for wearables personalization
@Meta
At Meta reality labs research, my focus is on demonstrating wearables form-factor (SG and AR in particular) as a medium to improve human health and wellbeing, especially in conversational and social scenarios.
-
How can we use wearables to enhance human hearing capacity (e.g., enabling conversations in high noise settings like restaurants, social events, etc; enhancing hearing capacity for hearing impaired individuals) ?
-
How can we make virtual conversations perceptually realistic, increase their realism, and thereby increase social connection and presence ?
-
How can we reduce audio and sounds cognitive load in daily settings with wearables and meta AI ?
-
What general purpose multimodal AI systems (camera + microphone + gyrometric) are needed to discover and drive audio-driven use-cases with wearables form-factor?
My team's approach to answering these questions has been to innovate on all-day egocentric AI with physics/domain driven perceptual and cognitive knowledge; along with e2e system-level AI optimization.
My broader org at Reality Labs research discovers, designs, develops and builds egocentric multi-sensory systems for Audio & Conversational experiences by bringing together : AI/DSP always-on systems, wearable device physics and human auditory perception.
Some references to the work done by me / my team over the past decade:
Conversation Focus on Smart glasses
-- Connect 2025 https://www.youtube.com/live/D97ILdUbYww?si=7qQE0B9k_5YphDQz&t=2292
-- Connect dev keynote https://www.youtube.com/watch?v=KWdUxc24dIw&t=4930s
Inside Facebook Reality Labs Audio: The future of Audio
-- https://about.fb.com/news/2020/09/facebook-reality-labs-research-future-of-audio/
Ego4d Reveal Session at EPIC Workshop, ICCV 2021
-- https://youtu.be/2dau0W0NVQY?si=mWhfdjzpGcwcKfHJ&t=3302
CVPR 2023 Sight & Sound workshop talk
-- https://www.youtube.com/live/6TaZT2u1jJ8?si=_O8tCl2y_2ppOCfi&t=32716
Opensource Datasets
My group has also been vital in helping build large scale open source-able datasets for egocentric machine perception research. Check out the following two datasets and benchmarks we have designed over the past few years.
Ego4d: Around the world in thousands of hours of video
@grad-school
Prior to moving into Meta Reality Labs, I was working primarily in the following two application domains, building novel machine and computer vision models / systems.
-
Brain imaging and Clinical Trials design
-
Multi-modal Predictive Modeling of Alzheimer's disease