Accessibility wasn’t the starting point, but the more I work on this the more it feels like a natural fit.
On nonverbal inputs, I’ve focused on gaze and gestures so far. I’ve thought about things like head nods or blink patterns for simple confirm/cancel, but not explored them deeply yet.
Right now the main challenge is keeping everything reliable without adding too much complexity.
For motor disability accessibility, the architecture advantage is real. Most assistive tech sits outside the app and navigates by DOM tree or pixel position, which is brittle. Since you're inside the React tree, you could expose semantic actions — not just "click the third button" but "open this ticket" — which is what users actually want to do. That beats anything screen readers offer today.
And it's not just permanent disability. Temporary and situational cases are everywhere and constantly overlooked — a parent holding a child, someone with a broken arm, post-surgery recovery. These people aren't going to install a full assistive tech stack for a few weeks or a few minutes. But gaze + voice built into the app they're already using? That's zero-friction.
The real value is combining inputs. Gaze to set context, voice for commands, and simple nonverbal signals (blink, nod) for confirm/cancel. That covers users who have voice but limited mobility and users who have gaze control but inconsistent speech. Most assistive tools force you to pick one input mode. Having all three with shared app context is the differentiator.
Even starting with head nod as a binary yes/no would unlock a lot. Reduces the voice dependency for simple interactions and makes the whole system more resilient when one input channel is unreliable.
Really appreciate this! This is one of the strongest framings I’ve seen of where this could go.
The semantic action point is exactly where I think the architecture wants to evolve: less “infer everything from the DOM,” more explicit app-level capabilities like opening, filtering, confirming, assigning, etc.
And I think you’re right on the temporary / situational accessibility angle too. I didn’t start from accessibility, but the more I build this, the more it feels like a natural fit for those cases because it removes the need to install a separate assistive stack.
Head nod as a simple yes/no is also a very interesting idea. I probably wouldn’t start there before hardening the core loop, but it feels like a strong extension once the underlying interaction model is solid.
Oh one other idea that popped into my mind is getting facial and vocal emotion data to help drive supportive interactions. One thing that is lost in lot of these tools is guiding folks when the action taken isn't one that is expected. I think back to when I was trying to get Google Assistant to play a particular song but was getting it wrong (I actually had the title wrong but I didn't know that then) I asked it 4-5 times with my tone getting more and more frustrated and it just continued playing the same song. If it knew I was getting frustrated it could have went "Sounds like i'm not getting the right song, can you hum the tune or say some of the lyrics".
That’s a really good point. I think the deeper problem there is not just understanding intent, but knowing when to stop confidently executing and switch into a better recovery mode. Thanks for the very useful feedback! I'll get back to working
Hey HN, I’ve been working on a side project called Exocor.
It’s a React SDK that lets you control your app with voice, gaze and hand gestures, no mouse or keyboard.
The idea is simple: instead of figuring out where to click, you can just look at something and say what you want.
Example:
look at a row → “open this” | say “navigate to equipment” | say “create a ticket” → it builds it
Some interactions are instant (navigation, selection), while more complex ones use an LLM and take a few seconds.
What makes it different from most “AI agents” is that it runs inside the app, not outside of it.
It has access to:
React state, routing, visible UI
So it doesn’t rely on screenshots or DOM guessing, it actually understands what’s on screen.
I originally started this thinking about environments where mouse/keyboard don’t work well (gloves, operating rooms, field work), but it’s also interesting for internal tools and dashboards.
This is v0.1 still rough in places, but the core flow works.
Thanks for the Tennenhouse link, definitely ahead of his time on Proactive Computing. You're right that 'Post-Interface' is a rhetorical hook. We can't eliminate the technical interface (APIs/protocols). My argument is about eliminating the cognitive interface for the end user. Right now, even our smartest agents are still reactive because they wait for a prompt. I see generative UI as the bridge to true proactivity: instead of a static dashboard waiting for input, the system constructs a temporary UI only when it senses intent via context/API. The goal isn't no interface, but zero friction interface where the UI is a generated byproduct of the agent's action, not a prerequisite for it.
Glad it resonated. You hit the nail on the head regarding agents being a 'workaround', that's exactly why I categorize them as the 'Transitional phase' rather than the destination. They are essentially bots trying to navigate a web that wasn't built for them.
Your point about the visual appearance changing dynamically is the 'Holy Grail' I touch on in the 'Generative UI' section. We are currently stuck designing static screens for dynamic problems.
I agree we haven't seen a true demonstration yet. Do you think that shift happens at the App level first (e.g., a dynamic Spotify), or does it require a whole new OS paradigm (a 'Generative OS') to work?
Good question! I'd say it happens at the app level first because the context of the OS is too big a surface to start with. But a RAG app for a specific vertical could have enough context to dynamically draw a custom UI for every user, given the constraints on what the app is generally about.
That makes a lot of sense, it is definitely the safer place to start.
It implies that design systems are about to change fundamentally. Instead of shipping a library of static components, we'll need to ship a set of constraints and rules that tell the RAG model how it's allowed to construct the UI on the fly.
Accessibility wasn’t the starting point, but the more I work on this the more it feels like a natural fit.
On nonverbal inputs, I’ve focused on gaze and gestures so far. I’ve thought about things like head nods or blink patterns for simple confirm/cancel, but not explored them deeply yet.
Right now the main challenge is keeping everything reliable without adding too much complexity.
Curious how you’d see this used in practice?