Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

I would counter that voice assistants are not a failed experiment. I would suggest that Amazon and Google’s execution of said experiments have failed.

I am extremely excited about the potential of projects like the Rabbit R1 and Humane.



Things don't seem to be going well at Humane, either: https://www.fastcompany.com/91008117/ai-pin-humane-layoffs


No horse in the race, but the layoff is only ten people from a 250 person company that is still hiring. I'm a little surprised it even merited an article.


I agree, I just think these new projects based on LLMs should be considered a different category, AI agents or something. The traditional grammar-based voice assistant architecture used by Google Assistant, Siri, Alexa, Bixby, etc is an abject failure, and not for lack of trying.


If the Alexa/Echo-category of devices are meant to be some widely flexible and pervasively useful device from Star Trek, they're a failure.

If they're meant to be a usable $30 kitchen timer and music player, they're pretty great.


Google Voice Search used internal codename "Majel" in reference to Majel Barrett, who played the voice of the Star Trek computer. That was explicitly the ambition. It just didn't work out.


OK. I believe you. They accidentally created something else that was very useful but different from what they set out to do.

Is that a story of failure or of success?

Starbucks launched to sell beans and espresso machines. YouTube launched as a video dating site. Are they also failures?


I guess where we disagree is "very useful". If Google Assistant stopped working tomorrow I would hardly care at all. There are a couple of scenarios where it's slightly more convenient than using my phone (assuming I don't encounter one of its many failure modes) and that's about it. I'm sure the hands free aspect is important for certain people in certain situations but I think the vast majority of people just don't see a lot of value from it.


Amazon alone has sold over half-a-billion Alexa-enabled units (around 10 of them to me).

I think people see more than $30 of value in them, at least as their revealed preferences suggest.


Those sales were subsidized in expectation of future profitability that will never come (at least not without a ground-up redesign of the product around LLM-based AI agents or some other paradigm). Economically Alexa is a "colossal failure": https://arstechnica.com/gadgets/2022/11/amazon-alexa-is-a-co...


Yeah, my crock pot and ceiling fan are "Alexa-enabled". That's two.


But how often did that computer voice-acted by her really do significantly more than "set timer to thirty minutes"? Outside of some broken plots on the original series ("we have insufficient data to know the truth, let's ask the computer who will tell us anyways!"), it really was mostly mundane voice assistant stuff.

(I'm deliberately excluding the "ten words to 'author' a holodeck scene" part, that had always been stretching my imagination a little too far, more "this can't work!" than space travel and transporter beams. Then stable diffusion happened)


There were some scenes where Riker on the bridge asked the computer essentially a SQL query: "give me a list of star systems with parameters that fit X and cross-reference by Y..." "There are 3 systems which fit your query: ..."


That might have been the initial hope of the team, before Google killed it. It's been on the graveyard for years with zero updates, my google assistance nest mini is arguably worse than when I bought it.

I believe it is possible to have made it better, but they didn't try, they just gave up, like much of google products.


I think way too much money has been sunk into those projects if their ambition is just to be a $30 kitchen timer and music player.


Yup, as far as something that can turn on a light, run a timer, convert some units, tell me the weather, and have an -okay- shot at some categories of random questions instead of me getting out my computer-- the google assistant is just fine.


They do succeed in that way many times a day every day in my house


As '80s telescreens they were fantastic.


I think many parts of the architecture can be reused - in Alexa terms, all of the “skills” that integrate the assistant with various other services. IMO one of the main problems with assistants is that I don’t know what skills are available or how to invoke them. It’s like I’m a wizard who has to memorize all the spells I could be casting. It never happens because I don’t care enough. I think LLM’s could potentially help my making it easier to discover and invoke those skills.


This "spells" is such a great way to explain how it feels to me to use these assistants. I'll play with one if I'm at a friend's house, but honestly can't see the appeal. Telling Google to change the color of the lighting or brightness just seems like something that is mostly a gimmick unless you're maybe disabled and then it may be a big quality of life improvement. The other stuff doubly so.

With ChatGPT I can see the appeal for certain tasks like having it create a custom text adventure for you, but I can't see it being too useful in my day to day life yet.


"Skills" will be obsolete very soon. AI agents will use the same software tools and services that humans do. They won't need special separate AI-only interfaces.

I'm not excited about the Rabbit R1 as a hardware device but their software vision is exactly right and there are new startups coming out of stealth seemingly every day now attacking this problem.


Skills are just APIs that conform to a similar look. We'll definitely continue to have AI-only or developed-for-AI APIs for future "agents" to act against. They probably won't spend much effort formatting text to sound good to a person, but the infrastructure is here.


I disagree. These special APIs will not have the breadth of capabilities that the human UI does, so AIs will use the human UIs out of necessity. But I think in the long term we will eventually see a simplification of UIs. As it becomes less common for humans to actually use them, they will no longer need fancy animations or dark mode or client-side validation or pretty styling. In the extreme, a return to plain HTML forms that a human can use in a pinch but are mostly used by AI agents. At that point I guess you're blurring the lines between UI and API.


Isn't it the exact opposite? Interfaces we use every day can be dead simple, all they need is that they don't change behind our back. The accelerator pedal does not come with a footover pop-up "keep pressed to make car go". Interfaces we use once in a leap year on the other hand, that's where we need all the hand-holding we can get.


Not sure if any comment that holds Humane in high regard can be taken seriously though.


It’s interesting. I had not heard the latest. Their initial videos looked promising.

I ordered a rabbit r1. I can see it’s missing some key functionality (Bluetooth for headphones so everyone doesn’t have to hear?) But… I think it’s an example of a first, promising, step to the era of real voice assistants/agents.


> Bluetooth for headphones so everyone doesn’t have to hear?

This is a critical missing feature. I would never ever use this unless it's in my ear alone.

And I hope it doesn't become socially acceptable to be carrying these around forcing everyone else to listen to whatever this device has to tell the user. There is already far too many noise and inconsiderate people in this world.


But how is the input side sufficiently compatible with any state other than being alone? Voice surely does not qualify?

Even gesture control would be a tough sell, and even that only if it's not "AR, where you push buttons projected across your field of vision" but "AR, where you can do the equivalent of gamepad buttons with hand movement anywhere the device can see the hand". Voice output (earbuds) would be far too slow for that kind of interaction, because you can't skim a list. Compared to the strictly sequential nature of audio, screens are the equivalent of embarrassingly parallel.

By the way, that slowness of voice output vs screen is also what I consider the true motivation companies had for creating those essentially free voice assistants: searching for product/service on a screen, even if it's just a small handheld screen, makes you pick from a list. With voice in the other hand, going through the list is so slow and cumbersome that the chances for just picking the first, "I'm feeling lucky", are much, much bigger. The value of placement (bought directly or bought indirectly, "this must be very relevant because we know how much they can spend on our other ad services") is just so much bigger with voice. Chances are people are less likely to listen to the second hit on voice than to go to the second page on screen.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: