Isn't this basically what javascript went through with Promise chaining "callback hell" that was cleaned up with async/await (and esbuild can still desugar the latter down to the former)
LLMs already do this and have a system role token. As I understand in the past this was mostly just used to set up the format of the conversation for instruction tuning, but now during SFT+RL they probably also try to enforce that the model learns to prioritize system prompt against user prompts to defend against jailbreaks/injections. It's not perfect though, given that the separation between the two is just what the model learns while the attention mechanism fundamentally doesn't see any difference. And models are also trained to be helpful, so with user prompts crafted just right you can "convince" the model it's worth ignoring the system prompt.
So it's still one stream of tokens as far as the LLM is concerned, but there is some emphasis in training on "trust the system prompt", have I got that right?
This! And even more, the role model extends beyond system and user: system > user > tool > assistant. This reflects "authority" and is one of the best "countermeasure": never inject untrusted content in "user" messages, always use "tool".
Most (all?) of this holds for quantizing convnets too, if you're looking for an easy exercise you can play around with quantizing resnet50 or something and plotting layer activations
I think the point is: If there is an API somewhere in Company's systems that does what the customer wants, why have a phone tree or an LLM in the way? Just add a button to the app itself that calls that API.
most support volume comes through voice, and you need a layer to interpret what the customer intent is
additionally for many use cases it's not feasible from an eng standpoint to expose a separate api for each entire workflow, instead they typically have many smaller composable steps that need to be strung together in a certain order depending on the situation
There's no reason the app itself couldn't string together those composable steps into an action performed when the user invokes it. OP's point is there is that neither an LLM or a voice layer is really required, unless you're deliberately aiming to frustrate the user by adding extra steps (chat, phone call). Customer intent can be determined with good UX.
Here's an evil business idea: Use the LLMs to identify the users most likely to be "vocal influencers" and then prioritize resources for them, ensuring they get the best experience. You can engineer a bubble this way.
And then the next step is to dynamically vary resources based on prediction of user stickiness. User is frustrated and thinking of trying competitor -> allocate full resources. User is profiled as prone to gambling and will tolerate intermittent rewards -> can safely forward requests to gimped models. User is an resolute AI skeptic and unlikely to ever preach the gospels of vibecoding -> no need to waste resources on him.
This is part of why running open models on hardware you control is valuable. They may trail SOTA by 6-12 months (really less for many use cases) but there's more reliability, control etc.
"Here's an evil business idea: Use the LLMs to identify the users most likely to be "vocal influencers" and then prioritize resources for them, ensuring they get the best experience. You can engineer a bubble this way."
Its quite likely this is already happening buddy...
The 'random' degradation across all LLM-based services is obvious at this point.
There is this famous quote from Bentley on asking programmers to write binary search
>I’ve assigned this problem [binary search] in courses at Bell Labs and IBM. Professional programmers had a couple of hours to convert the above description into a program in the language of their choice; a high-level pseudocode was fine. At the end of the specified time, almost all the programmers reported that they had correct code for the task. We would then take thirty minutes to examine their code, which the programmers did with test cases. In several classes and with over a hundred programmers, the results varied little: ninety percent of the programmers found bugs in their programs (and I wasn’t always convinced of the correctness of the code in which no bugs were found).
>I was amazed: given ample time, only about ten percent of professional programmers were able to get this small program right. But they aren’t the only ones to find this task difficult: in the history in Section 6.2.1 of his Sorting and Searching, Knuth points out that while the first binary search was published in 1946, the first published binary search without bugs did not appear until 1962.
The invariants are "tricky", not necessarily hard but also not trivial to where you can convert your intuitive understanding back into code "with your eyes closed". Especially since most implementations you write will only be "subtly flawed" rather than outright broken. Randomizing an array is also one of the algorithms in this class, conceptually easy but most implementations will be "almost right", not actually generating all permutations.
Because the planetary alignments won't allow for another launch any time soon... which I guess has a natural correspondence in that the macroeconomic conditions (read: bubble) mean that now is a great time to play around with things while they're basically giving it away to get you hooked.
This seems like an arbitrary restriction. Tool-use requires a harness, and their whitepaper never defines exactly what counts as valid.
reply