Hacker Timesnew | past | comments | ask | show | jobs | submitlogin
Adding guardrails to large language models (github.com/shreyar)
62 points by swyx on March 16, 2023 | hide | past | favorite | 13 comments


Well, sort of. This is mostly for using a large language model to generate JSON or XML or SQL, something for which there's a syntax checker. It guarantees only that the output has the right syntax. If used for censorship, it's just looking for keywords.


Maybe I am a bit too lazy but I don’t understand what “guardrails” add that I wouldn’t get simply from:

1. Denying outputs with blacklisted words or phrases, and

2. Deserialising the JSON with serde_json [0], and denying output that fails to deserialise. If my requirements are very specific I can use strongly typed structs. If my requirements are more loose I can use serde json value types etc

[0]: https://docs.rs/serde_json/latest/serde_json/


I've been very mildly involved in this project, so I can give my two cents. While it's true that structural / type checks are not difficult to implement, there's no real need for a back-and-forth when you do static checks -- you either fail out, or run rules to fix.

There's something a bit different that we (should) expect with LLMs (and FMs more generally) since they are fundamentally interactive, so you can actually get them to correct things in interesting ways. Passing the outputs of static checkers back to the models is one nice trick. I (and some friends) have been exploring some stuff with using models in the loop for evaluation (more research side), and I think guardrails is directionally exciting in bringing that kind of vision into more production type settings. There's also just the crud of dealing with LLM code...


Thank you. That makes a lot of sense! Good idea :)


Wouldn’t the right way to create AI guardrails is to to have an antagonistic AI act as a moderator? Like you have one model trained to be as accurate as possible in fulfilling the prompt, and then another AI trained based on how human moderators apply the terms of another, “moderation” prompt. Then you have the two fight on a large training set and when you’re done you have generated a moderated AI.


"LLM moderation" is something that sounds downright dystopic.

Speaking to your comment practically, I feel like it would probably be possible to prompt an LLM to successfully "express X concept that breaks ToS" in such a way that moderation doesn't flag it. It may take clever prompt engineering but that's what these jailbreaks are.


To be effective the moderator AI would need to be as smart (or smarter) as the source AI. Think of all the ways we have already seen people get around restrictions. Giving instructions for murder isn't allowed but people said they were writing a novel and want to have a murder in it and how could be be done. A smart moderator would see what the user is trying to do and stop it.


It will be fun to watch them argue.


Something like a GAN?

Will probably end up something similar though.


What does FM mean in this context? I already see it mentioned in one top level comment and in the first thread but I don't see a definition here or on the project page.

Edit: after adding Large Language Model to my query it seems I found the explanation: FM stands for "Foundational Model".

https://kagi.com/search?q=llm+large+language+model+fm&r=no&s...


This is great, thanks for sharing. Key component in evolving FM based applications is making them feel as deterministic as possible vs probabilistic. Framework like this would enable generating trust in the outputs of these FMs.. exciting.


This is gonna be great for generating domain-specific mock data.


You can bet people will ask LLMs to generate the rail files ^_^




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: