Cool stuff. I think there have been projects recently that use LLMs to encode messages in plain text by manipulating the choices of output tokens. Someone with the same version of the LLM can decode. Note sure where to find these projects though.
I created something similar a long long time ago, but much simpler, using markov chains. Basically just encoding data via the choice of the next word tuple given the current word tuple. It generated gibberish mostly, but was fun 25 years ago
This is a really interesting space, and one that I've been playing with since the first GPTs landed. But it's even cooler than simply using completion choice to encode data. It has been mathematically proven that you can use LLMs to do stego that cannot be detected[0]. I'm more than positive that comments on social media are being used to build stego dead drops.
What I find really interesting about this approach is that it's one of the less obvious ways LLMs might be used by the general public to defend themselves against the LLM capabilities used by bad actors (like the more obvious LLMs making finding bugs easier is good for blackhats, but maybe better for whitehats), i.e semantic search.
The reasoning in my head being that it creates a statistical firewall that would preclude eaves-droppers with privileged access from being able to use cheap statistical methods to detect a hidden message (which is effectively what crypto _is_, ipso facto this is effectively undetectable crypto).
ETA, the abstract for a paper I've been working on related to this:
Mass surveillance systems have systematically eroded the practical security of private communication by eliminating channel entropy through universal collection and collapsing linguistic entropy through semantic indexing. We propose a protocol that reclaims these lost "bits of security" by using steganographic text generation as a transport layer for encrypted communication. Building on provably secure generative linguistic steganography (ADG), we introduce conversation context as implicit key material, per-message state ratcheting, and automated heartbeat exchanges to create a system where the security properties strengthen over time and legitimate users enjoy constant-cost communication while adversaries face costs that scale with the entire volume of global public text. We further describe how state-derived proofs can establish a novel form of Web of Trust where relationship depth is cryptographically verifiable. The result is a communication architecture that is structurally resistant to mass surveillance rather than merely computationally resistant.
If Claude code is written by Claude code, and AI outputs are not currently considered copyrightable, then how is Anthropic asserting copyright over the leak?
Not saying this gets through to people, but copyright is purely about the legal ability to restrict what other people do. Whereas property rights are about not allowing others to restrict what you do (e.g. by taking your stuff).
Interesting. I don't quite agree. It's one thing to predict what general topics will be hot and popular this year. But that's not the same as what particular research problem will be important and have lasting influence.
There are a few kinds of important research. One is solving a well-defined, well-known problem everyone wants to solve but nobody knows how. Another is proposing a new problem, or a new formulation of it, that people didn't realize was important.
There is also highly-cited research that isn't necessarily important, such as being the next paper to slightly lower a benchmark through some tweaks (you get cited by all the subsequent papers that slightly lower the benchmark even further).
I agree that (while the ethics of this are a different issue) the copyright question is not obviously clear-cut. Though IANAL.
As the LGPL says:
> A "work based on the Library" means either the Library or any derivative work under copyright law: that is to say, a work containing the Library or a portion of it, either verbatim or with modifications and/or translated straightforwardly into another language. (Hereinafter, translation is included without limitation in the term "modification".)
Is v7.0.0 a [derivative work](https://en.wikipedia.org/wiki/Derivative_work)? It seems to depend on the details of the source code (implementing the same API is not copyright infringement).
This is not how computer science publishing works, however. Post it on arxiv, submit to a conference, get 3 peer reviews, accepted, “published”. 99% of papers are effectively open access for free.
I thought the point of passkey security is that you don't have to send the private key around, it can stay on your device. Different passkey per device. Lose or destroy a device, delete that passkey and move on.
None of the password managers (including but not limited to ones built-in iOS/Android) work that way. The Apple one (and I think Google is the same) keeps the private key inside the secure enclave (security processor), but it is still copied to each new device - though it is end-to-end encrypted during that transmission.
The issue there being there's a big usability headache with enrolling multiple devices. You really want one device to be able to enroll all your devices (including not-present and offline), but there's no mechanism to do this with the way the webauthn spec works at the moment.
That’s how I use them. Passkeys on two Yubikeys. And I tag in my password manager which credentials have what form of auth. UP, TOTP (also stored on the two Yubikeys), Webauthn or passkeys (the former indicating 2FA).
https://xkcd.com/2754/
reply