Super confusing... seems like some sort of in with the VCs that can pull this program's guests was enough to create a new podcast that is now seen as influential. My best is, this was a side liquidity event for the openAI VCs that had somehow invested into the podcast, looking to get some money out of openAI stake.
> It's only true in a universe where Iran would have collapsed from within before the expiration of the sunset clause, and that clearly was not going to happen.
No one can know this hypothetical, but some def bet their entire futures/careers on this: that an Iran with a more prosperous middle class (as a result of JCPOA) might have had a better chance for social/internal reform, i.e. regime change.
> If you take a lot of chances, that adds up eventually and you'll have some big wins. Just do it safely, so that they don't add up to a lot of big losses, too.
And here is great contradiction in this whole essay. You can't "safely" take a lot of chances and not lose big, when in most cases to have big wins, one has to do unsafe things...
This is also why folks who have a safety net (in terms of family wealth, etc) tend to do better as entrepreneurs. Not sure this essay is helpful.
Only to prompt thought on this exact question, im interested in answers:
I just ran a benchmark against haiku of a very simple document classification task that at the moment we farm out to haiku in parallel. very naive same prompt system via same api AWS bedrock, and can see that the a few of the 4b models are pretty good match, and could be easily run locally or just for cheap via a hosted provider. The "how much data and how much improvement" is a question i dont have a good intuition for anymore. I dont even have an order of magnitude guess on those two axis.
Heres raw numbers to spark discussion:
| Model | DocType% | Year% | Subject% | In $/MTok |
percents are doc type (categorical), year, and subject name match against haiku. just uses the first 4 pages.
in the old world where these were my own in house models, id be interested in seeing if i could uplift those nubmers with traingin, but i haven't done that with the new LLMs in a while. keen to get even a finger to the air if possible.
Can easily generate tens of thousands of examples.
You can fine tune a small LLM with a few thousand examples in just a few hours for a few dollars. It can be a bit tricky to host, but if you share a rough idea of the volume and whether this needs to be real-time or batched, I could list some of the tradeoffs you'd think about.
Source: Consulted for a few companies to help them finetune a bunch of LLMs. Typical categorical / data extraction use cases would have ~10x fewer errors at 100x lower inference cost than using the OpenAI models at the time.
ok, even that "few thousand examples" heuristic is useful. the usecase would be to run this task over id say somewhere in the order of magnitude of 100k extractions in a run, batched not real time, and we'd be interested in (and already do) reruns regularly with minor tweaks to the extracted blob (1-10 simple fields, nothing complex).
My interest in fine tuning at all is based on an adjacent interest in self hosting small models, although i tested this on aws bedrock for ease of comparison, so my hope is that given we are self hosting, then fine tuning and hosting our tuned model shouldn't be terribly difficult, at least compared to managed finetuning solutions on cloud providers which im generally wary of. Happy for those assumptions to be challenged.
Labeling or categorization tasks like this are the bread and butter of small fine tuned models. Especially if you need outputs in a specific json format or whatever.
I did an experiment where I did very simple SFT on Mistral 7b and it was extremely good at converting receipt images into structured json outputs and I only used 1,000 examples. The difficulty is trying to get a diverse enough set of examples, evaling, etc.
If you have great data with simple input output pairs, you should really give it a shot.
I am thinking to fine-tune it to recognize better my handwriting. It already works quite well by default, but my writing is just horrible, so it got trouble sometimes.
> When the system rewards cheating, the rational choice is to cheat—or be disadvantaged.
Doesn't the current president of the U.S. and indeed his posse sorta of espouse this when you look at their backgrounds? This feels like a bigger cultural issue around what the advantaged folks have been doing all along
This has been endemic for a long time. I’ve always known folk who game the system, regardless of politics or demographics
The change I feel is that nobody even cares to be honorable any longer. There is no benefit, even culturally. As the article says, you’d have to be stupid not to do it. I’ve always tried to be honest idk
But laws don’t matter anymore. There is no shaming bad actors. It’s all blatantly out there and no consequences have been doled out so here we are.
Could Q.ai be commercializing the AlterEgo tech coming out of MIT Lab?
i.e. "detects faint neuromuscular signals in the face and throat when a person internally verbalizes words"
If this works well, then I could finally see that AI wearable pins could be socially feasible. IMO speaking aloud in public to AI doesn't seem like something which will work but it is also what OpenAI is apparently investing a lot into with their hardware ambition with Jony Ive [0].
> ...in most people, when they "talk to themselves" in their mind (inner speech or internal monologue), there is typically subtle, miniature activation of the voice-related muscles — especially in the larynx (vocal cords/folds), tongue, lips, and sometimes jaw or chin area. These movements are usually extremely small — often called subvocal or sub-articulatory activity — and almost nobody can feel or see them without sensitive equipment. They do not produce any audible sound (no air is pushed through to vibrate the vocal folds enough for sound). Key evidence comes from decades of research using electromyography (EMG), which records tiny electrical signals from muscles: EMG studies consistently show increased activity in laryngeal (voice box) muscles, tongue, and lip/chin areas during inner speech, silent reading, mental arithmetic, thinking in words, or other verbal thinking tasks
The fact that we can't just spin up a Claude code on our iPhones and have it program and run the end result right there in iOS should be chargeable offense by apple (and Android). Looking forward to the day that this capability exists.
reply