It's hard to know for sure. There are good information theoretic reasons to suspect that general models will always be better than smaller expert models, but maybe a MoE can claw some performance back, albeit with redundant computation. The properties of conditional entropy, for instance, always favor more generality. This assumes that the harness isn't a factor, or is at least equivalent across different models.
Yep. These days, simplicity is a massive part of my development style. I don't want to be looking at a codebase, even my own, and thinking "shit, this guy was way smarter than me".
There are good information theoretic reasons to suspect that general models will be better than specialized ones, because knowledge and skills often overlap different areas, sometimes in surprising and unintuitive ways.
And yes, I'm aware that that statement might seem to fly in the face of much of the past two years of industry development, where specialized models have been in vogue. I think they'll settle to being appropriate for low cost "good enough" applications, but I'm less convinced they'll have anywhere near the fidelity of larger frontier models.
It is, but they have different use cases. CadQuery uses a geometry kernel that does boundary representation, which you need for path generation for modern manufacturing tooling. OpenSCAD produces a standard mesh representation (i.e. triangles), which is insufficient for cutting and subtractive manufacturing, but often fine for additive manufacturing (3D printing).
Google is in an incredibly strong position. They're a top tier AI vendor, and in a world where content creation is largely commoditized and outsourced to AI, advertising companies will determine what gets seen, and what gets buried in the noise. They control both generation and visibility of what gets generated. Facebook could be in the same position, but they aren't as strong in AI. OpenAI wants to be Google, but they don't have the advertising reach.
Yeah, they aren't perfect or always necessarily the best in a given area, but to compare them to IBM is probably missing the forest for the trees.
> Yeah, they aren't perfect or always necessarily the best in a given area, but to compare them to IBM is probably missing the forest for the trees.
I think comparing them to IBM is reasonable, just maybe not... today's IBM.
IBM was an absolute hardware and software behemoth leading up to the PC / early Internet era, after which they pivoted from making groundbreaking real things to providing "enterprise support."
They also outlasted almost all of their contemporaries with that pivot, for better or worse.
IBM got to where it is today by being complacent and not keeping up with innovation. Google is notably at the forefront of innovation, in a driving seat, and that innovation directly stands to benefit one of their core businesses in a way that the market is probably only just beginning to understand. It's an entirely different situation, imo.
A technology company that's profitably doing over $50 billion in sales after more than 100 years with no obvious signs of impending doom sounds like an okay position to me, even if the tables have turned since 1984[1].
The context here is that at one point, IBM was an innovator and global leader in the technology space before it got outcompeted. It was the first company to cross a $100 billion market cap. If you think of the IBM of 50 years ago as being roughly analogous to Apple today, the difference is pretty clear. Google is much closer to Apple than it is to IBM, and I don't see that changing.
The policy is how you select your actions -- in this case, the next token. It can be random, but it doesn't have to be. "Deterministically choose the best action" is a valid policy (we would call it the greedy policy), as long as you have some other means of injecting stochasticity so the model explores the space. Uniform random is also a valid policy, as is always selecting the same token (it obviously wouldn't be very performant, and would defeat the purpose here, but it might be fine in, for example, a multi-armed bandit scenario). Most of the time, the policy is a parameterized distribution, and we want to learn the model parameters that maximize some measure of success (the reward component).
Off-policy versus on-policy refers to what data the model is trained on. On-policy training is where the training data is collected by the policy. Off-policy training is where the data was collected by a different sampling process (e.g. we have a standard dataset that we're going to use for supervised training).
The distillation risk has been brewing for a while now. In a very real sense, the model is the data, so if the data is locked down because of how valuable it is, it was only a matter of time before fully open access to the models would be revoked.
There's also an additional economic concern that rarely gets mentioned: because no one has cracked continual learning, keeping models up-to-date and filling in gaps in performance requires retraining on an ever growing dataset. Granted, you aren't starting from scratch each time, but the scaling required just to stay relevant looks daunting.
I don't know where any this goes on a societal level, but I've believed since the release of deepseek r1 that access to frontier models would eventually be locked up behind contracts, since the only moats protecting the models themselves are purely artificial. It remains to be seen how effective China is at pushing the envelope, and whether they are interested in providing unfettered access. And on top of that, it remains to be seen how well these models actually turn out to scale in the long run.
They are also not getting the same quantity or quality of data as was possible in the first years of "ingest". Compared to the beginning, from here on it is more like a drip feed of new training data. Still immense volumes of data, but we are talking 1 year of data production from society versus centuries of text and data ingested in a short time frame.
For pre-training, yes. But for post-training you need high-quality labelled datasets for reinforcement learning. So far AI has been most successful in coding, because you can translate the usage into such datasets, and thus produce a virtuous cycle: More usage produces more data, which produces better models, which drives more usage.
The question is whether this same model can successfully be applied in disciplines like medicine, law, engineering, etc.
You only get good at the things you actually do. Our ancestors had to maintain a minimum level of fitness in order to be able to eat -- a level that most people today never reach, because the modern world has removed that need. Thinking is a skill just like any other, so what happens when people no longer have to exercise that skill to survive? It's a scary thought.
Tools don’t eliminate work, they abstract and amplify work. Those who miss this point are doomed to become the folks who say “back in my day we walked to school in the snow, uphill both ways”.
The map isn’t the territory; thinking about what to build is just as valid as thinking about how to build it. Architects aren’t carpenters, but that doesn’t mean there’s no value in architecture.
Not following. The demand for architects is gated by the cost of building. And the metaphor is that all if us who used to be carpenters can be architects, in the software sense. Maybe some people don’t want to be, but it is still a very thought-intensive profession.
The question is whether vibe coding requires a lot of thought, and I don't believe it does. The industry in in full blown idiocracy at the moment, and if you think you're a real engineer despite not understanding what you're building, you're a joke.
The problem is that if you don't stick to truth and make an attempt at objectivity, others will step in to fill the void. This is how you sow division and undermine trust in science.
I'm having a very hard time understanding a society where research is openly conducted on innate physiological differences between people, and bad actors don't use this official research to practice open discrimination. The lesser of the two evils is to draw a line and tell people to just accept these differences.
If anything, the golden age of third places coincided with the golden age of suburbanization, which was obviously heavily car dependent. Their death almost certainly has more to do with financialization making it harder for small businesses to stay afloat, a drop in demand due to competition for attention, and decreasing work-life balance eroding people's ability to socialize.
In my grandfather's day, one income was enough to support a household, and there was less free work being done on the job, which meant fewer hours and being less drained at the end of the day. And yes, people spent less time commuting, meaning they had more time and energy for socializing after work. But communities were also more decentralized, and population centers had fewer people in general. A big part of the problem is that modern cities can be massive, and invariably funnel people to a handful of work districts, which just doesn't scale. When you double the distance to the CBD, you quadruple the number of people coming in (give or take, it's not exact because we tend to increase density close to the CBD as a response to this). Take it from someone who's lived in a place where cars aren't really necessary, the logistics of urbanization are still a crap experience when you're crammed into a train carriage during rush hour. It's common for people to commute for 90 minutes on public transport in Asian megacities, for example.
reply