How is compute shortage to satisfy demand manifested? Obviously they never close sign-ups, so only option is to extended queues? But if demand grows like crazy, then queues should get longer, yet my pro claude plan seems snappy with only occasional retries due to 429.
Many years ago when reading Redis code I saw the same pattern: they pass around simple pointer to data, but there is a fixed length metadata just before that.
I assume it’s either Antirez’s sds or a variant / ancestor thereof, yes. It stores a control block at the head of the string, but the pointer points past that block, so it has metadata but “is” a C string.
How do those companies make money? Qwen, GLM, Kimi, etc all released for free. I have no experience in the field, but from reading HN alone my impression was training is exceptionally costly and inference can be barely made profitable. How/why do they fund ongoing development of those models? I'd understand if they release some of their less capable models for street cred, but they release all their work for free.
Chinese companies don't always operate on purely capitalistic principles, there is sometimes government direction in the background.
For China, the country, it's a good thing if American AI companies have to scramble to compete with Chinese open models. It might not be massively profitable for the companies producing said models, but that's only a part of the equation
China seems to combine the best points of capitalism (many companies taking many shots on goal, instead of the eastern bloc way of one centrally-mandated solution that either works or not) with the best points of communism (state-sponsored industries that don't have to generate a profit, for the glory and benefit of the state).
Ostensibly, a mix of VC funding and that they host an endpoint that lets them run the big (200+GB) models on their infrastructure rather than having to build machines with hundreds of gigs of llm-dedicated memory.
But on inference they have to compete with other inference provider that just has a homepage, a bunch of GPUs running vllm and none of the training cost. Their only real advantage are the performance optimizations that they might have implemented in their inference clusters and not made public
As someone active in both English and Chinese media, I always feel like who relying on only one is brainwashing, just like Wumao. There's no difference here; it's always about the government control,destroying US company... In reality, free services have always been a competitive strategy for businesses in China, from ride-hailing to bike-sharing, all about grabbing market share and competing for potential users. Daily active users are what Chinese companies care about most.
Adjacent to it are PR reviews. Suggesting simpler approach in PR almost always causes friction: work is done and tested, why redo? It also doesn't make a good promotion material: keeping landscape clear of overengineered solutions is not something management recognises as a positive contribution.
Depends on the management and whether they're involved in coding. Any engineering manager, architect, senior / lead developer etc should appreciate lower complexity.
Of course, if it's the person in charge introducing said overengineering there is a problem.
they can recognise on the informal level, but you can't put it into end of the year review document. What it will be? "Kept N PRs from introducing cruft into our systems?". Fixing or building things is much more visible, than just maintaining high standards.
Worse, to suggest a simpler approach checking existing products/APIs or even preparing toy prototype is required to be confident in own advice. This hidden work is left entirely unnoticed even by well meaning managers/engineers: they simply don't know if you knew or had to discover simpler solution.
reply