It's all relative. For local use I'd classify it by hardware (VRAM size) using FP8 or Q6 quantization:
1. tiny <2-3B -- easily runnable on lower-spec hardware
2. small 4-8B -- runnable on 8GB GPUs
3. medium 9-12B -- runnable on 12GB GPUs
4. large 13-24B -- runnable on 16GB (for the lower end models) and 24GB GPUs
5. very large 25-32GB -- runnable on 32GB GPUs
6. huge >32GB -- not easily runnable on consumer GPUs without compromising performance (offloading layers to the CPU/RAM), quality (heavy quantization, esp. at <= Q4), or price (investing in multi-GPU setups and/or server-grade hardware).
You could possibly split huge down further, as 70GB models (e.g. llama 3) are easier to get working than >120GB models and 1TB models are completely intractable.
> tiny <2-3B -- could run in a browser even, mac neo
Or a phone. I’m running Gemma 4 E2B in one of my apps on my 14 pro (which may or may not be killing my display through overheating. It might just be a coincidence).
Yeah. I run LLM models locally and for me 22B-32B is the largest I'm willing to invest in trying out.
Even though Mistral 4 has 6B active parameters per token (allowing 3-3.5 per token parameters to be loaded on a 4090), the ~240GB download + storage is pushing the limits of being able to try this out locally, especially if you are downloading and evaluating multiple models.
It also makes it harder for other people to make downstream finetunes like with what happened with the older Mistral/Magistral models.
I think machines like the DGX Spark are about to become a lot more common/popular. It’s big enough to run sparse 150-250B MoEs with enough throughout for a single user. Deepseek v4 Flash is #1 (in terms of usage) on OpenRouter because it’s good enough to be useful. You can run it on a Spark (though it runs better across 2, which is getting up there in cost)
NVDA is a free screen reader for Windows (written by blind devs) that works with Firefox and Chrome.
You don't need to pay for a specialist browser as all web browsers (Firefox, Chrome, Edge, Safari, etc.) will implement the native accessibility model of the operating system they are running on (IAccessible/MSAA for Windows, etc.).
In Firefox you can press the right mouse button and select "Inspect Accessibility Properties" or select the "Accessibility" tab from the developer window and it will show the accessibility tree (roles, states, properties, etc.) just like the DOM tree in the "Inspect" tab. That is what the browser is displaying to screen readers and other accessibility software and uses the behaviour of the HTML elements along with the ARIA roles/states/properties defined by the webpage to construct that tree. Thus, it will display an ol/ul as a `role=list` unless overridden to be e.g. a `tablist` by the website.
You said "see this article" re: how aria-label is not applicable to div elements, hence the second link which is the WAI-ARIA guide on labelling elements.
You also said that ARIA can't help with custom controls in that post, which is where the other links are applicable as they describe doing just that. I.e. using ARIA tags to implement tabs, accordions, etc. either with or without a framework library.
ARIA is a solution to a specific problem, not something that should be used on every site. HTML is accessible out of the box when semantic elements are used as intended. If you are using a div as a button, you probably aren't hand writing HTML. It is likely part of a library. Adding the necessary ARIA attributes benefits every site using the library. Your boiling the ocean analogy implies that every web developer needs to scatter ARIA attributes all over their code, which just isn't true.
IIRC, libraries like numpy and pytorch can already do that as they store the matrices as 1D arrays with information on things like the stride length (advancing to the next row). That allows you to implement operations like transposition by editing the stride length and other parameters without manipulating the content of the matrix array.
From the video [1] that links to Ben Eater's fork with extensions and configuration specific to his 6502 breadboard computer [2]. That in turn is forked from `mist64/msbasic` which refers to a blog post [3] which states:
> This episode of “Computer Archeology” is about reverse engineering eight different versions of Microsoft BASIC 6502 (Commodore, AppleSoft etc.), ...
> This article also presents a set of assembly source files that can be made to compile into a byte exact copy of seven different versions of Microsoft BASIC, and lets you even create your own version.
So Ben Eater's version is based on a reverse engineered version of the same program. You should be able to adapt the code released here to run on Ben Eater's 6502 with a bit of work.
A better formulation would be something like Fuzzy Logic [1]. That represents floating point values from 0 (false) to 1 (true), so 0.5 could be "unsure", 0.9 could be "very likely", etc. However, that doesn't make boolean logic invalid.
Boolean logic is also the foundation of computing: logic gates, circuits like BCD, etc.
Fuzzy logic is indeed a better formulation. One nit tho: the intermediary values don't mean 'very likely' or 'unsure' in general. They usually represent degrees of truth or degrees of membership. So it's more like '0.9 tall' means 'quite tall', while '0.5 tall' would be interpreted as 'this guy is tall to a degree of 0.5 out of 1'.
They could technically refer to 'very likely' or 'unsure' only if the predicate you're modeling is itself about certainty or belief. For example, you could say "I'm certain about X to degree 0.8 out of 1" meaning you're quite certain about X. But notice that the 0.8 is about your belief, not about X itself.
Yeah, this (the OPs) reads like a confused teenagers post, who has just started to explore the intracacies of logic. The whole post disproves itself...
Fuzzy logic is fine, I suspect they saw something like this and got confused. I would recommend they think harder about how very pertinent boolean logic is to everything they are doing before dismissing it...
> First, the discussion is about about news, not science (nor about general LLM behaviour).
What if science is the news, such as:
1. advancements in fusion power; or
2. progress/status of the Artemis missions; or
3. new LLM models and/or capabilities (e.g. Project Glasswing).
With things like that you typically have a press announcement/briefing, a research paper/publication, or both. That information is then presented in newspapers/media that may obscure, misrepresent, or overly generalize the original finding/announcement.
There may also be clarifications, retractions, etc. after publication, such as with the initial announcement/publication of the proof to Fermat's Last Theorem that initially had an error that was later corrected.
That doesn't work if you have limited or no connectivity (e.g. on a mountain range). There are also privacy concerns, e.g. a doctor using it to transcribe medical information.
1. tiny <2-3B -- easily runnable on lower-spec hardware
2. small 4-8B -- runnable on 8GB GPUs
3. medium 9-12B -- runnable on 12GB GPUs
4. large 13-24B -- runnable on 16GB (for the lower end models) and 24GB GPUs
5. very large 25-32GB -- runnable on 32GB GPUs
6. huge >32GB -- not easily runnable on consumer GPUs without compromising performance (offloading layers to the CPU/RAM), quality (heavy quantization, esp. at <= Q4), or price (investing in multi-GPU setups and/or server-grade hardware).
You could possibly split huge down further, as 70GB models (e.g. llama 3) are easier to get working than >120GB models and 1TB models are completely intractable.
reply