More

grepLeigh · on Feb 9, 2025

LLMs that use Chain of Thought sequences have been demonstrated to misrepresent their own reasoning [1]. The CoT sequence is another dimension for hallucination.

So, I would say that an LLM capable of explaining its reasoning doesn't guarantee that the reasoning is grounded in logic or some absolute ground truth.

I do think it's interesting that LLMs demonstrate the same fallibility of low quality human experts (i.e. confident bullshitting), which is the whole point of the OP course.

I love the goal of the course: get the audience thinking more critically, both about the output of LLMs and the content of the course. It's a humanities course, not a technical one.

(Good) Humanities courses invite the students to question/argue the value and validity of course content itself. The point isn't to impart some absolute truth on the student - it's to set the student up to practice defining truth and communicating/arguing their definition to other people.

[1] https://arxiv.org/abs/2305.04388

ctbergstrom · on Feb 9, 2025

Yes!

First, thank you for the link about CoT misrepresentation. I've written a fair bit about this on Bluesky etc but I don't think much if any of that made it into the course yet. We should add this to lesson 6, "They're Not Doing That!"

Your point about humanities courses is just right and encapsulates what we are trying to do. If someone takes the course and engages in the dialectical process and decides we are much too skeptical, great! If they decide we aren't skeptical enough, also great. As we say in the instructor guide:

"We view this as a course in the humanities, because it is a course about what it means to be human in a world where LLMs are becoming ubiquitous, and it is a course about how to live and thrive in such a world. This is not a how-to course for using generative AI. It's a when-to course, and perhaps more importantly a why-not-to course.

"We think that the way to teach these lessons is through a dialectical approach.

"Students have a first-hand appreciation for the power of AI chatbots; they use them daily.

"Students also carry a lot of anxiety. Many students feel conflicted about using AI in their schoolwork. Their teachers have probably scolded them about doing so, or prohibited it entirely. Some students have an intuition that these machines don't have the integrity of human writers.

"Our aim is to provide a framework in which students can explore the benefits and the harms of ChatGPT and other LLM assistants. We want to help them grapple with the contradictions inherent in this new technology, and allow them to forge their own understanding of what it means to be a student, a thinker, and a scholar in a generative AI world."

globalnode · on Feb 10, 2025

I'll give it a read. I must admit, the more I learn about the inner workings of LLM's the more I see them as simply the sum of their parts and nothing more. The rest is just anthropomorphism and marketing.

maccaw · on Feb 10, 2025

Funny, I feel the same way about humans.

rockskon · on Feb 10, 2025

Whenever I see someone confidently making a comparison between LLMs and people, I assume they are unserious individuals more interested in maintaining hype around technology than they are in actually discussing what it does.

williamcotton · on Feb 10, 2025

Someone saying "they feel" something is not a confident remark.

Also, there's plenty of neuroscience that is produced by very serious researchers that have no problems making comparisons between human brain function and statistical models.

https://en.wikipedia.org/wiki/Bayesian_approaches_to_brain_f...

https://en.wikipedia.org/wiki/Predictive_coding

rockskon · on Feb 10, 2025

Theories and approaches to study are not rational bases for making comparisons between LLMs and the human brain.

They're bases for studying the human brain - something which we are very much in our infancy of understanding.

mr_toad · on Feb 9, 2025

Current LLMs are not the end-all of LLMs, and chain of thought frontier models are not the end-all of AI.

I’d be wary of confidently claiming what AI can and can’t do, at the risk of looking foolish in a decade, or a year, or at the pace things are moving, even a month.

ctbergstrom · on Feb 9, 2025

That's entirely true. We've tried hard to stick with general principles that we don't think will readily be overturned. But doubtless we've been too assertive for some people's taste and doubtless we'll be wrong in places. Hence the choice to develop not a static book but rather living document that will evolve with time. The field is developing too fast for anything else.

With respect to what the future brings, we do try to address a bit of that in Lesson 16: https://thebullshitmachines.com/lesson-16-the-first-step-fal...

mr_toad · on Feb 10, 2025

> we don't think will readily be overturned

I think that’s entirely the problem. You’re making linear predictions of the capabilities of non-linear processes. Eventually the predictions and the reality will diverge.

habinero · on Feb 10, 2025

There's no evidence to support that's the case.

pama · on Feb 10, 2025

Every time someone claimed “emerging” behavior in LLMs it was exactly that. I can probably count more than 100 of these cases, many unpublished, but surely it is easy to find evidence by now.

interstice · on Feb 10, 2025

Said the turkey to the farmer

falcor84 · on Feb 10, 2025

I don't think that's how that metaphor works.

interstice · on Feb 13, 2025

Not quite, but it was the closest pithy quote I could think of to convey the point that things can be false for a long time before they are suddenly true without warning.

habinero · on Feb 16, 2025

How about "Yes, they laughed at Galileo, but they also laughed at Bozo the Clown?"

We heard alllllll the same hype about how revolutionary the blockchain was going to be and look how that turned out.

It's a virtue to point out the emperor has no clothes. It's not a virtue to insist clothes tech is close to being revolutionary and if you just understand it harder, you'd see the space where the clothes go.

kykeonaut · on Feb 9, 2025

The post seems to be talking about the current capabilities of large language models. We can certainly talk about what they can or cannot do as of today, as that is pretty much evidence based.

dullcrisp · on Feb 9, 2025

They saw you coming in part 16.

beezlewax · on Feb 10, 2025

That shouldn't give them any more merit that their current iteration deserves.

You could say the same thing about spaceships or self diving cars.

onemoresoop · on Feb 9, 2025

The ground truth is chopped off into tokens and statistically evaluated. It is of course just a soup of ground truth that can freely be used in more or less twisted ways that have nothing to do or are tangent to the ground truth. While I enjoy playing with LLMs I don't believe they have any intrinsic intelligence to them and they're quite far from being intelligent in the same sense that autonomous agents such as us humans are.

whattheheckheck · on Feb 10, 2025

Any all of the tricks getting tacked on are overfitting to the test sets. It's all the tactics we have right now and they do provide assistance in a wide variety of economically valuable tasks with the only signs of stopping or slowing down is data curation efforts

pjs_ · on Feb 10, 2025

I've read that paper. The strong claim, confidently made in the OP is (verbatim) "they don’t engage in logical reasoning.".

Does this paper show that LLMs "don't engage in logical reasoning"?

To me the paper seems to mostly show that LLMs with CoT prompts (multiple generations out of date) are vulnerable to sycophancy and suggestion -- if you tell the LLM "I think the answer is X" it will try too hard to rationalize for X even if X is false -- but that's a much weaker claim than "they don't engage in logical reasoning". Humans (sycophants) do that sort of thing also, it doesn't mean they "don't engage in logical reasoning".

Try running some of the examples from the paper on a more up-to-date model (e.g. o1 with reasoning turned on) it will happily overcome the biasing features.

Lerc · on Feb 10, 2025

I think you'll find that humans have also demonstrated that they will misrepresent their own reasoning.

That does not mean that they cannot reason.

In fact, to come up with a reasonable explanation of behaviour, accurate or not, requires reasoning as I understand it to be. LLMs seem to be quite good at rationalising which is essentially a logic puzzle trying to manufacture the missing piece between facts that have been established and the conclusion that they want.

grepLeigh · on Jan 27, 2025

As a learning exercise, I enjoyed Neural Networks From Scratch: https://nnfs.io/

There's also a world of statistics and machine learning outside of deep learning. I think the best way to get started on that end is an undergrad survey course like CS189: https://people.eecs.berkeley.edu/~jrs/189/

HardikVala · on Jan 28, 2025

Was not aware of these resources. Thanks for sharing!

grepLeigh · on Jan 12, 2025

Maybe I'm the outlier here, but 15 minutes to chat with a human about my use case and pricing is way more efficient than donking around in docs/trial product.

The only product I really want to punch in credit card info and GO is commodity software (e.g. AWS EC2 or a domain registration service.

I think wires sometimes get crossed in pricing/sales models, where an enterprise product gets priced like commodity software ... but that's usually a sign the company is immature. There shouldn't be a sales team for software that costs 2-3 figures. Software costing 5-6+ figures absolutely requires people in the sales/onboarding process, because a big part of what I'm paying for is support.

scarab92 · on Jan 12, 2025

Maybe I’m not asking the right questions, but I consistently find that I get “Yes” answers in these calls, that turn out to actually be “No” in practice.

I think the problem is that we rarely want to know “can you meet this use case”, but rather “how well can you meet this use case”, and that’s hard to assess without putting your hands on the software.

bruce511 · on Jan 12, 2025

Which is to say that the quality of the sales person matters.

If your sales department is staffed by people who got hired on Monday, and are on the phone by Friday, then frankly they're not worth much.

I've seen the opposite though where sales folk know more about the software than support folk. They're equipped to help you with choices, but also understand limits and high-cost areas. Yes you absolutely can get Custom Reports, but we absolutely charge for that. And the data you're looking for is on this built-in report....

Dealing with a good salesperson, who knows their stuff, and understands that truth and trust are important, is an amazing thing.

johnnyanmac · on Jan 12, 2025

It's definitely a generational thing. I've been spammed so utterly often that I simply do not answer my phone for a non-contact (or the inevitable interview phone call. But less often with video calls these days). If it's important enough to contact me, it's important enough to leave a voice mail.

I don't really do these sale pitches often, but it's a similar mentality for a different reason. I simply want anything communicated in writing in case they try to say yes to put a foot in the door, but the small details say no.

elevatedastalt · on Jan 12, 2025

I presume you are an emergency contact for some people? Maybe a spouse, or a kid? Or even a friend? What's your contingency for when they are lying bleeding somewhere and someone can't reach you since they are not on your contact list?

ryandrake · on Jan 12, 2025

I'm not the guy you asked, but I also basically keep my phone in Do Not Disturb mode 24/7, meaning no calls, no texts, no notifications, ever. I choose when I have time to look at my messages, not the other party.

I'm not a doctor, and even if I was, I'll never be able to help them purely over the phone if they are "lying bleeding somewhere" and I'm not around. If my house is burning down and I'm away, what am I going to do about it remotely that a phone call will solve? I'm not a firefighter and I can't splash water over RF. If something happens at my kid's school, I'm not there, and even if I was, I probably wouldn't be able to do anything about it.

That being said, if someone really, really thinks that I can somehow help them over the phone in an emergency, despite my number not being 9-1-1, certain family and friend's numbers I allow to punch through DND and reach me.

bruce511 · on Jan 12, 2025

My phone is on silent 100% of the time. (With small exceptions when I'm expecting a call.)

I might like to be informed of emergencies, but I'm not a first responder. If you are bleeding phone 911, not me.

To be fair, my mom sends my wife a message to tell me to "check my phone" if she needs me :)

Each person finds their own level of intrusion they want from their device. I've picked mine. You pick yours.

johnnyanmac · on Jan 12, 2025

Not particularly, no. But I imagine they would simply say "Johnny it's X, call me. It's urgent". (Scammers are bad, but I've never been tricked with that kind of line).

If I'm being frank, that extra minute for me to respond probably won't change their fate if they are indeed bleeding out somewhere.

datavirtue · on Jan 12, 2025

Yeah, if you answer calls from random numbers something unsettling usually happens like some Indian guy yelling at me about how the IRS wants their money and the cops are on the way to my house (punchline: he can make it stop if I give him my bank account information). Unfortunately, it works on the kind of people who answer phone calls.

dinkumthinkum · on Jan 12, 2025

One could ensure their spouse or child knows about 911 in America or similar service in other countries, which is, of course, what should be called in such a circumstance firstly anyway. Also, people generally have such numbers as contacts in their phone ... I don't know why I'm explaining it; this just seems like common sense ...

elevatedastalt · on Jan 12, 2025

Emergency calls often come from people NOT in your contacts. That's why you provide emergency contacts on forms. If something goes wrong at work for example, someone from the office would call, not your spouse themselves.

OJFord · on Jan 12, 2025

And how will that not be important enough to leave a voice message about?

genewitch · on Jan 12, 2025

blue-collar, unemployed (NEET, even), my oldest kids, PhDs i've known personally for decades, white collar workers - an incomplete list of everyone i called in 2024 with a full voicemailbox.

i try to tell people about the "two calls in a minute lets it go thru" feature because as of yet the autodialers don't know about it or have it implemented.

OJFord · on Jan 12, 2025

They're probably not also people who don't answer their phone, preferring either to have a voice message or nothing (because it isn't important) then.

I do the same as commenter up-thread, my voicemail inbox is empty. Sometimes I let a call ring out and then listen to the message immediately, I just don't want to have to deal with it synchronously. Then if it's 'I have a number of opportunities that seem like a great fit for you' I can just delete it and move on with my day, not have to try to say no-bye politely before hanging up, for example.

Someone with a full inbox is more likely someone who does the opposite - they'll never listen to their messages because they want to talk to someone, you'd have to call them back anyway so no harm it's full. (Or they'd call you from the missed call, not because they heard your message.)

genewitch · on Jan 12, 2025

> Or they'd call you from the missed call, not because they heard your message.

people with professional and PhD with full mailboxes do this, with jitter of up to hours.

the last full voicemail i hit i was called back nearly immediately, and they said "what, i just delete all my voicemails, if i call it says no new no saved"

There's a reason i'm harping on specific elements so much, because i don't think voicemail is magic, i guess.

do these people have voicemail boxes like this (nsfw language, a bit. idk. it's art.): https://nextcloud.projectftm.com/index.php/s/NSFW_voicemail_...

ghaff · on Jan 12, 2025

I don't really disagree. The problem is outreach when you're clearly mostly researching something whether to do with computers or something else. One travel company is particular was pretty aggressively reaching out because I downloaded a couple brochures.

grepLeigh · on Jan 1, 2025

What's the compensation scheme for reviewers?

Are there any mechanisms to balance out the "race to the bottom" observed in other types of academic compensation? e.g. increase of adjunct/gig work replacing full-time professorship.

Do universities require staff to perform a certain number of reviews in academic journals?

hanche · on Jan 1, 2025

Normally, referees are unpaid. You're just supposed to do your share of referee work. And then the publisher sells the fruits of all that work (research and refereeing) back to universities at a steep price. Academic publishing is one of the most profitable businesses on the planet! But univesities and academics are fighting back. Have been for a few years, but the fight is not yet over.

throwaway2037 · on Jan 2, 2025

If unis "win", what is the likely outcome?

bumby · on Jan 2, 2025

More/easier/cheaper dissemination of research.

SJC_Hacker · on Jan 2, 2025

> Do universities require staff to perform a certain number of reviews in academic journals?

No. Reviewers mostly do it because its expected of them, and they want to publish their own papers so they can get grants

In the end, the university only cares about the grant (money), because they get a cut - somewhere between 30-70% depending on the instituition/field - for "overhead"

Its like the mafia - everyone has a boss they kick up to.

My old boss (PI on an RO1) explained it like this

Ideas -> Grant -> Money -> Equipment/Personnel -> Experiments -> Data -> Paper -> Submit/Review/Publish (hopefully) -> Ideas -> Grant

If you don't review, go to conferences/etc. its much less likely your own papers will get published, and you won't get approved for grants.

Sadly there is still a bit of "junior high popularity contest" , scratch my back I'll scratch yours that is still present in even "highly respected" science journals.

I hear this from basically every scientist I've known. Even successful ones - not just the marginal ones.

davrosthedalek · on Jan 2, 2025

While most of what you write is true to some extend, I do not see how reviewing will get your paper published, except maybe for the cases the authors can guess the reviewer. It's anonymous normally.

SJC_Hacker · on Jan 2, 2025

The editor does though, they all know each other. They would know who's not refereeing - and word gets around.

tokinonagare · on Jan 1, 2025

I don't thing it's a money problem. It's more like a framing issue, with some reviewers being too narrow-minded, or lacking background knowledge on the topic of the paper. It's not uncommon to have a full lab with people focussing on very different things, when you look in the details, the exact researchers interests don't overlap too much.

davrosthedalek · on Jan 1, 2025

Typically, at least in physics (but as far as I know in all sciences), it's not compensated, and the reviewers are anonymous. Some journals try to change this, with some "reviewer coins", or Nature, which now publishes reviewer names if a paper is accepted and if the reviewer agrees. I think these are bad ideas.

Professors are expected to review by their employer, typically, and it's a (very small) part of the tenure process.

paulpauper · on Jan 1, 2025

It's implicitly understood that volunteer work makes the publishing process 'work'. It's supposed to be a level playing field where money does not matter.

jasonfarnon · on Jan 2, 2025

Do universities require staff to perform a certain number of reviews in academic journals?

Depends on what you mean by "require". At most research universities it is a plus when reviewing tenureship files, bonuses, etc. It is a sign that someone cares about your work, and the quality of the journal seeking your review matters. If it were otherwise faculty wouldn't list the journals they have reviewed for on their CVs. If no one would ever find out about a reviewers' efforts e.g. the process were double blind to everyone involved, the setup wouldnt work.

canjobear · on Jan 2, 2025

There is no compensation for reviewers, and usually no compensation for editors. It’s effectively volunteer work. I agree to review a paper if it seems interesting to me and I want to effectively force myself to read it a lot more carefully than normal. It’s hard work, especially if there is a problem with the paper, because you have to dig out the problem and explain it clearly. An academic could refuse to do any reviews with essentially no formal consequences, although they’d get a reputation as a “bad citizen” of some kind.

acomjean · on Jan 1, 2025

I know from some of my peers that reviewed biology (genetics) papers, they weren’t compensated.

I was approached to review something for no compensation as well, but I was a bad fit.

grepLeigh · on Dec 2, 2024

A less literal translation like "essentially" or "in essence" is deployed by master Latin translators like Robert Fagles. I've even seen "in a vacuum" which does a better job at communicating the original intent than a string of cryptic prepositions.

OrangeMusic · on Dec 5, 2024

Sometimes I use it (maybe wrongly?) as a synonym to "technically".

"Maybe John is not the boss per se, but we follow his orders"

"Maybe John is not technically the boss, but we follow his orders"

grepLeigh · on Nov 29, 2024

Is living near a sports stadium really that desirable? I lived a few blocks from the Giants stadium in SF, and I'll never make that mistake again.

Running a 15 minute errand on a game day could take hours. It was impossible to get my car out of the garage or get on/off the highway. The food/trash left on the streets was terrible too, which made walking my dog a PITA instead of a pleasure.

garciasn · on Nov 29, 2024

I think the point is that if you live in an American city with a professional sports team, you’re living in an area that offers way more than just a sports team; there are many things of note to do within a 20mi radius.

People who choose to live, say 250mi from the nearest major professional sports team are going to have a ton less job opportunity, things of note to do, but will generally have a lot less to pay because no one else wants to live there.

grepLeigh · on Nov 29, 2024

Ah that makes more sense, thank you! I wasn't looking at it as a proxy for surrounding development, but now I can see why that might be a more nuanced metric than just region size/population.

ziddoap · on Nov 29, 2024

>Is living near a sports stadium really that desirable?

For me? Absolutely not, haha. But I'm sure it is for others. I have no idea what the average American would say.

grepLeigh · on Nov 22, 2024

I think for every student thoughtfully using ChatGPT, there are a dozen who mindlessly dump homework in and copy the output verbatim.

I'm taking classes at a community college for fun, and it's frankly disturbing how reliant the 18-20 y/o crowd is on ChatGPT to do basic tasks. There's also so much unfounded trust in the output of LLMs. At least once per week, I've heard a student arguing with a tutor/professor because their ChatGPT generated homework was marked incorrect - they argue the answer key must be wrong.

I do think LLMs have a place in education, but right now I see them exacerbating existing problems in high school / college aged generations. There's a low tolerance for frustration and grappling with a new problem, or applying past learning to a new situation.

bdangubic · on Nov 22, 2024

> I think for every student thoughtfully using ChatGPT, there are a dozen who mindlessly dump homework in and copy the output verbatim.

is it fair to say that this same argument applies universally for everything?

> I'm taking classes at a community college for fun, and it's frankly disturbing how reliant the 18-20 y/o crowd is on ChatGPT to do basic tasks

I understand where you are going with this but this tech is here and it is not going anywhere and it will be interwoven into every aspect of our lives - these are just facts. At work a year ago it was like few of us were playing around with it, now there is no one on the team that isn't using claude/cursor/copilot/chatgpt... and if we had someone that didn't fairly certain they would not last more than a few months. I think instead of fighting it you should embrace it. if you have unfounded trust (rightfully so) check and check and verify and check and verify. even with all that it is amazing piece of tech without which you will be at a disadvantage both in school as well as work as well as...

> I do think LLMs have a place in education, but right now I see them exacerbating existing problems in high school / college aged generations. There's a low tolerance for frustration and grappling with a new problem, or applying past learning to a new situation.

ohhhh 1000000% but we have to keep in mind that the tech is in its infancy and there will be growing pains...

grepLeigh · on Nov 16, 2024

Nvidia is trying something similar: https://blogs.nvidia.com/blog/llm-semiconductors-chip-nemo/

I'd want to know about the results of these experiments before casting judgement either way. Generative modeling has actual applications in the 3D printing/mechanical industry.

therealcamino · on Nov 17, 2024

That sounds like good work, but we can't ignore the context. Nvidia can train their own LLM's on proprietary Nvidia designs, which isn't a possibility for a random startup.

If the evaluation of the approach is "it works great if you train it on a few decades of the best designs from a successful fabless semiconductor company", I would say that if you plan to use that method as a startup, you're clearly going to fail. Nobody's going to give away their crown jewels to train an LLM that designs chips for other companies.

shash · on Nov 17, 2024

The problem _there_ is that there's very little diversity in the training data - it's all NVidia designs which are probably from the same phylogenetic tree. It'll probably end up regurgitating existing NV designs...

grepLeigh · on Nov 8, 2024

I ditched technology and switched to paper planners, in particular Japanese planners with time columns and enough space to dot down daily notes/thoughts.

After years of being tethered to Slack and other productivity apps, the only ones I use now are Google calendar (coordinating meetings with other people) and email for communication/correspondence.

I think it's been helpful for coping with ADHD. Attention is a finite resource that you only get so much of in a day, and everything online is fighting for it.

Here's an example of what I mean: https://www.jetpens.com/blog/Kokuyo-Jibun-Techo-A-3-in-1-Pla...

mschild · on Nov 8, 2024

I'd love to ditch slack, unfortunately it, or a competitor, is the standard comms software for all jobs I've ever had.

I managed to somewhat successfully reduce the distraction by muting most channels and blocking black out times where I simply turn it off for 2-3 hours a time

grepLeigh · on Oct 30, 2024

I have a whole "chop wood, carry water" speech born from leading corporate software teams. A lot of work at a company of sufficient size boils down to keeping up with software entropy while also chipping away at some initiative that rolls up to an OKR. It can be such a demotivating experience for the type of smart, passionate people that FANNGs like to hire.

There's even a buzzword for it: KTLO (keep the lights on). You don't want to be spending 100% of your time on KTLO work, but it's unrealistic to expect to do done of it. Most software engineers would gladly outsource this type of scutwork.

girvo · on Oct 31, 2024

> KTLO (keep the lights on)

Some places also call this "RTB" for "run the business" type work. Nothing but respect for the engineers who enjoy that kind of approach, I work with several!