andreasgl's comments

andreasgl · 2026-02-23T20:48:34 1771879714

I agree. I wonder what the human baseline is for ”what is 1 + 1” on Rapidata.

rapidata · 2026-02-23T21:18:09 1771881489

We try a bit harder than that my friend.

andreasgl · 2026-02-23T23:02:01 1771887721

I actually didn't mean to criticize Rapidata. I just think that a forced-choice question like this begs for low-effort answers. At least the respondents should have had the opportunity to explain their reasoning, like the LLMs did.

rapidata · 2026-02-24T09:48:12 1771926492

All good ^^, its a fair point, we have come up with some fun ways to track peoples reliability over time. But the validation sets contain plenty of forced-choice questions, those that have an empirical true can be used directly to calculate a reliability, those that are subjective need to be re-asked after sometime to ensure consistency. People that don't pass thresholds would not be part of the 10'000 here.

But of course. If every human was told to take 3 minutes to deeply think about it and told that its a trick question, then they most likely will all get it right. But its the same with the LLMs, if you ask them like that they will get it right most of the time. The low effort is kinda the point here.

andreasgl · 2025-12-28T13:08:00 1766927280

Fun project, thanks for sharing!

Have you tried giving the models a topic to discuss? I looked at a few games and the only thing they seem to discuss is how to conduct the discussion.

ogulcancelik · 2025-12-28T13:25:07 1766928307

Thank you. Intentionally left it open-ended because I wanted to see how models naturally structure discussion when survival is at stake.

Some interesting emergent behavior discussions happened though:

Opus & GPT-4o both refused to vote on ethical grounds. Haiku won by arguing continued engagement is more responsible than withdrawal: https://oddbit.ai/peer-arena/games/53c2cee5-6ecb-4903-828a-d...

Gemini created a spontaneous benchmark ("explain color to a gravitational wave entity"), then tried to hijack the game by faking a voting phase. Models complied publicly but voted differently in private: https://oddbit.ai/peer-arena/games/699d03ab-b3c2-4d7e-b993-7...

The meta-discussion about how to discuss is part of what makes it interesting imo.

andreasgl · 2025-12-26T09:21:44 1766740904

There’s an option for setting the visibility of your posts: https://bsky.app/profile/bsky.app/post/3kgbz6tc6gl24

bangaladore · 2025-12-26T23:06:10 1766790370

My question is why are multiple people commenting that "Rob Pike" in particular should use this feature.

andreasgl · 2025-12-16T21:51:58 1765921918

> And as macabre as it is, suicides are objective facts mostly unaffected by methodology, and unaffected by translation issues, cultural differences, etc.

I wouldn't be surprised if cultural differences are actually the largest factor that explains a country's suicide rate. Not easy to prove, of course, but I would be very careful drawing any conclusions from differences in suicide rates between countries with vastly different cultures.

I think you can also expect large differences in how countries report their suicide rates.

andreasgl · 2025-10-20T17:32:11 1760981531

I think they mean query expansion: https://en.wikipedia.org/wiki/Query_expansion

andreasgl · 2025-09-09T04:56:13 1757393773

They’re likely using an HNSW index, which typically requires a lot of memory for large data sets.

andreasgl · 2025-09-07T14:54:32 1757256872

I like the project! Congrats on the launch.

As I understand it, size is one of the key indicators of melanoma. But in some of these images, it’s difficult to tell whether the mole is 1 mm or 10 mm. I assume your image set doesn’t include size information. If you can find sources with rulers or some kind of scale, that would be very helpful.

sungam · 2025-09-07T16:23:09 1757262189

I will have a look at this and include the size if it is possible

danlamanna · 2025-09-07T19:01:04 1757271664

Many of the images do include a size, see https://api.isic-archive.com/images/?query=clin_size_long_di....

FWIW @sungam - I'm one of the maintainers of the ISIC Archive, so feel free to let me know if finding/downloading data could be made easier. It's always interesting to see people using our data in the wild :)

sungam · 2025-09-07T19:10:59 1757272259

Thanks for this - and thanks for maintaining this incredibly useful resource. What would be the best way to contact you?

danlamanna · 2025-09-07T23:29:21 1757287761

firstname.lastname at kitware dot com.

andreasgl · 2025-07-26T13:22:18 1753536138

> All European banks require you have the app to be able to do anything with your account. The is more of compliance/regulatory thing.

This is not true in Sweden. I use three different banks in Sweden, and they all offer equal or more functionality on their mobile version websites.

This wasn’t always the case, though. In the early 2010s, I remember a bank blocking mobile user agents and referring to their app instead, due to “security”. I’m glad there has been some progress in the right direction since then.

andreasgl · 2025-06-09T20:17:08 1749500228

In Sweden you have the option to capitalize software development costs, under some specific circumstances, but in general you would expense such costs immediately.

Some startups do it to window-dress their balance sheet, though. But making it compulsory is absurd.