More

msp26 · 2026-03-23T16:16:58 1774282618

> Data extraction tasks are amongst the easiest to evaluate because there’s a known “right” answer.

Wrong. There can be a lot of subjectivity and pretending that some golden answer exists does more harm and narrows down the scope of what you can build.

My other main problem with data extraction tasks and why I'm not satisfied with any of the existing eval tools is that the schemas I write change can drastically as my understanding of the problem increases. And nothing really seems to handle that well, I mostly just resort to reading diffs of what happens when I change something and reading the input/output data very closely. Marimo is fantastic for anything visual like this btw.

Also there is a difference between: the problem in reality → the business model → your db/application schema → the schema you send to the LLM. And to actually improve your schema/prompt you have to be mindful of the entire problem stack and how you might separate things that are handled through post processing rather than by the LLM directly.

> Abstract model calls. Make swapping GPT-4 for Claude a one-line change.

And in practice random limitations like structured output API schema limits between providers can make this non-trivial. God I hate the Gemini API.

sbpayne · 2026-03-23T16:21:39 1774282899

This is very true! I could have been more careful/precise in how I worded this. I was really trying to just get across that it's in a sense easier than some tasks that can be much more open ended.

I'll think about how to word this better, thanks for the feedback!

sethkim · 2026-03-23T16:44:21 1774284261

This is extremely true. In fact, from what we see many/most of the problems to be solved with LLMs do not have ground-truth values; even hand-labeled data tends to be mostly subjective.

rco8786 · 2026-03-23T16:20:47 1774282847

I think they're just saying that data extraction tasks are easy to evaluate because for a given input text/file you can specify the exact structured output you expect from it.

msp26 · 2026-03-17T22:32:17 1773786737

Man the lowest end pricing has been thoroughly hiked. It was convenient while it lasted.

msp26 · 2026-03-12T11:08:55 1773313735

I got claude to reverse engineer the extension and compare to changedetection and here's what it came up with. Apologies for clanker slop but I think its in poor taste to not attribute the opensource tool that the service is built on (one that's also funded by their SaaS plan)

---

Summary: What Is Objectively Provable

- The extension stores its config under the key changedetection_config

- 16 API endpoints in the extension are 1:1 matches with changedetection.io's documented API

- 16 data model field names are exact matches with changedetection.io's Watch model (including obscure ones like time_between_check_use_default, history_n, notification_muted, fetch_backend)

- The authentication mechanism (x-api-key header) is identical

- The default port (5000) matches changedetection.io's default

- Custom endpoints (/auth/, /feature-flags, /email/, /generate_key, /pregate) do NOT exist in changedetection.io — these are proprietary additions

- The watch limit error format is completely different from changedetection.io's, adding billing-specific fields (current_plan, upgrade_required)

- The extension ships with error tracking that sends telemetry (including user emails on login) to the developer's GlitchTip server at 100% sample rate

The extension is provably a client for a modified/extended changedetection.io backend. The open question is only the degree of modification - whether it's a fork, a proxy wrapper, or a plugin system. But the underlying engine is unambiguously changedetection.io.

vkuprin · 2026-03-12T11:21:37 1773314497

Fair point, and I should have been upfront about this earlier. The backend is a fork of changedetection.io. I've built on top of it — added the browser extension workflow, element picker, billing, auth, notifications, and other things — but the core detection engine comes from their project. That should have been clearly attributed from the start, and I'll add it to the docs and about page.

changedetection.io is a genuinely great project. What I'm trying to build on top of it is the browser-first UX layer and hosted product that makes it easier for non-technical users to get value from it without self-hosting and AI focus approach

P.S -> I've also added an acknowledgements page to the docs: https://docs.sitespy.app/docs/acknowledgements

briaoeuidhtns · 2026-03-12T15:07:34 1773328054

have you adhered to the license? https://github.com/dgtlmoon/changedetection.io/blob/master/C... . if so, where can I get a copy of the source?

vkuprin · 2026-03-12T18:05:30 1773338730

Yes — the project is Apache 2.0 licensed (https://github.com/dgtlmoon/changedetection.io/tree/master?t...), which permits forking and commercial use. There's also a COMMERCIAL_LICENCE.md in the repo for hosting/resale cases, and I've reached out to the maintainer directly about it. Attribution is here: https://docs.sitespy.app/docs/acknowledgements

msp26 · 2026-03-12T11:06:12 1773313572

see:

https://qht.co/item?id=47349069

msp26 · 2026-03-07T17:30:31 1772904631

Apologies but I will use this thread as an opportunity to report CC VSCode extension bugs because I don't think there's an official channel that actually gets read by humans.

> yeah they're shipping too fast and everything is buggy as shit

- fork conversation button doesn't even work anymore in vscode extension

- sometimes when I reconnect to my remote SSH in VSCode, previously loaded chats become inaccessible. The chats are still there in the .jsonl files but for some reason the CC extension becomes incapable of reading them.

-- this issue happens so frequently that I ended up making a skill to allow CC to dig up info from the bugged sessions

msp26 · 2026-03-03T20:08:00 1772568480

many tasks don't need any reasoning

msp26 · 2026-03-03T19:48:45 1772567325

What the fuck is this price hike? It was such a nice low end, fast model. Who needs 10 years of reasoning on this model size??

I'm gonna switch some workflows to qwen3.5.

There's a lot of tasks that benefit from just having a mildly capable LLM and 2.5 Flash Lite worked out of the box for cheap.

Can we get flash lite lite please?

Edit: Logan said: "I think open source models like Gemma might be the answer here"

Implying that they're not interested in serving lower end Gemini models?

zzleeper · 2026-03-04T02:25:15 1772591115

Are there good open models out there that beat gemini 2.5 flash on price? I often run data extraction queries ("here is this article, tell me xyz") with structured output (pydantic) and wasn't aware of any feasible (= supports pydantic) cheap enough soln :/

kristianp · 2026-03-04T10:27:03 1772620023

You'll have to try out models on your use case. Openrouter makes that easy.

msp26 · 2026-03-02T16:50:20 1772470220

> every single product/feature I've used other than the Claude Code CLI has been terrible

yeah they're shipping too fast and everything is buggy as shit

- fork conversation button doesn't even work anymore in vscode extension

- sometimes when I reconnect to my remote SSH in VSCode, previously loaded chats become inaccessible. The chats are still there in the .jsonl files but for some reason the CC extension becomes incapable of reading them.

msp26 · 2026-02-27T23:02:20 1772233340

Batshit situation, respectable position from Dario throughout.

But there's some irony in this happening to Anthropic after all the constant hawkish fearmongering about the evil Chinese (and open source AI sentiment too).

msp26 · 2026-02-18T18:11:22 1771438282

Horrific comparison point. LLM inference is way more expensive locally for single users than running batch inference at scale in a datacenter on actual GPUs/TPUs.

AlexandrB · 2026-02-18T18:15:38 1771438538

How is that horrific? It sets an upper bound on the cost, which turns out to be not very high.