Hacker Timesnew | past | comments | ask | show | jobs | submit | itissid's commentslogin

Isn't this something to do with their paid pyx(as opposed to ty/ruff etc) thingy?

> are they fairly uniformly similar with gains in one or a few areas, or is it noisier with a lower overall loss?

It seems like you want to know what median, 5-95 or 1-99 differences might be? I also wonder how the "residual" plot looks like... If there are too many residual data points for a scatter plot then a histogram might be useful to visualize the modes. I suspect that as loss decreases multiple modes should condense or altogether collapse into one.


Many times there is really no way of getting around some of the expert-human judgement complexity of the larger question of "How to get agents to build reliably".

One example I have been experimenting is using Learning Tests[1]. The idea is that when something new is introduced in the system the Agent must execute a high value test to teach itself how to use this piece of code. Because these should be high leverage i.e. they can really help any one understand the code base better, they should be exceptionally well chosen for AIs to use to iterate. But again this is just the expert-human judgement complexity shifted to identifying these for AI to learn from. In code bases that code Millions of LoC in new features in days, this would require careful work by the human.

[1] https://anthonysciamanna.com/2019/08/22/the-continuous-value...


I can't find the pricing page for $/Million tokens for completion APIs for this model...Anyone knows where it is?


I tried looking and couldn't find a proper price per token for the chat model. It claims to be free in some places. I did find these prices for the other services: Text to Speech (Bulbul v3): ₹30 per 10K characters Text to Speech (Bulbul v2): ₹15 per 10K characters Sarvam Vision: Free per page Speech to Text: ₹30 per hour Speech to Text with Diarization: ₹45 per hour Speech to Text & Translate: ₹30 per hour Speech to Text, Translate & Diarization: ₹45 per hour Sarvam Translate V1: ₹20 per 10K characters Translate Mayura V1: ₹20 per 10K characters Transliterate: ₹20 per 10K characters Language Identification: ₹3.5 per 10K characters


It appears to be free (like their old Sarvam-M).


One set of applications to build with subscription is to use the claude-go binary directly. Humanlayer/Codelayer projects on GitHub do this. Granted those are not ideal for building a subscription based business to use oathu tokens from Claude and OpenaAI. But you can build a business by building a development env and gating other features behind paywall or just offering enterprise service for certain features like vertical AI(redpanada) offerings knowledge workers, voice based interaction(there was a YC startup here the other day doing this I think), structured outputs and workflows. There is lots to build on.


I have my homenas set up with Node Proxy Manager container forwarding requests to different docker machines:ports e.g. I have some TTS/STT/LLM services locally hosted. To increase bandwidth to internet facing nodes, would you use this or some other simpler solution?


Is it a typo and it's the Nginx Proxy Manager?


I assume so; I use the same thing with my Unraid box and then create the DNS entries in the unifi panel so I get jellyfin.lan, minecraft.lan, etc inside the house.


Oh yeah Nginx* not Node.


There could be another model in the future, one where many more independent people might support self maintained software by non saas companies

e.g. If the supply of labor learning to build software increases and it becomes very close to what are now vocation training, then you can just hire a guy — like you would a consultant — who can quickly get spun up and make fixes. I would think one of the few things preventing this kind of socio economic set up are saas jobs that are siloed off by interview "walls" to most people from entering. Make it like a vocation, like plumbing or electrician, with lots of non saas companies supporting the market and suddenly it will be the death of saas.

The incentives for this future are closer than they were in 2022-23.


Google maps discover feature is a dumpster fire for fomo driven brain fog


I think a few things explain these kinds of projects

1. There are a lot of Agentic Data Plane startups for knowledge workers(not really for coders[1] but for CFOs, Analysts etc) going up. e.g https://www.redpanda.com/ For people to ask "Hey give me a breakdown of last year's sales target by region, type and compare 2026 to 2025 for Q1".

Now this can be done entirely on intranet and only on certain permissioned data servers — by agents or humans — but as someone pointed out the intranet can also be a dangerous place. So I guess this is about protecting DB tables and Jiras and documentation you are not allowed to see.??

2. People who have skills — like the one OP has with wasm (I guess?) — are building random infra projects for enabling this.

3. All the coding people are getting weirded out by its security model because it is ofc not built for them.

[1] As I have commented elsewhere on this thread the moment a coder does webfetch + codeexec its game over from security perspective. Prove me wrong on that please.


Wait. I don't understand the threat vector modelled here. Any agent or two isolated ones that the do Webfetch and code exec, even in separate sandboxes, is pretty much game over as far as defending against threat vectors goes. What am I missing here?


Well, if wasm process is limited on the syscalls it can make, the blast radius is limited. For example you can block network access, and disk access for tools that don't need those capabilities.

That being said, this doesn't sound like they're really thinking through the risks.

> Dynamic Tool Building - Describe what you need, and IronClaw builds it as a WASM tool

If the agent can write it's own insecure plugins, and the wasm processes isn't properly isolated, you've really gained nothing.


even if it is isolated, like no network or host access. Like say the malicious prompt created a wasm tool that patched your project code to leak information like adding a logger.warning. but LOG_LEVEL was set to error or whatever that prevented this from surfacing during testing or dev/beta.

Again running on that was container that code does not reveal anything. But then another isolated wasm tool was responsible to build the binary and ship it to prod.

Shotgunned all over prod logs are spotted by a log watcher within minutes of deploy. Whew... right?

But you are already screwed.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: