Hacker Timesnew | past | comments | ask | show | jobs | submit | more kbknapp's commentslogin

Really cool idea but this gives me anxiety just thinking about how it has to be maintained. Taking into account versions, command flags changing output, etc. all seems like a nightmare to maintain to the point where I'm assuming actual usage of this will work great for a few cases but quickly lose it's novelty beyond basic cases. Not to mention using `--<CMD>` for the tool seems like a poor choice as your help/manpage will end up being thousands of lines long because each new parser will require a new flag.


Would it be fair to think about this as a shim whose scope of responsibility will (hopefully) shrink over time, as command line utilities increasingly support JSON output? Once a utility commits to handling JSON export on its own, this tool can delegate to that functionality going forward.


I'd also assume that a CLI resisting JSON support is likely to have a very stable interface. Maybe wishful thinking...


It would but I can still see somebody launching this with great enthusiasm and then losing the passion to fix Yet Another Parsing Bug introduced on a new version of dig


`jc` author here. I've been maintaining `jc` for nearly four years now. Most of the maintenance is choosing which new parsers to include. Old parsers don't seem to have too many problems (see the Github issues) and bugs are typically just corner cases that can be quickly addressed along with added tests. In fact there is a plugin architecture that allows users to get a quick fix so they don't need to wait for the next release for the fix. In practice it has worked out pretty well.

Most of the commands are pretty old and do not change anymore. Many parsers are not even commands but standard filetypes (YAML, CSV, XML, INI, X509 certs, JWT, etc.) and string types (IP addresses, URLs, email addresses, datetimes, etc.) which don't change or use standard libraries to parse.

Additionally, I get a lot of support from the community. Many new parsers are written and maintained by others, which spreads the load and accelerates development.


> Not to mention using `--<CMD>`

If you read further down in the documentation, you can just prefix your command with `jc` (e.g. `jc ls`). The `--cmd` param is actually a good idea, since it allows you to mangle the data before converting it (e.g. you want to grep a list before converting it).

Regarding maintenance, most of the basic unix commands' output shouldn't change too much (they'd be breaking not only this tool but a lot of scripts). I wouldn't expect it to break as often as you imagine, at least not because of other binaries being updated.


I'm sort of torn - yeah, one well-maintained "basket" beats having a bunch of ad-hoc output parsers all over the place, but I want direct json output because I'm doing something complicated and don't want parsing to add to the problem. (I suppose the right way to get comfortable with using this is to just make sure to submit PRs with additional test cases for everything I want to use it with, since I'd have to write those tests anyway...)


This requires collaboration. People submitting parsing info for the tool they need, and people that use it to easily keep it up to date. That is the only way.


This is one of the better use cases for LLMs, which have shown good capability at turning unstructured text into structured objects


If LLM’s were local and cheap, sure. They’re just too heavyweight of a tool to use for simple CLI output manipulation today. I don’t want to send everything to the cloud (and pay a fee), and even if it was a local LLM, I don’t want it to eat all my RAM and battery to do simple text manipulation.

In 20 years, assuming some semblance of moore’s law still holds for storage/RAM/gpu, I’m right there with you.


On my M1 Pro/16GB RAM mac I get decently fast, fully local LLMs which are good enough to do this sort of thing. I use them in scripts all the time. Granted, I haven’t checked the impact on the battery life I get, but I definitely haven’t noticed any differences in my regular use.


Which models do you run and how?


https://github.com/ggerganov/llama.cpp is a popular local first approach. LLaMa is a good place to start, though I typically use a model from Vertex AI via API


Thanks. I have llama.cpp locally. How do you use it in scripts? As in how do you specifically, not how would one.


I have ollama's server running, and I interact with it via the REST API. My preferred model right now is Intel's neural chat, but I'm going to experiment with a few more over the holidays.


I tried ollama today and it is super easy, finding good models is definitely going to be challenging. I tried a few on some sample (JSON) tasks and it is... frustrating... how they ellide or are unable to follow instructions.

Is there a good fine-tuning workflow with ollama?


I haven’t tried any fine tuning so I can’t help there, sorry. Though I will say that neural chat has been pretty good to me, even though I have definitely observed it ignoring instructions at times, like a “here’s your json:” preamble in response to a query that specifically requests only json.


I use ollama (https://ollama.ai/) which supports most of the big new local models you might've heard of: llama2, mistral vicuna etc. Since I have 16GB of RAM, I stick to the 7b models.


not op, but this is handy https://lmstudio.ai/


Thanks!


Yeah, it would be much better if you could send a sample of the input and desired output and have the LLM write a highly optimized shell script for you, which you could then run locally on your multi-gigabyte log files or whatever.


Problem: some crusty old tty command has dodgy output.

Solution: throw a high end GPU with 24GB RAM and a million dollar of training at it.

Yeah, great solution.


With fine-tuning, you can get really good results on specific tasks that can run on regular cpu/mem. I'd suggest looking into the distillation research, where large model expertise can be transferred to much smaller models.

Also, an LLM trained to be good at this task has many more applications than just turning command output into structured data. It's actually one of the most compelling business use cases for LLMs


The complaint is less whether it would work, and more a question of taste. Obviously taste can be a personal thing. My opinions are my own and not those of the BBC, etc.

You have a small C program that processes this data in memory, and dumps it to stdout in tabular text format.

Rather than simplify by stripping out the problematic bit (the text output), you suggest adding a large, cutting-edge, hard to inspect and verify piece of technology that transforms that text through uncountable floating point operations back into differently-formatted UTF8.

It might even work consistently (without you ever having 100% confidence it won't hallucinate at precisely the wrong moment).

You can certainly see it being justified for one-off tasks that aren't worth automating.

But to shove such byzantine inefficiency and complexity into an engineered system (rather than just modify the original program to give the format you want) offends my engineering sensibilities.

Maybe I'm just getting old!


If you can modify the original program, then that is by far the best way to go. More often than not, you cannot change the program, and in relation to the broader applicability, most unstructured content is not produced by programs.


> More often than not, you cannot change the program

I’d challenge that. Try working with your upstream. It’s easier than ever nowadays to submit issues and PRs on GitHub.

Building layers upon layers, just work around minor issues in a tool is not wise.


Unfortunately, in my experience it is required more often than not. I have two open issues with up streams that I'm trying to not work around. They haven't even replied let alone consider changes or support contributions. These aren't small projects either. You'd be surprised how many solo maintained projects won't even entertain

Anyway, I do try, but in my experience, if it happens, it's not over night and your going to have to maintain a work around for an amount of time


Yes, makes sense. Although this was originally a post about output of common command-line tools. Some of these are built on C libraries that you can just use directly. They are usually open source.


As someone who maintains a solution that solves similar problems to jc, I can assure you that you don’t need a LLM to parse most human readable output.


it's more about the maintenance cost, you don't have to write N parsers for M versions

Maybe the best middle ground is to have an LLM write the parser. Lowers the development cost and runtime performance, in theory


You don’t have to write dozens of parsers. I didn’t.


Part of the appeal is that people who don't know how to program or write parsers can use an LLM to solve their unstructured -> structured problem


this is a terrible idea, I can't think of a less efficient method with worse correctness guarantees. What invariants does the LLM enforce? How do you make sure it always does the right thing? How do you debug it when it fails? What kind of error messages will you get? How will it react to bad inputs, will it detect them (unlikely), will it hallicinate an interpretation (most likely)

This is not a serious suggestion


I used to focus on the potential pitfalls and be overly negative. I've come to see that these tradeoffs are situational. After using them myself, I can definitely see upsides that outweigh the downsides

Developers make mistakes too, so there are no guarantees either way. Each of your questions can be asked of handwritten code too


You can ask those questions, but you won't get the same answers.

It's not a question of "is the output always correct". Nothing is so binary in the real world. A well hand-maintained solution will trend further towards correctness as bugs are caught, reported, fixed, regression tested, etc.

Conversely, you could parse an IP address by rolling 4d256 and praying. It, too, will sometimes be correct and sometimes be incorrect. Does that make it an equally valid solution?


Sure. But we weren’t talking about non-programmers maintaining software.


> people who don't know how to program OR write parsers

there are plenty of programmers who do not know how to write lexers, parsers, and grammars


We are chatting about maintaining a software project written in a software programming language. Not some theoretical strawman argument youve just dreamt up because others have rightly pointed out that you don’t need a LLM to parse the output of a 20KB command line program.

As I said before, I maintain a project like this. I also happen to work for a company that specialises in the use of generative AI. So I’m well aware of the power of LLMs as well as the problems of this very specific domain. The ideas you’ve expressed here are, at best, optimistic.

by the time you’ve solved all the little quirks of ML you’ll have likely invested far more time on your LLM then you would have if you’d just written a simple parser and, ironically, needed someone far more specialised to write the LLM than your average developer.

This simply isn’t a problem that needs a LLM chucked at it.

You don’t even need to write lexers and grammars to parse 99% of application output. Again, I know this because I’ve written such software.


Give a kid a hammer and he'll find something to fix.


What value does this comment add?


Approximately the same amount as the comment I replied to.


One attempts to nudge a user towards the comment guidelines of HN (https://qht.co/newsguidelines.html)

> Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.

> Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.

> Eschew flamebait. Avoid generic tangents. Omit internet tropes.


The old saying "If a hammer is your only tool then everything is a nail" is absolutely pertinent to this comment thread.


how so? what assumptions are you making to reach that conclusion?


This all should be obvious to any human with knowledge of common colloquialisms. You aren't an AI are you?

The latest "hammer" is AI.

Lots of commenters here are suggesting to use a complex AI to solve simple text parsing. Maybe you can't see the problem with that, but it's like using 1000 Watts of power to solve something that should take 1 microwatt, just because "new, shiny" AI is here to save us all from having to parse some text.

I'm not making assumptions about what people are commenting about in this thread. Your comment comes off like a subtle troll.


> Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.

What rule applies when the initial comment is not thoughtful and substantive?


I got a kick out of it.

¯\_(ツ)_/¯


Keep in mind that the maintenance responsibility you're anxious about is currently a cost imposed on all developers.

<rant>

Since I started programming in the 80s, I've noticed a trend where most software has adopted the Unix philosophy of "write programs that do one thing and do it well". Which is cool and everything, but has created an open source ecosystem of rugged individualism where the proceeds to the winners so vastly exceeds the crumbs left over for workers that there is no ecosystem to speak of, just exploitation. Reflected now in the wider economy.

But curated open source solutions like jc approach problems at the systems level so that the contributions of an individual become available to society in a way that might be called "getting real work done". Because they prevent that unnecessary effort being repeated 1000 or 1 million times by others. Which feels alien in our current task-focussed reality where most developers never really escape maintenance minutia.

So I'm all in favor of this inversion from "me" to "we". I also feel that open source is the tech analog of socialism. We just have it exactly backwards right now, that everyone has the freedom to contribute, but only a select few reap the rewards of those contributions.

We can imagine what a better system might look like, as it would start with UBI. And we can start to think about delivering software resources by rail instead of the labor of countless individual developer "truck drivers". Some low-hanging fruit might be: maybe we need distributions that provide everything and the kitchen sink, then we run our software and a compiler strips out the unused code, rather than relying on luck to decide what we need before we've started like with Arch or Nix. We could explore demand-side economics, where humans no longer install software, but dependencies are met internally on the fly, so no more include paths or headers to babysit (how early programming languages worked before C++ imposed headers onto us). We could use declarative programming more instead of brittle imperative (hard coded) techniques. We could filter data through stateless self-contained code modules communicating via FIFO streams like Unix executables. We could use more #nocode approaches borrowed from FileMaker, MS Access and Airtable (or something like it). We could write software from least-privileges, asking the user for permission to access files or the networks outside the module's memory space, and then curate known-good permissions policies instead of reinventing the wheel for every single program. We could (will) write test-driven software where we design the spec as a series of tests and then AI writes the business logic until all of the tests pass.

There's a lot here to unpack here and a wealth of experience available from older developers. But I sympathize with the cognitive dissonance, as that's how I feel every single day witnessing the frantic yak shaving of "modern" programming while having to suppress my desire to use these other proven techniques. Because there's simply no time to do so under the current status quo where FAANG has quite literally all of the trillions of dollars as the winners, so decides best practices while the open source community subsists on scraps in their parent's basement hoping to make rent someday.


Many (most?) disagree with this line of thinking, but I believe the "Rust will never have a 2.0" style thought is what ultimately leads these multi-year pursuits of perfect (or at least good enough to last "essentially forever"). The Editions provider a certain release valve for some styles of breaking changes, but I don't believe it's quite enough ultimately over the entire lifespan of a language which will undoubtedly grow cruft that even Editions cannot remove.


I would certainly agree that it is a pressure in that direction. But like anything, you take the design constraints you have, and do the best you can with them. Paralysis is not the only possible outcome. Accepting that nothing is ever perfect, that you will make mistakes and then have to deal with it later, and that's okay, is another. There are many instances of the latter happening over the history of Rust's development in the past. It would be nice if the Project could figure out how to strike that balance again.


There are two ways to take it, "Rust will never have a 2.0 ... therefore everything added to it must be perfect." or "Rust will never have a 2.0 ... so don't try to force it to be something that its not and lets make Rust the best it can be."

A language with Rust's priorities but designed with an effect system from the start could be epic. But Rust is not that language and maybe it can't be. And that's ok. We can give Rust a pass since effect systems weren't even invented yet when it was being designed.


> I believe the "Rust will never have a 2.0" style thought is what ultimately leads these multi-year pursuits of perfect

probably, but if it had instead had a "Rust will break your code once a year/biyear" style thought then it likely would have been yet another nice ML-y language no one cared about.


it also doesn't help that the fix for the increased difficulty in implementing AsyncIterator for devs (assuming the approach advocated for in the OP winds up being the one selected by the language team) relies on the as yet (?) unstabilized generators/async generators feature. I'm not really why it's not available yet as the necessary compiler features are already in place and have been for years but because it's not, this is kind of a hard pill to swallow.


Seems the author is expecting OAI to continue merrily along its way working towards AGI (albeit at a stated slower pace) while MSFT is able to take Altman et al and circle the wagons on what already exists (GPT4) squeezing it for all its worth. While that's entirely possible, there are other outcomes not nearly as positive that would put MSFT at a disadvantage. It's like saying MSFT's AI hedge is built on what appears like sand; maybe it's stable, maybe it's not.


Don’t think they can just outright steal GPT4 and they definitely won’t be taking the world class data set with them


I've worked in large companies (thousands of employees) and startups (<20) and I actually felt more like a cog in the machine at the startup size companies.

I was literally just a means to an end to churn out code on a product. I could have been (and eventually was) replaced at any moment with another generic cog willing to churn out the same code without much of a thought.


After working at Google and Startups, I totally agree. You are much more of a cog at a startup due to the desperate need to grind out the next A/B test or customer requirement.

People WAY over glamorize startups.


It's the IATA code for Portland International Airport. Many datacenters use IATA codes for the nearest airport to give a rough approximation of the location. So the PDX datacenter is the one closest to the PDX airport in or around Portland.


Git terminology is a clinical example where many (most? definitely not all) terms make perfect sense once you already understand how it works, but make almost no sense in concert with other terminology or when you don't know the implementation details.

Leaky terminology.


In part, but also, it's because different people worked on different parts of git and came up with different names. Linus originally called it the cache, the most computer sciency term, and then I think Junio renamed it to index, a more DVCS-specific term, but most users called it the staging area, and now the evolution of this term is fossilised into the git UI as well as its internals.


No, it's not sensible, just sloppy naming plain and simple. Good naming takes effort and it just wasn't done by git's early developers.


So basically poor API design

Imagine if you had to know OS/IDE/compiler internals for basic usage


> Imagine if you had to know OS/IDE/compiler internals for basic usage

Back when I used to walk backwards uphill for 6 days to get to school, that's precisely how it was.

Except that "IDE" was a misspelling of "I'd" and nobody ever did that.


hmm?


The "Security. Cryptography. Whatever." podcast just did an episode about the origin of the nistp curves.

https://securitycryptographywhatever.com/2023/10/12/the-nist...


My wife isn't going to be happy about this...

I have an ErgoDox EZ at home, and a Moonlander for work. I absolutely love these keyboards. They have played a significant role in reducing pain I experience from arthritis.


I can totally relate. I bought my ErgoDox EZ a while ago and I still love it.

I changed the switches for some lubed linears, and added some foam and now it even sounds better. It also reduced my wrist pain.

I'm debating with myself whether to get the Moonlander or not. I don't actually need it, but it looks so cool (and the USB port of the ErgoDox is quite outdated haha).


I love my ergodox ez. I got the moon lander, and it looks cool but I do not like it. Niether does my friend. The tenting tilts the thumb away which makes it uncomfortable. I wish I just got another ergodox ez (eventually I did).


The latest Ergodox model now has USB-C.


I love my ErgoDox so much. If it broke, I would immediately purchase a new one. Cannot imagine trying to go without it.


Lead with the pain reducing angle. Tell her it's better to be addicted to keyboards than painkillers!


how's the moonlander btw? i'm thinking about getting one specifically for work.


steep learning curve if you've never used a split\ortho board but once you learn how to type again (couple of weeks for me), other than that my wrist pain is now gone! only real downside is people asking why I have a pair of glowing oven gloves on my desk


I'd like to add that switching to a drastically different keyboard is the perfect opportunity to simultaneously switch keyboard layouts if thats something you've been considering. I switched from QWERTY to a layout based on Programmer Dvorak when I got my Moonlander. I'm really glad I did.


This was exactly my line of thought when I got the Moonlander (except I thought Colemak). It has been gathering dust in my shelf for almost 2 years now.


What kind of board do you use instead?


I mainly use my macbook m1 as my keyboard (also when connected to external screen)

Before this I used a thinkpad as my keyboard


Same, though it was painful. Took me ~6 months to go from 6-fingered pecking to full proficiency with split-keyboard Dvorak. 100% worth it, though.


I have one. Personally, I prefer the Kinesis Advantage2 when it comes to plain typing, because I find dipping my hands into a well of keys is more a little comfortable than tenting them. The Moonlander has better software, aesthetics, and is more compact (which would be useful for someone else, but not me). The main reason I got it, though is that it supports GeminiPR, which makes it suitable for stenography.

Ortholinear split takes a while to get used to but you really never want to go back to a regular keyboard after you get used to it - it makes you feel like you've been breaking your wrists.


Two years or so in, I'm hooked. I've already decided that if I return to office I'll need one for each location (not interested in carrying one back and forth).

I've used an "ergonomic" "split layout" keyboard for years. Specifically, I used the MSFT Natural 4000 since about 2010 due to RSI and typing-related pain. Even with that keyboard, I still had pinky/ring-finger related RSI from modifier keys (shift/ctrl/esc), but it was rather minimal if I was careful and kept my "off-hours" related typing minimal.

I picked up the Moonlander after looking at multiple truly-split keyboards including the Ergodox, Nyquist, Kinesis Advantage 2 (and later 360). I've used a Via/QMK keyboard as well and I will say that Oryx and the Configurator seem much more user-friendly and featureful.

The auto-shift feature is great, as I rarely need to use the shift key for normal typing. Having a toggle key to quickly disable it for things like gaming or the odd application that doesn't detect keydown/up for shifted keys as expected (or perhaps it's on the kbd firmware side and isn't sending a held-down shifted key, who knows) is rather useful.

Layers are your friend and don't be worried about printing out a layer map while you're learning. And don't be afraid to change your layout to try something.

I changed my layout multiple times a day for the first two or so weeks, maybe once a day for another two weeks, then perhaps once a week to once a month over the next few months. I think I last changed my layout 6 months ago to add a new shortcut for a feature added to my IDE.

If you're already accustomed to a split keyboard, expect minimal growing pains beyond learning your layer keys and such. But if it's your first time using a split KB, then you'll have that adjustment too. MSFT used to have a sculpted kb that wasn't a full split and that was a good "training wheels" kb for a split, but I don't think they make that board anymore. Basically, if you don't alrady use a split, expect to reach for keys on the other side a bit until your brain adjusts.

I also suggest using background colors in your layers to indicate hotkeys and such. It started as a crutch for me, but I've come to really like it as it serves as a bit of a reminder for those less-used shortcuts.

The ability to hot-swap your key switches and caps is great. I ended up with a highly tactile switch that isn't much louder than the usual "cherry MX browns", but that is entirely optional. The out of the box keys are perfectly fine for those not interested in that feature.


Your wife micromanages your budget?


H: "Honey, I bought a new $370 keyboard."

W: "Another one?! You already have one for the house and one for the office... Where is this one going to go?"

Doesn't have to be budget micromanagement.


> W: "Another one?! You already have one for the house and one for the office... Where is this one going to go?"

You might be confusing “wife is interested in my hobbies” with “wife is not happy”. The former is what you described and the latter is literally what GGGP said.


Going over expenses together with your partner is important. If you don't have a partner, then you can spend money on whatever you want; you're only going to impact yourself.


Unless GP is close to destitute, bickering over spending $400 on their primary tool that presumably keeps them gainfully employed is the literal definition of micromanagement.

I do have a partner and she has autonomy over the money we earn together as do I, because we trust each other to make reasonable decisions.


You can't both have autonomy over the same resource.

It sounds like you and your partner have a mutual understanding of the other's spending, and your expenses are low enough and/or your income is high enough that it doesn't create problems when you spend below the implicit limits.

That's great for you, but it comes across as narrow-minded to talk as though a $400 expense is unworthy of discussion to every couple.


> unworthy of discussion to every couple.

“Unworthy of discussion” vs. “isn’t going to be happy” is a good way to move the goalposts.

I suppose I’m just tired of the juvenile meme of “my wallet is going to hurt” or “my wife [almost always wife, and never husband; what does that say?] is going to be upset” at <insert hobby>.

If that’s the level of trust one has in their life partner, one ought to take a closer look at what’s really important to oneself.


I have one of these fancy keyboards. It is not the "primary tool that keeps me gainfully employed".


This makes me wonder about newer terminal emulators on maccOS like Warp[1], and if they're for example taking all input locally, and then sending it over the remote host in a single blob or not? I imagine doing so would possibly break any sort of raw-mode input being done on remote host but I'd also imagine that is a detectable situation in which you could switch into a raw keystroke feed as well.

[1]: https://warp.dev


In general once you’re connecting over SSH the connection itself is always in raw mode and then the remote host deals with its pty normally (which can be in line or raw mode). Terminals with special shell integrations usually need them installed on the remote host too (some have support that does that somewhat transparently though).

This is why mosh can have better behaviour than pure SSH over high latency connections. However this feature isn’t going to apply to mosh.


I wonder if SSH can honor line-buffered mode. It should be able to detect it, but then if it incorrectly switches to line buffering then random stuff might deadlock.


It's really hard for me to imagine that an app that markets "AI for your terminal" is going to be "more secure and private" than some standard Unix tool.

Perhaps some very specific example of a security feature (such as protecting against timing attacks) could be protected against in a new tool, and not in the older more standard one. But it seems far more likely that many other security features would get forgotten in the newer tool, and by adding "AI" so many more attack vectors would be added.

It's honestly hard to even believe in the privacy claims of warp. Almost all NLP tools in today's age seem to fall towards cloud solutions, which almost immediately makes that likelihood of privacy close to nil.


If they’re designed to take in data at some baud rate, wouldn’t the blob feed in at that rate too?


I wonder how well (or if it's even possible?) to pair this with hosted WASM runners (like CloudFlare Workers, or Fastly Runners, etc.).

It looks like there is also (limited?) support for tracing functions and allocations in WASM binaries you didn't compile. I'm looking forward to trying this out!


Yes, it should work in Cloudflare, you'd just need to use the JS adapter for whichever APM you use (Datadog, Honeycomb, Lightstep w/ more coming soon). The library provides import functions you pass into your module instance, and as long as the Wasm was instrumented before you deploy the Worker, you should have no issues. Please let us know if you hit any though!


One caveat that Datadog (specifically traces) probably won't work on the edge just yet. But if this is something you're interested in kbknapp feel free to post your questions or requests here https://github.com/dylibso/observe-sdk/discussions


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: