More

pocketsand · on Oct 12, 2024

For me, the fact he tried to compel the WPE CEO to work for him or else he would expose that she was in negotiations with him is the most unhinged thing I’ve ever heard in a hiring process. Quite literally an affront to freedom.

My most charitable guess at what is going on is severe mental illness.

kemayo · on Oct 12, 2024

It did all seem to blow up around when he came back from a sabbatical, presumably taken because he really needed a break. Though I guess that he also had a bit of a PR disaster with Tumblr during that sabbatical, so...

tptacek · on Oct 12, 2024

"Quite literally an affront to freedom"? Really? Say more.

pocketsand · on Oct 12, 2024

Work for me or I will publicly shame you, try to damage your reputation, and make your employees potentially question your loyalty.

Threatening people if they don’t take a job with you is in the most literal way a challenge to your autonomy as a human being.

tptacek · on Oct 12, 2024

I think a lot of this whole saga is easier to understand if you just remember that WordPress is a major open source project and a company, so you get to see a lot of the stuff every company --- almost none of whom run major open source projects --- does in the ordinary course of business.

Anyways, if you thought there was any freedom or purity to the tech job market, bad news for you.

FireBeyond · on Oct 12, 2024

> Anyways, if you thought there was any freedom or purity to the tech job market, bad news for you.

Ahh, "everyone else does it" - no, have never been threatened that if I refuse a job offer, or otherwise decline, that my current employer will be told all about how disloyal I was to them.

tptacek · on Oct 12, 2024

Hey, to be clear, I'm not sticking up for it, I'm just saying: commentary about this whole saga sometimes feels like people have lost some perspective, and are holding WordPress to a standard that pure commercial companies aren't held to. Hiring is ruthless! Executive hiring in particular!

appendix-rock · on Oct 12, 2024

Americans relating everything back to “freedom” is so jarring for literally everyone else.

pocketsand · on Sept 2, 2024

Having an ad partner that is doing bad things on its own direction is different from explicitly paying a company to do a specific bad thing.

Far be it from me to defend Meta on privacy —- I won’t —- but we should at least characterize their numerous evils honestly.

stoperaticless · on Sept 7, 2024

> is different

True.

Good thing (glass half full) about clicky-baity version, it forces facebook to take action, because it directly associates their brand with the problem.

pocketsand · on Aug 15, 2024

For what it’s worth, in most cases, an organization that overly focuses on process is superior to one with a naive individualism that throws employees under the bus by default.

pocketsand · on Aug 1, 2024

If you took the "no it won't" side of every argument about "how in X number of years, AI is sure to Y", you'd be way ahead.

In any event, raw parameter/weight count to me seems like a very primitive way to judge "complexity" in comparison to the human brain. Looked at most ways, our brains are for more efficient at doing the incredible things they do than LLMs. Consider how little language young children are exposed to in comparison to LLMs given their abilities to figure out how to produce language.

If the brain doesn't work like an LLM, you can expand the size and "complexity" of these models to the moon and they won't outperform the brain. Current models can write impressively well, but they can barely do math. It's clear they don't reason as we do.

pocketsand · on July 24, 2024

You would think so, but people like Sam Altman have suggested that they can use AI-generated data to train their own models. See here:

https://www.nytimes.com/2024/04/06/technology/tech-giants-ha...

lawn · on July 25, 2024

At no point should you trust anything Sam Altman says.

sailingparrot · on July 24, 2024

Training on ai-generated data isn't a problem, and has been routinely done by everyone for 18 mo +.

The issue is training on 'indiscriminate' ai-generated data. This just leads to more and more degenerate results. No one is doing this however, there is always some kind of filtering to select which generated data to use for training. So the finding of that paper are entirely not surprising, and frankly, intuitive and already well known.

pocketsand · on July 19, 2024

I would disagree. I work in healthcare and we’ve always used SQL Server. While I wouldn’t pick it, it’s been reliable and integrates with auth.

No one “loves” Teams, but honestly it serves its purpose for us at no cost.

No one loves OneDrive but it works.

I think people underestimate how much work it would take to integrate services, train people, and meet compliance requirements when using a handful of the best in class products instead of MS Suite.

mbreese · on July 19, 2024

People use Teams and OneDrive because it’s “Free” when you use Office. IMO, that’s a bit of an anti-trust problem. Both have good competitors (arguably better competitors) that are getting squeezed because of the monopoly pricing with Office.

But with SQL Server, on the other hand, I think you are right. It is a good piece of software. But it also has high quality competition from multiple vendors. Some of it enterprise (Oracle, DB2), some of it FOSS (Postgres, MySQL). Because of this, it has to be better quality to survive… they couldn’t bundle it to get market share, it actually had to compete.

josephd79 · on July 19, 2024

Word, no one uses teams because its great. The only reason it's used is because it's bundled with $M365.

throwaway3306a · on July 19, 2024

People use Teams because it's well integrated into Office, 365, Entra and other MS products, they would (and recently do) pay for it. It has functionalities that no other alternative has, e.g. it can act as a full call centre solution through a SIP gateway.

digging · on July 19, 2024

"Well integrated" is honestly a stretch, but it is fair to say it's integrated with no extra setup.

throwaway3306a · on July 19, 2024

How to manage Slack access control via Azure AD groups? Even the most basic integrations are missing in other options...

ta1243 · on July 19, 2024

> No one “loves” Teams, but honestly it serves its purpose for us at no cost.

Of course there's a cost, its just hidden and you are forced to pay it. Microsoft used its monopoly position to move into a new market.

pocketsand · on July 19, 2024

Yeah, sure. But the marginal cost is zero, whereas a Slack subscription for every person in our org will cost about 1 million dollars a year. And it doesn’t integrate as well with every other piece of functional but mediocre software.

The person approving the $1 million dollar budget item doesn’t really care that Teams isn’t “free” in the sense that there is no free lunch, and while they perhaps have moral qualms of antitrust, that’s outside their purview. We’re locked into Office suite and right now there is no extra charge for Teams.

ta1243 · on July 19, 2024

Which is why the legal process is simply too slow for big tech

Microsoft did a massively illegal thing (again) and got away with it

Time to hold companies responsible for their suppliers.

pocketsand · on July 8, 2024

My experience with DataSpell has not been great. Granted, my workflow leans toward R, and it DataSpell has a Python-first approach, but the app was basically completely broken to even load R, and StackOverflow was full of relatively old posts of people with the same problem. If they really cared about that app that would never happen.

I just do a lot of my R editing in PyCharm now and flip between terminals and RStudio. I was hoping DataSpell could unify that, but it's not ready.

The new RStudio IDE is promising, however.

pocketsand · on June 23, 2024

I do stats and data viz for a living and the article seemed perfectly reasonable to me.

He isn’t dogmatic.

He makes reasonable arguments.

I’m confused by these hopelessly uncharitable readings of the article.

pocketsand · on June 23, 2024

Why? They’re non-parametric and make zero assumptions of normality.

blueflow · on June 23, 2024

How else would you calculate the quartiles to render the boxes?

munch117 · on June 23, 2024

Count data points in each quartile. You can do that for any sortable data, independent of distribution.

blueflow · on June 23, 2024

On second thought, this method makes the outer brackets / whiskers pretty much useless since their position is determined by the largest outliers, which is quite much random.

Falkon1313 · on June 24, 2024

That's not how they're drawn. Outliers (More than 1.5 times the interquartile range outside the 1st/3rd quartile) are plotted as dots beyond the whiskers. The whiskers go at Q1-1.5×IQR and Q3+1.5×IQR.

blueflow · on June 24, 2024

Better is! Look what i was replying to.

blueflow · on June 23, 2024

If you do that in your paper, you better write next to the graph that you did that.

munch117 · on June 23, 2024

Perhaps I expressed myself poorly, and left room for misunderstanding, because I cannot possible imagine that we have any real disagreement on how to compute quartiles.

Any set of numbers I give you, you can compute quartiles for it. There is no algorithm for doing that that breaks down if the numbers don't follow a normal distribution.

blueflow · on June 23, 2024

Look at this SVG from wikipedia: https://upload.wikimedia.org/wikipedia/commons/1/1a/Boxplot_...

When you calculate the box plot using normal distribution parameters, the outliers are outside the outer bracket.

If you split the dataset into 4 equal parts, the bracket will be larger because the outliers are still inside it.

The methodologies are not equal.

This thread is the first time i heard people do the "split dataset into 4 quarters" and using that for box plots.

ColFrancis · on June 23, 2024

For what it's worth, you've convinced me that my beloved box plots need to be explained if I want to use them again.

The SVG you've provided clearly shows that the box plot splits the data in 4. The interquartile range (IQR) is clearly marked and it even has a comparison for what the standard deviation (variance) measure would be.

Secondly, if the data truly came from a normal distribution, there are no outliers. Outliers are data points which cannot be explained by the model and need to be removed. Unless you have a good reason to exclude the data points they should be included. This is why I like the IQR and the median, they are not swayed by a few wide valued data points. The 1.5*IQR rejection filter I think is lazy and unjustified. Happy to discuss this point further as it is a bug bear of mine.

blueflow · on June 24, 2024

When i said "splitting", i meant it like my parent explained: Basically sorting your datasets and then splitting into quarters.

What you want to explain to me (IMHO to the wrong person) is the correct approach of calculating a mean and standard deviation and drawing the box from that. Lets stay with that (and thats what i said earlier in the thread)

After i wrote the post you replied to, i realized that the pure "splitting" method for box plots is nonsensical since the outer brackets interval is determined by the two most extreme values. They are too random to be meaningful. It does not make sense to draw a box plot from that.

ColFrancis · on June 25, 2024

The quartiles are defined by doing the sorting and splitting algorithm. So if you want quartiles (or any other quantile generally) you need to calculate it that way. The mean and standard deviation (sigma) are fundamentally different, which is why the image you linked shows them to contrast against the quantiles.

If you want to represent the standard deviation with your box plot, you can calculate it using standard formulas, many maths libraries have them built in. I don't know how to plot it using any graphing package though. ggplot, plotly and matlab all use the quantiles (the ones I have experience with). Perhaps where ever you learned to read them as mean and standard devation has a reference you could use?

> They are too random to be meaningful. It does not make sense to draw a box plot from that.

This can be a problem. In practice, the distributions I see don't go too crazy and are bounded (production rates can't be negative and can't be infinite). I prefer to use the 10th and 90th percentiles which are well defined and better behaved for most distributions. I do make sure it's very clearly marked on each plot though as it's not standard. Using the 1.5 x IQR cutoff is no better though as when you have enough samples you find that the whiskers just travel out to the cutoff.

pocketsand · on June 23, 2024

As I'm sure you know, there are a lot of variations on how quantiles are calculated in various software. The 25th percentile, e.g., doesn't always line up with a value in the dataset, so sometimes nearest rank methods are used, otherwise a linearly interpolated data point, where interpolation is done in various ways.

In any event, none of these methods assume normality, or rely on CDFs of a normal curve.

If they did, every box plot would be symmetric.

The fact some people think that boxplots are constructed in such a way is a pretty good reason to take the author's article seriously as for how boxplots are confusing.

ColFrancis · on June 25, 2024

As a first pass definition it does well to explain the concept. Even if you're interpolating you will need to rank the samples and find the two nearest neighbours to interpolate between.

It serves to distance it from the moment-based statistics like mean and variance at least.

thaumasiotes · on June 23, 2024

Arguing that nobody who might be professionally expected to look at a box plot can be reasonably expected to understand how box plots are defined doesn't make a compelling case that using them is a good idea.

A4ET8a8uTh0 · on June 23, 2024

It is actually a fascinating argument that shows how little of what is being decided is based on actual data ( or at least our understanding of it ), but rather that data visualization is being used to push already pre-approved decisions with data being used merely as a 'for' argument.

I agree that if there is an indication that if most professionals don't really know what boxplot is supposed communicate, maybe it should not be used.

blueflow · on June 23, 2024

If the method how the plot boxes are calculated is not clear (this thread references at least two different methods), you'll need to explicitly write it down which methods you did use.

thaumasiotes · on June 24, 2024

> this thread references at least two different methods

No, as the sidethread comment notes, there is only one way you can compute quartiles. You seem to be arguing that the correct thing to do is to impute them, and that calculating them is such a deviant practice that it would need to be specially remarked on.

blueflow · on June 24, 2024

Isn't this what i was saying from the beginning?

  Box plots are made for visualizing generalized normal distributions and nothing else.

And now people in this thread argue you can calculate them from something else. Not sure if you are replying to the right post.

thaumasiotes · on June 24, 2024

That might be what you were saying from the beginning, but the only thing that that would establish is that you're completely out of touch with reality. Box plots are made for visualizing quartiles.

Your theory would imply, among other things, that the median line going through the box part of a box plot always divides it in half, which obviously is not the case.

blueflow · on June 24, 2024

No? Exponential Gaussian?

Whatever you do, you should explain first what you do that your whiskers stay meaningful and are not just whatever randomness your outliers produced.

pocketsand · on June 11, 2024

He paid 54bn to have the run of a web site where he creates his own reality, surrounded by sycophants.

It cost a third of that for Boston to do the Big Dig.

To each his own.