As someone who has already tried to replicate some Neural net results (hard to get right), the fact that the authors answered so thoroughly is really positive.
Unfortunately it's internal.
All faang companies spend a lot of time and resources on hiring to make it more fair for everybody and to get signals faster. And nobody internally enjoys doing a lot of coding interviews - it's as frustrating for the interviewer as it is for the candidate.
If there would be an easier way then these companies would definitely use it.
A candidate usually has 5-7 interviews (if he is not failing the phone screen), which is 5-7 engineers spending at least 1 hour for the interview + 1 hour for the feedback.
I am having 2-5 interviews per week, which means I spend between 4 and 10 hours per week just on hiring.
If we could reduce this number we would definitely do it, as it's costing a loooot of money.
To be fair: there is some innovation and traction in those areas, s.t. some candidate s do a coding project at home and then present their solution (to reduce the number of interviews).
Someone who didn't get the job should sue just because if these valuable they will have the data to prove it and then the rest of us can look it up in the court records.
My boss asked HR about this and they said if we want to ask coding questions we need to first give the coding interview and score the results - but not use that to decide to hire or not. Then after two years look at their performance and compare to how they did on the interview.
> And nobody internally enjoys doing a lot of coding interviews
That's true, I recently had an interview and the interviewer was on the phone all the time totally uninterested, barring the first 5 minutes. How do you account for that?
Everybody knows (and as stated in the article) the system can be gamed (by memorizing Leetcode solutions) and I choose not to. If you don't believe me have a look here:
(there are a ton of interview experiences and detailed questions and strategies on this site)
There are people who are freaking memorizing behavioral questions!!
Surely, the FAANG companies are aware of this?
This what I see happening these days:
1. You have an interview candidate solving questions without memorization
2. You have an interview candidate solving questions with memorization
How are you distinguishing between the two and how do you prevent bias? Bias in all forms, not just the stereotype: white guy hiring another white guy.
> The idea that Google has internal research indicating this wouldn't be surprising to me.
The problem with internal research on these matters is the nearly total lack of negative examples. Depending on the quality of the pre-on-site screening the quality of candidates at that point might be so high that random selection might be just as good.
A negative example would be someone who scored good within the interview process, but then performs bad on the job (bad performance rating).
Such examples get discussed internally by the hiring committee (senior people doing the last hiring decision for every candidate).
I think the biggest issue with the current approach among faang interviews is that a lot of really good people get filtered out (false negative) - but faang get so many applications that they are ok with this.
And yet, googles previous public comments on the topic are somewhat the reverse - all their interview question except for questions on previous work/experiences were not correlated with "success".
Scare quotes used because knowing how to define success is an even more hairy question that we (industry, humans) dont have a good grasp on either. So any criteria you use might be wrong or a subset of the right (where "wrong" means you would change your conclusion if you saw the bigger picture, which none of us can.)
I also saw people having 10 interviews, and then get rejected :,(
8-15 years ago fb/google had 20+ interviews - and reduced the number based on research that after 4 they can predict it really good.
They mention 4 in the article, but that actually means 4 on-site. Before that a candidate usually has to pass a simple recruiter phone screen and a simple coding phone screen.
> *citation needed
Yes. Citations would be useful. To me, it seems to make sense intuitively. I'd say that at the very least it is a fair/objective way of ranking candidates. Particularly so if where the ratio of candidates to positions is high.
"So if anyone tries to convince you that Commons Clause is wrong because it doesn't meet all the requirements of the Open Source Definition, you should ask them if proprietary is better."
"Freedom for others to commercialize your software comes with starting an open source project, and while that freedom is important to uphold, growth and commercial pressures will inevitably force some projects to close."
> "you should ask them if proprietary is better [than Commons Clause]."
Neither is better, because they are the same. Commons Clause is proprietary.
If a piece of FOSS that I used became proprietary, I would fork (or use a fork) based on the last FOSS version. Just because the Commons Clause kinda looks like it's FOSS if you squint, doesn't make it any different.
Well, the FSF defines "“proprietary software” as synonymous with nonfree software."[1] But that's kind of a cop-out answer.
"Proprietary" means that the copyright holder retains certain rights, rather than granting the rights to the recipient. In the case of the Commons Clause, the rights that they retain propriety of are the rights to commercial use.
When someone releases their software under GPL, they still retain certain rights rather than granting them to the recipient - notably, the right to sell it commercially without providing the source, and without requiring the buyers to adhere to the terms of the GPL. This is clearly a valuable right, seeing how many companies have business models that are built around dual-licensing GPL'd code for commercial proprietary use.
Strictly speaking, the only license that doesn't have the copyright holder retaining any rights is the lack of one (i.e. releasing to public domain). If you're not releasing to PD, that's necessarily because you want to retain some rights. And then it's just a question of which ones. GPL has one answer, something like MIT has another, and Commons Clause has another still. I fail to see what makes some of them proprietary, while others are not.
From my perspective, if I can get the code, hack on it, and release the changed version to others who can also do all of these things in turn, that's enough to make it non-proprietary already. Proprietary is when the software is closed source outright, or the source is provided for "educational use only" (i.e. no derived works allowed), or when derived works cannot be redistributed. Licenses that allow redistribution of patches, but not original code with patches applies, would be the grey territory.
> Strictly speaking, the only license that doesn't have the copyright holder retaining any rights is the lack of one (i.e. releasing to public domain).
This is a bit nit-picking, but releasing without a license is pretty much the opposite of releasing to the public domain.
If you release without anything, the raw unmodified copyright laws apply, which are rather strict and give the recipient basically no rights - certainly no right to redistribution.
You have to make some kind of explicit statement if you want to release something into the public domain. That's why things like CC0 exist.
I'm not an expert by any means, but if the Commons Clause's FAQ is accurate, the copyright holder is only retaining a very specific right to commercial use, not all rights to commercial use:
"Commons Clause only forbids you from “selling” the Commons Clause software itself. You may develop on top of Commons Clause licensed software...and you may embed and redistribute Commons Clause software in a larger product, and you may distribute and even “sell” (which includes offering as a commercial SaaS service) your product...You just can’t sell a product that consists in substance of the Commons Clause software and does not add value."
I see how that restricts my rights as a user, but those are rights I simply don't care about. Back in the days of dialup, people were able to make a few bucks by selling Linux CDs to help other users get started. But what examples are there today of unmodified free software being sold by someone other than the primary developer, aside from various app store scams?
the difficulty is deciding what is added value. for some it may be a new interface to use the software, for others it may be a promise of support or even some kind of insurance.
with this kind of limitation in the license it is really hard to tell what the intentions of the licensors are, what they will tolerate and what they won't allow.
there is really only one position that is safe. either all commercial activity is allowed, or none of it is.
everything else is a legal minefield that is to dangerous for anyone to even touch.
Not at all. MS shared source license didn't permit the creation of derived works at all, much less redistribution of them, for any purpose (commercial or otherwise).
Proprietary software can and, in many cases, do have their source code accessible, so yes, Common Clause is as good as proprietary software, no more and no less.
In a sense you can always assume that the answer to these problems is just 'make the neural net bigger' but I find this deeply unhelpful for two reasons:
1. This clearly does not seem to be the way humans learn. Humans can learn from very few examples, in entirely unguided environments, and they don't face the same issues that existing algorithms suffer from. (for example humans have no big problem with rotational invariance, whereas ML vision algorithms do).
2. It's essentially surrendering to the fact that we aren't able to understand how cognition works and build higher-level representations as a result. The goal of AI research can not just be to feed data blindly into enormous primitive structures, it must also be to get a grasp on what sort of complex structures are part of intelligent agents and how they interact.
Humans can learn from very few examples, in entirely unguided environments, and they don't face the same issues that existing algorithms suffer from. (for example humans have no big problem with rotational invariance, whereas ML vision algorithms do).
That's because, contrary to the zeitgeist, humans are not a blank slate. Our brains are the result of billions of years of evolution. They are extremely well adapted to modelling the natural environment and the behaviour of other beings around us. This is in stark contrast to computers which we start from nothing and force feed a huge amount of data without context and then expect results. The fact that this approach works at all for some tasks is staggering.
Not an expert, but as long as there is no way of knowing WHAT it is the algorithm is learning it seems to me that it could never work reliably. It might look perfectly reasonable until you hit one of the triggers the algorithm used to segment the data.
Someone somewhere shared a story about using machine learning to spot the difference between US and Russian tanks ; which apparently worked fine until field testing, where it failed miserably. What the algorithm had learned was the difference between great quality photos of US tanks and poor quality photos of Russian. True or not, this is exactly the kind of issues that will keep popping up.
Plenty of people are spending plenty of time figuring out how to mess with facial recognition as we speak by taking advantage of the same fundamental weakness.
Oh, yep! It's super simple to induce systemic errors like that. Take https://github.com/kevin28520/My-TensorFlow-tutorials/tree/m... . Lighten every dog by 20%. Darken every cat by 20%. Train. Take image of cat, lighten 20%, watch as it's transformed into a dog!
For large corpora, it's impossible to know what features got selected. They probably aren't any feature a human would consider.