> "We believe this results from factors that include the lack of Black faces in the algorithms' training data sets..." the researchers wrote in an op-ed for Scientific American.
> The research also demonstrated that Black people are overrepresented in databases of mugshots.
The sort of clear-headed thinking that makes the AI bias field as respected as it is.
The actual quote that the mention in the article refers to:
"Using diverse training sets can help reduce bias in FRT performance. Algorithms learn to compare images by training with a set of photos. Disproportionate representation of white males in training images produces skewed algorithms because Black people are overrepresented in mugshot databases and other image repositories commonly used by law enforcement. Consequently AI is more likely to mark Black faces as criminal, leading to the targeting and arresting of innocent Black people."
So they're saying that simultaneously the training set has too few black faces and the set being compared against has too many.
> Consequently AI is more likely to mark Black faces as criminal, leading to the targeting and arresting of innocent Black people.
I don’t see how this relates to simple facial recognition. It doesn’t appear that they’re scanning for “criminal physiognomies” but for specific facial matches.
Furthermore, it seems that this whole line of argumentation implies that facial recognition software may be mistaking innocent Black people for non-Black perpetrators, which I don’t see any evidence for. How does this increase arrest rates for Black people if AI just can’t tell them apart? In all likelihood, the person who got away is also Black.
It doesn't imply that it's matching black people to white perpetrators. The claim is that A) the model itself is worse at matching for black faces and B) the database being searched against is often disproportionately made up of black faces.
Give it a photo of a black person to search on and you're probably getting a black person as a match, but the likelihood that it's actually the same person is lower than it would be if you were searching for a white person.
The quote doesn't say it's increasing arrest rates for black people, but arrest rates for innocent black people. If you use facial recognition and it's 99% accurate for white people and 75% accurate for black people (numbers chosen arbitrarily), you're going to target a lot more black people incorrectly even if you're never incorrectly matching photos of white criminals to black people.
> It doesn't imply that it's matching black people to white perpetrators. The claim is that A) the model itself is worse at matching for black faces and B) the database being searched against is often disproportionately made up of black faces.
Right, I understand that in the context of this specific quote, but the article implies that claim.
> Give it a photo of a black person to search on and you're probably getting a black person as a match, but the likelihood that it's actually the same person is lower than it would be if you were searching for a white person.
Lower, but by how much? The number given here is six in all. It feels very premature to use probably in that sentence. (Edit: misread that as you’re probably going to get a match)
> The quote doesn't say it's increasing arrest rates for black people, but arrest rates for innocent black people.
I meant this quote from the article: “facial recognition leads police departments to arrest Black people at disproportionately high rates.”
But I agree. It seems that there is a disparity in accuracy, it’s very unclear on how much of one but so far it appears that we’re talking about a fraction of a percent. We only have a sample size of six to draw on. We don’t know the demographics of the districts this has been employed in, and it seems strange to assume that they’re the same as the American population at large. I mean the first example is from Detroit.
The article posted to HN in this relevant section for the start of this thread (the part about more/less black people in the data sets) quotes/paraphrases a Scientific American piece (where I got the quote with "innocent" in it from my comment), which itself is based on a paper in Government Information Quarterly.
The paper is what the article here links to when they say that facial recognition leads to disproportionate arrests of black people, the part you're mentioning now. That's a separate finding of the paper from the statements about possible reasons "why" that are based on the training and search sets.
The main thrust of the paper is actually those numbers: they find that black-white arrest disparity is higher in jurisdictions that use facial recognition.
"FRT deployment exerts opposite effects on the underlying race-specific arrest rates – a pattern observed across all arrest outcomes. LEAs using FRT had 55% (B = 1.55) significantly higher Black arrest rates and 22% lower White arrest rates (B = 0.78) than those not implementing this technology."
They do some stuff I'm not really qualified to opine on to try to control for the fact that obviously facial recognition adoption is also correlated to department size, budget, crime rate and things like that. Of course the usual caveats still apply, particularly that they're not claiming or attempting to show causation.
This doesn't rescue their claim. If the suggested class imbalance really exists in the training/test sets, the model will preferentially identify whites as criminals.
The claim is that the model is worse at telling black faces apart from each other.
The system is trained to match images of faces, not identify criminals; it's not comparing things to its training set to give a "criminality" score. The training data is just what has taught the system how to extract features to compare. You run an image of an unknown person against your database of known images, and look for a match so you can identify the unknown person.
If the model is just "worse at" black people, it's going to make more mistakes matching to them.
When this software is being sold to these departments, it's amazing that people in the chain don't seem to be talking enough about the training set used or performance on certain populations. If you are going to arrest or build a case on facial recognition, you would think that they would be prepared to defend its accuracy against a broad range of demographics. Embarrassing failures and mistaken arrests, hurts their program, not to mention the money the city losses in lawsuits.
The answer to this conundrum might be that neither the departments nor the vendors are particularly interested in avoiding bias. Paying lip service is generally sufficient.
It makes sense to me? The algorithm specialises in distinguishing between the faces in its training set. It works by dimensionality reduction. If there aren't many black faces there it can just dedicate a few of its dimensions to "distinguishing black face features".
Then if you give it a task that only contains black faces, most of the dimensions will go unused.
Are black faces overrepresented or underrepresented? According to AI researchers, we're faced with Schrodinger's Mugshot--there's simultaneously too many and too few!
It's phrased accurately if confusingly. The bigger and un-fixable problem is that people are more apt to believe that a computer has calculated the correct answer when by its very nature popping oft bad images into a facial recognition search is almost always going to produce results even if most are fake and the real ID may not even be among the results.
Without additional leads police are strongly incentivized to pick one of the results and run with it and in many cases with enough data you have enough to get a plea or conviction even if they didn't do it especially if the person so selected was in the database in the first place because they have a record.
Convictions/pleas are obtained all the time with similar levels of proof.
This is fundamentally the same problem as dragnet searches of phone GPS to see who was in a space in a range of time. It could be a valuable investigative tool but its also a great way to "solve" a crime by finding someone to pin it on.
Because models are trained and validated on real data. Given a training set of crimes and corresponding surveillance footage, arrestee info is a (not noisy) label for “who is the guy in the movie.”
With a moment's thought, even the most emotive amongst us should see that the mugshots will be part of the training set--the photographed individuals are, after all, the class of true positives.
You train a model on a bunch of photos of white people, and a few photos of black people.
You then deploy that model, and use the model to match black person detained by racist officers against a database of photos that the police have from before. In that database the majority of people are black.
Shitty AI that was not properly taught what black people look like because most of the people in the training data were white, says that it found a probable match for detained black person.
Racist officers do not attempt to second guess the computer, so they throw innocent black person into their car and drive off to the police station.
Come on, we know that there is variation, sometimes drastic, between populations on all different facets of life. But this one? No. It would be racist to even broach the subject. That’s why we know White people are to blame for it.
> The research also demonstrated that Black people are overrepresented in databases of mugshots.
The sort of clear-headed thinking that makes the AI bias field as respected as it is.