Garbage In/Garbage Out in criminal justice AI systems

I just came across an article related to the American criminal justice system that should be very disturbing. With over 2 million Americans in prison or jail (and 7 million when parolees and probationees are included), American criminal justice is necessarily run as a cheap slipshod operation. Many accused of serious crimes barely have a chance to talk to a public defender. And, to further economize, AI (“artificial intelligence”) is increasingly used to
[ul][li] tell police forces where to target with patrols;[/li][li] advise judges on bails and sentencing; and[/li][li] help parole boards make parole decisions.[/li][/ul]
These AI systems are proprietary, designed by by software houses who protect the algorithms as their secret intellectual property. Thus the judges and parole boards may not even be informed what factors produced the scores presented to them.

All in the name of efficiency. Just as licorice candy is extruded at high speed by cleverly designed machines so that children can get tooth decay at modest cost, so defendants and convicts are extruded through the justice system with little need for expensive humans to verify if justice is actually being served.

But anyone familiar with “Artificial Intelligence” knows that if there’s anything AI systems are NOT, it is intelligent. They use pattern matching. “In our training set, the defendants who matched the target defendant received an average prison sentence of 7.4 years, so we recommend a sentence of 5 to 10 years.” Does everyone see the problem with that? The AI is designed to mimic past mistakes. Blacks received disproportionately high sentences in the past? OK, let’s keep it up!

I think this approach is wrong.

AI systems like these, if used properly - I can’t speak for specific implementations - will by definition give you the highest chance of maximizing the parameters you care about.

For example, let’s suppose you are deciding how much bail to require someone to pay. That is, for differing bail amounts, what is the probability that the accused returns to court?

AI systems can be fed data on the past, infer the most likely relationship between bail amount and chance of returning to court, and give you the optimal policy. Humans decide how much they care about suspects skipping bail versus some sort of quantified value to having suspects able to return to work and not become destitute criminals.

Humans generally can’t do any better, given good choices for the value function, clean data, etc.

I think we need more systems like these. We definitely need to look at sentencing, and choose punishments more rationally. If 5 years imprisonment for a major crime has the same deterrent effect as 20, then 5 years is a better choice (because it leaves more remaining lifespan for the felon to repay society and also it saves 15 years of costs)

Similarly, if it turns out that say, spouse murderers who killed the spouse in the heat of passion rarely reoffend, maybe a policy of probation makes sense.

It depends on what the data tells us - the point is, human beings would emotionally decide that since they read articles about criminals reoffending, it must happen really often (whether or not in reality it does), therefore all prison sentences should be extremely long. And murderers should generally always get life because they took someone else’s life away. These sort of policies may feel good emotionally but they may not be *optimal for meeting goals for cost effectiveness for a justice system.

*I know that sounds mercenary, but consider this. Say you as a human really hate major crimes being committed, like murders and rapes and armed robberies. And you have a fixed budget in dollars. Lesser crimes you also care about but perhaps 10 times or more less. Shouldn’t you choose policies - both prison sentence length, how many police, parole policies, whether prisons are punitive or rehabilitative - that make a variable you call TOTAL_CRIME, which is the weighted sum of all the crimes committed, minimized?

Wouldn’t any suboptimal policy being causing more of your citizens to be victims of crimes?

Also relevant to this topic is Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O’Neal.
https://weaponsofmathdestructionbook.com

Interview: Cathy O'Neil on Weapons of Math Destruction - Econlib

I am not going to say it’s always the case, but my agency is not using these just for efficiency. It’s also being used for consistency between offenders (two similar offenders should be treated similarly) and officers ( an offenders supervision shouldn’t change drastically simply because she’s transferred from one officer to another).Those expensive humans are not so good at that.

But it’s also being used to target resources - some people are at higher risk of violating the terms of their supervision than others so resources should be concentrated on them. A drug dealer is more likely to violate than people convicted of certain types of homicides, the risk goes down as time passes so someone who has been out for a year is lower risk than someone who has been out for a month and so on . I find that humans are not so good at that, either.

That’s kind of the problem. The AI isn’t necessarily consistent between offenders. If the dataset it was trained on includes disparate treatment for different sorts of people, the algorithm will perpetuate that.

That’s one of the reasons that some people (me included) object to these kinds of things. Another reason is that if the details of the algorithm and the source code are secret, it fundamentally compromises the ability of the defendant to challenge the prosecution’s case.

I’m talking about consistent between similar offenders- I will not claim to know all the inner workings of the risk assessment tools I use, but I do know that if two different offenders have the same answers , they will be given the same risk level. That was not a given before using these tools. I know there are no questions regarding race or ethnicity or income and I know that the instruments are not valid for certain groups such as offenders under the age of 18.
I’m not sure what you are referring to when you say " it fundamentally compromises the ability of the defendant to challenge the prosecution’s case." since I am unaware that any of these instruments are used in prosecuting cases. They are essentially a high-tech and more detailed version of the old-fashioned grid or points systems that were previously used to determine bail, sentencing, or how parolees or probationers were to be supervised. When I first started working at my agency , everything was determined by time. Once someone was on supervision for X number of months , supervision became less intense. Which probably worked fine at one end of the spectrum - but it often resulted in over-supervision at the other end. The articles talk about “sending people to jail” but they don’t acknowledge that risks assessment also " let people out of jail" that parole boards using their judgment wouldn’t have released- and sometimes those parole boards don’t release people for reasons having nothing to do with the chances that they will reoffend.

What none of these articles shows, that I could see, is whether the AI programs provide better or worse decisions than a human in the same situation. Most of your noted drawbacks to AI are also drawbacks with human decision makers. Do you really think that humans, consciously or unconsciously, don’t allow race or socioeconomic background (or vengeance, or “making an example of”, etc.) to influence decisions? Do you think that human decision making is not something of a “black box”? Humans can’t tell you every piece of data used to make a decision, either. We all have ingrained biases and previous experiences that affect our decisions. These have led to many documented cases of poor decision making - sentences too long, too short, let out on parole too early, etc.

The only fact that has been submitted so far is that AI is a different way of doing things.

Keep in mind that all an AI can do is predict the probability distribution of a policy, versus it’s model, and choose a policy (called the optimal policy) that makes the weighted score over that probability distribution be the best on what your metric is.

For example, if you consider every felony re-offence -3 and every misdemeanor -1, while every month of jail is -0.1, you then multiply by the probability distribution for each month of jail and choose the length with the highest (least negative) score.

I mean I’d want to use a more fine grained model, I think a drug dealer getting caught dealing more drugs is not nearly as bad as a rapist raping someone else or a murderer killing someone else. So I’d categorize each offense into smaller buckets with differing weights.

Anyways if you do the math right, you will still have bad outcomes. Both paying for unnecessary extra months of jail and from crimes committed. Optimal policies do not guarantee success in such an unpredictable environment.

But, done right, the AI system would be fair. Though they have a serious bug - this affects human decision makers - the problem is called “overfitting”. Say the system knows the offender’s name, and figures out that offenders who’s name starts with an “R” are more likely to re-offend. You see, if in reality, over all the criminals in the entire United States, if “names starts with an R” is a reliable metric, then it’s rational to give them a long sentence. But, usually, correlations with variables where there exists no mechanism of linkage are just variations from chance.

Fighting overfitting is doable but it’s a complex problem and this is one reason why the companies selling this software should be required to show their work.

The goal with most of these AI assessments is to take the prejudiced human equation out of the system as much as possible and to reduce recidivism. In a law and order culture like the US with a “lock 'em up and throw away the key” mentality, these systems are actually very progressive. They reward treatment and encourage rehabilitation.

The ORAS seems to be catching on. I know the Parole Board in Alabama implemented it, and the Courts and state prison system are in the process of implementing it as well. It wasn’t designed by a corporation, but by researchers at the University of Cincinnati.

The goal is to reduce recidivism, and my impression is that the results have been positive, though I haven’t looked up the data for myself.

However, it’s far from fool proof as this story will attest to. This guy was released under ORAS.

I think this is one of the potential advantages of an AI system; it will be objective. A computer will only give outputs based on the data it’s programmed to evaluate. It’s not going to introduce any subconscious prejudices into its conclusions. It’s not going to treat black people differently than white people or men different than women.

Now I’ll grant that you could program an AI system to be prejudiced. But it would have to be done deliberately; a computer is not going to develop prejudices on its own. And I feel that most prejudices would be screened out by the programming. Most prejudices are subconscious; very few people believe that they act based on them. So when people are programming in standards, they would use the impartial standards they believe they have rather than the biased standards they actually use in life. The AI system would then apply those better standards in the real world. A computer would mimic what we aspire to be rather than what we actually are.

The law is distinct from justice.

It seems absolutely fine to me to use systems to monitor post-facto to help consistency, prevent corruption, and so on, and for the data to be used in training sessions and the like, but live is another matter entirely. I don’t want our society to become Megacity 1 where perps are cubed simply because they are likely to be criminals.

Justice is a human matter and should remain in the control of humans.

I think this is the point septimus is trying to make. There seems to be no oversight of the programming criteria. Just because a program is objective about HOW it judges the criteria doesn’t mean that the criteria being used to judge is objective. And as Doctor Jackson asks, what do we know about how these programs are doing so far? And I’ll ask: is it right to subject vulnerable people to the “shaking out” process?

Just because something has the potential to be a highly useful tool doesn’t mean it currently is!

mc

The inputs are deliberate. Prejudice may not be, but can still happen with the right (or wrong) inputs. For instance, if poor people are more likely to have certain police interactions, say due to living in a more heavily policed area or being more likely to be stopped, and those interactions count against you, that’s a potential problem. Or if the questions are proxies for race or class.

Per the interview I linked to above:

All of this should be avoidable, but some of the questions are already iffy, and we often can’t examine the algorithms.

I had a long response written and lost it, but it was similar to this later quote from the interview
**

I don’t think anyone thinks these can’t be improved at all - but as someone who uses one of these instruments I can say
1 The one I use doesn’t have any questions regarding race, or social class or zip code or family members having police contact.
2 The instrument is known not to be valid and not used for certain populations, such as offenders under age 18.
3 In my experience , the use of this instrument has negated the tendency of many officers to consider every offender at high risk for violating. For example, for 25 years my agency has had a policy that low-risk offenders should not be subject to a curfew. Nearly everyone had one anyway. Now, the policy is that offenders given a certain level by the tool are not to have curfews, and although the tools can be overridden, "gut feelings’ are not sufficient reason for an override.

If as you say above, you don’t understand how it works, you can’t say whether it has any consistent biases.

The problem with AI algorithms of the sort that these programs use is that they take on the biases of the people who develop them. If they use a dataset with built in biases, the algorithm will have built in biases.

It would not have to be deliberate. Imagine that unconsciously racist probation officers are stricter on Black people on parole, inflating the stats on how often Black people break parole. This data is given to the algorithm, and the automated system now is accidentally less likely to give Black people parole, since the data shows that they break parole more often. So, you realize that and don’t give the computer access to race data, but even then it is likely to find a proxy for race, like zip code, that correlates well with breaking parole.

I’m not saying that they don’t have any consistent bias. I’m saying that in my experience they are less biased than the alternative , because those judges/parole boards and parole/probation officers also have biases and have direct knowledge of factors ( like neighborhood, zip code, socioeconomic class) that the assessment tools don’t. Is it your position that people are not biasd bias or that we can somehow know exactly how a judge decides to sentence a defendant to the minimum sentence rather than the maximum?

It might- or like the instrument I use, it could ask a question that almost perfectly correlates with violating parole. That question is, of course “how many times has this person violated parole”.

I get the feeling that people believe that most of these assessment tools just go through the government entity’s computer system and pull out data, without anyone except the software house knowing what data is being taken out.I certainly can’t say that none are like that- but I do know that none of the ones I’ve used or even read about are.* They all involve someone answering a series of questions.Lots of people do know what the questions are and it’s not a matter of the program “finding” a proxy for race , like zip code. People would have had to actively decide to include that question.

  • It couldn’t be like that at my agency, because all our data is kept in something circa 1985, greenscreen and all

You should perhaps re-examine your goals. Let’s suppose there is a clear correlation between class and recidivism. I mean, I don’t assume there is, but I would believe it if the data showed that, because obviously richer people have more ways to stay out of trouble.

But if you’re the system designer, what do you care about? Ethically, anything other than “to minimize the number of victims of crimes, weighted more heavily towards more serious crimes, with the budget I have” is a policy that leads to more crime victims.

Sure, it may be *unfair *to jail people of a lower class longer, and it may also be unfair because due to a long history of discrimination, some minorities have a much higher probability of being in the lowest class.

But remember, any policy of trying to be “more fair” by definition means you trade off for less effectiveness in preventing other citizens from being a victim of a crime.

What matters more to you? Personally I see only one valid choice, but if you can argue another way and get your peers to agree, it’s possible to set the optimizing criteria on an AI system your way.

Of course, in my parole officers are stricter on Black parolees not-so-hypothetical, this question will disadvantage Black parole seekers.

The problem is that if race and getting caught breaking parole are very correlated, then any attempt to predict one will naturally also kind of predict the other. I imagine that at the very least, you have to tell the machine what crime the inmate is in for. To use a possibly outdated example, the algorithm will probably predict a higher chance of parole breaking from people convicted of selling crack than of people selling powdered cocaine. I’m not sure if there’s a real solution - this all seems bad, but I agree that it is probably better than humans doing it.

:confused: What do you think my “goals” are?