First Draft of Final Paper

Evaluation of the specific risk assessment tool used in New York City to determine racial inequalities in the criminal justice system

Juan Marte

CUNY John Jay College of Criminal Justice

Introduction (Incomplete)

In this data-driven era, if you have been arrested it is increasingly likely the judge deciding whether to send you home or to jail to await trial will consult actuarial math. Specialized algorithms called risk assessment tools, plumb your history, demographics, and other details to spit out a score quantifying how likely you are to commit another crime or to show up at your next hearing. But these tools have come under fire for treating people of color more harshly.

Algorithms and risk assessment tools in the criminal justice system

Algorithms have been used in some form in criminal justice decision-making since the 1920s, they are gaining wider use in areas such as pretrial decision-making. The algorithmic tools take in a variety of inputs, ranging from just a few variables to over a hundred, and assign defendants a risk score based on probability of rearrest, failure to appear in court, or both. The score is then often shown to judges, who may choose to release defendants with low-risk scores on their own recognizance or under some form of limited supervision (Stephanie Wykstra, 2018). The use of algorithms is increasingly relevant and of greater impact in the preliminary hearing to determine if a defendant can appear in court in a state of freedom.

Professor Olivia Woods (2020) states: “I think it will be difficult to figure out exactly how the algorithms work, partially because of the advanced math, but more importantly because the formulas are private/proprietary secrets that the companies that make the software don’t want to release. You’ll have to look at the outcomes, and/or use specific examples of individuals to bolster your claims”. In that same context, it is difficult to understand how complex mathematical formulas remain secret and endanger something as valuable as the right to liberty.

For example, in 2016, journalists at ProPublica criticized as unfair the risk assessment tool called COMPAS, developed by the company Northpointe (later renamed Equivalent). After analyzing data, they obtained from a jurisdiction in Florida that uses the algorithm, the reporters concluded that the algorithm is racially biased. They found that among defendants who are not rearrested within two years, 45% of those who are black, compared with 24% of the whites, and had been assigned high-risk scores. Yet when Northpointe responded to the critique, they pointed to a different statistic, supporting a different sort of fairness: within each risk category, black and white defendants had the same rearrest rate.

The algorithm’s goal is to reduce discrimination. Algorithms also have the potential to reduce discrimination by improving our ability to detect it. However, according to the report on measures of fairness in NYC risk assessment tool: “black defendants were about twice as likely as white defendants to be made ineligible for the supervised release program based on the risk assessment. Hispanic defendants were about 1.5 times as likely to be ineligible as White defendants. Thus, this tool has the potential to disproportionately impact communities of color relative to White communities by denying access to a potentially beneficial program at a higher rate”.

The problem is that algorithms are produced by people, and people discriminate. Those who complete the data and formulate the tools are human with prejudice and stigma, which causes algorithms to fail. So there is a need for transparency and continuous evaluation of algorithms. Judges must decide if defendants await trial at home or in jail on the basis of predicting whether they are likely to flee or commit crimes. For judges, this task requires probabilistic thinking of the sort that behavioral science tells us is very difficult for everyone, and that might well be infected by racial and other biases.

Existing data suggest that most risk assessment tools have poor to moderate accuracy in most applications. Typically, more than half of individuals judged by tools as high risk are incorrectly classified, they will not go on to offend. These persons may be detained unnecessarily. False positives may be especially common in minority ethnic groups, (Douglas, T, 2017). Thus, not only is the predictive accuracy of risk assessment tools imperfect, but it is also imperfectly presented in the literature. This limited and skewed evidence base creates a risk that decision-makers will rely more heavily on risk assessment scores than their accuracy warrants.

Tom Douglas (2017) states that risk assessment tools that take into account an individual’s demographic characteristics such as ethnicity, age, immigration status, and gender. It has been suggested that risk assessment tools should employ only ‘individualized’ information, such as information about declared plans and desires based on face to face interviews, though, even then, judgments may be subject to implicit biases based on the demographic characteristics of the individual being assessed. This means that the algorithm would be more effective if they only used information only relevant to the crime in question, eliminating demographic variants that have already been discriminated against in the past.

Adam Neufeld in his article “In defense of risk assessment tools” published on The Marshall Project, states that algorithms can help the criminal justice system, but only alongside thoughtful humans. In fact, algorithms could be useful for the judicial system as long as the operators of justice are objective, impartial, and with the good sense to impart justice.

Defenders of risk assessment say that the tools are inherently fair because a particular score means what it means a seven is a seven, no matter what your race. Studies suggest that well-designed algorithms may be far more accurate than a judge alone. Fairness is a subjective concept and judges are human beings, so they make mistakes due to their emotions. Let judges make critical decisions based on their personal experience, intuition, and whatever they decide is relevant, it is unfair.

Nonetheless, algorithms are weak in many ways. The rational and logical part of the brain is essential in making optimal decisions in which a machine cannot do it.

The most important thing to evaluate a risk assessment tool is the need for transparency in the development of these risk assessment tools. Data may itself encode racial bias into the tool since it comes from the peak years of “Stop, Question, and Frisk,” a policing practice found by courts to be racially discriminatory, said Tarak Shah, a data scientist at Human Rights Data Analysis Group (HRDAG).

Critics of algorithms have also pointed to a lack of transparency as a major problem. Should individual defendants be notified about what their scores are, and should they have a right to see how the scores were calculated? It should be the same when you apply for a credit card and they provide a credit report. Lack of transparency is what causes inequalities and discrimination.

Beyond that, should other people, for instance, independent researchers, have the access required to check a given tool for computational reproducibility, or whether they arrive at the same results, using the original data and code?

Risk assessments are pitched as race-neutral, replacing human judgment, subjective, fraught with implicit bias, with objective and scientific criteria. Trouble is, the most accurate tools draw from existing criminal justice data: what happened to large numbers of actual people who were arrested in any particular location. And the experience of actual people in the criminal justice is fraught with racial disparities and implicit bias.

But concerns about racial bias have determined risk assessments almost since their inception. In 2014, then-Attorney General Eric Holder warned that the tools “may exacerbate unwarranted and unjust disparities that are already far too common in our criminal justice system and in our society.”

Criminal Justice Agency (CJA) in New York City presented a release assessment with a maximum score of 25 points. The answers to each of the eight (8) questions in the assessment determine whether any points are deducted. The following outlines CJA’s recommendations based on the final score. Score 12-25, recommended for release on recognizance; score 0-11, not recommended for release on recognizance.

The questions are as follows: years since last bench warrant, two or more bench warrants in last 5 years, misdemeanor or felony convictions in last year, number of misdemeanor convictions in last 3 years, felony convictions in last 10 years, pending cases, years living at last 2 addresses, and reachable by phone. It is evident that the questionnaire covers areas such as criminal history, demographic area, and economic situation. A defendant who does not have a phone or who lives in an area marginalized by criminality is more likely to await his trial in prison.

In general, risk assessment models, normally operate by using some information about the defendant, such as criminal history and demographic information, to predict an outcome of interest. The data used to train and evaluate the model at hand was collected during the peak years of Stop, Question, and Frisk (SQF), a practice that was ultimately ruled to have been applied in a racially discriminatory manner in Floyd vs. City of New York (959 F. Supp. 2d 540). According to the New York Civil Liberties Union, from 2009-2011, 87 percent of SQFs were of Blacks and Latinos.

The report concluded that there is a bias against Black and Hispanic people because ‘innocent’ Black and Hispanic people are more likely to be denied access to supervised release or be described as a high risk of re-arrest than White people.

(more)

Conclusion (Pending)

Juan's Portfolio

Recent Posts

Recent Comments

Archives

Categories

Meta

Need help with the Commons?