Study: AI can hide racial disparities in credit and lending
By law, credit and loan decisions cannot discriminate on the basis of race or lead to results that differ significantly by race. But to make sure they don’t discriminate, banks and other lenders aren’t allowed to ask race questions on most apps. This makes it difficult for auditors to ensure that credit decisions are fair.
To assess racial disparities in loan decisions, lenders or auditors must infer the races of applicants, typically using a system – known as a proxy – that guesses the races of applicants based on what they know. , such as their neighborhoods and last names.
But those proxies – including a method the Consumer Financial Protection Bureau uses to audit lenders – can yield very different results depending on small changes in how they guess candidate races, according to a new study by Cornell.
“It is worrying that these models are being used to determine whether financial institutions are complying with the law,” said Madeleine Udell, Richard and Sybil Smith Sesquicentennial Fellow and Assistant Professor in the School of Operations Research and Information Engineering. “They clearly aren’t evaluating what they’re supposed to be doing.”
Their paper, “Equity under ignorance: assessing the disparity when the protected category is not observed, ”Will be presented at the ACM Fairness, Accountability and Transparency Conference January 29-31 in Atlanta. Cornell Tech PhD student Xiaojie Mao is the lead author. Co-authors included Udell; Nathan Kallus, assistant professor of operations research and information engineering at Cornell Tech; and financial industry data scientists Jiahao Chen and Geoffry Svacha.
Understanding the risks of discrimination when using artificial intelligence is especially important as financial institutions increasingly rely on machine learning for lending decisions. Machine learning models can analyze amounts of data to arrive at relatively accurate predictions, but their operations are opaque, making fairness difficult.
“How can a computer be racist if you don’t understand race? Well, it is possible, and one of the biggest challenges we are going to face in the years to come is that humans are using machine learning with unintended harmful consequences that could lead us to polarization and polarization. increased inequalities, ”Kallus said. “There has been a lot of advancement in machine learning and artificial intelligence, and we have to be really responsible in our use of it. “
Race is one of the many characteristics protected by federal and state laws; others include age, gender and disability status.
The researchers used data from mortgages – the only type of consumer loan that includes the race for applications – to test the accuracy of the Bayesian Improved Surname Geocoding (BISG) audit system. They found that his results often underestimated or overestimated racial differences, depending on several factors. Assuming race is based on the census tracts where applicants live, we erase black applicants who live in predominantly white neighborhoods and white applicants who live in predominantly black neighborhoods.
The BISG model estimates the probability that a person is of a certain race, and by performing calculations a user can set a minimum probability, for example by choosing to use examples where the probability of a given race is 80% or more. But the differences in this minimum probability resulted in surprisingly large variations in the results, the researchers found.
“Depending on the threshold you chose, you would get very different answers as to the fairness of your credit process,” Udell said.
The researchers’ findings not only shed light on the accuracy of BISG, they could help developers improve machine learning models that make credit decisions. Better models could help banks make more informed decisions when approving or rejecting loans, which may lead them to grant loans to qualified but low-income applicants.
“You can determine who will actually default or not in a fair way,” Kallus said. “What we want to do is make sure that we put these constraints on the machine learning systems that we build and train, in order to understand what it means to be fair and how we can ensure that it is right from the start. departure. “