Newswise — Can artificial intelligence and machine learning tools be harnessed to build a more equitable workforce? It’s a question with implications prompting the New York City Council to consider new rules meant to curb bias in hiring AI, which is increasingly being used to evaluate job applications. 

New research co-authored by Margrét Bjarnadóttir at the University of Maryland’s Robert H. Smith School of Business may offer a way.

Early attempts to incorporate AI into the human resources process haven’t been met with resounding success. Among the most well-known fails: In 2018, was forced to abandon an AI recruiting tool it built, when it was discovered to be discriminating against female job applicants.

In the research, recently awarded the Best White Paper award at the 2021 Wharton Analytics Conference, Bjarnadóttir and her co-authors, David Anderson at Villanova University and David Ross at the University of Florida, examine the roots of those AI biases and offer solutions to the challenges they present – solutions that could unlock the potential of analytics tools in the human resources space. They for example recommend creating a bias dashboard that parses a model’s performance for different groups, and they offer checklists for assessing your work – one for internal analytical projects, and another for adopting a vendor’s tool.

“We wanted to look at the question of: How can we do better? How can we build toward these more equitable workplaces?” says Bjarnadóttir, associate professor of management sciences and statistics at Maryland Smith.

It wouldn’t be simple. The biases stem from an organization’s HR history. In Amazon’s AI case, the analytics tool was found to be rejecting resumes from applicants for technical job roles because of phrases like “women’s chess team.” With the technical jobs long dominated by men, the model had taught itself that factors correlated with maleness were indicators of potential success.

“We can’t simply translate the analytical approaches that we use in accounting or operations over to the HR department,” Bjarnadóttir says. “What is critically different, is that in contrast to, say, determining the stock levels of jeans or which ketchup brand to put on sale, the decisions that are supported in the HR department can have an instrumental impact on employees lives; who gets hired, who receives a promotion, or who is identified as a promising employee.”

And because the data it draws from is historical, it’s tough to amend. In other contexts there are more obvious remedies. For example, if a skin-cancer-detecting AI tool fails to detect skin cancers in darker skin tones, one could input more diverse images into the tool, and train it to identify the appropriate markers. In the HR context, organizations can’t go back in time and hire a more diverse workforce.

And the fact that typically HR data is what is called unbalanced, meaning, not all demographic groups are equally represented, causes issues when our algorithms interact with the data. A simple example of that interaction is the fact that analytical models will typically perform best for the majority group, because a good performance for that group simply weights the most in the overall accuracy – the measure that most off-the-shelf algorithms optimize.

So even if your models are carefully built - If your data aren’t balanced, even carefully built models won’t lead to equal outcomes for different demographic groups. For example, in a company that has employed mostly male managers, a model is likely to identify men disproportionately as future management candidates - in addition to correctly identifying many men. And the algorithm is more likely to overlook qualified women.

So, how can they “do better,” as Bjarnadóttir says, in future hiring and promotion decisions, and ensure that applications are unbiased, transparent and fair? The first step is to apply a bias-aware analytical process that asks the right questions of the data, of modeling decisions and of vendors, and then both monitors the statistical performance of the models, but perhaps more importantly, monitors the tool´s impact on the different employee groups.

Other Link: 2021 Wharton Analytics Conference