We recently celebrated the WomenTech Global Awards 2020. The celebration was a wonderful reminder and showcase of the achievements and perseverance of women, minorities and allies in tech. However, in a way it is also a reminder about the challenges in front of us as a society: to achieve actual gender equality, in women's private and business lives.
The fact is that we are entering into the world's biggest economic transformation after the Industrial Revolution. We are using the power of algorithms to govern our lives better and to discover more profound ways of achieving societal progress.
However, in that process, we are becoming aware that technological progress, if not done with great attention and intention, could take us backward when it comes to certain civilizational values, such as human rights. One of the values to be protected with special care is women’s rights.
If we go into more technical detail we have to discuss the topic of Fairness in Artificial intelligence algorithms. It might sound esoteric, but it is a classic example of how female leaders are excluded when competing for C-suite jobs. With more HR companies using AI, this is a problem to be addressed now, not years from now.
The basic idea of model fairness in machine learning is very well defined in an article written by Fabricio Pretto. In short, it handles the problem of assessing how fair a machine learning model is when facing pre-existing biases in the data. In this case, the question would be: “Is it fair that a job-matching system favors male candidates for CEO interviews, because that matches historical data?”
What is fairness in Artificial Intelligence?
Using AI is becoming a business standard in many industries and Human Resources is no exception. Machine learning algorithms are being incorporated into the hiring process in order to help employers choose the best candidate for a vacant position. A latest poll among HR departments and Recruiters in Germany revealed that around 90% of interviewed HR departments are using automated recruiting platforms. More than ⅓ of the recruiting processes are already automated. The ratio of machine learning algorithms backing procedures like active recruiting, pre-selection of candidates or the final hiring decision is up to 10%. Surveyed companies believe that in the next 10 years this portion will increase significantly.
Machine learning tools have to be “trained” in order to be useful and achieve accuracy and reliability. This training calls for a use of lots of empiric data, where certain patterns are to be recognised and compared with newly presented data. This 'insight' is then used by the algorithm to make predictions, and respectively, classifications. This is a very simple overview of the basic algorithm workflow a machine learning tool is based on.
So, in the case of the hiring process, the data of a job applicant is churned through a trained machine learning model, with the aim to classify the applicant as suitable or not-suitable for the vacant position. For achieving accuracy the model has to be trained with lots of empiric data coming from the experience gathered in previous hiring processes. As an example, artificial neural networks are known to be extremely “data hungry”.
But exactly here, in this training process, comes the problem for female leaders.
Namely, historic data used as a training reference is biased by itself. Until recently most of the C-suite positions were occupied by male candidates. According to Statistisches Bundesamt the proportion of females holding leading positions in Germany moved from 25.8 % in 1992 to just 29.4 % in 2019. European Commission announced that only 27.8 % of management board positions in the largest public-listed companies within the European Union were held by women during the course of the year 2019. Artificial intelligence, distinctively introduced in the HR process approximately as from 2010, is training on those proportions.
In this context, the empiric data is biased and the machine learning model is trained on the data biased in favour of men. So when we follow classic machine learning modeling, we have a situation where women are once more put into less favorable roles than men. The practical consequence for C-suite positions is that the probability of being invited for a job interview is in favor of male candidates given the same level of education and professional experience.
So, this unquestioned and more automated reliance on artificial intelligence tools may lead to unfair conclusions if not handled properly.
Machine Learning as “Statistics on Steroids”
This kind of bias happening in disfavour of women is coming out of a general machine learning problem summarized in this article by Judea Pearl.
Pearl criticizes the data-centric thinking which dominates both statistics and machine learning cultures. In fact, he argues that this data-centric school assumes that the path to rational decisions comes out of the data itself, which is considered to be based on a reliable and objective data-mining process. However, data used in machine learning applications can be inherently biased and might not bring the results we are looking for.
Pearl further argues that we need a transformation from a pure data-fitting environment (which the author calls 'statistics on steroids') to a data-understanding culture. This data-understanding resp. data-interpretation school sees data not as the sole object of inquiry. It rather views data as an auxiliary means for interpreting reality, whereas 'reality' stands for the processes that generate the data.
Given this, we are clearly standing on a very interesting crossing on our tech development path...we as human beings are simply not capable of that kind of scaled processing of all the incoming data, putting them into co-relations and getting respective conclusions, as the process itself gets too complex. On the other hand, the pure reliance on the data fit may also lead to wrong developments and lead to the deepening of social inequalities, respectively endangering other civilisation values. The handling of potential female leaders is such an example.
It is clear that the answer is as always in the middle - there is an optimal mixture needed between available tools, processes, and their variations. Adjusting them to every business need and keeping specifics in mind is a must! It is necessary to expand our views and use different combinations of statistical modeling, data mining, machine learning algorithms and others to achieve that goal in an optimal way.
What can be done?
Slowly these problems get into the mainstream of artificial intelligence applications. Using representative datasets for training and testing a model, designing a model by putting fairness as a goal, testing machine learning models on unfair biases and analyzing the performance of those models while having unfairness and bias in mind, are just some examples of how to face the matter
First, we need to raise awareness. Second, data and ML professionals need to handle data in a proper way when fitting to models to get an objective helper in decision making.
In order to avoid bias in machine learning models, several strategies are available. Here is a short overview:
- Dataset 1: The performance of a model is closely related to the data set which is used. This point cannot be stressed hard enough. Therefore, a dataset which represents the basis for any stochastic model resp. on which a machine learning model is trained has to be representative. It has to map reality in a proper way. This is one of the major sources of bias.
- Dataset 2: It is always to be questioned if sensitive input features, like race, sex or age, are even to be used to build up a model. The obvious answer seems to be to just leave those parameters out in order to avoid bias. Unfortunately, things are not so simple. Leaving those kinds of parameters out may be the cause of a different species of bias.
See the example of college admission tests in the U.S. In this case, the ZIP codes were seen as discriminatory. On the other hand, test scores are affected by preparatory resources available in a certain area which is important to evaluate the overall test score. Hence, the canceling of ZIP codes (which gives an indication of the area) as discriminatory input feature cuts off very important information and biases the overall model.
- Dataset 3: Before using data in a model, it should be checked and “cleaned”. One major problem in this context is that data can be quite imbalanced. Imbalanced data means that one category in the data set is underrepresented, e.g. only ⅓ of C-suite positions are held by females and ⅔ are held by males. Imbalance in a dataset leads to bias as the machine learning model has a tendency to favour the majority class in the data set.
Unfortunately, imbalanced datasets are more the rule in real life than the exception. Data science offers a wide range of possibilities to handle this problem (e.g. undersampling, oversampling, generating synthetic data).
- Using the right machine learning model: As a matter of fact not every machine learning model is suitable for every problem. As an example, artificial neural networks are extremely suitable models for image recognition. In terms of credit default classifications they are not the “big burner” as favorizing of the majority class (see the issue of imbalanced data) can be overwhelming. It is crucial to find a suitable model for a given problem.
- Assessing model performance: Model performance is not model performance. It is important to look at the right performance metrics. A machine learning model can have an overall forecast accuracy of 97% which would be very good. This means that 97% of the overall forecast was done right. But within this 97% accuracy the forecast accuracy for the majority group (e.g. male candidates or non-default credit clients) could be e.g. 99% while the forecast accuracy for the minority group (e.g. female candidates or default credit clients) is just a very bad 60%. On the one hand, you have to go into detail in order to assess the performance of a model and to detect bias. On the other hand, this problem is heavily connected to imbalanced datasets.
- New assessment procedure: In order to detect inherent bias in a dataset, the classic range of performance metrics may not be enough to tackle this problem. Hence, new performance measures are moving in. One of those new approaches is called ‘Orthogonal Projection’. In short, orthogonal projection allows to remove the linear dependencies between two input features. This guarantees that there are no hidden dependencies between those features which could be the cause of a biased result.
When it comes to hiring female leaders and using AI tools, it is enough to be aware that no tools are to be taken as completely reliable and objective until proven so. We have to make sure special attention is put on using high tech and protecting the core values of society such as women's rights.
Thus, we will make sure not to give up the progress made so far and we can confidently go further until gender equality is real on paper, in the algorithmic, and actual reality. It all starts with recognizing the issues to be fixed, and taking action.
This article was written by Dzeneta Schitton. She is a Lawyer and Real Estate manager active in the field of digital and social media. Dzeneta combines legal and economic expertise with the newest data science technologies in order to bring changes into Real Estate risk management.