According to the Oxford Lexicon[1] a Bias is defined as "inclination or prejudice for or against one person or group, especially in a way considered to be unfair" and is a bigger problem than often thought.
Bias exists especially in modern applications that are based on Artificial Intelligence. Not every AI-application but especially those that are trained on human-generated data, are at risk of a severe bias.
At the website of AI Multiple[2], bias in modern AI is defined as “AI Bias is an anomaly in the output of machine learning algorithms, due to the prejudiced assumptions made during the algorithm development process or prejudices in the training data”. Or, in plain English: it is the assumption that our relatively young, mostly male and Western-oriented software developers generated “data” is the norm and that it is interchangeable with the data generated by "others".
If we focus on Human Language Technology: if he understands me, he understands everyone who speaks English. But... we often forget that "our" data, norms and values are not simply valid or true for every English-speaking person or for any other language by the way. So, an algorithm trained with this kind of data can perform very well if the users are more or less from the same “group” but the performance will drop down if the users are from a different group. This shift in performance is called the Bias.
Bias and data collection
Modern software development uses more and more AI-based routines where the main algorithm is trained on “human generated” data. Under “Human Generated Data” (HGD) we consider data that is produced by humans and are characteristic for those humans. Think about your face, your voice, the way you walk or sleep, or the books you read.
Often a project starts with a good idea and (a limited) amount of data; data that you often try to get from your own environment. And there the risk starts!
The first clearly recognisable modern software bias was with the recognition of faces. The training and testing group consisted of pictures of young, high educated (mostly) men. After severe coding, training and testing a pretty good result was achieved. The software was ready and it could go to market!
But... it became clear that women were less well recognised than men. So, a database with young women was quickly added and the system was re-trained. Sometime later, version two was released and now men AND women could be recognised. But... it became clear that elderly people and/or people with other skin colours were less recognised. So, new data were added and it went on for a long time until the database was a non-discriminating, good representation of all kind of humans.
Is it avoidable?
Unlike many of my colleagues, I’m not really surprised or disappointed by these results. After all, you have to start with what’s available, with people of whom you have a profile, a face or their speech. And often these are people who are similar to you. The wrong thing about it, is the time to market. Especially with human generated data you use for training of your algorithms, you know that you have to enlarge your data because the data must be a good and honest representation of the people who will use the software. And with the fast increase of AI-based software in our daily life, this often means everyone. So, once you have proved that the principle works, you must continue to collect new data from people who are different from you and then start the training again.
Automatic Speech Recognition
Is there a bias with speech recognition? Unfortunately, yes! It is not different to other AI-based application that use HGD. With ASR and other speech-based projects “bias law” applies. We train the recogniser on how and what WE say, and by WE we mean: our words, our tone of voice and of course our pronunciation. Once Speech Recognition left the laboratories, it started its market introduction as a user-specific application with which we could semi-automatically help certain groups to get something easier, faster, and/or cheaper.
But Speech Recognition got better and better, it became popular, and it was used by a growing group of other people. And as the user group expanded, the original assumptions (you speak like me, you say this or that as I do) were increasingly compromised. Whereas five to ten years ago we could still say that we could recognise “correctly spoken English” of “native English people”. Although still true, this turns out to be less and less useful. English is the Lingua Franca of our time and it is spoken by a huge variety of people who do not have English as their mother tongue. Of the approximately 1.5 billion people who speak English, less than 400 million use it as a first language. That means over 1 billion speak it as a secondary language with their own, sometimes typical pronunciation .
Moreover, Speech Recognition is not something you make and leave it as it is for the next 50 years. Languages always change, new generations pronounce existing words differently, the language itself changes under the influence of neighbouring languages, and through immigrants and second-language speakers: the use of the language by groups who didn’t speak that language before.
Just listen to an interview with a non-native English speaker or a broadcast from the 1930s. You can usually follow it, but for our ears it sounds strange. In order to keep up with speech recognition and to be able to recognise new, young, older, sick, or dialect-speaking English people, and to deliver what they ask for, the Automatic Speech Recogniser must be updated continuously.
You need to gather conversations, retrain your modules and bring it out. Then, don’t stop but continue. And once done, are you ready? Not quite, because apart from the slowly disappearing bias, we need to focus on the next big stap: “understanding what is meant”. But that will be discussed another time.