Check your bias, IBM!
I got across an article from Venture Beat, which brands itself as 'the leading source for transformative tech news and events that provide deep context to help business leaders make smart decisions.' This headline caught my eye: IBM toolkit promises to mitigate advertising bias.
That headline is a strange one. It combines so many incompatible words. IBM and advertising or advertising and bias. (Also note that it is not IBM making promises. It is the toolkit making the promise. How?)
The article starts with the usual mantra about AI, Machine Learning and bias. We learn that bias can be buried in data. Yes, that’s true.
We also learn that bias can be buried in data used to generate the algorithms. No, that’s not true. Data doesn't generate algorithms! Something is going on here…
Then we learn from this article that in 2018, IBM launched a project called AI Fairness 360. This is an open source toolkit which should help us to mitigate bias in datasets and ML models. Yes, that's a great objective, especially in areas of bank loans, insurance, medicine. It’s doubly important in the justice system.
The author then continues that IBM is now trying to apply this AI Fairness 360 toolkit to the advertising industry.
These are actual quotes from the article:
'The bias that exists in advertising has historically been ingrained in how we do marketing'.
You don't say? There is bias in marketing? I would never have guessed that.
'We have gotten really good in the advertising industry at targeting people'.
No, you haven’t. I still have yet to find a website with an ad which really targets me.
'But in the process of targeting people with new ML algorithms, advertisers have also sub-optimized the results for certain groups'.
Now we know. Groups are special entities which have nothing to do with people. Advertisers are really good at finding me. But once they realize that I belong to a certain group, they stop showing me the ads. Do let me know which group that is.
To provide an example of how advertising campaigns can be biased, they talk about an educational campaign about the importance of COVID-19 vaccination. That sounds like a worthwhile campaign and it should reach everyone. Then you learn that the agency used a system which optimized serving the ads based on click-through. Further, it was most successful with women aged 45 to 65.
They ran an algorithm which optimizes for engagement. Then, they are surprised that certain people don't see the ad.
Why would you ever use a system which is self-optimized for targeting in a situation where you want to reach out to everyone!? That has nothing to do with bias. That is stupid.
Still, I wanted to learn more about this IBM Toolkit to help me to mitigate bias. I clicked through a few articles to find the AI Fairness 360 toolkit. This is where you can see the real anti-bias action.
There’s a demo where I could choose the data set. From three available sets, I chose German credit scoring.
The machine should be able to tell me who to lend money to (and who will be denied a loan).
Based on the data, the system indicates that Male is privileged, Female is unprivileged, Old is privileged and Young is unprivileged.
In the next step, the machine checks for bias and confirms it. Money for old guys, no money for young women…
Except the machine didn't detect any bias based on sex. Only on age. Interesting.
Now, let's unleash the machine to find a way to mitigate the bias. We have 4 options available to do that. I experimented with each option.
Reweighting: Weigh the examples in each (group, label) combination differently to ensure fairness before classification.
Result: Bias against the unprivileged group was reduced to acceptable levels for 1 of 4 previously biased metrics, 3 out of 5 metrics which still indicate bias for the unprivileged group.
Optimized Pre-Processing: Learns a probabilistic transformation that can modify the features and the labels in the training data.
Result: Bias against the unprivileged group was unchanged after mitigation, 4 of 5 metrics indicate bias.
Adversarial Debiasing: Learns a classifier that maximizes prediction accuracy and simultaneously reduces an adversary's ability to determine the protected attribute from the predictions. This approach leads to a fair classifier as the predictions cannot carry any group discrimination information that the adversary can exploit.
Result: Bias against the unprivileged group was reduced to acceptable levels for 1 of 4 previously biased metrics, 3 of 5 metrics still indicate bias for the unprivileged group.
Reject Option-Based Classification: Changes predictions from a classifier to make them fairer. Provides favorable outcomes to unprivileged groups and unfavorable outcomes to privileged groups in a confidence band around the decision boundary with the highest uncertainty.
Result: Bias against the unprivileged group was unchanged after mitigation, 4 of 5 metrics indicate bias.
Conclusion?
Even IBM, with all their algorithms and supercomputers, can't do much about German bias. That’s true, even if the algorithm is set up to explicitly discriminate against a certain group and favor the other one.
Many companies today claim that they can reduce or eliminate bias. However, when anybody tells you that adding more computers, more technology, more AI and ML to mitigate bias will solve the problem, they are wrong. Computers just amplify the bias for us. When you add journalists who just copy and paste links and repeat the same shit, you get the impressions that machines are out there to get us. And that's my biased recurrent pattern for you.