Last month, I wrote a blog post warning about how, if you follow popular trends in NLP, you can easily accidentally make a classifier that is pretty racist. To demonstrate this, I included the very simple code, as a “cautionary tutorial”.
The post got a fair amount of reaction. Much of it positive and taking it seriously, so thanks for that. But eventually I heard from some detractors. Of course there were the fully expected “I’m not racist but what if racism is correct” retorts that I knew I’d have to face. But there were also people who couldn’t believe that anyone does NLP this way. They said I was talking about a non-problem that doesn’t show up in serious machine learning, or projecting my own bad NLP ideas, or something.
Well. Here’s Perspective API, made by an offshoot of Google. They believe they are going to use it to fight “toxicity” online. And by “toxicity” they mean “saying anything with negative sentiment”. And by “negative sentiment” they mean “whatever word2vec thinks is bad”. It works exactly like the hypothetical system that I cautioned against.
On this blog, we’ve just looked at what word2vec (or GloVe) thinks is bad. It includes black people, Mexicans, Islam, and given names that don’t usually belong to white Americans. You can actually type my examples into Perspective API and it will actually respond that the ones that are less white-sounding are more “likely to be perceived as toxic”.
- “Hello, my name is Emily” is supposedly 4% likely to be “toxic”. Similar results for “Susan”, “Paul”, etc.
- “Hello, my name is Shaniqua” (“Jamel”, “DeShawn”, etc.): 21% likely to be toxic.
- “Let’s go get Italian food”: 9%.
- “Let’s go get Mexican food”: 29%.
Here are two more examples I didn’t mention before:
- “Christianity is a major world religion”: 37%. Okay, maybe things can get heated when religion comes up at all, but compare:
- “Islam is a major world religion”: 66% toxic.
I’ve heard about Perspective API from many directions, but my proximate source is this Twitter thread by Dan Luu, who has his own examples:
It’s 🤣 to poke around and see what biases the system picked up from the training data. 😰 to think about actual applications, though. pic.twitter.com/VJ9y9yxz2D— Dan Luu (@danluu) August 12, 2017
I have previously written positive things about researchers at Google who are looking at approaches to de-biasing AI, such as their blog post on Equality of Opportunity in Machine Learning.
But Google is a big place. It contains multitudes. And it seems it contains a subdivision that will do the wrong thing, which other Googlers know is the wrong thing, because it’s easy.
Google, you made a very bad investment. (That sentence is 61% toxic, by the way.)
As I update this post in April 2018, I’ve had some communication with the Perspective API team and learned some more details about it.
Some details of this post were incorrect, based on things I assumed when looking at Perspective API from outside. For example, Perspective API does not literally build on word2vec. But the end result is the same: it learns the same biases that word2vec learns anyway.
In September 2017, Violet Blue wrote an exposé of Perspective API for Engadget. Despite the details that I had wrong, the Engadget article confirms that the system really is that bad, and provides even more examples.
Perspective API has changed their online demo to lower toxicity scores across the board, without fundamentally changing the model. Text with a score under a certain threshold is now labeled as “not toxic”. I believe this remedy could be described technically as “weak sauce”.
The Perspective API team claims that their system has no inherent bias against non-white names, and that the higher toxicity scores that appear for names such as “DeShawn” is an artifact of how they handle out-of-vocabulary words. All the names that are typical for white Americans are in-vocabulary. Make of that what you will.
The Perspective API team continues to promote their product, such as via hackathons and TED talks. Users of the API are not warned of its biases, except for a generic warning that could apply to any AI system, saying that users should manually review its results. It is still sometimes held up as a positive example of fighting toxicity with NLP, misleading lay audiences into thinking that present NLP has a solution to toxicity.