How to make a racist AI without really trying

Perhaps you heard about Tay, Microsoft’s experimental Twitter chat-bot, and how within a day it became so offensive that Microsoft had to shut it down and never speak of it again. And you assumed that you would never make such a thing, because you’re not doing anything weird like letting random jerks on Twitter re-train your AI on the fly.

My purpose with this tutorial is to show that you can follow an extremely typical NLP pipeline, using popular data and popular techniques, and end up with a racist classifier that should never be deployed.

There are ways to fix it. Making a non-racist classifier is only a little bit harder than making a racist classifier. The fixed version can even be more accurate at evaluations. But to get there, you have to know about the problem, and you have to be willing to not just use the first thing that works.

This tutorial is a Jupyter Python notebook hosted on GitHub Gist. I recommend clicking through to the full view of the notebook, instead of scrolling the tiny box below.


16 thoughts on “How to make a racist AI without really trying

  1. Very cool work!

    The article specifically mentions Racism and Sexism, but would that cover religion? I suspect if might cover the higher level view of religion as they are often correlated with race, but what about the lower level (catholic vs protestant)?

    In Canada I know the charter of rights and freedoms explicitly lists “religion, race, national or ethnic origin, colour, sex, age or physical or mental disability.” and it was ruled by the supreme court that sexual orientation is considered equivalent in the list.

    I also wonder if this work could be effectively expanded to include age, disability and sexual orientation? I suspect this might be more difficult as there are many dual purpose words which (I won’t list) are often used as both as derogatory towards a class of people, and in informal speech as negative descriptors of an item.


  2. Great questions!

    Religious bias is something I looked at and described briefly in a previous post: . However, this is tricky. I created a test for it, but I’m not convinced it’s measuring the right thing, and I can only improve the measure slightly with the de-biasing code I’ve written so far. And I haven’t seen any publications that try to address religious bias in semantics that I can follow for guidance.

    On the other hand, while the first-names test is mostly about race/ethnicity, you can also see the difference in inferred sentiment between predominantly-Christian and predominantly-Islamic first names. That’s something where ConceptNet’s debiasing helps.

    I also made a brief mention of age, but only in the context of de-porn-ing word embeddings that had learned unfortunate associations for words such as “teen” from the Web.


  3. Dear Rob, many thanks for this lucid tutorial. We would like to use it for a ‘speaking robot’ that shows the issue in an artist (algoliterary) exhibition in November in Brussels. The idea is to give the visitor the option to score a sentence she would type or to view the training process. The way you compiled the script, the scoring of the phrase with a trained/saved model needs the loaded embeddings – which takes 3.4 minutes using the Glove 6B dataset. Would there be a way to circumvent this? Many thanks in advance for your answer! An (you find information on the exhibition via and my email is under Core-members/An Mertens on


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s