How to make a racist AI without really trying
Perhaps you heard about Tay, Microsoft's experimental Twitter chat-bot, and how within a day it became so offensive that Microsoft had to shut it down and never speak of it again. And you assumed that you would never make such a thing, because you're not doing anything weird like letting random jerks on Twitter re-train your AI on the fly.
My purpose with this tutorial is to show that you can follow an extremely typical NLP pipeline, using popular data and popular techniques, and end up with a racist classifier that should never be deployed.
There are ways to fix it. Making a non-racist classifier is only a little bit harder than making a racist classifier. The fixed version can even be more accurate at evaluations. But to get there, you have to know about the problem, and you have to be willing to not just use the first thing that works.
This tutorial is a Jupyter Python notebook hosted on GitHub Gist. I recommend clicking through to the full view of the notebook, instead of scrolling the tiny box below.
https://gist.github.com/rspeer/ef750e7e407e04894cb3b78a82d66aed
Comments