Yes, people do want pre-computed word embeddings

The very informative tutorial by Vlad Niculae on Word Mover’s Distance in Python includes this step:

We could train the embeddings ourselves, but for meaningful results we would need tons of documents, and that might take a while. So let’s just use the ones from the word2vec team.

I couldn’t have asked for a better justification for ConceptNet and Luminoso in two sentences.

When presenting new results from Conceptnet Numberbatch, which works way better than word2vec alone, one objection is that the embeddings are pre-computed and aren’t based on your data. (Luminoso is a SaaS platform that retrains them to your data, in the cases where you do need that.)

Pre-baked embeddings are useful. People are resigning themselves to use word2vec’s pre-baked embeddings because they don’t know they can have better ones. I dream of the day when someone writing a new tutorial like this says “So let’s just use Conceptnet Numberbatch.”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s