“F.D.R.’s War Plans!” reads a headline from a 1941 Chicago Daily Tribune. Had this article been written today, it might rather have said “21 War Plans F.D.R. Does Not Want You To Know About. Number 6 may shock you!”. Modern writers have become very good at squeezing out the maximum clickability out of every headline. But this sort of writing seems formulaic and unoriginal. What if we could automate the writing of these, thus freeing up clickbait writers to do useful work?
If this sort of writing truly is formulaic and unoriginal, we should be able to produce it automatically. Using Recurrent Neural Networks, we can try to pull this off.
Standard artificial neural networks are prediction machines, that can learn how to map some input to some output, given enough examples of each. Recently, as people have figured out how to train deep (multi-layered) neural nets, very powerful models have been created, increasing the hype surrounding this so-called deep learning. In some sense the deepest of these models are Recurrent Neural Networks (RNNs), a class of neural nets that feed their state at the previous timestep into the current timestep. These recurrent connections make these models well suited for operating on sequences, like text.
We can show an RNN a bunch of sentences, and get it to predict the next word, given the previous words. So, given a string of words like “Which Disney Character Are __”, we want the network to produce a reasonable guess like “You”, rather than, say, “Spreadsheet”. If this model can learn to predict the next word with some accuracy, we get a language model that tells us something about the texts we trained it on. If we ask this model to guess the next word, and then add that word to the sequence and ask it for the next word after that, and so on, we can generate text of arbitrary length. During training, we tweak the weights of this network so as to minimize the prediction error, maximizing its ability to guess the right next word. Thus RNNs operate on the opposite principle of clickbait: What happens next may not surprise you.
I based this on Andrej Karpathy’s wonderful char-rnn library for Lua/Torch, but modified it to be more of a “word-rnn”, so it predicts word-by-word, rather than character-by-character. (
Code will be put up on github soon. Here is the code.) Predicting word-by-word will use more memory, but means the model does not need to learn how to spell before it learns how to perform modern journalism. (It still needs to learn some notion of grammar.) Some more changes were useful for this particular use case. First, each input word was represented as a dense vector of numbers. The hope is that having a continuous rather than discrete representation for words will allow the network to make better mistakes, as long as similar words get similar vectors. Second, the Adam optimizer was used for training. Third, the word vectors went through a particular training rigmarole: They received two stages of pretraining, and were then frozen in the final architecture – more details on this later in the article.
The final network architecture looked like this:
One Neat Trick Every 90s Connectionist Will Know
Whereas traditional neural nets are built around stacks of simple units that do a weighted sum followed by some simple non-linear function (like a tanh), we’ll use a more complicated unit called Long Short-Term Memory (LSTM). This is something two Germans came up with in the late 90s that makes it easier for RNNs to learn long-term dependencies through time. The LSTM units give the network memory cells with read, write and reset operations. These operations are differentiable, so that during training, the network can learn when it should remember data and when it should throw it away.
To generate clickbait, we’ll train such an RNN on ~2 000 000 headlines, scraped from Buzzfeed, Gawker, Jezebel, Huffington Post and Upworthy.
How realistic can we expect the output of this model to be? Even if it can learn to generate text with correct syntax and grammar, it surely can’t produce headlines that contain any new knowledge of the real world? It can’t do reporting? This may be true, but it’s not clear that clickbait needs to have any relation to the real world in order to be successful. When this work was begun, the top story on BuzzFeed was “50 Disney Channel Original Movies, Ranked By Feminism“. More recently they published “22 Faces Everyone Who Has Pooped Will Immediately Recognized“. It’s not clear that these headlines are much more than a semi-random concatenation of topics their userbase likes, and as seen in the latter case, 100% correct grammar is not a requirement.
The training converges after a few days of number crunching on a GTX980 GPU. Let’s take a look at the results.
Early on in the training, the model is stringing together words with very little over all coherency. This is what it produces after having seen about 40000 headlines:
Adobe ‘ s Saving New Japan
Real Walk Join Their Back For Plane To French Sarah York
Dr 5 Gameplay : Oscars Strong As The Dead
Economic Lessons To Actress To Ex – Takes A App
You ‘ s Schools ‘ : A Improve Story
However, after having had multiple passes through the data, the training converges and the results are remarkably better. Here are its first outputs after completed training:
Earth Defense Force : Record Olympic Fans
Kate Middleton , Prince William & Prince George Leave Kate For The Queen
The Most Creative Part Of U . S . History
Biden Responds To Hillary Clinton ‘ s Speech
The Children Of Free Speech
Adam Smith And Jennifer Lawrence ( And Tiger Woods ” Break The Ice Ball , For This Tornado )
Romney Camp : ‘ I Think You Are A Bad President ‘
Here ‘ s What A Boy Is Really Doing To Women In Prison Is Amazing
L . A . ‘ S First Ever Man Review
Why Health Care System Is Still A Winner
Why Are The Kids On The Golf Team Changing The World ?
2 1 Of The Most Life – Changing Food Magazine Moments Of 2 0 1 3
More Problems For ‘ Breaking Bad ‘ And ‘ Real Truth ‘ Before Death
Raw : DC Helps In Storm Victims ‘ Homes
U . S . Students ‘ Latest Aid Problem
Beyonce Is A Major Woman To Right – To – Buy At The Same Time
Taylor Swift Becomes New Face Of Victim Of Peace Talks
Star Wars : The Old Force : Gameplay From A Picture With Dark Past ( Part 2 )
Sarah Palin : ‘ If I Don ‘ t Have To Stop Using ‘ Law , Doesn ‘ t Like His Brother ‘ s Talk On His ‘ Big Media ‘
Israeli Forces : Muslim – American Wife ‘ s Murder To Be Shot In The U . S .
And It ‘ s A ‘ Celebrity ‘
Mary J . Williams On Coming Out As A Woman
Wall Street Makes $ 1 Billion For America : Of Who ‘ s The Most Important Republican Girl ?
How To Get Your Kids To See The Light
Kate Middleton Looks Into Marriage Plans At Charity Event
Adorable High – Tech Phone Is Billion – Dollar Media
Tips From Two And A Half Men : Getting Real
Hawaii Has Big No Place To Go
‘ American Child ‘ Film Clip
How To Get T – Pain
How To Make A Cheese In A Slow – Cut
WATCH : Mitt Romney ‘ s New Book
Iran ‘ s President Warns Way To Hold Nuclear Talks As Possible
Official : ‘ Extreme Weather ‘ Of The Planet Of North Korea
How To Create A Golden Fast Look To Greece ‘ s Team
Sony Super Play G 5 Hands – On At CES 2 0 1 2
1 5 – Year – Old , Son Suicide , Is Now A Non – Anti
” I ” s From Hell ”
God Of War : The World Gets Me Trailer
How To Use The Screen On The IPhone 3 Music Player
World ‘ s Most Dangerous Plane
The 1 9 Most Beautiful Fashion Tips For ( Again ) Of The Vacation
Miley Cyrus Turns 1 3
This Guy Thinks His Cat Was Drunk For His Five Years , He Gets A Sex Assault At A Home
Job Interview Wins Right To Support Gay Rights
Chef Ryan Johnson On ” A . K . A . M . C . D . ” : ” ” They Were Just Run From The Late Inspired ”
Final Fantasy X / X – 2 HD : Visits Apple
A Tour Of The Future Of Hot Dogs In The United States
Man With Can – Fired Down After Top – Of – The – Box Insider Club Finds
WATCH : Gay Teens Made Emotional Letter To J . K . Williams
It surprised me how good these headlines turned out. Most of them are grammatically correct, and a lot of them even make sense.
Consider the sentence “Mary J. Williams On Coming Out As A Woman”. I suspected this might be a case where the network had simply memorized a headline from the dataset. It turns out this was not the case. The only thing similar to “Coming Out As A Woman” is the headline “Former Marine Chronicles Journey Coming Out As A Trans Woman On YouTube”. The name “Mary J. Williams” does not appear in the dataset. The network has apparently learned that this is a plausible name, and also that such a name is the type of thing that can come out as a woman.
Another good one is “Romney Camp: ‘I Think You Are A Bad President'”. It’s suspiciously good – it wouldn’t surprise me if this was a real headline that some website had published. But it’s not in the dataset, not even close. While “Romney Camp” occurs 17 times in the dataset, none of these contain any statement about the president (or even the word president). “Bad President” occurs only once in the dataset, in the headline “Rubio: Obama Is A ‘Bad President'”. Yet, the network knows that the Romney Camp criticizing the president is a plausible headline. The network knows something about language, and it has some level of knowledge about the world by knowing what words are semantically associated.
Kim Kardashian Is Married With A Baby In New Mexico
Let’s investigate these semantic associations. By seeding the model with the start of a sentence, and getting the RNN to complete it, we can get a peek into what the model knows. For example, we can ask it to complete “Barack Obama Says” and “Kim Kardashian Says”, and compare the outputs.
Here are the 10 first completions of “Barack Obama Says”:
Barack Obama Says He’s Like ‘A Single Mother’ And ‘Over The Top’
Barack Obama Says He Is “The First Threat Gay Woman In The World”
Barack Obama Says About Ohio Law
Barack Obama Says He Is Wrong
Barack Obama Says He Will Get The American Idol
Barack Obama Says Himself Are “Doing Well Around The World”
Barack Obama Says As He Leaves Politics With His Wife
Barack Obama Says He Did 48 Things Over
Barack Obama Says GOP Needs To Be Key To New Immigration Policy
And here are the 10 first completions of “Kim Kardashian Says”:
Kim Kardashian Says ‘Idea’ To Her Mean Baby!
Kim Kardashian Says North West Is Even More ‘Important’
Kim Kardashian Says She Would Love Kanye
Kim Kardashian Says She’s A Hero
Kim Kardashian Says She Looks Fake
Kim Kardashian Says It Was Over Before They Call Her
Kim Kardashian Says Her Book Used To Lose Her Cool
Kim Kardashian Says She’s Married With A Baby In New Mexico
Kim Kardashian Says Kanye West Needs A Break From Her
By getting the RNN to complete our sentences, we can effectively ask questions of the model. Ilya Sutskever and Geoff Hinton trained a character level RNN on Wikipedia, and asked it to complete the phrase “The meaning of life is”. The RNN essentially answered “human reproduction”. It’s funny that you can get an RNN to read Wikipedia for a month, and have it essentially tell you that meaning of life is to have sex. It’s probably also a correct answer from a biological perspective.
We can’t directly replicate this experiment on the clickbait model, because the word “meaning” is not in its vocabulary. But we can ask it to complete the phrase “Life Is About”, for similar effect. These are the first 10 results:
Life Is About The (Wild) Truth About Human-Rights
Life Is About The True Love Of Mr. Mom
Life Is About Where He Were Now
Life Is About Kids
Life Is About What It Takes If Being On The Spot Is Tough
Life Is About A Giant White House Close To A Body In These Red Carpet Looks From Prince William’s Epic ‘Dinner With Johnny’
Life Is About — Or Still Didn’t Know Me
Life Is About… An Eating Story
Life Is About The Truth Now
With some experimentation, I ended with the following architecture and training procedure. The initial RNN had 2 recurrent layers, each containing 1200 LSTM units. Each word was represented as a 200 dimensional word vector, connected to the rest of the network via a tanh. These word vectors were initialized to the pretrained GloVe vectors released by its inventors, trained on 6 billion tokens from Wikipedia. GloVe, like word2vec, is a way of obtaining representations of words as vectors. These vectors were trained for a related task on a very big dataset, so they should provide a good initial representation for our words. During training, we can follow the gradient down into these word vectors and fine-tune the vector representations specifically for the task of generating clickbait, thus further improving the generalization accuracy of the complete model.
It turns out that if we then take the word vectors learned from this model of 2 recurrent layers, and stick them in an architecture with 3 recurrent layers, and then freeze them, we get even better performance. Trying to backpropagate into the word vectors through the 3 recurrent layers turned out to actually hurt performance.
To summarize the word vector story: Initially, some good guys at Standford invented GloVe, ran it over 6 billion tokens, and got a bunch of vectors. We then took these vectors, stuck them under 2 recurrent LSTM layers, and optimized them for generating clickbait. Finally we froze the vectors, and put them in a 3 LSTM layer architecture.
The network was trained with the Adam optimizer. I found this to be a Big Deal: It cut the training time almost in half, and found better optima, compared to using rmsprop with exponential decay. It’s possible that similar results could be obtained with rmsprop had I found a better learning and decay rate, but I’m very happy not having to do that tuning.
Building The Website
While many headlines produced from this model are good, some of them are rambling non-sense. To filter out the non-sense, we can do what Reddit does and crowd source the problem.
To this end, I created Click-o-Tron, possibly the first website in the world where all articles are written in their entirety by a Recurrent Neural Network. New articles are published every 20 minutes.
Any user can vote articles up and down. Each article gets an associated score determined by the number of votes and views the article has gotten. This score is then taken into account when ordering the front page. To get a trade-off between clickbaitiness and freshness, we can use the Hacker News algorithm:
In practice, this can look like the following in PostgreSQL:
CREATE FUNCTION hotness(articles) RETURNS double precision LANGUAGE sql STABLE AS $_$ SELECT $1.score / POW(1+EXTRACT(EPOCH FROM (NOW()-$1.publish_date))/(3*3600), 1.5) $_$;
The articles are a result of three seperate language models: One for the headlines, one for the article bodies, and one for the author name.
The article body neural network was seeded with the words from the headline, so that the body text has a chance to be thematically consistent with the headline. The headlines were not used during training.
For the author names, a character level LSTM-RNN was trained on a corpus of all first and last names in the US. It was then asked to produce a list of names. This list was then filtered so that the only remaining names were the ones where neither the first nor the last name was in the original corpus. This creates a nice list of plausible, yet original names, such as Flodrice Golpo and Richaldo Aariza.
Finally, each article’s picture is found by searching the Wikimedia API with the headline text, and selecting the images with a permissive license.
In total, this gives us an infinite source of useless journalism, available at no cost. If I remember correctly from economics class, this should drive the market value of useless journalism down to zero, forcing other producers of useless journalism to produce something else.
As they say on BuzzFeed: Win!