Painting Video with Neural Networks

take-on-me.jpg

The music video for “Take On Me” by A-ha features a mix of sketched animation and live action. It was done by hand drawing 3000 frames, and took 16 weeks to complete. The video ended up winning 6 awards in the 1986 MTV Music Video Awards.

This effect is pretty striking. But who wants to draw 3000 frames by hand? What if we could automate this process? And what if we could summon the ghost of Pablo Picasso to draw the frames for us? It turns out that we can.

https://i.imgur.com/sb8dHcY.png

A photo styled after various famous paintings, by a neural style transfer algorithm.

About three months ago, Gatys, Ecker and Bethge from University of Tübingen published A Neural Algorithm of Artistic Style. Their algorithm is able to transfer the visual style of one image onto another, by means of a pretrained deep convolutional neural network. The output images are phenomenal – in my opinion, this is one of the coolest things to come out of the field of machine learning in a while.

Convolutional Neural Networks decompose images into a hierarchy of image filters, where each image filter can be seen as detecting the presence or absence of a particular feature in the image. The lowest level filters detect things like edges and color gradients at different orientations, while the higher level filters detect compositions of the filters below, thus detecting more complex features.

Screenshot from 2015-12-15 21:33:02.png

Visualization of what features the image filters look for in the different layers of a ConvNet, along with images that cause their activation. See the paper for details: Visualizing and Understanding Convolutional Networks, Zeiler & Fergus 2013.

Generating Textures

The idea of the neural style transfer algorithm starts with the idea of texture generation. Gatys and gang had previously found that if you find the correlation between feature maps, calculated as the dot product of each of the image filter activations with each other, you get a matrix that says what features are in an image, but ignores where in the image they are. Two images have similar textures if their corresponding texture matrices are similar.

To synthesize a texture from an image, you run back propagation in reverse, starting with random pixels and adjusting the pixels so as to minimize the squared difference between these texture matrices for each layer.

Screenshot from 2015-12-16 21:06:14.png

Textures synthesized to match the images in the bottom row. From top to bottom, as an increasing number of ConvNet layers are taken into account, the more the structure of generated image matches its inspiration. From Texture Synthesis using Convolutional Neural Networks, Gatys, Ecker & Bethge 2015.

 

The next thing to do is to figure out a way to synthesize the content of a given image. This is simpler, just adjust the pixels so as to minimize the squared difference between the image filter activations themselves.

We now have a measure of texture similarity and a measure of content similarity. In order to do style transfer, we want to generate an image that has similar textures to one image, and similar content to another image. Since we know how to do that with the two squared differences we just defined, we can just minimize their sum. That is the neural style transfer algorithm.

The hierarchical image filters of a ConvNet have been shown in various ways to be similar to how vision works in human beings. Thus an appealing aspect of the style transfer algorithm is that it makes a quite concrete connection between our perception of artistic style, and the neurons in our brain.

Applying it to Video

Gene Kogan was the first I saw to have the idea of applying this style transfer algorithm to video. The most straight forward way would be to just run the algorithm on each frame seperately. One problem with this is that the style transfer algorithm might end up styling successive frames in very different ways. In order to create smoother transitions between frames, Kogan blended in the stylized version of the previous frame at a low opacity. Check out his awesome rendering of a scene from Alice in Wonderland.

One thing we can do to improve the blending of frames is to calculate the optical flow between the frames. That is, we try to figure out how things move between two frames in the video. We can then take this estimate of motion, and use it to bend and smudge the stylized image with the same motion before blending it in.

image-00733image-00734

opticalhsv

Two successive video frames, and the optical flow between them. The bottom image shows the direction and magnitude of the motion from the first picture to the second. The color indicates direction, and the color saturation indicates magnitude.

Luckily, such an optical flow calculation is included in OpenCV. I’ve uploaded some code on GitHub that takes care of computing flow between frames, morphing the stylized image and blending it in.

The optical flow morphing takes a little bit of computation (~1 second), but it is absolutely dwarfed by the run time of the style transfer algorithm. Currently, it takes about 3 minutes to render a single frame at maximum resolution on a Titan X GPU. And it uses all of its 12GB of memory in order to do so, and that is at sub-HD resolution. In any case, the effect is sweet enough to make it worth the wait.

This is what it looks like when applied to what is a cult classic in certain circles, Bobby Meeks’ part from the 2003 snowboard movie “Lame”:

Here it is applied to the music video for “Islands”, by The xx:

Isn’t that just damn cool?

Auto-Generating Clickbait With Recurrent Neural Networks

Hey! If you are a web developer, you should know about CatchJS. It’s a service for tracking and logging errors in JavaScript, with some pretty exciting features.

1204-FDRWarPlansa

“F.D.R.’s War Plans!” reads a headline from a 1941 Chicago Daily Tribune. Had this article been written today, it might rather have said “21 War Plans F.D.R. Does Not Want You To Know About. Number 6 may shock you!”. Modern writers have become very good at squeezing out the maximum clickability out of every headline. But this sort of writing seems formulaic and unoriginal. What if we could automate the writing of these, thus freeing up clickbait writers to do useful work? 

If this sort of writing truly is formulaic and unoriginal, we should be able to produce it automatically. Using Recurrent Neural Networks, we can try to pull this off.

The Future Of Women's Hair: What's The Secret?

How well can a neural network write clickbait? This screenshot is a hint.

Standard artificial neural networks are prediction machines, that can learn how to map some input to some output, given enough examples of each. Recently, as people have figured out how to train deep (multi-layered) neural nets, very powerful models have been created, increasing the hype surrounding this so-called deep learning. In some sense the deepest of these models are Recurrent Neural Networks (RNNs), a class of neural nets that feed their state at the previous timestep into the current timestep. These recurrent connections make these models well suited for operating on sequences, like text.

rnn-unrolled2

Left: RNNs have connections that form a cycle. Right: The RNN unrolled over three timesteps. By unrolling over time we can train an RNN like a standard neural network.

We can show an RNN a bunch of sentences, and get it to predict the next word, given the previous words. So, given a string of words like “Which Disney Character Are __”, we want the network to produce a reasonable guess like “You”, rather than, say, “Spreadsheet”. If this model can learn to predict the next word with some accuracy, we get a language model that tells us something about the texts we trained it on. If we ask this model to guess the next word, and then add that word to the sequence and ask it for the next word after that, and so on, we can generate text of arbitrary length. During training, we tweak the weights of this network so as to minimize the prediction error, maximizing its ability to guess the right next word. Thus RNNs operate on the opposite principle of clickbait: What happens next may not surprise you.

I based this on Andrej Karpathy’s wonderful char-rnn library for Lua/Torch, but modified it to be more of a “word-rnn”, so it predicts word-by-word, rather than character-by-character. (Code will be put up on github soon. Here is the code.) Predicting word-by-word will use more memory, but means the model does not need to learn how to spell before it learns how to perform modern journalism. (It still needs to learn some notion of grammar.) Some more changes were useful for this particular use case. First, each input word was represented as a dense vector of numbers. The hope is that having a continuous rather than discrete representation for words will allow the network to make better mistakes, as long as similar words get similar vectors. Second, the Adam optimizer was used for training. Third, the word vectors went through a particular training rigmarole: They received two stages of pretraining, and were then frozen in the final architecture – more details on this later in the article.

The final network architecture looked like this:

architecture

One Neat Trick Every 90s Connectionist Will Know

Whereas traditional neural nets are built around stacks of simple units that do a weighted sum followed by some simple non-linear function (like a tanh), we’ll use a more complicated unit called Long Short-Term Memory (LSTM). This is something two Germans came up with in the late 90s that makes it easier for RNNs to learn long-term dependencies through time. The LSTM units give the network memory cells with read, write and reset operations. These operations are differentiable, so that during training, the network can learn when it should remember data and when it should throw it away.

To generate clickbait, we’ll train such an RNN on ~2 000 000 headlines, scraped from Buzzfeed, Gawker, Jezebel, Huffington Post and Upworthy.

How realistic can we expect the output of this model to be? Even if it can learn to generate text with correct syntax and grammar, it surely can’t produce headlines that contain any new knowledge of the real world? It can’t do reporting? This may be true, but it’s not clear that clickbait needs to have any relation to the real world in order to be successful. When this work was begun, the top story on BuzzFeed was “50 Disney Channel Original Movies, Ranked By Feminism“. More recently they published “22 Faces Everyone Who Has Pooped Will Immediately Recognized“. It’s not clear that these headlines are much more than a semi-random concatenation of topics their userbase likes, and as seen in the latter case, 100% correct grammar is not a requirement.

The training converges after a few days of number crunching on a GTX980 GPU. Let’s take a look at the results.

Early on in the training, the model is stringing together words with very little over all coherency. This is what it produces after having seen about 40000 headlines:

2 0 Million 9 0 1 3 Say Hours To Stars The Kids For From Internet
Adobe ‘ s Saving New Japan
Real Walk Join Their Back For Plane To French Sarah York
State 7
Dr 5 Gameplay : Oscars Strong As The Dead
Economic Lessons To Actress To Ex – Takes A App
You ‘ s Schools ‘ : A Improve Story

However, after having had multiple passes through the data, the training converges and the results are remarkably better. Here are its first outputs after completed training:

John McCain Warns Supreme Court To Stand Up For Birth Control Reform
Earth Defense Force : Record Olympic Fans
Kate Middleton , Prince William & Prince George Leave Kate For The Queen
The Most Creative Part Of U . S . History
Biden Responds To Hillary Clinton ‘ s Speech
The Children Of Free Speech
Adam Smith And Jennifer Lawrence ( And Tiger Woods ” Break The Ice Ball , For This Tornado )
Romney Camp : ‘ I Think You Are A Bad President ‘
Here ‘ s What A Boy Is Really Doing To Women In Prison Is Amazing
L . A . ‘ S First Ever Man Review
Why Health Care System Is Still A Winner
Why Are The Kids On The Golf Team Changing The World ?
2 1 Of The Most Life – Changing Food Magazine Moments Of 2 0 1 3
More Problems For ‘ Breaking Bad ‘ And ‘ Real Truth ‘ Before Death
Raw : DC Helps In Storm Victims ‘ Homes
U . S . Students ‘ Latest Aid Problem
Beyonce Is A Major Woman To Right – To – Buy At The Same Time
Taylor Swift Becomes New Face Of Victim Of Peace Talks
Star Wars : The Old Force : Gameplay From A Picture With Dark Past ( Part 2 )
Sarah Palin : ‘ If I Don ‘ t Have To Stop Using ‘ Law , Doesn ‘ t Like His Brother ‘ s Talk On His ‘ Big Media ‘
Israeli Forces : Muslim – American Wife ‘ s Murder To Be Shot In The U . S .
And It ‘ s A ‘ Celebrity ‘
Mary J . Williams On Coming Out As A Woman
Wall Street Makes $ 1 Billion For America : Of Who ‘ s The Most Important Republican Girl ?
How To Get Your Kids To See The Light
Kate Middleton Looks Into Marriage Plans At Charity Event
Adorable High – Tech Phone Is Billion – Dollar Media
Tips From Two And A Half Men : Getting Real
Hawaii Has Big No Place To Go
‘ American Child ‘ Film Clip
How To Get T – Pain
How To Make A Cheese In A Slow – Cut
WATCH : Mitt Romney ‘ s New Book
Iran ‘ s President Warns Way To Hold Nuclear Talks As Possible
Official : ‘ Extreme Weather ‘ Of The Planet Of North Korea
How To Create A Golden Fast Look To Greece ‘ s Team
Sony Super Play G 5 Hands – On At CES 2 0 1 2
1 5 – Year – Old , Son Suicide , Is Now A Non – Anti
” I ” s From Hell ”
God Of War : The World Gets Me Trailer
How To Use The Screen On The IPhone 3 Music Player
World ‘ s Most Dangerous Plane
The 1 9 Most Beautiful Fashion Tips For ( Again ) Of The Vacation
Miley Cyrus Turns 1 3
This Guy Thinks His Cat Was Drunk For His Five Years , He Gets A Sex Assault At A Home
Job Interview Wins Right To Support Gay Rights
Chef Ryan Johnson On ” A . K . A . M . C . D . ” : ” ” They Were Just Run From The Late Inspired ”
Final Fantasy X / X – 2 HD : Visits Apple
A Tour Of The Future Of Hot Dogs In The United States
Man With Can – Fired Down After Top – Of – The – Box Insider Club Finds
WATCH : Gay Teens Made Emotional Letter To J . K . Williams

It surprised me how good these headlines turned out. Most of them are grammatically correct, and a lot of them even make sense.

Consider the sentence “Mary J. Williams On Coming Out As A Woman”. I suspected this might be a case where the network had simply memorized a headline from the dataset. It turns out this was not the case. The only thing similar to “Coming Out As A Woman” is the headline “Former Marine Chronicles Journey Coming Out As A Trans Woman On YouTube”. The name “Mary J. Williams” does not appear in the dataset. The network has apparently learned that this is a plausible name, and also that such a name is the type of thing that can come out as a woman.

Another good one is “Romney Camp: ‘I Think You Are A Bad President'”. It’s suspiciously good – it wouldn’t surprise me if this was a real headline that some website had published. But it’s not in the dataset, not even close. While “Romney Camp” occurs 17 times in the dataset, none of these contain any statement about the president (or even the word president). “Bad President” occurs only once in the dataset, in the headline “Rubio: Obama Is A ‘Bad President'”. Yet, the network knows that the Romney Camp criticizing the president is a plausible headline. The network knows something about language, and it has some level of knowledge about the world by knowing what words are semantically associated.

Kim Kardashian Is Married With A Baby In New Mexico

Let’s investigate these semantic associations. By seeding the model with the start of a sentence, and getting the RNN to complete it, we can get a peek into what the model knows. For example, we can ask it to complete “Barack Obama Says” and “Kim Kardashian Says”, and compare the outputs.

Here are the 10 first completions of “Barack Obama Says”:

Barack Obama Says It’s Wrong To Talk About Iraq
Barack Obama Says He’s Like ‘A Single Mother’ And ‘Over The Top’
Barack Obama Says He Is “The First Threat Gay Woman In The World”
Barack Obama Says About Ohio Law
Barack Obama Says He Is Wrong
Barack Obama Says He Will Get The American Idol
Barack Obama Says Himself Are “Doing Well Around The World”
Barack Obama Says As He Leaves Politics With His Wife
Barack Obama Says He Did 48 Things Over
Barack Obama Says GOP Needs To Be Key To New Immigration Policy

And here are the 10 first completions of “Kim Kardashian Says”:

Kim Kardashian Says She Wants To Sign Again
Kim Kardashian Says ‘Idea’ To Her Mean Baby!
Kim Kardashian Says North West Is Even More ‘Important’
Kim Kardashian Says She Would Love Kanye
Kim Kardashian Says She’s A Hero
Kim Kardashian Says She Looks Fake
Kim Kardashian Says It Was Over Before They Call Her
Kim Kardashian Says Her Book Used To Lose Her Cool
Kim Kardashian Says She’s Married With A Baby In New Mexico
Kim Kardashian Says Kanye West Needs A Break From Her

Question Answering

By getting the RNN to complete our sentences, we can effectively ask questions of the model. Ilya Sutskever and Geoff Hinton trained a character level RNN on Wikipedia, and asked it to complete the phrase “The meaning of life is”. The RNN essentially answered “human reproduction”. It’s funny that you can get an RNN to read Wikipedia for a month, and have it essentially tell you that meaning of life is to have sex. It’s probably also a correct answer from a biological perspective.

We can’t directly replicate this experiment on the clickbait model, because the word “meaning” is not in its vocabulary. But we can ask it to complete the phrase “Life Is About”, for similar effect. These are the first 10 results:

Life Is About The Weather!
Life Is About The (Wild) Truth About Human-Rights
Life Is About The True Love Of Mr. Mom
Life Is About Where He Were Now
Life Is About Kids
Life Is About What It Takes If Being On The Spot Is Tough
Life Is About A Giant White House Close To A Body In These Red Carpet Looks From Prince William’s Epic ‘Dinner With Johnny’
Life Is About — Or Still Didn’t Know Me
Life Is About… An Eating Story
Life Is About The Truth Now

Network details

With some experimentation, I ended with the following architecture and training procedure. The initial RNN had 2 recurrent layers, each containing 1200 LSTM units. Each word was represented as a 200 dimensional word vector, connected to the rest of the network via a tanh. These word vectors were initialized to the pretrained GloVe vectors released by its inventors, trained on 6 billion tokens from Wikipedia. GloVe, like word2vec, is a way of obtaining representations of words as vectors. These vectors were trained for a related task on a very big dataset, so they should provide a good initial representation for our words. During training, we can follow the gradient down into these word vectors and fine-tune the vector representations specifically for the task of generating clickbait, thus further improving the generalization accuracy of the complete model.

It turns out that if we then take the word vectors learned from this model of 2 recurrent layers, and stick them in an architecture with 3 recurrent layers, and then freeze them, we get even better performance. Trying to backpropagate into the word vectors through the 3 recurrent layers turned out to actually hurt performance.

graph1

To summarize the word vector story: Initially, some good guys at Standford invented GloVe, ran it over 6 billion tokens, and got a bunch of vectors. We then took these vectors, stuck them under 2 recurrent LSTM layers, and optimized them for generating clickbait. Finally we froze the vectors, and put them in a 3 LSTM layer architecture.

The network was trained with the Adam optimizer. I found this to be a Big Deal: It cut the training time almost in half, and found better optima, compared to using rmsprop with exponential decay. It’s possible that similar results could be obtained with rmsprop had I found a better learning and decay rate, but I’m very happy not having to do that tuning.

graph2

Building The Website

While many headlines produced from this model are good, some of them are rambling non-sense. To filter out the non-sense, we can do what Reddit does and crowd source the problem.

To this end, I created Click-o-Tron, possibly the first website in the world where all articles are written in their entirety by a Recurrent Neural Network. New articles are published every 20 minutes.

screenshot

Any user can vote articles up and down. Each article gets an associated score determined by the number of votes and views the article has gotten. This score is then taken into account when ordering the front page. To get a trade-off between clickbaitiness and freshness, we can use the Hacker News algorithm:

points

In practice, this can look like the following in PostgreSQL:

CREATE FUNCTION hotness(articles) RETURNS double precision
LANGUAGE sql STABLE
AS $_$
SELECT $1.score / POW(1+EXTRACT(EPOCH FROM (NOW()-$1.publish_date))/(3*3600), 1.5)
$_$;

The articles are a result of three seperate language models: One for the headlines, one for the article bodies, and one for the author name.

The article body neural network was seeded with the words from the headline, so that the body text has a chance to be thematically consistent with the headline. The headlines were not used during training.

For the author names, a character level LSTM-RNN was trained on a corpus of all first and last names in the US. It was then asked to produce a list of names. This list was then filtered so that the only remaining names were the ones where neither the first nor the last name was in the original corpus. This creates a nice list of plausible, yet original names, such as Flodrice Golpo and Richaldo Aariza.

Finally, each article’s picture is found by searching the Wikimedia API with the headline text, and selecting the images with a permissive license.

In total, this gives us an infinite source of useless journalism, available at no cost. If I remember correctly from economics class, this should drive the market value of useless journalism down to zero, forcing other producers of useless journalism to produce something else.

As they say on BuzzFeed: Win!

The WTF Factor: Quantifying JS Library Weirdness

Evaluating JavaScript libraries is hard. It’s fairly easy to tell if a library is popular, but it’s hard to tell if it’s any good. One useful metric to have would be the average number of questions a user of the library has while using it. We can’t measure that directly, but some recently released work allows us to get a pretty good estimate.

Thanks to Julian Shapiro, Thomas Davis and Jesse Chase we now have Libscore, a project that crawls the top one million sites on the web to determine what JavaScript libraries they use. It works by actually executing the JavaScript on each page, so it gives a very accurate picture of what libraries are being used.

The number we’d like to calculate is questions per user, qpu. Libscore gives good numbers for the denominator. Turns out there are at least a couple of ways to get useful numbers for the numerator as well.

Number of questions on Stack Overflow

We can see how many questions are listed for the library’s tag on Stack Overflow. This gives the exact number questions, but has the problem that it does not count the number of users who had that question – Stack Overflow discourages duplicate questions.

Number of searches on Google

By digging into the data behind Google Trends we can get a number that is proportional to the number of searches for a particular search term on Google. The theory is that most searches involving a library indicates a developer with a question.

There is an undocumented API that gives us the raw data behind the Google Trends charts. It gives search counts normalized to a scale from 0 to 100, such that 100 is the maximum in the given dataset. As long as we’re careful about comparing everything to a common reference, and scaling the numbers appropriately where needed, we can get a set of numbers that are proportional to the real search counts.

graph
By summing up the individual datapoints, we get the areas under these graphs, which is proportional to the number of searches.

While this fixes the problem of counting cases where multiple people have the same questions, it has problems of its own. One problem is that we’ll end up undercounting certain search terms. Ember, React, Backbone etc. are all normal words that have meanings of their own. To try to deal with this, we can throw in the “js” suffix as shown above. This helps, but it means we’re undercounting searches for these libraries. It helps that this problem occurs for most of the libraries, so at least the bias is spread out somewhat evenly.

Results

I gathered the relevant data for 11 JS libraries and frameworks of various types: jQuery, jQuery UI, Backbone, React, Knockout, AngularJS, Ember, Meteor, Modernizr, Underscore and Lo-Dash. Below, Underscore and Lo-Dash are counted as one, since their counts are conflated on Libscore. The data was collected in mid-December.

Library Libscore users Searches (Relative numbers) Stack Overflow questions
jQuery 634872 49558 562442
jquery ui 176543 2961 30353
Modernizr 109076 216 655
underscore/lodash 20183 139 3367
AngularJS 4954 1259 69133
React 203 27 981
Backbone 7908 297 17006
Ember 185 113 13119
Knockout 1982 154 12488

The differences here are pretty dramatic. Using the Stack Overflow measure, Angular has a 6.5x higher WTF factor than Backbone, while Meteor has a 21x higher WTF factor than Angular does. Meteor has a staggering 49143x higher WTF factor than Modernizr.

What is bad about this metric?

There are a few ways in which the numbers used here may be biased in one way or the other.

Problems with the Google based metric:

  • Undercounting searches. The JS library names are words with meanings of their own, so we have to use the “js” suffix to count them.
  • Round-off errors. Because Google trends only produces numbers from 0-100, we have low precision on the numbers.

Problems with the Stack Overflow metric:

  • Since SO discourages duplicate questions, this will undercount the number of people having a question.

Problems with both metrics:

  • Undercounting sites: Sites that have been taken offline are not counted, while the searches and SO questions made during their development are counted. This produces some bias against the older libraries.
  • Sites that are unpublished are not counted, but the searches made during their development are. This produces some bias against the newer libraries.
  • Biased site sample: Only the top million sites are counted. This produces a bias against any library that is used mostly on smaller/less-popular sites.

What is good about this metric?

There are certain things that indicate that this approach is doing what it’s trying to do.

One is that the rankings produced by the Stack Overflow WTF and the Google WTF largely agree with each other. This means that neither measure can be entirely wrong, unless they are wrong in the same way.

A low WTF factor does not mean a framework is better, it just means it you will have less questions about it. One would expect a projects WTF factor to correlate with the scope and ambition of the project. One would expect frameworks to have a higher WTF factor than libraries, because they affect more parts of the development. In this view, the fact that Meteor produces the most questions makes sense, given that it is the only full stack framework in consideration: It involves both the client side, the server side and even the database. Choosing such an all-encompassing framework comes at cost: More questions. It is good to be able to quantify just how big that cost is.

AngularJS: The Bad Parts

Hey! If you do front end development, you should know about CatchJS. It’s a JavaScript error logging service that doesn’t suck.

Below is a graph over the amount of searches for AngularJS versus a bunch of other Single Page Application frameworks. Despite the flawed methodology, the story seems to be pretty clear: Popularity wise, Angular is beating the shit out of the other frameworks. I spent most of last year working on a large project built on AngularJS, and I’ve gotten to know the framework in some depth. Through this work I have learned that Angular is built around some really bad ideas that make it a pain to work with, and that we need to come up with something better. Whatever the reason is for Angulars popularity, it isn’t that it’s a great framework.

The amount of searches for various SPA frameworks.

The amount of searches for various SPA frameworks.  (A less charitable interpretation of this data would be that Angular users have to search for answers more often than the others do.)

Bad Idea #1: Dynamic scoping

The scope of a variable is the part of the program where the variable can be legally referenced. If your system has variables, it has some concept of scoping.

Angular has a DSL that is entangled with the HTML and used primarily to express the data-bindings between the UI and application code. This has variables, and thus a concept of scopes. Let’s take a look at it. Consider for example ng-model:

1
<input type="text" ng-model="obj.prop" />

This creates a two way binding on the property prop of object obj. If you type into the input field, the property prop updates. If you assign to the property prop, the input field updates. Neat.

Now let’s add some simple parts:

1
2
3
4
<input type="text" ng-model="obj.prop" />
<div ng-if="true">
    <input type="text" ng-model="obj.prop" />
</div>

Question: What does obj.prop refer to in the second input tag? The answer is that it is literally impossible to tell what meaning of ng-model=”obj.prop” is by reading the code. Whether or not the two “obj.prop” names refer to the same thing depends on the runtime state of the program. Try it out here: http://jsfiddle.net/1op3L9yo/ If you type into the first input field first, the two inputs will share the same model. If you type into the second one first, they will have distinct models.

WTF?

What’s going on here? Understanding that requires some knowledge of AngularJS terminology – skip this paragraph if you don’t care. The part that says ng-if is what’s called a directive. It introduces a new scope that is accessible as an object within the program. Let’s call it innerScope. Let’s call the scope of the first input outerScope. Typing “t” into the first input will automatically assign an object to outerScope.obj, and assign the string you typed to the property like so: outerScope.obj.prop = "t".  Typing into the second input will do the same to the innerScope. The complication is that innerScope prototypically inherits from outerScope, so whether or not innerScope inherits the property obj depends on whether or not it is initialized in outerScope, and thus ultimately depends on the order in which the user interacts with the page.

This is insane. It should be an uncontroversial statement that one should be able to understand what a program does by reading its source code. This is not possible with the Angular DSL, because as shown above a variable binding may depend on the order in which a user interacts with a web page. What’s even more insane is that it is not even consistent: Whether or not a new scope is introduced by a directive is up to its implementer. And if a new scope is introduced, it is up to its implementer to decide if it inherits from its parent scope or not. In total there are three ways a directive may change the meaning of the code and markup that uses it, and there’s no way to tell which is in play without reading the directive’s source code. This makes the code-markup mix so spectacularly unreadable that one would think it is deliberately designed for obfuscation.

When variable scope is determined by the program text it is called lexical scoping. When scope is dependent on program state it is called dynamic scoping. Programming language researchers seem to have figured out pretty early that dynamic scoping was a bad idea, as hardly any language uses it by default. Emacs Lisp does, but that is only because Richard Stallman saw it as a necessary evil to get his Lisp interpreter fast enough back in the early 80s.

JavaScript allows for optional dynamic scoping with the with statement. This is dangerous enough to make Douglas Crockford write books telling you not to use it, and it is very rarely seen in practice.  Nevertheless, with-statements are similar to how scoping works in Angular.

Pit of Despair

At this point, I imagine some readers are eager to tell me how to avoid the above problem. Indeed, when you know about the problem, you can avoid it. The problem is that a new Angular user likely does not know about the problem, and the default, easiest thing to do leads to problems.

The idea of the Pit of Success is said to have been a guiding principle in designing platforms at Microsoft.

The Pit of Success: in stark contrast to a summit, a peak, or a journey across a desert to find victory through many trials and surprises, we want our customers to simply fall into winning practices by using our platform and frameworks.  To the extent that we make it easy to get into trouble we fail.

Rico Mariani, MS Research MindSwap Oct 2003. 

Angular tends to not make you fall into the Pit of Success, but rather into the Pit of Despair – the obvious thing to do leads to trouble.

Bad Idea #2: Parameter name based dependency injection

Angular has a built in dependency injector that will pass appropriate objects to your function based on the names of its parameters:

function MyController($scope, $window) {
    // ...
}

Here, the names of the parameters $scope and $window will be matched against a list of known names, and corresponding objects get instantiated and passed to the function. Angular gets the parameter names by calling toString() on the function, and then parsing the function definition.

The problem with this, of course, is that it stops working the moment you minify your code. Since you care about user experience you will be minifying your code, thus using this DI mechanism will break your app. In fact, a common development methodology is to use unminified code in development to ease debugging, and then to minify the code when pushing to production or staging. In that case, this problem won’t rear its ugly head until you’re at the point where it hurts the most.

Even when you’ve banned the use of this DI mechanism in your project, it can continue to screw you over, because there are third party apps that rely on it. That isn’t an imaginary risk, I’ve experienced it firsthand.

Since this dependency injection mechanism doesn’t actually work in the general case, Angular also provides a mechanism that does. To be sure, it provides two. You can either pass along an array like so:

module.controller('MyController', ['$scope', '$window', MyController]);

Or you can set the $inject property on your constructor:

MyController.$inject = ['$scope', '$window'];

It’s unclear to me why it is a good idea to have two ways of doing this, and which will win in case you do both. See the section on unnecessary complexity.

To summarize, there are three ways to specify dependencies, one of which doesn’t work in the general case. At the time of writing, the Angular guide to dependency injection starts by introducing the one alternative that doesn’t work. It is also used in the examples on the Angular front page. You will not fall into the Pit of Success when you are actively guided into the Pit of Despair.

At this point, I am obligated to mention ng-min and ng-annotate. These are source code post-processors that intend to rewrite your code so that it uses the DI mechanisms that are compatible with minification. In case you don’t think it is insane to add a framework specific post-processor to your build process, consider this: Statically determining which function definitions will be given to the dependency injector is just as hard as solving the Halting Problem. These tools don’t work in the general case, and Alan Turing proved it in 1936.

Bad Idea #3: The digest loop

Angular supports two-way databinding, and this is how it does it: It scans through everything that has such a binding, and sees if it has changed by comparing its value to a stored copy of its value. If a change is found, it triggers the code listening for such a change. It then scans through everything looking for changes again. This keeps going until no more changes are detected.

The problem with this is that it is tremendously expensive. Changing anything in the application becomes an operation that triggers hundreds or thousands of functions looking for changes. This is a fundamental part of what Angular is, and it puts a hard limit on the size of the UI you can build in Angular while remaining performant.

A rule of thumb established by the Angular community is that one should keep the number of such data bindings under 2000. The number of bindings is actually not the whole story: Since each scan through the object graph might trigger new scans, the total cost of any change actually depends on the dependency graph of the application.

It’s not hard to end up with more than 2000 bindings. We had a page listing 30 things, with a “Load More” button below. Clicking the button would load 30 more items into the list. Because the UI for each item was somewhat involved, and because there was more to this page than just this list, this page had more than 2000 bindings before the “Load More” button was even clicked. Clicking it would add about a 1000 more bindings. The page was noticeably choppy on a beefy desktop machine. On mobiles the performance was dreadful.

Keep in mind that all this work is done in order to provide two-way bindings. It comes in addition to any real work your application may be doing, and in addition to any work the browser might be doing to reflow and redraw the page.

To avoid this problem, you have to avoid this data binding. There are ways to make bindings happen only once, and with Angular version 1.3 these are included by default. It nevertheless requires ditching what is perhaps the most fundamental abstraction in Angular.

If you want to count the number of bindings in your app, you can do so by pasting the following into your console (requires underscore.js). The number may surprise you.

function getScopes(root) {
    var scopes = [];
    function traverse(scope) {
        scopes.push(scope);
        if (scope.$$nextSibling)
            traverse(scope.$$nextSibling);
        if (scope.$$childHead)
            traverse(scope.$$childHead);
    }
    traverse(root);
    return scopes;
}
var rootScope = angular.element(document.querySelectorAll("[ng-app]")).scope();
var scopes = getScopes(rootScope);
var watcherLists = scopes.map(function(s) { return s.$$watchers; });
_.uniq(_.flatten(watcherLists)).length;

Bad Idea #4: Redefining well-established terminology

A common critique is that Angular is hard to learn. This is partly because of unnecessary complexity in the framework, and partly because it is described in a language where words do not have their usual meanings.

“Constructor functions”

In JavaScript, a constructor function is any function called with new, thus instantiating a new object. This is standard OO-terminology, and it is explicitly in the JavaScript specification. But in the Angular documentation “constructor function” means something else. This is what the page on Controllers used to say:

Angular applies (in the sense of JavaScript’s `Function#apply`) the controller constructor function to a new Angular scope object, which sets up an initial scope state. This means that Angular never creates instances of the controller type (by invoking the `new` operator on the controller constructor). Constructors are always applied to an existing scope object.

https://github.com/angular/angular.js/blob/…

That’s right, these constructors never create new instances. They are “applied” to a scope object (which is new according to the first sentence, and existing according to the last sentence).

“Execution contexts”

This is a quote from the documentation on scopes:

Angular modifies the normal JavaScript flow by providing its own event processing loop. This splits the JavaScript into classical and Angular execution context.

https://docs.angularjs.org/guide/scope

In the JavaScript specification, and in programming languages in general, the execution context is well defined as the symbols reachable from a given point in the code. It’s what variables are in scope. Angular does not have its own execution context any more than every JavaScript function does. Also, Angular does not “modify the normal JavaScript flow”. The program flow in Angular definitely follows the same rules as any other JavaScript.

“Syntactic sugar”

This is a quote from the documentation on providers:

[…] the Provider recipe is the core recipe type and all the other recipe types are just syntactic sugar on top of it. […] The Provider recipe is syntactically defined as a custom type that implements a $get method.

https://docs.angularjs.org/guide/providers

If AngularJS was able to apply syntactic sugar, or any kind of syntax modification to JavaScript, it would imply that they had their own parser for their own programming language. They don’t*, the word they are looking for here is interface. What they’re trying to say is this: “Provide an object with a $get method.”

(*The Angular team does actually seriously intend to create their own compile-to-JS programming language to be used with Angular 2.0.)

Bad idea #5: Unnecessary complexity

Let’s look at Angular’s dependency injector again. A dependency injector allows you to ask for objects by name, and receive instances in return. It also needs some way to define new dependencies, i.e. assign objects to names. Here’s my proposal for an API that allows for this:

injector.register(name, factoryFn);

Where name is a string, and factoryFn is a function that returns the value to assign to the name. This allows for lazy initialization, and is fully flexible w.r.t. how the object is created.

The API above can be explained in two sentences. Angular’s equivalent API needs more than 2000 words to be explained. It introduces several new concepts, among which are: Providers, services, factories, values and constants. Each of these 5 concepts correspond to slightly different ways of assigning a name to a value. Each have their own distinct APIs. And they are all completely unnecessary, as they can be replaced with a single method as shown above.

If that’s not enough, all of these concepts are described by the umbrella term “services”. That’s right, as if “service” wasn’t meaningless enough on its own, there’s a type of service called a service. The Angular team seem to find this hilarious, rather than atrocious:

Note: Yes, we have called one of our service recipes ‘Service’. We regret this and know that we’ll be somehow punished for our misdeed. It’s like we named one of our offspring ‘Child’. Boy, that would mess with the teachers.

https://docs.angularjs.org/guide/providers

Einstein supposedly strived to make things as simple as possible, but no simpler. Angular seems to strive to make things as complicated as possible. In fact it is hard to see how to complicate the idea of assigning a name to a value any more than what Angular has done.

In conclusion

There are deep problems with Angular as it exists today. Yet it is very popular. This is an indication of a problem with how we developers choose frameworks. On the one hand, it’s really hard to evaluate such a project without spending a long time using it. On the other hand, many people like to recommend projects they haven’t used in any depth, because the idea of knowing what the next big thing is feels good. The result is that people choose frameworks largely based on advice from people who don’t know what they’re talking about.

I’m currently hoping Meteor, React or Rivets may help me solve problems. But I don’t know them in any depth, and until I do I’ll keep my mouth shut about whether or not they’re any good.