Big Recsys Redux: Recs at Netflix
I wrote about recommender systems last week, but there is so much discussion around their effects right now in the mainstream tech press that they deserve a second issue.
As a recap, I said that there were two things that made recommender systems super ineffective, and that YouTube, one of the premier companies tech using recommendations, suffers from both a lot of the first and a lot of the second.
Recommender systems today have two huge problems that are leading companies (sometimes at enormous pressure from the public) to rethink how they’re being used: technical bias, and business bias.
The real problem is YouTube’s business model.
YouTube is THIRSTY for advertising money, at all times. Regardless of what users are doing on the platform, as long as it doesn’t impact advertising partnerships, it’s all above board.
Today, let’s talk about Netflix. It’s an example of another company that leans very heavily on surfacing recommended content to users, and has problems that are more benign than YouTube, but just as interesting. In fact, Netflix was at the vanguard of bringing collaborative filtering, the most common recsys algorithm, through the Netflix Prize, a competition to improve recommendations for viewers.
Here’s a good backgrounder on the prize if you’re not exactly worked up about the idea of reading the Wikipedia summary of tuning RMSE results:
In October 2006, Netflix, then a service peddling discs of every movie and TV show under the sun, announced "The Netflix Prize," a competition that lured Mackey and his contemporaries for the computer programmer equivalent of the Cannonball Run. The mission: Make the company's recommendation engine 10% more accurate -- or die coding.
….
Hobbyists, academics, and professionals weren't just drawn to the contest by the potential payday. The revelations were just as enticing; because the winners would retain ownership of their work, a contestant like Volinsky could also pitch management at AT&T on devoting time and resources to the project. (The rules of the contest only stipulated that the winning team would have to license its work non-exclusively to Netflix.) Most importantly, the data was just plain interesting: an unruly mess of insights into taste, behavior, and pre-streaming viewer psychology. As Chris Volinsky put it, "Everyone likes movies."
Ironically, the winning solution was never implemented:
Netflix awarded a $1 million prize to a developer team in 2009 for an algorithm that increased the accuracy of the company's recommendation engine by 10 percent. But it doesn't use the million-dollar code, and has no plans to implement it in the future, Netflix announced on its blog Friday. The post goes on to explain why: a combination of too much engineering effort for the results, and a shift from movie recommendations to the "next level" of personalization caused by the transition of the business from mailed DVDs to video streaming.
What happened? Netflix’s executives decided (correctly) that video streaming was the next step forward for the company. They raised prices and then split off the DVD service (remember Qwikster? RIP. )
At first, the stock rose to an all-time closing high of $42.68, with investors drawn to the additional revenue per subscriber the move might produce. The stock wouldn’t reach that level again for more than two years. Subscribers swiftly expressed their displeasure about having the DVD rentals that attracted them to Netflix removed from the convenience of on-demand streaming. The dual plans effectively amounted to a 60% price hike.
This required re-architecting how recommendations were served, particularly as the company moved away from DVDs and onto digital-only ways of moving content.
Later, Netflix, frantic to innovate again, changed its UI from displaying matching recommendations with number of stars, to thumbs-up and thumbs-down.
Netflix VP of product Todd Yellin told journalists on Thursday during a press briefing at the company’s headquarters in Los Gatos, Calif., that the company had tested the new thumbs up and down ratings with hundred of thousands of members in 2016. “We are addicted to the methodology of A/B testing,” Yellin said. The result was that thumbs got 200% more ratings than the traditional star-rating feature.
Netflix is also introducing a new percent-match feature that shows how good of a match any given show or movie is for an individual subscriber. For example, a show that should close to perfectly fit a user’s taste may get a 98% match. Shows that have less than a 50% match won’t display a match-rating, however.
Did these actually work better? The underlying recommender system was essentially the same, but the ratings they surfaced to consumers, from anecdotal evidence, got much worse (although, in theory, the recommender system was supposed to gather more data more efficiently). And still, even with the immense amount of data mining that they do, no one can find anything to watch on Netflix.
Why make these changes? Netflix has always been walking on a very thin tightrope between being a tech company that serves content and and a content company that just happens to be online.
Initially, it was all about the tech. Early on, it came out with things like collaborative filtering and chaos engineering to make sure everyone could stream Netflix all the time.
But, as Netflix gets larger and larger and more of a mainstream player (for example, it was just accepted to the MPAA,) it sways more and more in the direction of being a content company, which means its focus is on getting the best deals on shows and keeping them around so more people watch them, instead of purely technical accomplishments that keep the service up and running.
As the company grows and grows, it’s not as important to recommend specific content to niche audiences. What is important:
Buying the most amount of content that the common denominator of people will watch:
Mr. Sarandos told the group that spending on film and TV projects, particularly big budget movies, needed to be more cost-effective, according to people familiar with the meeting. Netflix has long measured the efficiency of its TV shows and movies using a ratio of their cost to a measure of viewership that gives more weight to new subscribers and those viewed at risk of canceling, say former employees. Mr. Sarandos made clear that in the future big-budget projects should bring in lots of viewers, a shift from the past when they might have gotten a pass if they were expected to get buzz and build industry credibility.
Pushing original content so that Netflix doesn’t lose money and competes against players like HBO
Streamlining both of the above so that content becomes a commodity - aka less of a risk
To this effect, the best thing Netflix has done recently is buy (and, soon, lose) the Friends catalog, which is a real shame because it carried me through my last maternity leave and late-night pumping sessions, and I was really relying on it this time around, as well.
It was the second-most-watched show on Netflix in 2018, according to Nielsen, which is probably why Netflix was willing to pay as much as $100 million for worldwide rights for this year. But starting in 2020, it will only be on HBO Max. (Separately, Netflix will also lose “The Office” starting next year when it will be available only on NBCUniversal’s streaming service.)
It’s also moved to offering recommendations in rows instead of specific personalized titles, which makes it easier to offer rows of pre-filtered content that’s decided loosely by a team or algorithm based on what Netflix wants you to see versus what you personally like. (I like to joke that they have a data scientist for each row)
It’s also pushing much more aggressively on surfacing its own content (just note how many of the visualizations in this post focus on customizing art for Netflix-specific content. And as another side note, the Netflix technical blog is a pretty good read in general.) It has to - it’s working against an entire universe of not-Netflix content that draws away viewers elsewhere.
As it does all of this, the engineering and data sides frequently come in conflict with the business.
The show stars Jane Fonda and Lily Tomlin as two women who find out their husbands are having an affair with one another. According to The Wall Street Journal, which cited anonymous sources close to the discussions, Netflix's tech side found that users were more likely to click on promotion for the show that didn't include Fonda.
The Journal reported that this finding prompted an internal debate within the company between Netflix's content team, which didn't want to anger Fonda and argued that it could be a violation of her contract, and its tech team, which stressed the importance of the data.
Netflix ultimately decided to include images of Fonda, according to the Journal, but it shows how Netflix's Hollywood side and its Silicon Valley side can come into conflict as the streamer leans more into original shows and movies that include top talent.
What happens when tech and content fight at a company who is quickly becoming a dominant force in content purchases and production rather than the tech that got them there?
Recsys becomes more irrelevant. If the recommender system recommends content that the business doesn’t like, and the business is driving the bus, the business always wins, and the algorithm accuracy so prized during the Netflix Prize years doesn’t matter very much at all.
The niche, discerning customers that initially came to Netflix to seek out custom-tailored recommendations in those famous niche categories (“Steamy British Independent Dramas”, anyone?) become under-served as mainstream content that’s less risky (like Friends and other shows that have already proven themselves, like Stranger Things 3, for example) pushes them out of the way.
It’s a classic case of two of Vicki’s Favorite Internet Rules: The Innovator’s Dilemma (aka moving from tech to content distribution), and Good Things Don’t Scale. Or rather, Netflix has scaled, tremendously, successfully, and that’s why it’s now becoming much more tame, at the behest of the business rather than the algorithms that got it to this point.
Art: New Television Antenna, Normal Rockwell, 1949
What I’m reading lately:
This interview with yours truly about my five favorite books to get started with data science:
My guilty pleasure is buying cookbooks and then only looking through them and never making anything
Palantir’s UI is much less sexy than I would have imagined
Google’s Privacy Policy over the last 20(!!!) years:
Ten rules for reproducible Jupyter notebooks
About the Author and Newsletter
I’m a data scientist in Philadelphia. This newsletter is about tech and everything around tech. Most of my free time is spent kid-wrangling, reading, and writing bad tweets. I also have longer opinions on things. Find out more here or follow me on Twitter.
If you like this newsletter, forward it to friends!