The reign of Big Recsys

**How we make money**

        July 8, 2019

The reign of Big Recsys

If you’ve ever used YouTube’s autoplay feature, Amazon’s “More Like This”, anything on Netflix, or used Facebook’s News Feed, you’ve used one of the most common machine learning systems working to shape our online worlds today: recommender systems.
Also known as recsys in industry lingo, these systems are machine learning algorithms embedded in software that provide a set of content recommendations based on an individual user’s history of activity, aggregated across the entire user base. 
 Because there is so much content generated on most web platforms, recommender systems are usually constantly working to surface content that they think is relevant to you (aka what will make you stay on the platform longer.) 
The most common recommendation system in use today is collaborative filtering, which works something like this. I read Shoe Dog. You read Shoe Dog. We both say that we liked Shoe Dog. I also read Bad Blood, and I liked it. The recommender system already knows that you like the same stuff that I read, so it recommends Bad Blood to you to read, as well. 
There’s another way to do recommenders, as well: content filtering. Aka, instead of looking at user activity, looking at the content itself. We know that Bad Blood is similar to Shoe Dog because both are best-selling books about American businesses, so a recommendation based on these two books might be something like Walter Isaacson’s biography on Steve Jobs.  If you’re interested in diving in deeper into the nitty gritty of how recommender systems work, do a search for matrix factorization. 
Much of the consumer-facing internet runs on recommendation systems and how to tune them. 
As a recent example of their pervasiveness, 
80 percent of movies watched on Netix came from recommendations, 60 percent of video clicks came from home page recommendation in YouTube.
There are whole academic conferences sponsored by companies like Hulu, Amazon, Medium, and Spotify that have a keen interest in improving recommendations. 
This all seems like a great system, what’s not to love? 
A lot. 
Recommender systems today have two huge problems that are leading companies (sometimes at enormous pressure from the public) to rethink how they’re being used: technical bias, and business bias. 
A prime example of a company that’s been in the spotlight for recommender misuse  YouTube. Let’s take a look at how the problem of recommender systems impacts the video site. 
First is the issue of similar content from a technical perspective. Recommender systems will feed you similar content. In the case of business books, it’s (mostly) fine, unless you care about variety. If you like business books, their best bet to reduce errors is to give you more of that same type of books. But what if you don’t want to continue to read business books? What if you want to read fantasy? Unless you clicked on a fantasy title, the recommender system has no way to know that you like it. This is irritating for books, but a real problem for  YouTube. 
The threats, Mr. Cain explained, came from right-wing trolls in response  to a video he had posted on YouTube a few days earlier. In the video,  he told the story of how, as a liberal college dropout struggling to  find his place in the world, he had gotten sucked into a vortex of  far-right politics on YouTube.
Over years of reporting on internet culture, I’ve heard countless  versions of Mr. Cain’s story: an aimless young man — usually white,  frequently interested in video games — visits YouTube looking for  direction or distraction and is seduced by a community of far-right  creators.
The common thread in many of these stories is YouTube and its  recommendation algorithm, the software that determines which videos  appear on users’ home pages and inside the “Up Next” sidebar next to a  video that is playing. The algorithm is responsible for more than 70  percent of all time spent on the site.
The radicalization of young men is driven by a complex stew of  emotional, economic and political elements, many having nothing to do  with social media. But critics and independent researchers say YouTube  has inadvertently created a dangerous on-ramp to extremism by combining  two things: a business model that rewards provocative videos with  exposure and advertising dollars, and an algorithm that guides users  down personalized paths meant to keep them glued to their screens.
Here’s what happened: Caleb Cain at some point clicked on a video about alternate history or Nazis, and the recommender system picked up on it. Other people that clicked on that same video also watched similar content, so it kept recommending the same type of content, until Caleb became enveloped in an enormous filter bubble that his brain was simply unable to get out of. If you’re constantly told that red is blue, there’s no way you can get back to reasoning that red is red, as long as it looks like millions of other people think like you.
This is an even bigger problem for children’s content. In its eagerness to keep young eyes on the screen longer, the algorithm is now working backwards - influencing creators to create terrible, weird content for kids
To begin: Kid’s YouTube is definitely and markedly weird. I’ve been aware of its weirdness for some time. Last year, there were a number of articles  posted about the Surprise Egg craze. Surprise Eggs videos depict, often  at excruciating length, the process of unwrapping Kinder and other egg  toys. That’s it, but kids are captivated by them. There are thousands  and thousands of these videos and thousands and thousands, if not  millions, of children watching them.
On-demand  video is catnip to both parents and to children, and thus to content  creators and advertisers. Small children are mesmerised by these videos,  whether it’s familiar characters and songs, or simply bright colours  and soothing sounds. The length of many of these videos — one common  video tactic is to assemble many nursery rhyme or cartoon episodes into  hour+ compilations —and the way that length is marketed as part of the  video’s appeal, points to the amount of time some kids are spending with  them.
YouTube  broadcasters have thus developed a huge number of tactics to draw  parents’ and children’s attention to their videos, and the advertising  revenues that accompany them.
This problem is so bad that never at any given time do I allow my daughter to watch YouTube unsupervised, and when I do I usually cast via Bluetooth on my phone to on our TV screen where I can see what’s going on.  I’m not alone: Silicon Valley parents don’t let their kids watch, either. In order to fight this crap, I have effectively become my child’s own recsys. 
Ok, so algorithms are stupid. What else is new? Garbage in, garbage out. Like most algorithms, recommender systems are driven by two things: the data you put into them and the results you’re hoping to get out.  They don’t operate in a vacuum, separate from people. The people who work at YouTube are not bad people. They’re all smart - exceedingly so - and working on problems at scale. They write papers like these. They hold multiple PhDs.
So then, how did these systems, then, go so wrong? Can’t they just fill in the bias and randomize them a bit more? 
The real problem is YouTube’s business model.  
The way YouTube makes money is through advertising. It says so in Alphabet’s 10k. 
How we make money
The  goal of our advertising business is to deliver relevant ads at just the  right time and to give people useful commercial information, regardless  of the device they’re using. We also provide advertisers with tools  that help them better attribute and measure their advertising campaigns  across screens. Our advertising solutions help millions of companies  grow their businesses, and we offer a wide range of products across  screens and formats. We generate revenues primarily by delivering both  performance advertising and brand advertising.
If you browse around looking for YouTube in the document, you’ll find lots of statements like this: 
As online advertising evolves, we continue to expand our product offerings which may affect our monetization.
As  interactions between users and advertisers change and as online user  behavior evolves, we continue to expand and evolve our product offerings  to serve their changing needs. Over time, we expect our monetization  trends to fluctuate. For example, we have seen an increase in YouTube  engagement ads, which monetize at a lower rate than traditional desktop  search ads. Additionally, we continue to see a shift to programmatic  buying which presents opportunities for advertisers to connect with the  right user, in the right moment, in the right context. Programmatic  buying has a different monetization profile than traditional advertising  buying on Google properties. 
(There is a small subset of their revenue that comes from YouTube Premium, their subscription service (side note: If you can justify it, I extremely recommend this as it makes YouTube 150% more tolerable))
What you’ll also find if you read about YouTube’s CEO, Susan Wojcicki, is that she’s from the advertising world, which is an enormous signal of what YouTube values in leadership: 
Her tenure as C.E.O. wasn’t supposed to be dominated by pedophilia and  attempted mass murder. When she got the job, in 2014, Ms. Wojcicki was  hailed straightforwardly as the most powerful woman in advertising,  someone who’d helped turn on the cash spigots in her time at Google and  would presumably repeat the trick at YouTube. In the five years since,  Ms. Wojcicki has introduced new forms of ads as well as subscription  offerings for music, original content and the cord-cutting service  YouTube TV. But somewhere along the line, her job became less about  growth and more about toxic containment.
YouTube is THIRSTY for advertising money, at all times. Regardless of what users are doing on the platform, as long as it doesn’t impact advertising partnerships, it’s all above board. To wit, in 2018,  
YouTube recently suspended advertising from Paul's YouTube  channel after he shocked a rat with a Taser and joked on Twitter about  ingesting Tide Pods, which are capsules containing laundry detergent.  Weeks earlier, he had filmed himself next to a corpse of a Japanese  suicide victim, a move that was widely criticized. 
Paul responded  with an apology tour, first with a short video, then with a longer  video on suicide prevention. But then he returned with the Tide Pods  tweet and the rat video. 
His infractions count as two strikes,  Wojcicki said during her appearance Monday at the CodeMedia industry  conference. "We can’t just be pulling people off of our platform," she  asserted. 
However, when brands pull content from YouTube, YouTube changes its tone: 
Following the exodus of some of its high-profile advertisers, Google has  publicly apologized and pledged to give brands more control over where their ads appear.
The problem is, though, that since Google and Facebook control 90% of advertising online, advertisers always come back.  They have to. 
If the storyline is feeling a bit worn, that's because it is. Many  major brands fled the Alphabet-owned platform due to content issues in  2017, but they’d largely returned before this latest scandal, drawn by its 1.8 billion users and attractive targeting tools. 
In  the case of AT&T, the timing of these new revelations is especially  on-the-nose: The company had only just announced in January that it  would start advertising again on YouTube after a two year hiatus spurred by having its ads play on videos featuring disturbing material like hate speech and violent extremism. 
So really, the most important algorithm at YouTube isn’t collaborative filtering: it’s the endless loop between advertising money and cultural norms. 
This puts Susan Wojcicki in an interesting position: she has to make advertisers happy, but also kind of not mess up the platform, but also still continue to push the recommendation system since that’s where views come from. So, she has to kind of low-key excuse some of the absolute atrocities on the platform to keep it relatively “open”, while at the same time putting on a happy face for users and an even happier song and dance for advertisers. 
No recommendation system, no PhD, and no complete set of data can fix this business problem. 
Art: Blam, Roy Lichtenstein 1962
What I’m reading lately: 
How Instagram accidentally started recommending emojis
I just finished reading An Elegant Puzzle. Still mulling over it, but expect a review within the next several weeks. 
This thread about open source maintenance: 
James Turnbull @kartar
People: "omfg only 5 people maintain pip?!?! How can that work?"
Me: sigh "It works because those people, who cop shit for every bug and get near zero thanks for fixes, bust their guts to make pip work for a largely ungrateful and entitled audience." #opensource4:26 AM ∙ Jul 6, 2019
3,125Likes518Retweets
This summary by Peter Norvig about some points Noam Chomsky made on NLP
Schnitzel Cat aka Dmitriy Mandel @mndl_nycFascinated by this debate between Peter Norvig & Noam Chomsky on whether real science is insight or statistical models. It's from 8 yrs ago, but these questions will be with us for a long time to come.
norvig.com/chomsky.html
#ML
norvig.comOn Chomsky and the Two Cultures of Statistical Learning6:14 PM ∙ Jul 5, 2019
24Likes6Retweets
 SQL EVERYWHERE - a paper. 
This essay about Odessa was so good I’m going to have to buy the book. 
About the Author and Newsletter
I’m  a data scientist in Philadelphia. This newsletter is about tech and everything around tech. Most of my free time is spent kid-wrangling,  reading, and writing bad tweets. I also have longer opinions on things.  Find out more here or follow me on Twitter.  
If you like this newsletter, forward it to friends!
Subscribe now

                            Don't miss what's next. Subscribe to Normcore Tech:

            Email address (required)