Facebook’s AI Won’t Fix Zuck’s Problems, But Third Party’s AI Could

In testimony yesterday Mark Zuckerberg said they were working on AI to solve a lot of its problems.

How long will it take for Facebook to fix its problems?

Zuckerberg set a timeline of 5 years.

Let that sink in. FakeNews, bullying, human trafficking, hate speech, 5 years away, and that those solutions wouldn’t be transparent. Why? Because language is hard and Facebook is investing in neural networks not technologies designed for language (epistemology, language heuristics, and self-generating rubrics).

Why isn’t Facebook just using these other technologies?

Because they hired rather than bought.

In Silicon Valley every database admin is taking Coursera courses about Neural Networks and worshiping at the Altar of Andrew Ng. Neural Networks are not good at language, and the language based acquisitions (Wit.Ai) didn’t include any proprietary technologies that moved their ability do solve big problems forward.

Why won’t Neural Networks work?

When working with a Neural Network things are generally simplified first. An image of a few hundred thousand pixels is broken into a vocab of a few dozen shapes represented a few dozen times each and the Neural Network uses those to learn.

This doesn’t work with language. There are millions of words (Recognant uses about 2 million) those words could have multiple meanings, which works out to about 8 million word senses.

These can’t be reduced. Neural Networks don’t typically determine hundreds of thousands (let alone millions) of outcomes. This means that a Neural Network needs to be trained for each of the scenarios they want to prevent. For example while it is easy to say, “we want to prevent bullying” but what you actually want to prevent is:

Fat shaming
Skinny shaming
Race shaming
Transgender shaming
Cis-gender shaming
Wealth shaming
Poverty shaming
Political shaming
Antisemitism
Islamophobia
Misogyny
Geo-shaming
Collegiate Shaming
If you think about the above list you will come up with a dozen types of bullying that aren’t on the above list.

Can’t you just tell if people are being mean?

An epistemology can, but a Neural Network can’t. A neural network looks at the individual words, and can’t tell a positive use from a negative use which means that each of the types of bullying has to be trained separately.

Can’t you just set flag words and block them?

Pure word blocking doesn’t tell the difference between, “What sex are your new puppies” and “he has sex with puppies.”

Users also get around word lists by doing things like $ex or s3x.

But a Neural Network isn’t that stupid… Right?

“That is a dope-ass hat” and “That is a dope ass-hat” are both the same to a Neural Network. To a Neural Network if “skinny” is positive in one context it is positive in all contexts, so “you’re a fucking skinny bitch” is super positive because a Neural Network learns that “fucking” before a positive is “extra” positive.

To make neural networks smarter they often use N-grams. (Which is how your Autocorrect/Autosuggest works on your phone’s keyboard. “Skinny bitch” can be flagged as negative if it has come up enough times. The challenge in that is that 2 million words becomes a billion bi-grams, and 10 trillion trigrams. There generally isn’t enough data to train a network to understand all of those combinations (which is one of the reasons Autosuggest is often so funny).

But Neural is getting faster, how do you know it will take so long?

While I tell people that Neural Networks that comparing them to human brains is stupid (and it is)… Using the biological analogy, the nuances of sarcasm, taunting, and the code language of bullying is something that requires the equivalent of an 8th grader’s understanding English. That’s 86 billion neurons times 14 years. Depending on the number of layers, and the complexity of each layer, the cost could vary, but Nvidia is putting 24 Trillion operations per second in their self driving car computer (Which is roughly like a 6 week old rat running a maze level of compute, simulating 200M neurons). The GPU cost is exponential such that each doubling of neurons quadruples the cycle cost. This means that the 430 times as many neurons works out to 215 doublings (that’s 5 with 64 zeros after it) more expensive. That’s for a system that processes content at the speed of an eighth grader. So a single employee replacement would cost 52 vigintillion times as much as the computer in a self driving car. That’s about 50 Jupiters of mass in GPUs (not possible to build).

You can pick just about any amount of simplification and this is still an impossible problem for Neural Networks.

Clearly Facebook knows this?

Possibly. Facebook doesn’t need to solve these problems they just need to look like they are trying. As long as it is believed that the problem can’t be solved, they don’t need to solve the problem they just need to say they are working on it. That was proven yesterday; as long as Facebook is perceived as the leader if they say something is impossible it is believed to be so. Because nobody has the same scale of users even if a solution exists Facebook can say “it doesn’t scale,” and nobody can argue.

So why doesn’t Facebook just go buy all the tech that could fix all these things?

Fixing problems doesn’t actually make Facebook more money. 20 people bullying an individual means 20 people are having fun at the expense of 1. Those 20 people represent more money than the 1 being bullied. So banning them costs Facebook money.

So safety doesn’t make Facebook money?

Nope. Safety is a loss leader. The wilder the platform the more people will use it and the more money that can be made, (mods and AI are expensive). The testimony session went over this multiple times.

“Regulation would be a barrier for small companies to enter the space and stifle competition” in effect Facebook is saying they shouldn’t have to solve these problems because if they did then every competitor would need their own Death Star sized moderation robot to comply with the regulations. This is of course not true.

One of the simplest examples is that Facebook allowed targeting of ads by race, gender, and age, and that is illegal for advertising real estate, housing, and jobs. Enforcing the rules would cost Facebook money, so they didn’t.

Even though Facebook claims to have fixed ad targeting, it is still possible today to eliminate large portions of a race, or age group by using negative and positive interests to remove those races (Not many people under 55 are interested in AARP, or “grandkids” or a host of entertainers from the 1960s).

But regulations make it harder to compete…

Social Media Platforms are communities and early platforms have less issues rather than more. Twitter for instance doesn’t have an online marketplace, so they don’t have to deal with classifieds, not dealing with classifieds reduces the burden of monitoring that people aren’t selling children in said classifieds. The 140 character limit of Twitter also reduced the monitoring requirement as it is much harder to put up a message in code with so few characters. One of the difficulty Facebook faces is that the breadth of their product means more rules to follow, and more places for users to exploit the system.

During testimony it became apparent Facebook has no way to know who is buying ads. Simply requiring a business to have a presence in the US doesn’t prove the money isn’t coming from Russia, A new social media company likely wouldn’t sell its own ads, and so the onus to make sure that ads comply with FTC rules wouldn’t fall on the new startup, but rather the ad provider.

For direct buys an early company is more likely to have direct contact with the buyer, and is therefore more likely to know if the buyer is in the US.

Facebook is huge why can’t they solve all these things quickly?

A small agile company would be working on maximizing their reach in 1 or 2 markets. Facebook is working on maximizing every market. Much as I said earlier about they are profitable therefore they aren’t trying to stop bad things to the fullest extent, Facebook has an issue with scale.

They offer almost every feature in almost every language on the planet. They operate in almost every jurisdiction.

Facebook just by nature of its scale has to be breaking laws every day, because they can’t be aware of every law they need to comply with, or offer a Terms of Service that stays up to date in every language and jurisdiction they operate. For this reason their ToS has links in it to other documents. Some of those documents the link isn’t even valid because things have changed along the way. So at any given moment you may or may not have to comply with some part of the ToS because it may or may not be actively linked at the moment.

So can you really solve these things?

Recognant already has solutions that are doing these things. Recognant does automated moderation, and finds human traffickers. We have been integrated in solutions for fighting gender and race discrimination, bullying, and copyright violations. We offer our own products for detecting fake news. We have also previously been in the ad fraud detection, and ad rules enforcement business using AI based solutions. So yes, while we are limited to English we do solve many if not all of the problems facing Facebook, Porting the solutions to other languages is simpler than building them the first time, and we’ve only spent about $4.5M on development.

We aren’t going to fix the ToS issues. And we aren’t going to fix Facebook does bad things on purpose. The Cambridge Analytica “breach” wasn’t because Facebook is missing AI, or doesn’t have a technology to stop bad things, it happened because Facebook cared more about growth, and owning unified login so they could collect data, than they cared about users or privacy. We can’t fix that.

But the technology hurdles? Yeah, we have that covered.

Disclaimer: Brandon Wirtz is CEO of www.Recognant.com, and AI company that provides solutions for content moderation, preventing human trafficking, and automating research.