Facebook has amassed a list of swear words believed to be the world's largest, after trawling billions of posts for hate speech left it with a compendium of profanities in almost every language.

The social media network - with more than 2.3 billion users - has long struggled to moderate the torrent of new content it publishes every day.

It now deploys a combination of artificial intelligence and 15,000 human reviewers to block anything "that describes or negatively targets people with slurs".

In doing so, however, it has generated an immense list of foul language which its human "reviewers" can refer to in order to enforce what Facebook calls its "community standards".

Advertisement

The list's existence was confirmed by a Facebook source, who added that it would never be made public "to avoid people gaming the system by misspelling the word in a way that it is still recognisable..."

Earlier this year, Mike Schroepfer, the Facebook chief technology officer, told a conference that the network was engaged in "an intensely adversarial game" when it came to moderating content.

"We build a new technique, we deploy it, people work hard to try to figure out ways around this."

Facebook has an ever-expanding glossary of key words it uses to hunt down hate speech. Illustration / Rod Emmerson
Facebook has an ever-expanding glossary of key words it uses to hunt down hate speech. Illustration / Rod Emmerson

Memes, images and slang have presented challenges to its attempts to regulate posts, and led to tortuous definitions about what does and does not constitute "hate speech".

Currently it is defined as a "direct attack on people based on 'protected characteristics' -race, ethnicity, national origin, religious affiliation, sexual orientation, caste, sex, gender, gender identity, and serious disease or disability."The site bans reference or comparison to "insects and animals that are culturally perceived as intellectually or physically inferior".

But it accepts that the concept of offensive material is slippery - for example, in the way some slurs are adopted by their subjects and "used self-referentially or in an empowering way".

It is not just words. The site has developed a large-scale machine learning system named Rosetta to scan text embedded in pictures posted by users.

This works by deploying sophisticated analysis of the picture alongside its text, to detect whether the "meaning" conveyed by the two in combination is offensive or not. Rosetta is currently thought to be evaluating more than 1 billion images each day.

Advertisement

But Facebook concedes that scanning text in videos present significant challenges. In the first quarter of 2019, the site said it took action on 4 million pieces of content, up from 2.5 million in the same period of 2018.

Of those, it intervened before users reported any offence in 65 per cent of cases, up from 38 per cent in the similar period in 2018.

"Violating material" that is not found before users spot it is passed to 15,000 or so moderators - native language speakers who, the source said, "collectively speak almost every language widely used", and work in more than 20 locations, including America and Ireland, Germany, the Philippines and Spain.