Facebook underestimates the issue of languages on its platform

KRAKOW, POLAND - 2018/09/07: Facebook logo is seen trough a magnifying glass on a computer screen. (Photo by Omar Marques/SOPA Images/LightRocket via Getty Images)

With a flood of a growing number of languages as mobile phones bring internet and social media connectivity to every corner of the globe, Facebook continues to struggle with hate speech and other types of problematic content on its platform.

Facebook started localizing itself a little over a year and a half ago. The company has used artificial intelligence to improve its machine translation models, allowing itself to recognize and translate a greater number of new languages.

Here are some interesting data:

  • “40 percent of Facebook users are not using English.
  • More than 70 percent of Facebook users are outside the United States.
  • It reaches more than 10 percent of the total national population in 26 countries.
  • Facebook is available in 43 languages and is in the process of being translated into another 60 languages. […]
  • 25,000 volunteers helped translate Facebook into Turkish last year, and there are now 9 million Turkish-language users signed up for Facebook.
  • Facebook is working on five Indian languages, including Tamil, Punjabi, and Hindi.”

“Translating more content in more languages also helps us better detect policy-violating content and expand access to the products and services offered on our platforms”, says the company.

The novelty is that Machine Translation (MT) systems started using monolingual corpora only, which can help computers to translate more languages including low-resource languages.  

However, talking about the language digital divide, Translators Without Borders affirms: “Machine translation requires vast amounts of data to be effective. At a minimum, 4-5 million strings of data are needed to build a successful machine translation engine, although some professionals recommend at least 100 million strings.

Size of publicly available language datasets

Facebook is currently the most powerful social media platform. Therefore, It has a big job on its hands, moderating millions of reported posts each week.

But how can the company monitor billions of posts per day in more than 100 supported languages without disturbing the endless expansion that is fundamental to its business? And how about the content produced in not less than 6900 unsupported languages? Is the company putting profit before user safety?

“Extreme content” on Facebook varies from videos of violent abuse to hate speech to posts about self-harm or suicide. The Guardian published many examples of policy and raised the different challenges facing content moderators in its Facebook Files. In this respect, it is clear that the problem of minority languages is almost ignored.

While many hate speech reports are rejected because of Facebook’s failure to support all user languages and understand different cultural contexts, the problem of underrepresented languages is slow to be addressed at a time when social networks are opening up to more and more idioms.