User engagement platform Vuukle has tied up with Google to help develop a method for publishers to better combat online trolls and derogatory and hateful comments on the online comment sections. Google’s Jigsaw, an organisation which claims to “make technology safer” for users had earlier launched Perspective API—an artificial intelligence (AI) based tool that provides numeric feedback or ‘toxicity rating’ (on a scale of 1-100) to users by analysing their text or comments.

For example: if a person types “you are an idiot”, the toxicity rating will be higher at above 90%, denoting that the statement being made is offensive. But if the same person types “you are not polite”, the tool will return a lower toxicity rating which could be below 10%. Publishers can integrate this tool by default into their comment section and place a minimum toxicity threshold of their choice (e.g. 75%) to make sure that each comment meets the threshold. This, according to Vuukle will not only help combat hate speech, but also remove the requirement of a human moderator.

Who developed the tool

The Perspective API kit was developed by Jigsaw using ‘Conversations AI’, an open source artificial intelligence platform that helps develop products/softwares to improve real-time conversations. Using the Perspective API, Vuukle partnered with Google to help launch the product for publishers in India. Currently, the ‘toxicity rating’ tool is live with publishers like NBC-2, KEYC, Daily Sun, Ozee, BGR, Asian Age and Deccan Chronicle.

How publishers can use the tool

Vuukle said in a blog post that any publisher can integrate the tool into their comments section without having to edit the backend or making changes to the code. This is because Vuukle already has a comment box plugin that can be downloaded and integrated into websites of any publisher/ blogger/online news company. “The entire number crunching is done on Google’s server and Vuukle provides the integration into its comments widget,” Vuukle added.

How the tool was developed

In short, the Perspective API is able to provide toxicity ratings by analysing millions of comments and texts and then employing AI to make the final decision of whether a comment is ‘polite’ or ‘hateful’. “The Jigsaw project collected millions of annotated comments in editorial discussions from Wikipedia,” Vuukle said on its blog. These comments are taken from the “talk” pages in Wikipedia, and one can notice several derogatory and hateful comments being made, primarily due to the lack of an active moderating system. This method was then used to gather comments, texts from all other publishers/websites with a comment section:

“More often than not, editors are constantly bickering over what stays and what gets deleted. Add to this millions of user comments on New York Times and other online publishers’ stories. This, along with human annotations, resulted in enough data to train deep learning neural networks,” Vuukle added.

Removing hate content still requires human mods

An AI-based feature cannot be the only entrusted method to combat hateful content. If a user is asked to re-write a comment based on the AI’s judgment, then this can take the form of censorship in case the AI misinterprets the text. Secondly, we need to understand that the AI-based tool is able to filter out “hateful” speech, using human programmers/developers themselves. Therefore, the accuracy of what is “hateful” and what’s “not hateful” depends on human decision-making. Although we agree that the tool can outpace any human moderator, the tool itself will require moderation to make sure that the AI does not “misinterpret” comments as hateful.

For e.g. the below screenshot shows that by only typing the words “Trump”, the tool responds with a 16% toxicity rating. There is no explanation as tho why with the mere usage of the word “Trump” returns a 16% toxicity value; this should ideally be less than 1%.