Improving the Working Practices of Humans in the Loop of Artificial Intelligence - An Experimental Study with Online Image Curators

Fabian Stephany, Johann Laux

Zusammenfassung
AI technologies, like Chat GPT or Dall-E, require human curation of eradicating violent, racist or non-functional content from recognition or recommendation systems. But which features determine the output of "humans in the loop"? Our study shows that precise rules and better pay significantly improve the performance of human oversight.
Vortrag
Englisch
Conference

In January 2023, a Time article by Billy Perrigo revealed that OpenAI used Kenyan online workers to make ChatGPT less toxic. The investigation shows that thousands of human curators were needed to erase hateful, racist, and violent content from the AI Chatbot system often under questionable working conditions. This journalistic contribution spotlights a practise that has become standard in the globalised supply chain of AI tools; the role of so-called "humans in the loop" (HITL) curating text or images that would then fuel natural-language or image processing tool, which we all use on a daily basis. Acknowledging that AI tools will in the near future continue to rely on human curation, the investigation furthermore sparked a discussion on the "adequate" working conditions of the micro-task workers, which is the starting point of our study presented here.

Our work investigates the effects of different types of instructions on the accuracy of image classification tasks performed by content moderators from online labour markets. Specifically, we conducted an A/B test of classifying images under rules, incomplete rules, or standards and different payment structures for 300 online workers. To account for participants' varying backgrounds and experience levels, we survey them as to their educational and occupational background and experience level. This provides valuable information for understanding to what extent factors such as skill and experience influence accuracy in image classification tasks — compared to the type of instructions given, and the present or absence of additional monetary incentives. 

Apart from the immensely important normative debate on the working conditions of humans in the loop, we show that difference in instructions and higher wages actually lead to better outcomes in the human curation of AI technologies, such as image recognition software. The findings suggest that it could be in the economic interest of tech companies employing or outsourcing HITL work to provide "click-workers" with better instructions and a higher wage for their tasks. These insights will most likely become more relevant in the near future, as AI tools, such chatbots and image recognition technologies, will be more and more incorporated into the workflow routines of our economy.