Investigating Reliability and Stability of Crowdsourcing and Human Computational Outputs based on Artificial Intelligence

Main Article Content

Zeeshan Rasheed, Naeem Ahmed Ibupoto


            Crowdsourcing and human computational outputs provide a scalable and convenient way to perform different human generated tasks use to evaluate the effectiveness of different crowdsourcing platforms based on artificial intelligence. The crowdsourcing generated datasets depend on multiple factors including quality, task reward and accuracy measuring filters. The present was designed to evaluate reliability and stability and consistency of crowdsourcing platforms outputs. We conduct a longitudinal experiment over specified time and two crowdsourcing platforms, Amazon Mechanical Turk and CrowdFlower to demonstrate how the outcomes of reliability varies considerably across platforms whereas repeating tasks over same platform yields consistent results. Different tasks on three different datasets were performed to evaluate the quality of the task interface, employees' experience level supplied by the platform and the evaluation of aquracy of outcomes dependent on the task completion time. The outcomes revealed significant (p<0.05) preeminence of MTurk over CrowdFlower in terms of reliability, accuracy and completion time taken for a task. The tasks replicated on these two platforms showed significant difference in quality based outcomes. The data quality of same repeated tasks over different time was stable in the same platform while it’s was different for different crowdsourcing platform over differ time span.  It was concluded from the findings that by employing standard platform crowdsourcing settings varying order and magnitude of task completion on different platforms can easily be achieved with varying levels of accuracy

Article Details