JOURNAL ARTICLE
Keywords: Fairness Audit, AI-generated Text, AI-detectors, ESL pattern bias
Abstract: This study investigated the classification fairness at the threshold level of four commercially available AI detection tools on the Internet: Copyleaks, ZeroGPT, Scribbr, and Quillbot Premium. The research included the submission of three distinct chunks of texts (N=1212) of between 400-5oo words for evaluation. The writing texts came from fully AI-generated examples (N=307), prompted between 2024 and 2025, and published human-written texts (N=302), and ESL graduate student texts (N=303) written before 2021. The texts were analyzed using binary classification thresholds to determine how the three free devices (Copyleaks, ZeroGPT, Scribbr) and the one paid service (QPremium) performed when checking for potentially AI-generated material in each of the writing examples. The study employed a performance metrics to illustrate the issue with threshold application in such devices. The research included the use of the Chi-square test of independence as well as other inferential statistics to assess inter-detector consistency and potential bias patterns. The results indicated that such devices perform well in identifying AI-generated text written artificially; however, significant disparities emerged in the misclassification of human texts. In particular, AI detectors disproportionally flagged ESL writing with false positives. Such findings illustrate the importance of such fairness audits in assessing the linguistic sensitivity in such tools, especially in the educational setting, where misclassification can have academic or reputational consequences.
Article Info: Received: 11 Sep Aug 2025, Received in revised form: 09 Oct 2025, Accepted: 14 Oct 2025, Available online: 18 Oct 2025
DOI: 10.22161/ijtle.4.5.5
| Total View: 220 | Downloads: 1 | Page No: 30-45 | ![]() |