Comparative Analysis of Psychometric Profiling of AI-Generated and Teacher Constructed Multiple-Choice Question Items

Glory Evans; Udoh Evans; John Okon Esin

doi:10.66545/s7vd5d22

Authors

Glory Evans Akwa Ibom State College of Education Author
Udoh Evans Author

Competing Interests

The authors do not have any conflict of interest in respect to the manuscript.
John Okon Esin Department of Nautical Science Maritime Academy of Nigeria, Oron Author

Competing Interests

None

DOI:

https://doi.org/10.66545/s7vd5d22

Keywords:

Multiple-choice questions, AI-generated items, teacher-made items, achievement test, Psychometric profiling

Abstract

The emergence of artificial intelligence (AI) in educational assessment has created new opportunities for automating test item development, yet questions remain regarding the psychometric soundness of AI-generated instruments compared to teacher-constructed tests. This study undertook a comparative analysis of the psychometric properties of multiple-choice questions (MCQs) generated by an AI language model and those designed by experienced teachers. Using a quasi-experimental design, two parallel test forms of 30 items each were administered to 300 senior secondary school students in Akwa Ibom State. Parameters analyzed were difficulty index and discrimination index, while test reliability was examined using KR-20, and validity was through content alignment and correlated with external achievement scores. Results revealed that both AI-generated and teacher-made tests achieved acceptable reliability coefficients (α = 0.746 and α = 0.949, respectively). Teacher-made items demonstrated slightly superior discriminating indices (mean = 0.45) compared to AI-generated items (mean = 0.41), whereas AI-generated items exhibited a more balanced difficulty level, with 74% falling within the optimal difficulty range compared to 69% of teacher-made items. The findings indicate that AI-generated MCQs can produce psychometrically sound items comparable to teacher-made ones, though refinement is needed in discriminative power and distractor plausibility. This study concludes that AI holds promise as a supportive tool for large-scale item generation, but human expertise remains essential for ensuring validity and alignment with pedagogical intent.

Author Biography

Udoh Evans

Chief Lecturer, Department of Research and Strategic Development, Maritime Academy of Nigeria, Oron

References

Adesina, I.O (2024). The Role of Artificial Intelligence in Teaching of Science Education in Secondary Schools in Nigeria. European Journal of computer Science and Information Technology, 12(1), 57-67.

Adom, D., Adu Mensah, J., & Dake, D. A. (2020). Test, measurement, and evaluation: Understanding and use of the concepts in education. International Journal of Evaluation and Research in Education, 9(1), 109–119. https://doi.org/10.11591/ijere.v9i1.20457.

Attali, Y., LaFlair, G. ,&Runge, A. (2023). A new paradigm for test development (Duolingo webinar series). https://www.youtube.com/watch v=rRc960e9bzk&t2s

Asim, A.E., Evans, G.U., Idaka, I.E. (2020). Analysis of multiple-choice item format and Secondary School Student Achievement in Mathematics in Akwa Ibom State, Nigeria, African Journal of Theory and Practice of Educational Research (AJTPER) 8, 58-72.

Bsharat, T. & Khlaif, Z. (2024). Generative AI-Powered Adaptive Assessment. In (pp. Pages 430) https//doi.org/10.4018/979-8-3693-6397-3.

Dempere, J., Modugu, K., Hesham, A., & Ramasamy, L. K. (2023). The impact of ChatGPT on higher education. Frontiers in Education. Edtech, (2020). Successful AI Example in Higer Education that Can Inspire Our Future EdTech Magazine

Espinoza M, F. E., Arenas R., B. D. V., Aparicio, F. & Zúñiga O, D. C. (2021). Road safety perception questionnaire (RSPQ) in Latin America: a development and validation study. International Journal of Environmental Research and Public Health, 18(5), 2433. https://doi.org/10.3390/ijerph18052433.

Evans, G. U. (2016). Students’ perception of multiple-choice item format and Mathematics achievement test in junior secondary schools in Uyo Educational Zone of Akwa Ibom State, Nigeria. Academic Journal of Educational Research, 4(7), 111-116.

Evans, G. U., Uko, M. P. & Ekim, R. E. D. (2022). Investigation of differential item functioning of basic education certificate examination (BECE) Mathematics items in Akwa Ibom State. The African Journal of Behavioural and Scale Development Research, 4(1), 62-75.

Farazouli, A., Cerratto-Pargman, T., Bolander-Laksov, K., & McGrath, C. (2023). Hello GPT! Goodbye home examination? An exploratory study of AI chatbots impact on university teachers’ assessment practices. Assessment & Evaluation in Higher Education, 1-11.

Georgia Department of Education. (2017). An assessment & accountability brief: 2016-2017 Georgia milestones validity and reliability.

Joshua, M. T. (2013). Fundamentals of Test and Measurement in Education. ANITA Press, Eyo-ita, Calabar, Nigeria.

Kaipa, R. M. (2021). Multiple choice questions and essay questions in curriculum. Journal of Applied Research in Higher Education, 13(1), 16-32.

Kaplan, J.D. & Haenlein, G. (2019). Language model are few-shot learners. Advance in Neural Information Processing Systems 33 (NeuraIPS2020). https://proceedings Neurips.cc/paper./2020hash/1457c0d6bfeb4967418bfb8ac142f64a-Abstract.html.

Kolak, A. (2014), Teachers’ attitude towards evaluation process. Retrieved on 20/07/2017 from www.google/hrcak.srce.hr.

Milicevice, V., Lazarova, L. k & Pavlovic, M. J. (2024). The Application of Artificial Intelligence in Education-The current State and Trends. International Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE), 12(12), 259-272.

Naqvi, (2020). Artificial intelligent for adult, forensic accounting and valuation: A strategic perspective. John Wiley & Sons. https://doi.org/10.1002/97811119601906.

Obinne, A.D.E. (2011). A Psychometric Analysis of Two Major Examinations in Nigeria: Standard Error of Measurement. Retrieved 7-7-2015 from https//sites.google.com/site/hopecep900/rdp-1/annotated-referemces.

Omole, D. O. K. (2012). A comparative Analysis of the Psychometric Characteristics of JAMB, NABTEB, NECO AND WAEC Conducted Biology Examinations in Nigeria. Keffi Journal of Educational Studies (KEJES). A Publication of the Faculty of Education Nasarawa State University Keffi Nigeria 3, (1), June 2012.

Ryan, A., Judd, T., Swanson, D., Larsen, D. P., Elliott, S., Tzanetos, K., & Kulasegaram, K. (2020). Beyond right or wrong: More effective feedback for formative multiple-choice tests. Perspectives on Medical Education, 9, 307-313.

Sallam, M., Salim, N. A., Barakat, M., & Ala'a, B. (2023). ChatGPT applications in medical, dental, pharmacy, and public health education: A descriptive study highlighting the advantages and limitations. Narra J, 3(1).

Schwartz, R. Vassiley, A., Green, K., Perine, L. Burt, A. & Hall, P. (2022). Towards a standard for identifying and managing bias in artificial intelligence. NSIT special publication, 1270(10.6028).

Tan, D. A., & Cordova, C. C. (2019). Development of Valid and Reliable Teacher-Made Tests for Grade 10 Mathematics. International Journal of English and Education, 8(1)

Ubi, I.O. &Udemba, E.C. (2021). Age Differentials in Calibrated items of WAEC English Language Objective Test Taken by Students in Nigeria. Global Journal of Educational Research, 20(1), 45-54. 6th August. Publisher AJOL www.global journal series.com; global journalseries@gmail.com.

Udemba, E.C. Jacob, E.O.& Oluwayemisi, D. A. (2024). Psychometrics Properties of Artificial Intelligence (CHATGPT Bread Economics Multiple- Choice items. African Journal of Theory and Practice of Educational Assessment,13(2) 26-35.

Ukwuije, R P.I (2012). Educational Assessment: A Sine Qua Non for quality Education 83rd Inaugural Lecture, University of Port Harcourt, Choba.

Vasconcelos, M.A.R. &Dos Santos, R. P. (2023). Enhancing STEM learning with Chart GPT and Bing Chat as objects to think with: A case study. EURASIA Journal of Mathematics, Science and Technology Education. 19(7), https://doi.org/10.29333/ejmste/13313.

Xu, W. Meng, J.,Raja, S.K.S., Priya, M.P,& Kirurhiga Devi, M. (2023).Artificial intelligence in constructing personalized and accurate feedback system for students. International Journal of Modeling, Simulation, and Scientific Computing14(01),2341001.

Zatul, T. (2020). Investigating reliability and validity of student performance assessment in Higher Education using Rasch Model. Journal of Physics: Conference Series, 1529, 042088.

Zeeshn, .M.,Iqbal, M.,Sahibzada, S. Malik, G. M. (2023).A Comparative Analysis of Psychometric Properties in AI-Generated and Teacher-Made MCQs Kurdish Studies 12(4).188-18200 www.kurdishstudies.net DOI:10.53555/ks.v12j4.3653

Zong, M. & Krishnamachari, B. (2022). A survey on GPT-3. Preprint. https://dio.org/10.48550/arXiv.2212.00857.

Comparative Analysis of Psychometric Profiling of AI-Generated and Teacher Constructed Multiple-Choice Question Items

Authors

DOI:

Keywords:

Abstract

Author Biography

References

Downloads

Published

Issue

Section

How to Cite

Similar Articles

Latest publications

Browse

Make a Submission

Information