A Feasibility Study for Developing a Computerized Adaptive Version of Verbal Ability Test for Gulf Student

PDF (779KB), PP.44-57

Views: 0 Downloads: 0

Author(s)

Mohammed Al Ajmi 1,* Siti Salina Mustakim 2 Samsilah Roslan 2 Rashid Almehrizi 3

1. Faculty of Educational Studies, Universiti Putra Malaysia, Selangor, Malaysia

2. Faculty of Educational Studies, Universiti Putra Malaysia, 43400 Serdang, Selangor, Malaysia

3. College of Education, Sultan Qaboos University, Alhouz, Muscat, Sultanate of Oman

* Corresponding author.

DOI: https://doi.org/10.5815/ijeme.2024.06.04

Received: 13 Feb. 2024 / Revised: 25 Mar. 2024 / Accepted: 10 May 2024 / Published: 8 Dec. 2024

Index Terms

computerized adaptive testing, verbal ability, item response theory, three-parameter logistic model, marginal reliability

Abstract

Employing Computerized Adaptive Testing (CAT) to evaluate verbal ability symptoms proves advantageous over traditional tests by delivering heightened measurement precision and reducing the testing burden. The CAT-Verbal Ability, developed from a large sample of 2689 participants in Gulf countries, underwent meticulous item bank development, ensuring unidimensionality, local independence, and investigating differential item functioning (DIF). The CAT-Verbal Ability item bank has high content validity, is unidimensional, locally independent, and does not have DIF; these outstanding psychometric qualities were confirmed by CAT simulations that were based on real data. With just 14 items needed, CAT simulations showed a high degree of measurement accuracy (r=0.73). In addition to being a psychometrically sound instrument, the proposed CAT-Verbal Ability demonstrated acceptable marginal reliability, criterion-related validity, sensitivity, and specificity. This makes it an efficient assessment method that reduces testing burden while maintaining information integrity, and it also saves time.

Cite This Paper

Mohammed Al Ajmi, Siti Salina Mustakim, Samsilah Roslan, Rashid Almehrizi, "A Feasibility Study for Developing a Computerized Adaptive Version of Verbal Ability Test for Gulf Student", International Journal of Education and Management Engineering (IJEME), Vol.14, No.6, pp. 44-57, 2024. DOI:10.5815/ijeme.2024.06.04

Reference

[1]Hajovsky, D. B., Villeneuve, E. F., Reynolds, M. R., Niileksela, C. R., Mason, B. A., & Shudak, N. J. (2018). Cognitive ability influences on written expression: Evidence for developmental and sex-based differences in school-age children. Journal of School Psychology, 67, 104–118. https://doi.org/10.1016/j.jsp.2017.09.001
[2]Weiner, B. J., Lewis, C. C., Stanick, C., Powell, B. J., Dorsey, C. N., Clary, A., Boynton, M. H., & Halko, H. M. (2017). Psychometric assessment of three newly developed implementation outcome measures. Implementation Science, 12(1). https://doi.org/10.1186/s13012-017-0635-3
[3]York, T., Gibson, C., & Rankin, S. (2015). Defining and measuring academic success. Practical Assessment, Research and Evaluation, 20(5), 1–20. https://doi.org/10.7275/hz5x-tx03
[4]Axelrod, V., Rees, G., & Bar, M. (2017). The default network and the combination of cognitive processes that mediate self-generated thought. Nature Human Behaviour, 1(12), 896–910. https://doi.org/10.1038/s41562-017-0244-9
[5]Fabiano‐Smith, L. (2019). Standardized tests and the diagnosis of speech sound disorders. Perspectives of the ASHA Special Interest Groups, 4(1), 58–66. https://doi.org/10.1044/2018_pers-sig1-2018-0018
[6]Irwing, P., Booth, T., & Hughes, D. J. (2018). The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test Development. John Wiley & Sons.
[7]De Stasio, S., Fiorilli, C., & Di Chiacchio, C. (2014). Effects of verbal ability and fluid intelligence on children’s emotion understanding. International Journal of Psychology, 49(5), 409–414. https://doi.org/10.1002/ijop.12032
[8]Cappelleri, J. C., Lundy, J. J., & Hays, R. D. (2014). Overview of Classical test Theory and item response Theory for the quantitative assessment of items in developing Patient-Reported Outcomes Measures. Clinical Therapeutics, 36(5), 648–662. https://doi.org/10.1016/j.clinthera.2014.04.006
[9]Alzayat, F., Almahrazi, R., Arshad, A., Fathi, K., Albaili, M., dogan, A., Asiri, A., Hadi, F., & Jassim, A. (2011). Technical report of the Gulf Scale for Multiple Mental Abilities (GMMAS). Arab Gulf University, Bahrain.
[10]Forkmann, T., Boecker, M., Norra, C., Eberle, N., Kircher, T., Schauerte, P., Mischke, K., Westhofen, M., Gauggel, S., & Wirtz, M. (2009). Development of an item bank for the assessment of depression in persons with mental illnesses and physical diseases using Rasch analysis. Rehabilitation Psychology, 54(2), 186–197. https://doi.org/10.1037/a0015612
[11]Brodke, D. J., Saltzman, C. L., & Brodke, D. S. (2016). PROMIS for orthopaedic Outcomes measurement. Journal of the American Academy of Orthopaedic Surgeons, 24(11), 744–749. https://doi.org/10.5435/jaaos-d-15-00404
[12]Sedoc, J., & Ungar, L (2020). Item Response Theory for Efficient Human Evaluation of Chatbots. Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, 21–33. http://dx.doi.org/10.18653/v1/ 2020.eval4nlp-1.3.
[13]Young‐Afat, D. A., Gibbons, C., Klassen, A. F., Vickers, A., Cano, S., & Pusic, A. L. (2019). Introducing BREAST-Q Computerized Adaptive Testing: Short and Individualized Patient-Reported Outcome Assessment following Reconstructive Breast Surgery. Plastic and Reconstructive Surgery, 143(3), 679–684. https://doi.org/10.1097/prs.0000000000005314
[14]Choi, Y., & McClenen, C. (2020). Development of adaptive formative assessment system using computerized adaptive testing and dynamic Bayesian networks. Applied Sciences, 10(22), 8196. https://doi.org/10.3390/app10228196
[15]Falani, I., Akbar, M. and Naga, D. S. (2020). The precision of students’ ability estimation on combinations of item response theory models. International Journal of Instruction, 13(4), 545–558. pp. https://doi.org/10.29333/iji.2020.13434a
[16]Morris, S. B., Bass, M., Howard, E. V., & Neapolitan, R. E. (2019). Stopping rules for computer adaptive testing when item banks have nonuniform information. International Journal of Testing, 20(2), 146–168. https://doi.org/10.1080/15305058.2019.1635604
[17]Ghosh, A., & Lan, A. S. (2021). BOBCAT: Bilevel Optimization-Based Computerized Adaptive Testing. International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2021/332
[18]Magis, D & Barrada, J. R. (2017). Computerized adaptive testing with R: recent updates of the package catR. Journal of statistical software, 76. https://doi :10.18637/jss.v076.c01
[19]Istiyono, E., Dwandaru, W. S. B., Setiawan, R., & Megawati, I. (2020). Developing of Computerized Adaptive Testing to Measure Physics Higher Order Thinking Skills of Senior High School Students and its Feasibility of Use. European Journal of Educational Research, 9(1), 91–101. https://doi.org/10.12973/eu-jer.9.1.91
[20]Şahin, A., & Anıl, D. (2017). The effects of test length and sample size on item parameters in item response theory. Educational Sciences: Theory & Practice, 17, 321–335. https://doi.org/10.12738/estp.2017.1.0270
[21]Crisan, D., Tendeiro, J. N., & Meijer, R. R. (2017). Investigating the practical consequences of model misfit in unidimensional IRT models. Applied Psychological Measurement, 41(6), 439–455. https://doi.org/10.1177/0146621617695522
[22]Meijer, R. R., & Tendeiro, J. N. (2018). Unidimensional item response theory. In P. Irwing, T. Booth, & D. J. Hughes (Eds.), The Wiley handbook of psychometric testing: A multidisciplinary reference on survey, scale and test development (pp. 413–443). Wiley Blackwell. https://doi.org/10.1002/9781118489772.ch15
[23]Tian, X., & Dai, B. (2020). Developing a computerized adaptive test to assess stress in Chinese college students. Frontiers in Psychology, 11. https://doi.org/10.3389/fpsyg.2020.00007
[24]Reeve, B. B., Hays, R. D., Bjørner, J. B., et al. (2007). Psychometric Evaluation and Calibration of Health-Related Quality of Life Item Banks. Medical Care, 45(5), S22–S31. https://doi.org/10.1097/01.mlr.0000250483.85507.04 [25]Edelen, M. O., and Reeve, B. B. (2007). Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Qual. Life Res. 16, 5–18. doi: 10.1007/s11136-007-9198-0
[26]Pelobillo, G. (2023). Conceptions of Learning Physics among University of Mindanao Students: A Validation Study. International Journal of Instruction, 16(4), 921–938. https://doi.org/10.29333/iji.2023.16451a
[27]Tanaka, J. S., & Huba, G. J. (1985). A fit index for covariance structure models under arbitrary GLS estimation. British Journal of Mathematical and Statistical Psychology, 38(2), 197–201. https://doi.org/10.1111/j.2044-8317.1985.tb00834.x
[28]Wahyudi, A., Richardo, R., Eilks, I., & Kulgemeyer, C. (2023). Development of Three tier Open-Ended Instrument to measure Chemistry students’ critical thinking disposition using RASCH analysis. International Journal of Instruction, 16(3), 191–204. https://doi.org/10.29333/iji.2023.16311a
[29]Li, Z., Cai, Y., & Tu, D. (2020). A new approach to assessing shyness of college students using computerized adaptive testing: CAT-Shyness. Journal of Pacific Rim Psychology, 14, e20. https://doi.org/10.1017/prp.2020.15
[30]Bichi, A., & Talib, R. (2018). Item response theory: An introduction to latent trait models for testing and item development. International Journal of Evaluation and Research in Education, 7(2), 142–151. https://doi.org/10.11591/ijere.v7i2.12900.
[31]Fu, Q. (2010). Comparing Accuracy of parameter Estimation Using IRT Models in The Presence of Guessing [Unpublished Doctoral Dissertation]. University of Illinois at Chicago, USA.
[32]Gao, S. (2011). The Exploration of the Relationship Between Guessing and Latent Ability in IRT Models [Unpublished Doctoral Dissertation]. University of Southern Illinois, USA.
[33]Edwards, M. C., Houts, C. R., & Cai, L. (2018). A diagnostic procedure to detect departures from local independence in item response theory models. Psychological methods, 23(1), 138–149. https://doi.org/10.1037/met0000121
[34]Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265–289. https://doi.org/10.2307/1165285
[35]Wright, B. D., & Masters, G. N. (1982). Rating Scale Analysis. Chicago, IL: MESA Press.
[36]Ferrando, P. J., Vigil-Colet, A., & Lorenzo‐Seva, U. (2016). Practical Person-Fit Assessment with the Linear FA Model: New Developments and a Comparative Study. Frontiers in Psychology. https://doi.org/10.3389/fpsyg.2016.01973
[37]Almehrizi, R. S. (2010). Comparing among new residual-fit and wright’s Indices for dichotomous three -Parameter IRT model with standardized tests. Journal of Educational & Psychological Studies, 4 (2), 14-26.
[38]Almaskari, H. A., Almehrizi, R. S., & Hassan, A. S. (2021). Differential item Functioning of Verbal Ability Test in Gulf Multiple Mental Ability Scale for GCC students according to gender and country. Journal of Educational and Psychological Studies, 15(1), 120–137. https://doi.org/10.53543/jeps.vol15iss1pp120-137
[39]Geramipour, M. (2020). Item-focused trees approach in differential item functioning (DIF) analysis: A case study of an EFL reading comprehension test. Journal of Modern Research in English Language Studies, 7(2),123-147 https://doi: 10.30479/jmrels.2019.11061.1379 
[40]Magis, D., & Raiche, G. (2012). Random generation of response patterns under computerized adaptive testing with the R package catR. Journal of statistical software, 48(8), 1-31. https://doi.org/10.18637/jss.v048.i08
[41]Mahmud, J., Sutikno, M., & Naga, D. S. (2016). Variance difference between maximum likelihood estimation method and expected a posteriori estimation method viewed from number of test items. Educational Research and Reviews, 11(16), 1579-1589. https://doi: 10.5897/ERR2016.2807
[42]Sorrel, M. A., Barrada, J. R., De La Torre, J., & Abad, F. (2020). Adapting cognitive diagnosis computerized adaptive testing item selection rules to traditional item response theory. PLOS ONE, 15(1), e0227196. https://doi.org/10.1371/journal.pone.0227196
[43]Omara, E.,  Kazem, A. (2020). A feasibility study for developing a computerized adaptive form of Raven’s colored progressive matrices test for Omani children based on the item response theory. The International Journal for Research in Education, 44(3), 142–181. https://doi.org/10.36771/ijre.44.3.20-pp142-181
[44]Brown, A., & Croudace, T. (2015). Scoring and estimating score precision using multidimensional IRT. In S. P. Reise & D. A. Revicki (Eds.), Handbook of Item Response Theory Modeling: Applications to Typical Performance Assessment (pp. 307- 333). New York, NY: Routledge/Taylor & Francis Group.
[45]Seo, D. G., & Jung, S. (2018). A comparison of three empirical reliability estimates for computerized adaptive testing (CAT) using a medical licensing examination. Frontiers in Psychology, 9. https://doi.org/10.3389/fpsyg.2018.00681
[46]Smits, N., Cuijpers, P., & Van Straten, A. (2011). Applying computerized adaptive testing to the CES-D scale: A simulation study. Psychiatry Research, 188(1), 147–155. https://doi.org/10.1016/j.psychres.2010.12.001
[47]Kraemer, H. C., & Kupfer, D. J. (2006). Size of treatment effects and their importance to clinical research and practice. Biological Psychiatry, 59(11), 990–996. https://doi.org/10.1016/j.biopsych.2005.09.014
[48]Atchia, S. M. C., & Chinapah, V. (2023). Factors influencing the academic achievement of secondary school students: A Structural equation model. International Journal of Instruction, 16(1), 999–1020. https://doi.org/10.29333/iji.2023.16155a
[49]May, S., Littlewood, C., and Bishop, A. (2006). Reliability of procedures used in the physical examination of non-specific low back pain: a systematic review. Austr. J. Physiother. 52, 91–102. doi: 10.1016/s0004-9514(06)70044-7
[50]Liu, O. L. (2011). Do major field of study and cultural familiarity affect TOEFL® IBT reading performance? A confirmatory approach to differential item functioning. Applied Measurement in Education, 24(3), 235–255. https://doi.org/10.1080/08957347.2011.580645
[51]Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. Braun (Eds.), Test validity (Pps. 147–169). Hillsdale, NJ: Lawrence Erlbaum Associates.
[52]Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 136-162). Newbury Park, CA: Sage
[53]Rice, M. E., & Harris, G. T. (2005). Comparing effect sizes in follow-up studies: ROC Area, Cohen’s d, and r. Law and Human Behavior, 29(5), 615–620. https://doi.org/10.1007/s10979-005-6832-7
[54]Kocalevent, R.-D., Rose, M., Becker, J., Walter, O. B., Fliege, H., Bjorner, J. B., et al. (2009). An evaluation of patient-reported outcomes found computerized adaptive testing was efficient in assessing stress perception. J. Clin. Epidemiol. 62, 278–287.e3. doi: 10.1016/j.jclinepi.2008.03.003