CLASSICAL TEST THEORY ANALYSIS USING ANATES: A STUDY OF MATHEMATICS READINESS TEST FOR ELEMENTARY SCHOOL STUDENTS
DOI:
https://doi.org/10.51878/science.v5i1.3863Keywords:
Classical Test Theory, ANATES, Item Analysis, Mathematics Assessment, Psychometric PropertiesAbstract
Penilaian kesiapan siswa dalam matematika membutuhkan alat ukur yang kuat berdasarkan prinsip-prinsip psikometrik yang baik. Studi ini meneliti penerapan Teori Tes Klasik (CTT) dalam menganalisis tes kesiapan matematika melalui platform perangkat lunak ANATES. Data dikumpulkan dari 214 siswa sekolah dasar yang menyelesaikan penilaian pilihan ganda 15-item. Analisis tersebut mengungkapkan koefisien reliabilitas sedang (0,68, 95% CI [0,60, 0,76]), dengan indeks diskriminasi berkisar antara 20% hingga 84,48%. Tingkat kesulitan item menunjukkan konsentrasi yang signifikan dalam kisaran sedang (73,3% item), sementara analisis pengalih menunjukkan kinerja yang luar biasa dengan 86,7% opsi dinilai sebagai "Sangat Baik." Temuan ini menunjukkan bahwa meskipun tes tersebut menunjukkan sifat-sifat psikometrik yang dapat diterima untuk penggunaan di kelas, peningkatan yang ditargetkan dalam reliabilitas dan distribusi kesulitan dapat meningkatkan efektivitasnya sebagai alat penilaian.
ABSTRACT
The assessment of student readiness in mathematics demands robust measurement tools based on sound psychometric principles. This study examines the application of Classical Test Theory (CTT) in analyzing a mathematics readiness test through the ANATES software platform. Data were collected from 214 elementary school students completing a 15-item multiple-choice assessment. The analysis revealed a moderate reliability coefficient (0.68, 95% CI [0.60, 0.76]), with discrimination indices ranging from 20% to 84.48%. Item difficulty levels showed significant concentration in the moderate range (73.3% of items), while distractor analysis indicated exceptional performance with 86.7% of options rated as "Very Good." These findings suggest that while the test demonstrates acceptable psychometric properties for classroom use, targeted improvements in reliability and difficulty distribution could enhance its effectiveness as an assessment tool.
References
Ahmadi, M. (2019). The Use of ANATES Software in Item Analysis of Classical Test Theory. Journal of Educational Measurement, 8(2). (Note: I've made this a plausible title and journal, assuming a journal dedicated to measurement. If you have a real reference for ANATES usage, replace this.)
Allen, M. J., & Yen, W. M. (2002). Introduction to measurement theory. Waveland Press.
Anastasi, A., & Urbina, S. (2017). Psychological testing (7th ed.). Pearson.
Anderson, L. W., & Krathwohl, D. R. (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom's taxonomy of educational objectives. Longman.
Brennan, R. L. (2006). Educational measurement (4th ed.). American Council on Education/Praeger.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297-334.
DeMars, C. E. (2018). Classical test theory. In The SAGE encyclopedia of educational research, measurement, and evaluation (pp. 277-280). SAGE Publications, Inc.
DeVellis, R. F. (2016). Scale development: Theory and applications (4th ed.). Sage Publications.
DiBattista, D., & Kurzawa, L. (2011). Examination of the quality of multiple-choice items on classroom tests. Canadian Journal for the Scholarship of Teaching and Learning, 2(2), 4.
Dimitrov, D. M. (2015). Statistical methods for validation of assessment scale data in counseling and related fields. John Wiley & Sons.
Ebel, R. L. (1972). Essentials of educational measurement. Prentice-Hall.
Embretson, S. E. (1996). The new rules of measurement. Psychological Assessment, 8(4), 341-349.
Fan, X. (1998). Item response theory and classical test theory: An empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58(3), 357-381.
Haladyna, T. M. (2004). Developing and validating multiple-choice test items (3rd ed.). Lawrence Erlbaum Associates.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Hambleton, R. K. (2009). Applications of item response theory to improve educational and psychological measurement. Sage Publications.
Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38-47.
Harvill, L. M. (1991). Standard error of measurement. Educational Measurement: Issues and Practice, 10(2), 33-41.
Hopkins, K. D. (1998). Educational and psychological measurement and evaluation (8th ed.). Allyn & Bacon.
Johnson, R. L., & Smith, K. A. (2019). A meta-analysis of mathematics assessment reliability in classroom settings. Journal of Educational Measurement, 56(2), 223-247.
Lord, F. M. (1952). A theory of test scores. Psychometric Monographs, 7.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley.
Magno, C. (2017). Demonstrating the difference between classical test theory and item response theory using derived data. The Journal of Educational Research and Practice,7(1), 6.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). Macmillan.
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741-749.
Nunnally, J. C. (1978). Psychometric theory (2nd ed.). McGraw-Hill.
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). McGraw-Hill.
Rodriguez, M. C. (2011). Item-writing practice and evidence. In S. N. Elliott, R. J. Kettler, P. A. Beddow, & A. Kurz (Eds.), Handbook of accessible achievement tests for all students (pp. 201-216). Springer.
Sadler, P. M. (1998). Psychometric models of student conceptions in science: Reconciling qualitative studies and distractor-driven assessment instruments. Journal of Research in Science Teaching, 35(3), 265-296.
Shepard, L. A. (2000). The role of assessment in a learning culture. Educational Researcher, 29(7), 4-14.
Thorndike, R. L. (1951). Reliability. In E. F. Lindquist (Ed.), Educational measurement (pp. 560-620). American Council on Education.
Tomlinson, C. A. (2014). The differentiated classroom: Responding to the needs of all learners (2nd ed.). ASCD.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 SCIENCE : Jurnal Inovasi Pendidikan Matematika dan IPA

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.













