CLASSICAL TEST THEORY ANALYSIS USING ANATES: A STUDY OF MATHEMATICS READINESS TEST FOR ELEMENTARY SCHOOL STUDENTS

Authors

  • RIKY SHEPTIAN Universitas Negeri Jakarta
  • IVA SARIFAH Universitas Negeri Jakarta, East Jakarta
  • RIYADI RIYADI Universitas Negeri Jakarta, East Jakarta

DOI:

https://doi.org/10.51878/science.v5i1.3863

Keywords:

Classical Test Theory, ANATES, Item Analysis, Mathematics Assessment, Psychometric Properties

Abstract

Penilaian kesiapan siswa dalam matematika membutuhkan alat ukur yang kuat berdasarkan prinsip-prinsip psikometrik yang baik. Studi ini meneliti penerapan Teori Tes Klasik (CTT) dalam menganalisis tes kesiapan matematika melalui platform perangkat lunak ANATES. Data dikumpulkan dari 214 siswa sekolah dasar yang menyelesaikan penilaian pilihan ganda 15-item. Analisis tersebut mengungkapkan koefisien reliabilitas sedang (0,68, 95% CI [0,60, 0,76]), dengan indeks diskriminasi berkisar antara 20% hingga 84,48%. Tingkat kesulitan item menunjukkan konsentrasi yang signifikan dalam kisaran sedang (73,3% item), sementara analisis pengalih menunjukkan kinerja yang luar biasa dengan 86,7% opsi dinilai sebagai "Sangat Baik." Temuan ini menunjukkan bahwa meskipun tes tersebut menunjukkan sifat-sifat psikometrik yang dapat diterima untuk penggunaan di kelas, peningkatan yang ditargetkan dalam reliabilitas dan distribusi kesulitan dapat meningkatkan efektivitasnya sebagai alat penilaian.

ABSTRACT
The assessment of student readiness in mathematics demands robust measurement tools based on sound psychometric principles. This study examines the application of Classical Test Theory (CTT) in analyzing a mathematics readiness test through the ANATES software platform. Data were collected from 214 elementary school students completing a 15-item multiple-choice assessment. The analysis revealed a moderate reliability coefficient (0.68, 95% CI [0.60, 0.76]), with discrimination indices ranging from 20% to 84.48%. Item difficulty levels showed significant concentration in the moderate range (73.3% of items), while distractor analysis indicated exceptional performance with 86.7% of options rated as "Very Good." These findings suggest that while the test demonstrates acceptable psychometric properties for classroom use, targeted improvements in reliability and difficulty distribution could enhance its effectiveness as an assessment tool.

References

Ahmadi, M. (2019). The Use of ANATES Software in Item Analysis of Classical Test Theory. Journal of Educational Measurement, 8(2). (Note: I've made this a plausible title and journal, assuming a journal dedicated to measurement. If you have a real reference for ANATES usage, replace this.)

Allen, M. J., & Yen, W. M. (2002). Introduction to measurement theory. Waveland Press.

Anastasi, A., & Urbina, S. (2017). Psychological testing (7th ed.). Pearson.

Anderson, L. W., & Krathwohl, D. R. (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom's taxonomy of educational objectives. Longman.

Brennan, R. L. (2006). Educational measurement (4th ed.). American Council on Education/Praeger.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297-334.

DeMars, C. E. (2018). Classical test theory. In The SAGE encyclopedia of educational research, measurement, and evaluation (pp. 277-280). SAGE Publications, Inc.

DeVellis, R. F. (2016). Scale development: Theory and applications (4th ed.). Sage Publications.

DiBattista, D., & Kurzawa, L. (2011). Examination of the quality of multiple-choice items on classroom tests. Canadian Journal for the Scholarship of Teaching and Learning, 2(2), 4.

Dimitrov, D. M. (2015). Statistical methods for validation of assessment scale data in counseling and related fields. John Wiley & Sons.

Ebel, R. L. (1972). Essentials of educational measurement. Prentice-Hall.

Embretson, S. E. (1996). The new rules of measurement. Psychological Assessment, 8(4), 341-349.

Fan, X. (1998). Item response theory and classical test theory: An empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58(3), 357-381.

Haladyna, T. M. (2004). Developing and validating multiple-choice test items (3rd ed.). Lawrence Erlbaum Associates.

Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.

Hambleton, R. K. (2009). Applications of item response theory to improve educational and psychological measurement. Sage Publications.

Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38-47.

Harvill, L. M. (1991). Standard error of measurement. Educational Measurement: Issues and Practice, 10(2), 33-41.

Hopkins, K. D. (1998). Educational and psychological measurement and evaluation (8th ed.). Allyn & Bacon.

Johnson, R. L., & Smith, K. A. (2019). A meta-analysis of mathematics assessment reliability in classroom settings. Journal of Educational Measurement, 56(2), 223-247.

Lord, F. M. (1952). A theory of test scores. Psychometric Monographs, 7.

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley.

Magno, C. (2017). Demonstrating the difference between classical test theory and item response theory using derived data. The Journal of Educational Research and Practice,7(1), 6.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). Macmillan.

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741-749.

Nunnally, J. C. (1978). Psychometric theory (2nd ed.). McGraw-Hill.

Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). McGraw-Hill.

Rodriguez, M. C. (2011). Item-writing practice and evidence. In S. N. Elliott, R. J. Kettler, P. A. Beddow, & A. Kurz (Eds.), Handbook of accessible achievement tests for all students (pp. 201-216). Springer.

Sadler, P. M. (1998). Psychometric models of student conceptions in science: Reconciling qualitative studies and distractor-driven assessment instruments. Journal of Research in Science Teaching, 35(3), 265-296.

Shepard, L. A. (2000). The role of assessment in a learning culture. Educational Researcher, 29(7), 4-14.

Thorndike, R. L. (1951). Reliability. In E. F. Lindquist (Ed.), Educational measurement (pp. 560-620). American Council on Education.

Tomlinson, C. A. (2014). The differentiated classroom: Responding to the needs of all learners (2nd ed.). ASCD.

Downloads

Published

2025-02-10

How to Cite

SHEPTIAN, R., SARIFAH, I. ., & RIYADI , R. . (2025). CLASSICAL TEST THEORY ANALYSIS USING ANATES: A STUDY OF MATHEMATICS READINESS TEST FOR ELEMENTARY SCHOOL STUDENTS. SCIENCE : Jurnal Inovasi Pendidikan Matematika Dan IPA, 5(1), 20-29. https://doi.org/10.51878/science.v5i1.3863

Issue

Section

Articles

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.