Validity and reliability of cognitive tests study and development of elementary curriculum using Rasch model

The instrument for measuring knowledge in the subject of Elementary School Curriculum Study and Development has been created to measure students' understanding of teacher candidates in applying concepts, implications and curriculum development at the elementary school level. In order to make reliable and feasible instrument for measuring students knowledge in the subject of Elementary School Curriculum Study and Development, this study aims to produce empirical evidence about the validity and reliability of test instruments using the Rasch model analysis. The study was conducted by testing 20 items on 142 elementary school teacher candidate at one of the State Universities in the City of Mataram, West Nusa Tenggara. The validity and reliability of the instrument were measured by the Rasch analysis model using the Winstep program. The onedimensional testing of 20 items has a variance measured at 42.7% which exceeds the minimum points of 40.0% desired by the Rasch model. The reliability index of the respondents was 0.65 and the item reliability index was 0.98. All items show a positive value for Point Measure Correlation (PMC) in terms of item polarity which means there is no conflict between the item and the construct being measured. Outfit Mean Square value also shows that all items that almost all items have an MNSQ Outfit value smaller than 1.5 which means the measurement value can be said to be productive except for item 13 (3.77) and item 16 (3.77). Both of these items need to be re-examined because they have problems in measuring their validity. The results of this study have proven that the knowledge measurement test instrument in the Elementary School Curriculum Study and Development course has validity and reliability values that meet and are empirically feasible to be used in measuring Basic School Curriculum Study and Development knowledge for prospective teacher students.


INTRODUCTION
Teachers or educators as the spearhead in advancing the education system are required not only to be able to teach and educate but also must be able to evaluate the learning process because the success of the learning process can be seen from the results of evaluations conducted after the learning process activities are carried out (Widyaningsih & Yusuf, 2018). Learning evaluation is an organized and sequential process to determine the achievement of student learning objectives (Magdalena et al., 2020). In simple terms, evaluation is a means of measuring progress and the obstacles faced by students in meeting learning outcomes (Fitriani et al., 2017). Evaluation is very important for the teacher to plan before carrying out learning. This is because evaluation is one of three important components in improving learning (Hasibuan, 2016). With the presence of evaluation, teachers will be able to assess how far the learning programs in an educational unit have been implemented (Sulistyawati & Guntur, 2019). Furthermore, teachers must have superior competence, both in planning, implementing, and evaluating learning. The evaluation itself might also serve as a matter of consideration for defining the appropriate solutions for improvement (Hapsari et al., 2018). The ability to evaluate learning will determine how the next lesson plan will be made. Therefore, educators must be able to master a variety of evaluation techniques based on data collected during the learning process which will later be used to measure the extent to which the planned learning objectives have been previously achieved.
Teachers or educators in addition to being able to evaluate must also be able to develop measurement tools or instruments that will be used in evaluating the learning process following the expected learning outcomes. This is because the evaluation process is preceded by the measuring process. The design of a good measuring instrument is directly proportional to the quality of the evaluation to be carried out (Prijowuntato, 2020). The measurement process which is then continued with the assessment process, and ends with evaluation is a series of learning assessments that must be carried out by educators. The development of measuring instruments is an important step in carrying out learning evaluation. The measuring instrument or evaluation instrument developed in addition to being by the characteristics of the various components involved in the learning process must also be able to function as an indicator of the achievement of learning objectives as set out in the lesson plan (RPP).
Measurement of the achievement of learning objectives in evaluation activities is generally carried out with test and non-test techniques. The test is a tool that aims to collect information about the achievement of educational goals or learning objectives (Wahyudi, 2010), besides the test is also a certain way that can be used or procedures that need to be taken in the context of measurement and assessment in the field of education (Kadir, 2015). In simple terms, the test technique presents questions that have wrong or right answers. This allows the teacher to find out the students' understanding of something. This test technique has various models, including written and oral tests. The types of questions on the written test also vary, namely multiple choice, short answers, and descriptions. Another test technique is the oral test, which requires students to answer questions orally. In addition to the test technique there are also non-test techniques which are techniques for assessing student learning outcomes that are carried out without "testing" students, but by systematically observing what is commonly known as observation, interviews, questionnaire distribution, scale document analysis (both attitude scale and rating scale), case studies, and sociometry (Mania, 2008). Apart from observing and distributing questionnaires, another concrete form of non-test technique is inventory. Inventory is an instrument that contains reports on student progress during the learning process (Prijowuntato, 2020). Not only inventory, other types of non-test instruments include project assignments. This type is commonly used by teachers when they want to measure student skills. Project assignments may also be intended to assess processes and outcomes only. In measuring activities, educators are given the freedom to choose measurement techniques and are free to develop assessment instruments in the process of evaluating learning outcomes.
Instruments or measuring instruments that are good in an assessment process must have main characteristics namely, valid, reliable, and have a high level of usefulness (Gronlund et al., 2009). Besides, there are economic and practical aspects that a good instrument must have (Azwar, 2015). Setyosari (2013) argues that the two most important things an instrument must have are valid and reliable. In this case also applies to assessment instruments. An instrument is used to reveal a phenomenon or fact which will later be summarized into data. This is why an instrument must have validity and reality (Arifin, 2017). Good learning outcomes assessment instruments must meet several criteria including assessment instruments have good item validity, items must be steady in the sense of having good item reliability values. The validity and reliability of these items are important because they involve the level of confidence in the outcomes measured in the process of evaluating learning outcomes. The higher the value of the validity and reliability of an instrument, the more accurate the data obtained from a study or measurement (Hayati & Lailatussaadah, 2016). Besides, validity and reliability are also important factors in determining whether a measurement or test that has been carried out has good criteria or not (Wahyuningsih, 2015).
The item or item is said to be valid if the question measures something that must be measured, meaning that if the desired learning outcome is a change in the aspects of knowledge, skills, and attitudes, then the items developed must also include all three of these things (aspects of knowledge, skills, and attitudes) (Sumintono & Widhiarso, 2015). Another example of the validity of a measurement instrument is that when we want to measure length, the measuring instrument we use is a ruler or meter. A ruler or meter can't be used to measure the mass of an object because of course the ruler or meter is invalid and cannot be trusted to measure the mass of an object.
The validity of the instrument can be fulfilled from three aspects or parts. The three aspects are content validity (content), constructs, and criteria (Yusup, 2018). The validity of the content or content provides evidence on each item on the measuring instrument whether it reflects all the achievements to be measured. The validity of the content was assessed by an expert. There are several things that are usually considered in validating content. These include the representation of whether it is in accordance with the expected achievement indicators; the number of questions is appropriate; the format of the answer is clear; scoring is correct and clear; instructions for filling the instrument are visible; time for processing questions; as well as the layout and grammar used.
In contrast to content validity, construct validity focuses on the extent to which an instrument shows measurement results in accordance with the definition of the variable. Construct validity is very important to do. A simple example is, when we measure the length and width of a book, it is enough to use a ruler in cm. When measuring body weight, you can use the scale in kg. These concrete items are easy to measure and the units to be used are clear. It's different when we try to measure creativity, thought processes, mathematical communication skills, or students' critical thinking abilities. These abstract things must be reconstructed into something more concrete and can be measured quantitatively. That is the importance of construct validity in an instrument or measuring instrument.
The validity of the criteria focuses on comparing the instrument in question with other instruments that are considered comparable (Yusup, 2018). The validity of the criteria, as the name implies, focuses on whether the instrument is in accordance with the desired criteria (Arifin, 2017). The validity of this criterion needs to be done with the consideration that each instrument has criteria that are not always ideal. The instruments developed may have high content and construct validity values but are impractical and expensive. This requires the validity of the criteria to compare it with instruments with ideal criteria so that further instrument development can meet the expected criteria. The validity of these criteria is divided into transient and predictive validity.
The reliability of an essential test relates to the test of the constancy of test questions in which it is a set of items if repeatedly given to the same object (Nuswowati et al., 2010). The constancy or consistency of an instrument is if the items in the same instrument are tested several times to the same or almost the same subject or respondent (Rosseni et al., 2009). For example, an exam given today to a student by a teacher should give a value that is not much different if given the next day (because there are no learning activities or forget at the same time in one day) (Sumintono & Widhiarso, 2015). There are two aspects measured in terms of reliability, namely internal consistency and stability. Stable means when measuring two of the same objects, the measuring instrument has a tendency to show the same results. Internal stability is required, as is external stability.
Reliability or validity is a measure of the credibility of a measuring instrument. Reliable measuring instruments are not necessarily valid. Measuring instruments that have been declared valid must be reliable, so they can be used in the evaluation of learning. This is why the first test that must be done in determining the credibility of a measuring instrument is the validity test. The items that were declared valid were then tested for reliability. The aspect of validity that relates to reliability is the validity of the content (content), while the aspect of reliability that relates to validity is the internal reliability between items, item-total and split half.
Analysis of test instruments in the evaluation process in the field of education can be done through two approaches namely the classical test theory (CTT) approach and the modern approach with Rasch modelling. One of the weaknesses of classical test theory is that sometimes there are inconsistencies in the characteristics of items that depend on the ability of the respondent or test takers at the time of working on the questions. This inconsistency can be overcome in measurements by Rasch modelling.
Rasch modeling is a measurement model that measures continuously estimates the validity and reliability of each respondent candidate who answers items/questions and the difficulty of items/ questions for each question/item (Searing, 2008). Analysis by Rasch modelling produces fit statistics analysis which provides information to the researcher whether the data obtained is ideally illustrated that people who have the high ability provide patterns of answers to items according to the level of difficulty (Misbach & Sumintono, 2014). In Rasch modelling, the validity and reliability of a test instrument can be determined by looking at analyses such as item polarity, unidimensional, itemindividual/respondent mapping, item-individual reliability, and several other forms of analysis (Bond & Fox, 2007). Therefore, this research was conducted to obtain empirical evidence related to the vali-dity and reliability of items developed to measure the knowledge of prospective teacher students and as an effort to improve the quality of evaluation tools, develop evaluation tools that can measure the ability of prospective elementary school teachers in Elementary studies and curriculum development courses.

METHOD
This research is a quantitative study in the form of a survey conducted on 142 prospective elementary school teacher students as research samples. Students who were respondents were students who received the elementary school curriculum study and development course. The instrument of the questions in this study consisted of 20 items/questions in the form of multiple-choice and aimed at measuring student learning outcomes of prospective teachers from the aspect of knowledge. The objective questions consist of four answer choices (A, B, C, and D) and there is only one correct answer choice from the four options and the whole question is created and given in the form of a quiz on Google Form. The research data was obtained from the answers answered by respondents on the research instrument questions. Item evaluation is based on the correct or wrong answers from each respondent where the correct answer is given a value (1) and if it is wrong then the value is zero (0). The research data were then analyzed using the Winstep program to obtain the results of the analysis of the validity and reliability of items by the Rasch Model.

Construct validity
According to Nurfaizin (2019) construct validity is validity that concerns about how far the test items are able to measure what they really want to measure according to a specific concept or a predetermined conceptual definition. Construct validity is commonly used for instruments intended to measure conceptual variables, both typical of performance, such as instruments for measuring attitudes, interests, self-concept, locus control, leadership style, achievement motivation, etc., as well as those with performance characteristics. maximum such as instruments to measure aptitude (aptitude test), intelligence (intellectual intelligence), emotional intelligence and others (Kintner & Sikorskii, 2008).
In order to determine the construct validity of an instrument, a theoretical review process must be carried out of a concept of the variable to be measured, starting from the formulation of the construct, determining dimensions and indicators, to the elaboration and writing of the items of the instrument. The formulation of the construct must be done based on the synthesis of the theories regarding the concept of the variable to be measured through a logical and careful analysis and comparison process. Listening to the theoretical review process as has been stated, the construct validation process of an instrument must be carried out through expert review or justification or through the assessment of a group of panels consisting of people who master the substance or content of the variables to be measured (Ariffin et al., 2010).
The first analysis carried out on the items was an analysis of construct validity which was done by looking at polarity items. As shown by Figure 1 it is known that all item items have a positive Point Measure Correlation (PT-MEASURE CORR) or PMC value. This shows that there is no conflict between item problems with the measured question construction.
Furthermore, if seen from the OUTFIT value in the Mean Square column, it is known that almost all items about the value are smaller than 1.5, there are only three items with Outfit-MNSQ values above 1.5, namely item 13 (3.77), item 16 (2.01), and item 17 (1.63). Item 13 has ZSTD 3.7, item 16 ZSTD 3.2, and item 17 has ZSTD of 2.6. therefore, the researcher decided to review the problems for the three items before all three were aborted from the research instrument.
Subsequent analysis to see the unidimensional construct of knowledge and development of elementary school curriculum, researchers look at the Principal Component Analysis of Rasch Residuals as shown in Figure 2. Unidimential test is one of the tests that need to be seen for the validity of the instrument (Andrich, 1988). From Figure 2 it is known that the total items involved are 20 items and have a measured variance at 42.7%, and all items exceed the minimum points of 40% desired by the Rasch Model.

Constructive Reliability
Reliability is the consistency of measurement (Anjos et al., 2016). Fahruna and Fahmi (2017) state that reliability refers to an understanding that the instruments used in research to obtain the information used can be trusted as data collection tools and are able to reveal real information in the field. Azwar (2015) states that reliability is a tool for measuring a questionnaire which is an indicator of variables or constructs. A questionnaire is said to be reliable or reliable if a person's answer to a statement is consistent or stable over time. The reliability of a test refers to the degree of stability, consistency, predictive power, and accuracy. Measurements that have high reliability are measurements that can produce reliable data.
Reliability is an index that shows the extent to which a measuring instrument can be trusted or reliable (Khaeruman & Saefullah, 2017). If a measuring device is used twice -to measure the same symptoms and the measurement results obtained are relatively consistent, then the measuring device is reliable. In other words, reality shows the consistency of a measuring device in the same symptom meter. According to Nuswowati et al. (2010) reliability shows the extent to which measurement results with these tools can be trusted. The measurement results must be reliable in the sense that they must have a level of consistency and stability.
Summary statistics as shown in Figure 3 shows the results of the analysis of items/questions and individual respondents. Item reliability can be seen in the item reliability of 0.98 where the reliability number of this magnitude is included in the category of very good or special (Sumintono & Widhiarso, 2015). Besides, a separation index value of 6.25 was also obtained, in which the separation index could differentiate test items into 8.67 rounded up to 9 (nine) difficulty items. The greater the value of item separation, the quality of the instrument in terms of the overall respondents and the items the better, because it can identify the group of respondents and item groups (Sumintono & Widhiarso, 2015;Erfan et al., 2020). The results of the item analysis shown in Figure 3 on the Cronbach Alpha value (KR-20) show a value of 0.72 more than a minimum value of 0.70 (Pallant, 2010). This figure shows that if the items were analyzed using classical test theory, the results of a good or steady reliability analysis were obtained (Erfan et al., 2020).
The high value of item reliability is not accompanied by the high value of Person Reliability. Based on Figure 3 also obtained that the value of Person Reliability is 0.65 which is included in the sufficient category (Sumintono & Widhiarso, 2015). Also, the separation index value of the Measured Person is only 1.36, which if included in the strata equation, a value (H) of 2.14 is rounded to 2. This number 2 indicates that in general all respondents could only be divided into two groups. Conditions, where items are not able to separate individuals or respondents into more than two strata, may be caused by the quality of items/items that are low for good individual separation (Jailani, 2011). However, if seen from the reliability value of the items included in the special category shows that this instrument is sufficient and can be used in measuring the domain of study knowledge and elementary curriculum development for elementary school teacher candidates.

CONCLUSION
The conclusions that can be drawn from this study are based on the results of the construct validity test of 20 items obtained 17 items/test items that there is no conflict between item items and item construction measured by Outfit-MNSQ values that are at less than or equal to 1 .5 In addition to