LSCI ITEM PARAMETERS
Classical test theory (CTT) provides a framework for measuring the item parameters and reliability of the LSCI. A CTT analysis allows us to compute the item difficulties and discrimination values for each item on the LSCI. We define item difficulty to be the fraction of students responding
incorrectly to an item. It ranges from 0.0 to 1.0, with larger values indicating more challenging, difficult items. This is different from the oft-used
P-value, which is defined as the fraction of students selecting the
correct response (
Crocker and Algina 19863).
P-values are sometimes confusing to interpret as measures of items’ difficulties, since easier items have larger
P-values. We use our definition of item difficulty for clarity, in order to make harder items have larger difficulty values. The range of conventionally accepted values for item difficulty is between 0.2 and 0.8 (
Bardar et al. 20072). Item discrimination is defined by the value of the point biserial, which is the correlation between students’ scores on an individual item and students’ total scores on the LSCI as a whole (
Lord and Novick 196810). An item’s discrimination ranges from −1.0 to +1.0, with a value of zero meaning there is no correlation. A negative point biserial indicates that a student’s success on the instrument is anticorrelated with a correct response to the item (an indication of a problem with the item). The greater the value of the item’s discrimination, the better an item is at selecting high performing students from low performing students. Conventionally accepted values for item discrimination are typically between 0.3 and 0.7 (
Bardar et al. 20072,
Allen and Yen 19791).
Since all CTT statistics are highly sample dependent (
Hambleton and Jones 19939,
Thompson 200313), we expect the items’ difficulties and discriminations to be different when they are calculated using only pre-instruction responses versus when they are calculated using only post-instruction responses. Table
1 shows each item’s pre-instruction and post-instruction difficulty and discrimination. As expected, the values of item difficulty and discrimination change pre-instruction to post-instruction.
We bolded item parameters in Table
1 that fall outside of the conventionally accepted range of parameter values. Pre-instruction, all but 3 of the 26 items on the LSCI are flagged, and all of these flagged items have difficulties greater than 0.80, discriminations less than 0.20, or both. These results make sense if we take into account that the vast majority of students come to Astro 101 with little prior explicit instruction on the nature of light and spectroscopy in the context of astronomy and have many commonly held naïve ideas and reasoning difficulties that are elicited by the items on the LSCI (
Bardar et al. 20072,
Rudolph et al. 201012,
Deming and Hufnagel 20014). The high pre-test difficulty values of these items illustrate that the items are hard for students to reason through. Additionally, the low pre-test discrimination values indicate that these items challenge both high and low scoring students equally. The LSCI is challenging for most Astro 101 students prior to instruction; however, as we shall see in Section
4, many Astro 101 students are able to correctly answer a majority of the items on the LSCI pre-instruction.
The post-instruction item parameters show a significant change from those determined from pre-instruction responses. The majority of items decrease in difficulty and increase in discrimination. This makes sense. As students learn to reason about light and spectroscopy, the LSCI’s items become easier, and students who do well on the LSCI overall also are more likely to answer correctly on individual items. These patterns can be seen in Figure
1, which plots the post-instruction difficulty values of the LSCI’s items as a function of their pre-instruction difficulties, and in Figure
2, which plots the post-instruction discrimination values of the LSCI’s items as a function of their pre-instruction discriminations. Figure
3 combines the information in Figures
1,2 into one graph by plotting, for each item, the differences between the post-instruction and pre-instruction discrimination values (which, for almost all items, is positive) versus the differences between the post-instruction and pre-instruction difficulty values (which, for almost all items, is negative, indicating that items become easier pre-instruction to post-instruction). Figures
1–
3 together show that the majority of items decrease in difficulty and increase in their discriminatory abilities pre-instruction to post-instruction.
Nevertheless, a few items are flagged as having post-instruction parameters outside of their conventionally accepted ranges of values. These flagged items are Items 3, 14, 21, 25, and 26. Below we provide a brief discussion of each of these items and will provide a more detailed discussion of the particularly problematic of these items in Section
5.
Items 3 and 26 are each flagged because of their somewhat low post-instruction discrimination values. However, each item shows increased performance from pre-instruction to post-instruction, as they each decrease in difficulty and do not decrease in discrimination. These results are illustrated in Figures
1–
3. In addition, Item 26 falls within the conventionally accepted range for difficulty both pre-instruction and post-instruction and substantially increases in discrimination (falling very close to the acceptable range for discrimination post-instruction). Because the overall performance of Item 26 is at or near the acceptable range for both difficulty and discrimination post-instruction, we argue to keep Item 26 as is. Further, the most common incorrect choice selected by students for this item, both pre- and post-, elicits students’ inability to differentiate between the size of objects from the total energy output and peak wavelength provided in their light curves. However, since the discrimination value of Item 3 remains constant, and below the conventionally accepted range, pre-instruction to post-instruction, it will be an item we discuss in greater detail in Section
5.
Item 14 is flagged for having a low post-instruction difficulty (0.20, which means 80% of students responded correctly). However, even though almost all students respond correctly to this item, we find that the post-instruction discrimination value (0.33) is actually quite good. Additionally, this item is well matched to the pre-instruction knowledge and reasoning abilities of many Astro 101 students with 65% of students responding correctly. For these reasons, we argue to keep Item 14 as is.
Item 21 is the second most challenging item on the LSCI with a difficulty of 0.79 post-instruction. However, it is flagged because its discrimination value is low both pre-instruction and post-instruction (0.14 and 0.12, respectively). Item 21 also stands out in Figures
1–
3 since its difficulty increases and its discrimination decreases pre-instruction to post-instruction. The fact that fewer students responded correctly post-instruction compared to pre-instruction indicates that there might be an underlying problem with the item. We will discuss this further in Section
5.
Item 25 is the most difficult item on the entire instrument both pre-instruction and post-instruction. Only approximately 10% of students respond correctly to this item. However, its discrimination value improves from 0.11 to 0.28 pre-instruction to post-instruction. This increase means the top performing students are correctly reasoning about the concepts probed by Item 25 after instruction. We will further discuss this item in Section
5.