RESULTS
Table
2 summarizes our results. Table
2 shows, for each item, the average and standard deviation of students' responses, both pre- and post-instruction. While the averages and the standard deviations summarize the central tendencies and spreads in the responses, respectively, the reader may be interested in details of the pre- and post-instruction response distributions that are not necessarily immediately obvious from the numbers in Table
2. Consequently, in the Appendix, we include histograms showing the distribution of pre- and post-instruction responses for each item.
Table
2 also shows the difference in the means (Δ Avg. = post-instruction average – pre-instruction average). To determine whether or not the differences in the distribution of pre- and post-instruction responses are statistically significant, we applied the non-directional Mann-Whitney test to the pre- and post-instruction data for each item. Mann-Whitney is the appropriate test, as opposed to the t-test, because our data falls on an ordinal, not interval scale, and because we cannot assume students' responses come from a Gaussian distribution, as evidenced by the histograms contained in the Appendix. Our null hypothesis was that, for a given item, there is no difference in the distribution of pre- and post-instruction responses. Table
2 reports the
p-values from the Mann-Whitney test. We bolded and italicized all
p-values that indicate a statistically significant (
p < 0.001) difference in the pre- and post-instruction distributions of responses.
We also report in Table
2 the effect size for each item. Specifically, we use Cohen's
d as a measure of the effect size. Cohen's
d is defined as
where
μf is the average of the post-instruction responses,
μo is the average of the pre-instruction responses, and
σ is the “pooled sample standard deviation” (
Cohen 19888;
Hartung, Knapp, and Sinha 200816). This “pooled sample standard deviation” is given by
where
nf is the number of students in the post-instruction data (294),
σf is the standard deviation of the post-instruction responses for a given item,
no is the number of students in the pre-instruction data (442), and
σo is the standard deviation of the pre-instruction responses for that item (
Hartung, Knapp, and Sinha 200816). The effect size of each item in Table
2 expresses the difference in means (Δ Avg.) as a fraction of the “pooled sample standard deviation.”
Cohen (1988)8 defines effect sizes of
d = 0.20 to be small,
d = 0.50 to be medium, and
d = 0.80 to be large. If we elaborate on these definitions so that
d ≤ 0.20 is considered small, 0.20 <
d ≤ 0.50 is considered medium, and
d > 0.50 is considered large, then we see that none of the effect sizes in Table
2 fall into the “large” category. Most items have small effect sizes, although nineteen items (3, 5, 8, 10, 18, 20, 23, 29, 33, 34, 36, 38, 42, 43, 49, 50, 53, 59, and 60) have medium effect sizes. We bolded and italicized all medium effect sizes in Table
2.
Note that there are some items, whose effect sizes are medium, yet do not have a statistically significant
p-value. How can this be? Note that all of these items would be statistically significant if we adopted a less stringent criteria for statistical significance, such as
p < 0.05 or
p < 0.01. Furthermore, many of the TSSI's items have response distributions, both pre- and post-instruction, which exhibit various levels of departure from Gaussianity. (See the Appendix.) The effect size has a well-defined relationship to the amount of overlap between two distributions when both distributions are Gaussian. However, two non-Gaussian distributions may have a much different amount of overlap than two Gaussian distributions, even if their effect sizes are equal (
Coe 20027). This departure from Gaussianity is why some items have medium effect sizes, but not statistically significant (according to our stringent criteria) differences between their pre- and post-instruction response distributions.
While Table
2, in combination with the histograms in the Appendix, technically contains all the information about students' responses to the TSSI's items, the inferences one can draw from this are not immediately obvious. To provide a context to better understand the significance of the results contained in Table
2, we now group the items according to the categories in Table
1 to which they were assigned by
Cobern (2001)5. In the following analysis, we only look at post-instruction responses. We do this for two reasons. First, the majority of items do not have significant differences between their pre- and post-instruction averages; this is not surprising, given that all our instruction on worldviews was provided only through lecture, and that research shows that instructors often struggle to positively change students' attitudes and beliefs (e.g.,
Adams et al. 20051;
Perkins et al. 200528;
Redish, Saul, and Steinberg 199834). Second, this analysis is primarily concerned with identifying those items for which students' post-instruction responses fall well outside of the neutral range of scores.
The first category in Table
1, “Epistemology,” is composed of nine items, which are shown in Table
3. Figure
1 plots the average post-instruction response for each of these nine items.
Figure
1 also contains two other representations to help the reader visualize some of the information in Table
2 and the Appendix. First, we used white diamonds to represent items for which there was no statistically significant (
p < 0.001) difference in the pre- and post-instruction response distributions. For items that have a statistically significant difference, we used black circles. Additionally, Figure
1 contains a grey band that extends from 2.5 to 3.5 on the
y-axis. According to
Cobern (2001)5, a response
r is aligned with the “common image of science” if 3.5 <
r ≤ 5, is neutral with respect to this image if 2.5 <
r ≤ 3.5, and is anti-aligned if 1 ≤
r ≤ 2.5. The grey band in Figure
1 thus marks the region into which neutral responses fall. An item with an average response that falls below the grey band indicates an item for which students tend to disagree with the response one would give if one accepted every facet of the “common image of science.” The more items that fall above the grey band, the more students are aligned, on average, with the claims of the “common image of science.” Figures
2,3,4,5,6,7,8,9 also use these representations.
As Figure
1 shows, none of the average responses to the nine “Epistemology” items exhibit an anti-alignment with the “common image of science.” Five items (Items 2, 27, 29, 44, and 60) fall in the neutral region, while four items (Items 17, 33, 34, and 46) fall in the aligned region. The average post-instruction response across all nine “Epistemology” items is 3.29, which falls in the neutral region of Figure
1. While five items (Items 17, 29, 33, 34, and 60) have medium effect sizes, only Items 34 and 60 have statistically significant differences between the pre- and post-instruction response distributions.
The ten “Science and the Economy” items are shown in Table
4. Figure
2 plots the average post-instruction response for each of these ten items. Figure
2 implies that there is a uniform strong agreement among students about the importance of science for the economy, since all but one item fall in the aligned region (Item 22 falls in the neutral region). Consequently, the average post-instruction response across all ten of the “Science and the Economy” items is 3.80, which is in the aligned region. All three items that have medium effect sizes (Items 20, 42, and 49) also exhibit statistically significant differences in the pre- and post-instruction distributions of responses.
The four “Science and the Environment” items are shown in Table
5. Figure
3 plots the average post-instruction responses of these items. Three out of the four items have average post-instruction responses that place them in the aligned region of Figure
3 (Item 3 falls in the neutral region). The average post-instruction response across all of the “Science and the Environment” items is 3.90. This suggests that there is strong and uniform positive agreement among many students that science is “necessary for the discovery, development, and conservation and protection of natural resources and the environment in general” (
Cobern 20015;
Cobern and Loving 20026). All four items in this category have medium effect sizes, although only Items 38 and 43 have statistically significant differences between the pre- and post-instruction response distributions.
Table
6 contains the ten items from the “Public Policy and Science” category. Figure
4 plots the average post-instruction responses to these items. Four of the ten items (Items 6, 10, 18, and 57) fall in the aligned region of Figure
4, five items fall in the neutral region (Items 5, 26, 28, 45, and 50), and Item 19 falls in the anti-aligned region (with an average post-instruction response of 2.48). The average post-instruction response across all ten items is 3.44, which falls in the neutral region. Items 5, 10, 18, and 50 have medium effect sizes and all of these items except Item 50 also exhibit statistically significant differences between their pre- and post-instruction response distributions.
The “Science and Public Health” category has four items, which are shown in Table
7. Figure
5 plots the average post-instruction responses of these items. Items 8, 48, and 58 have average post-instruction responses that fall in the aligned region of Figure
5, while Item 9 falls in the neutral region. The average post-instruction response across all four items is 4.00, which falls into the aligned region. While Item 8 has a medium effect size, none of the four items exhibit statistically significant differences between the pre- and post-instruction distributions of responses.
Table
8 shows the seven items that comprise the “Science, Religion, and Morality” category. The average post-instruction responses to these items are plotted in Figure
6. The average post-instruction response to Item 7 places it in the anti-aligned region, while Item 32 falls in the aligned region. The remaining items all lie within the neutral region and, consequently, the average post-instruction response across all seven items is 3.03, also within the neutral region. All of these items have small effect sizes and none exhibit any statistically significant difference in the pre- and post-instruction response distributions.
Table
9 contains the four items in the “Science, Emotions, and Aesthetics” category. Figure
7 plots the average post-instruction responses for these items. Items 12 and 36 have average post-instruction responses that place them in the aligned region, while Item 1 falls in the neutral region and Item 21 falls in the anti-aligned region. The average post-instruction response across all four items is 3.35, which is in the neutral region. Item 36 has a medium effect size and has a statistically significant difference in the distributions of pre- and post-instruction responses.
The four items in the “Science, Race, and Gender” category are shown in Table
10 and their average post-instruction responses are plotted in Figure
8. All four items fall in the aligned region, and the average post-instruction response across these items is 3.94, implying that most students agree that “[r]ace, gender, and other personal factors are irrelevant in science” (
Cobern 20015;
Cobern and Loving 20026). Items 23 and 53 both have medium effect sizes and statistically significant differences between their pre- and post-instruction reponse distributions.
The final TSSI category, “Science for All,” has eight items. These items are shown in Table
11. Their average post-instruction responses are plotted in Figure
9. The average post-instruction response to Item 37 places it in the neutral region. The other seven items all fall in the aligned region, as does the average post-instruction response across all eight items (3.90). This suggests that many students agree with the idea that everyone should know and learn at least some science. All the items in this category have small effect sizes, and none show statistically significant differences between their pre- and post-instruction distributions of responses.
As a final way to analyze the data, we grouped all items in a given category together and calculated the average and standard deviation (pre- and post-instruction) of the responses to all of these items. We also calculated the differences in these pre- and post-instruction averages and the effect sizes of these differences. Additionally, we used the non-directional Mann-Whitney test to determine whether or not the distribution of students' total scores within a category were different pre- to post-instruction at a statistically significant level (
p < 0.001). We also performed these calculations for the TSSI as a whole. Table
12 contains our results.
Table
12 shows that the average post-instruction response to items in the categories “Science and the Economy,” “Science and the Environment,” “Science and Public Health,” “Science, Race, and Gender,” and “Science for All” are all aligned with the “common image of science.” The remaining categories have post-instruction averages that place them in the neutral region. “Epistemology” and “Science and the Environment” are the only categories that show statistically significant differences in the distributions of students' pre- and post-instruction total scores. “Science and the Environment” is the only category with a medium effect size; the rest are small. If we look at the average scores for the TSSI as a whole, we see that, pre-instruction, the average falls in the neutral region, while post-instruction, the average falls in the aligned with “the common image of science” region. The shift in the distribution of students' total scores on the TSSI pre- to post-instruction is statistically significant; however, the effect size is small.