When data are collected from the same participants (i.e., there is only one group) on two occasions (e.g., before and after treatment, or alternating between two programs, such as formative and summative feedback), the independence of the data no longer holds and therefore we cannot use independent samples t-test. In such situations, we can use dependent samples t-test, as there is dependency between the data caused by the same participants in two data generating occasions. For example, a researcher may be interested in understanding the effect of feedback on students’ achievement scores. The researcher administers a test at the beginning of the course and gives the same test at the completion of the course when the same students have received feedback. The researcher then compares the students’ mean scores before and after the feedback program. As another example, a nursing researcher is interested in understanding the effect of classical music on patients’ mood. The nurse measures the patients’ mood before playing some classical music and measures their mood after some time listening to classical music. Because there is only one group in this study (same patients) and there are two measurement occasions (before playing classical music and after), the researcher can use a dependent samples t-test.
Dependent samples t-test is also known as paired samples t-test. Paired samples t-test may refer to situations where the data are not generated by the same individual or unit, but with two units where they are somehow related. For example, a researcher may be interested in comparing sense of belonging to the community between first generation immigrants and second-generation immigrants (i.e., parents born outside the host country and their children born in the host country). In this situation, the two generations produce separate data, but there is parent-child relationship between the two groups. Therefore, to study if the sense of belonging to community is the same or different between first generation and second generation immigrants, a dependent or paired samples t-test would be appropriate.
Similar to the independent samples t-test, the normality of the distribution of the data is an assumption in dependent samples t-test. When we subtract occasion one data values from occasion two values (i.e., the difference values), the difference values should be normally distributed.
Does providing feedback to students have an effect on their writing improvement?
A middle-school teacher is interested in knowing if providing feedback to students has a noticeable effect on their writing scores. The teacher randomly selects 30 students in the program and gives them an essay assignment in the first week of the program. During the program, the teacher gives weekly writing homework and provides feedback on the students' performance. The program lasts for an entire school term. At the end of the program, the teacher administers an essay examination to the students to see if there is improvement due to feedback. In this study, there are two measurement occasions (before and after a teaching course) and one group. Therefore, the teacher uses a dependent samples t-test to compare the means of the same group over two measurement occasions. Table 1 includes five students and their scores on the writing test on two occasions.
Student | Score before the Program | Score after the Program |
---|---|---|
iSaRx9 | 77.00 | 94.00 |
XVSdWW | 77.00 | 80.00 |
tuV527 | 76.00 | 100.00 |
YpnYkK | 88.00 | 69.00 |
y7QuBV | 67.00 | 71.00 |
... | ... | ... |
The teacher enters the data in a spreadsheet program in the school computer lab and saves the data as CSV format. The complete data set for this example can be downloaded from here.
In the first step, data are read into the R Studio program. The structure of the data will be similar to the independent samples t-test, which is called long format data structure. We will create three columns (variables) in the spreadsheet, including Student ID, the occasion the score of the student was recorded (Time: Pre-program = 1 or Post-program = 2), and the Scores the students received.
Once the variables are created, we can read the data file (saved as CSV) into the R Studio environment. First, we produce some descriptive statistics, such as the mean score for the group writing performance. Table 2 below shows the descriptive statistics for the students, scores before and after the program and Figure 2 shows a bar plot of the mean writing scores before and after the feedback program.
Statistic | Before feedback | After feedback |
---|---|---|
Mean | 79.6 | 87 |
SD | 9.48 | 9.19 |
As Table 2 and Figure 2 above show, the mean writing scores has increased from 79.6 (Before feedback) to 87 after receivign the feedback. But is the increase statistically significant? We can perform a paired t-test to answer this question.
To perform a dependent samples t-test on the data, we use the t.test function in R using the formula notation, y ~ x and the option paired=TRUE. The following code in Listing 1 shows the formula approach to perform dependent samples t-test in R assuming the variance is equal between the two times.
> dfScores <- read.csv("dsFeedbackTwoTimes.csv")
> t.test(Score ~ Time, data=dfScores, paired=TRUE, var.equal=TRUE)
Paired t-test
data: Score by Time
t = -3.7074, df = 29, p-value = 0.0008799
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-11.53399 -3.33268
sample estimates: mean difference -7.433333
The mean score for the After the program data is 87.00 while the mean score for the Before the program is 79.56, which with a difference of 7.43, a seemingly improvement is observed (the mean difference in the R output is -7.433, which may appear as a decrease in mean score; however, we always need to look at the descriptive statistics to decide if the difference in mean is positive or negative). But is the improvement statistically significant? As the output in Listing 1 displays, the difference between the two mean values is -7.433 and the t value is -3.707. The p value associated with a t=3.70 is 0.000, which is below the criterion 0.05 (and 0.025 for two-tailed hypothesis). In addition, the 95% confidence interval values do not straddle the null hypothesis of zero mean difference. Therefore. we conclude that the change in scores (7.43 on average) is statistically significant. So, the researcher can report that feedback has an impact on students’ learning.