EXERCISE 1
In a study, researchers aim to evaluate the possible association between ibuprofen intake and nausea.
One group of 1,500 individuals takes one ibuprofen pill, while another group of 2,000 individuals takes a placebo pill.
15 subjects who took ibuprofen develop nausea, and 90 subjects who took the placebo develop nausea. Answer the following questions:
Is there an association between ibuprofen intake and nausea?
Display the data using an appropriate graph.
What is the estimated OR?
Data organization:
exposed_sick <- 15 exposed_not_sick <- 1500 - 15
non_exposed_sick <- 90 non_exposed_not_sick <- 2000 - 90
table <- matrix(c(15, 90, 1485, 1910), 2)
colnames(table) <- c("sick", "not_sick") +
rownames(table) <- c("exposed", "non_exposed")
Alternatively, use the following command to label rows and columns:
dimnames(table) <- list(ibuprofen = c("exposed", "non_exposed"), nausea = c("sick", "not_sick"))
Perform the chi-square test:
chisq.test(table)
The obtained p-value is <0.05, so the chi-square test in this exercise is statistically significant. We can conclude that there is a statistically significant association between ibuprofen intake and nausea.
To visualize the data:
barplot(table, beside = TRUE, col = c("blue", "red"), legend.text = TRUE, main = "Exercise 1", args.legend = list(x = "topleft"))
ì
To calculate the OR:
fisher.test(table)
The OR is 0.214. Since the confidence interval does not include the value 1, the association is statistically significant.
The OR value lies between 0 and 1, indicating that the exposure factor (ibuprofen) is protective, meaning there is a reduced risk of developing nausea after taking ibuprofen.
EXERCISE 2
An experiment is conducted on various herbal teas that promote sleep. Researchers recruit university students and create four groups of five students each. Each group is assigned a different herbal tea to drink an hour before bed: Group 1 drinks chamomile, Group 2 drinks passionflower, Group 3 drinks valerian, and Group 4 drinks lemon balm.
The hours of sleep of the students are recorded:
chamomile <- c(6, 6, 7, 6, 8)
passionflower <- c(5, 6, 5, 7, 9)
valerian <- c(9, 8, 10, 7, 9)
lemon_balm <- c(8, 8, 9, 5, 7)
hours <- c(chamomile, passionflower, valerian, lemon_balm)
groups < c (rep("cham", length(chamomile)), rep("pass", length(passionflower)), rep("val", length(valerian)), rep("mel", length(lemon_balm)))
Check normality:
shapiro.test(hours)
The Shapiro test is not statistically significant (p-value > 0.05), indicating the variable is normally distributed.
Check homoscedasticity:
bartlett.test(hours ~ groups)
The Bartlett test is not statistically significant (p-value > 0.05), meaning variances are homogeneous.
Apply the parametric ANOVA test:
result <- aov(hours ~ groups)
summary(result)
The p-value is >0.05, so the ANOVA test is not statistically significant. There are no differences among the groups of students regarding sleep hours.
EXERCISE 3
In an orthopedic hospital, researchers evaluate the effectiveness of a new brace during physiotherapy. The number of steps taken by eight patients in 10 minutes is counted before and after using the brace.
Are there differences in patient mobility during physiotherapy before and after using the new brace? Display the data in an appropriate graph.
Data:
without <- c(3, 3, 5, 4, 5, 2, 3, 4) with <- c(5, 6, 6, 7, 5, 4, 5, 7) steps <- c(without, with) group <- c(rep("without", 8), rep("with", 8))
Check normality:
shapiro.test(steps)
The Shapiro test is not statistically significant (p-value > 0.05), so the steps variable is normally distributed.
Check homoscedasticity:
bartlett.test(steps ~ group)
The Bartlett test is not statistically significant, indicating homogeneous variances.
Apply a paired t-test:
t.test(steps ~ group, paired = TRUE)
The t-test is statistically significant (p-value < 0.05). There is a significant difference in the number of steps taken by patients before and after using the brace. On average, patients took two more steps with the brace.
Visualize the data:
Rìboxplot(steps ~ group, main = "Exercise 3", col = c("blue", "red"))
EXERCISE 4
A study on risk factors for heart diseases examined the relationship between hypertension and coronary artery disease in two different age groups.
Considering the age groups, is there a relationship between hypertension and coronary artery disease?
Data organization:
exercise4 <- array(c(552, 941, 212, 495, 1102, 1018, 87, 106), dim = c(2, 2, 2), dimnames = list(exposure = c("exposed", "non_exposed"), diagnosis = c("sick", "healthy"), age_group = c("young", "old")))
Perform the Mantel-Haenszel test:
mantelhaen.test(exercise4, correct = FALSE)
The test is statistically significant (p-value < 0.05). The OR values from the two contingency tables are not statistically equal. This indicates that the age factor plays a role in the occurrence of coronary artery disease in subjects with and without hypertension.
The common OR is 1.35, and the confidence interval does not include 1. Thus, hypertension is a risk factor that increases the likelihood of developing coronary artery disease.
EXERCISE 5
In a study conducted in Italy, 10 patients with hypertriglyceridemia were placed on a low-fat, high-carbohydrate diet. Before the diet, cholesterol and triglyceride levels were recorded for each patient. Is there evidence of a linear relationship between cholesterol and triglyceride levels before the diet? Visualize the data with an appropriate graph.
Data:
cholesterol <- c(5.12, 6.18, 6.77, 6.65, 6.36, 5.90, 5.48, 6.02, 10.34, 8.51)
triglycerides <- c(2.30, 2.54, 2.95, 3.77, 4.18, 5.31, 5.53, 8.83, 9.48, 14.20)
Visualize the data:
scatter.smooth(cholesterol, triglycerides, col = "red", main = "Exercise 5")
Check normality:
shapiro.test(cholesterol)
shapiro.test(triglycerides)
The Shapiro test for cholesterol is statistically significant, indicating that the variable is not normally distributed.
Perform the correlation test:
cor.test(cholesterol, triglycerides, method = "spearman")
The correlation test is not statistically significant (p-value > 0.05), so there is no evidence of a significant linear relationship between cholesterol and triglyceride levels.
EXERCISE 6
In an Italian study, physicians from different specialties were interviewed regarding their recommendations for the surgical treatment of early-stage breast cancer. They were asked whether they would recommend:
A radical surgical treatment regardless of the patient's age (R)
A conservative surgical treatment only for younger patients (CC)
A conservative surgical treatment regardless of the patient's age (C)
Data organization:
med_internal <- c(6, 22, 42)
surgery <- c(23, 61, 127)
radiotherapy <- c(2, 3, 54)
oncology <- c(1, 12, 43)
gynecology <- c(1, 12, 31)
specialization <- c(med_internal, surgery, radiotherapy, oncology, gynecology)
rnames <- c("med_internal", "surgery", "radiotherapy", "oncology", "gynecology")
cnames <- c("R", "CC", "C")
data <- matrix(specialization, nrow = 5, ncol = 3, byrow = TRUE, dimnames = list(rnames, cnames))
Perform the chi-square test:
chisq.test(data)
The chi-square test is statistically significant (p-value < 0.05). Thus, there is a significant difference in the surgical treatments recommended based on the physician's specialization.