EXERCISE 1
In a survey, information is collected on the productive sector (Y) and gender (X) from a random sample of employed individuals:
Test, at a significance level of 5%, whether the two variables can be considered independent.
The code creates a vector with employment data for males and females:
occupazione <- c(0, 8, 12, 80, 10, 52, 58, 20)
A vector for row names (rnames) and column names (cnames) is also created:
rnames <- c("F", "M")
cnames <- c("agricoltura", "artigianato", "industria", "servizi")
Generate the data matrix and display it:
dati <- matrix(occupazione, nrow = 2, ncol = 4, byrow = TRUE, dimnames = list(rnames, cnames))
View(dati)
Perform a chi-squared test:
chisq.test(dati)
The p-value is statistically significant, indicating a difference in employment between men and women in the analyzed sectors.
EXERCISE 2
The goal is to verify if a preservative for human consumption has effects on growth factors. For this purpose, a group of 10 adult guinea pigs was subjected to a diet containing the tested substance. Each subject was weighed before and after the new diet to measure variations. Based on the table of weights before and after the diet, the aim is:
To determine whether the substance causes significant weight changes.
To find the actual weight change (δ) caused by the preservative at α = 0.05.
Create vectors for the guinea pigs' weights before and after the treatment, combine them into a single vector (peso), and create a gruppo vector:
prima <- c(180, 175, 150, 158, 174, 187, 172, 157, 164, 165)
dopo <- c(190, 170, 175, 164, 185, 184, 185, 168, 180, 173)
peso <- c(prima, dopo)
gruppo <- c(rep("pre", length(prima)), rep("post", length(dopo)))
Check if conditions for a t-test are satisfied using the Shapiro test for normality:
shapiro.test(peso)
Since p-value > 0.05, the data is normally distributed. Test for homoscedasticity:
bartlett.test(peso ~ gruppo)
Since p-value > 0.05, the homoscedasticity requirement is satisfied. Perform a paired t-test:
t.test(peso ~ gruppo, paired = TRUE)
The p-value < 0.05 indicates statistically significant weight changes due to the preservative. The weight change caused by the preservative is 9.2 grams.
EXERCISE 3
A laboratory evaluates the accuracy of a diagnostic test for Helicobacter infection before market release. The test is conducted on 200 infected subjects and 300 uninfected ones. Among the infected, 190 test positive, while among the healthy, 25 test positive. Calculate the sensitivity and specificity of the diagnostic test.
Define variables:
VERI_POSITIVI <- 200
VERI_NEGATIVI <- 300
FALSI_POSITIVI <- 25
MALATI_POSITIVI <- 190
FALSI_NEGATIVI <- 200 - 190
Compute sensitivity and specificity:
sensibilità=veri_positivi/veri_positivi+falsi_negativi
sensibilità<-200/(200+10)
specificità=veri_negativi/falsi_positivi+veri_negativi
specificità<-300/(25+300)
EXERCISE 4
A group of psychologists conducts a study to evaluate whether stress affects the bonus earned as a production incentive. Twenty-four employees of an insurance company are studied. The results are in the table.
The hypothesis is that stress improves employee performance. Is this true?
Define vectors:
stress <- c(100, 101, 103, 105, 109, 110, 111, 109, 99, 95, 92, 93, 96, 93, 98, 93, 102, 83, 91, 107, 71, 79, 83, 109) premio <- c(1000, 1030, 1000, 1200, 500, 503, 600, 980, 800, 780, 900, 825, 600, 505, 625, 300, 250, 300, 500, 525, 125, 220, 200, 190)
livello <- c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3)
sex <- c(0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1)
sex <- as.factor(sex)
livello <- as.factor(livello)
Run a regression model:
model <- glm(premio ~ stress + livello + sex, family = "gaussian")
summary(model)
The relationship between stress and production bonus is not statistically significant (p-value > 0.05). However, there are significant relationships between production bonus, professional level, and gender.
EXERCISE 5
A lab wants to evaluate if a drug inhibits cell motility. Ten Boyden chamber chemotaxis experiments are conducted, measuring the number of cells crossing a porous membrane. Treated cells are also evaluated for mortality. Data is in the table.
Define vectors:
trattate <- c(440, 695, 837, 1234, 679, 672, 964, 915, 654, 830)
non_trattate <- c(1000, 1023, 1232, 1563, 1236, 1245, 1236, 1289, 1110, 1239)
migrate <- c(trattate, non_trattate)
trattamento <- c(rep("trat", length(trattate)), rep("non_trat", length(non_trattate)))
trattate_morte <- c(56, 32, 32, 21, 45, 46, 22, 29, 41, 33) non_trattate_morte <- c(10, 15, 9, 9, 5, 10, 13, 16, 17, 22) mortalita <- c(trattate_morte, non_trattate_morte)
Run a regression model:
model <- glm(migrate ~ trattamento + mortalita, family = "gaussian")
summary(model)
The relationship between drug treatment and cell motility is not statistically significant after accounting for cell mortality. However, cell motility is significantly associated with cell mortality.
EXERCISE 6
Complete the obscured parts of the table:
tabella <- matrix(c(52, 23, 5, 38), 2)
colnames(tabella) <- c("malato", "sano")
rownames(tabella) <- c("esposto", "non_esposto")
fisher.test(tabella)