EXERCISE 1
Two groups of zebrafish, in which a Xenograft was performed, are treated with an experimental drug aimed at inhibiting angiogenesis. Newly formed blood vessels around the tumor mass are photographed and quantified by software that counts the pixels occupied by vessels in each image. The following results were obtained for the two study groups:
UNTREATED <- (22, 23, 25, 23, 23, 23, 25, 23, 23, 23, 23, 26, 27)
TREATED <- (15, 12, 13, 14, 18, 11, 12, 10, 9, 8, 6, 9, 12).
Does the pharmacological treatment work?
I create the two vectors in R:
UNTREATED <- c(22, 23, 25, 23, 23, 23, 25, 23, 23, 23, 23, 26, 27)
TREATED <- c(15, 12, 13, 14, 18, 11, 12, 10, 9, 8, 6, 9, 12)
I create a vector called "pixelvasi" where I insert all the values, consecutively, from both "UNTREATED" and "TREATED" groups.
pixelvasi<-c(22,23,25,23,23,23,25,23,23,23,23,26,27,15,12,13,14,18,11,12,10,9,8,6,9,12)
Using the rep function, I create a categorical vector that will contain the repetition of "UNTREATED" values n times (based on the length of the "UNTREATED" vector) and "TREATED" values (based on the length of the "TREATED" vector).
zebrafish <- c(rep("UNTREATED", length(UNTREATED)), rep("TREATED", length(TREATED)))
I check if the distribution of the variable "pixelvasi" is normal.
shapiro.test(pixelvasi)
Shapiro-Wilk normality test
data: pixelvasi
W = 0.87645, p-value = 0.004849
The p-value is <0.05, therefore I apply a non-parametric test.
Mann-Whitney test
wilcox.test(pixelvasi~zebrafish)
Wilcoxon rank sum test with continuity correction
data: pixelvasi by zebrafish
W = 169, p-value = 1.211e-05
alternative hypothesis: true location shift is not equal to 0
The difference between "TREATED" and "UNTREATED" groups is significant, so I can conclude that the pharmacological treatment is effective.
EXERCISE 2
We want to evaluate the effectiveness of antibiotic prophylaxis on patients undergoing two different types of surgical procedures (Procedure A and Procedure B) in relation to the occurrence of postoperative infections. Before the surgery, antibiotics were administered to 303 out of 606 patients undergoing Procedure A, while the remaining 303 received a placebo. For Procedure B, antibiotics were given to 301 out of 612 patients, while the remaining 311 received a placebo. The occurrence of postoperative infections in the examined patients is summarized in the following table. Tip (create correct tables first and use the array object).
I create two vectors, one containing values related to Procedure A:
vector1 <- c(303,26,303,46)
and one containing values related to Procedure B:
vector2 <- c(301,14,311,25)
I use the "array" object to create two 2x2 tables containing values related to Procedure A and B:
new.array <- array(c(vector1,vector2),dim = c(2,2,2))
print(new.array)
I apply the chi-square test to assess the effectiveness of antibiotic prophylaxis on patients undergoing the two different procedures.
chisq.test(table(new.array))
Chi-squared test for given probabilities
data: table(new.array)
X-squared = 0.75, df = 6, p-value = 0.9933
P-value > 0.05, so we can conclude that antibiotic prophylaxis did not result in a significant reduction in postoperative infections.
EXERCISE 3
A group of researchers has identified, through Linkage studies, that a gene CICC-1 co-segregates in a family of obese individuals. The hypothesis is that this gene, by influencing metabolism, affects the weight of the subjects. To demonstrate the relationship between this gene and metabolism, the expression of the gene and the levels of basal metabolism are measured in each subject. The obtained results are shown in the table below. METAB 2000 1950 2400 3500 3500 4500 900 1000 GEX-cicc-1 21.5 20.7 22.4 31.3 32 39 18.7 19.2
Can we conclude that the expression of this gene causes an increase in metabolism?
I create the two vectors containing the values related to metabolism and gene expression:
METAB<-c(2000,1950,2400,3500,3500,4500,900,1000)
GEXcicc<-c(21.5,20.7,22.4,31.3,32,39,18.7,19.2)
I apply linear regression:
summary(glm(METAB~GEXcicc,family = gaussian))
Call:
glm(formula = METAB ~ GEXcicc, family = gaussian)
Deviance Residuals:
Min 1Q Median 3Q Max
-433.42 -234.11 35.78 226.27 457.78
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1743.47 468.67 -3.720 0.00985 **
GEXcicc 164.54 17.66 9.318 8.65e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 122429.7)
Null deviance: 11364688 on 7 degrees of freedom
Residual deviance: 734578 on 6 degrees of freedom
AIC: 120.12
Number of Fisher Scoring iterations: 2
In this case, the p-value is 8.65e-05, indicating that the relationship between the two examined variables is significant.
However, I cannot conclude that the expression of GEX-cicc-1 causes an increase in metabolism. Regression evaluates the relationship between two variables, not a cause-and-effect relationship.
In this specific case, I cannot rule out the possibility that it is the basal metabolism influencing the gene expression, rather than the other way around.
EXERCISE 4
The maximum diameter of a tree trunk (Y, measured in inches) is influenced, among other factors, by the rainfall of the region (X, measured in inches). The following data is related to a sample of 15 eucalyptus trees. Depict the relationship between the two variables and calculate the relationship between rainfall and trunk diameter to understand how the trunk growth is affected by rainfall.
I create two vectors containing the measurements related to the region's rainfall and the maximum diameter of tree trunks:
PR<-c(250,115,75,85,100,75,85,225,250,255,175,140,150,170,75)
TA<-c(16.2,16.4,16.6,16.6,16.9,17,17.6,16.5,16.1,16.1,16.5,16.5,16.7,16.6,17.8)
I apply regression:
summary(glm(TA~PR,family = gaussian))
Call:
glm(formula = TA ~ PR, family = gaussian)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.44961 -0.13389 -0.02602 0.04308 0.75039
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.434435 0.220190 79.18 < 2e-16 ***
PR -0.005131 0.001354 -3.79 0.00225 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 0.1223926)
Null deviance: 3.3493 on 14 degrees of freedom
Residual deviance: 1.5911 on 13 degrees of freedom
AIC: 14.914
Number of Fisher Scoring iterations: 2
The relationship is significant, particularly based on the estimate, we can state that the relationship between rainfall and trunk diameter is inversely proportional.
With a unit increase in rainfall, there is a unit decrease in tree diameter, specifically by 0.005.
We can visualize this graphically with the function:
scatter.smooth(PR,TA)
EXERCISE 5
A study is conducted with the aim of understanding whether sleep hours affect the athletic performances of 10 hundred-meter sprinters. For this purpose, they are studied, and other potentially relevant variables are measured, such as weight, muscle mass, and type of training (INTENSIVE=2, LIGHT=1), as they are known to influence performances. The obtained data is shown in the following table.
Calculate if sleep hours influence performances while considering the other variables. Interpret the result.
I create vectors with the different variables from the table:
OREsonno<-c(8,10,8,6,5,8,9,10,11,8.5)
massMUSC<-c(15,20,13,8,9,12,16,18,19,8)
peso<-c(70,72,73,75,72,71,74,69,69,80)
Tipoall<-c(1,2,2,1,1,1,2,2,2,1)
PERFORM<-c(11.2,10.1,10.5,13.2,14.2,10.8,10.2,10.0,9.89,13.0)
I recode the variable "Tipoall," specifying that it is categorical:
Tipoall<-as.factor(Tipoall)
I apply regression:
summary(glm(PERFORM~OREsonno+massMUSC+peso+Tipoall,family = gaussian))
Call:
glm(formula = PERFORM ~ OREsonno + massMUSC + peso + Tipoall,
family = gaussian)
Deviance Residuals:
1 2 3 4 5 6 7 8 9 10
-0.35080 0.15899 -0.65463 0.00282 0.90588 -0.94646 -0.52468 0.32525 0.69507 0.38855
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.79926 10.65484 0.826 0.446
OREsonno -0.45067 0.32813 -1.373 0.228
massMUSC -0.02915 0.20346 -0.143 0.892
peso 0.10819 0.14305 0.756 0.484
Tipoall -0.77905 0.91629 -0.850 0.434
(Dispersion parameter for gaussian family taken to be 0.6616906)
Null deviance: 22.1373 on 9 degrees of freedom
Residual deviance: 3.3085 on 5 degrees of freedom
AIC: 29.318
Number of Fisher Scoring iterations: 2
The relationship between sleep hours and performance, assessed after accounting for the variables:
weight
muscle mass
type of training
is NOT found to be significant.