1-Import the database and reclassify the factorial variables that were mistakenly interpreted as numeric by R.
str(DATASET_MOTHER)
DATASET2<-as.data.frame(lapply(DATASET_MOTHER[,c(1,2,4:10,26:35)],as.factor))
DATASET3<-as.data.frame(lapply(DATASET_MOTHER[,c(3,11:25,36:38)],as.numeric))
DATI<-data.frame(DATASET2,DATASET3)
2-Check for the presence of missing data and impute variables containing NA.
dim(na.omit(DATI))
library(mice)
md.pattern(DATI[,30:38]) #consigliato suddividere il Dataset in piu scaglioni
IMP<- mice(DATI,m=1,meth='pmm',seed=500)
dati<- complete(IMP,1)
3-What was the average birth weight of the pregnant women (PESO_MADRE_NASCITA)?
Consider the average weight separately for cases and controls, and evaluate with a statistical test whether this difference is significant.
tapply(dati$PESO_MADRE_NASCITA,dati$Case_Ctrl, mean)
shapiro.test(dati$PESO_MADRE_NASCITA)
bartlett.test(dati$PESO_MADRE_NASCITA~dati$Case_Ctrl)
t.test(dati$PESO_MADRE_NASCITA~dati$Case_Ctrl)
There are no statistically significant differences between the birth weight of cases and controls.
4-For each subject, it was possible to infer the lymphocytic component expressed in the columns CD8.naive, CD4.naive, CD8T, CD4T, NK, Bcell, Mono, Gran, and PlasmaBlast. Which of the listed variables are normally distributed?
lapply(dati[,26:34],shapiro.test)
Only the variable 'CD8.naive' is normally distributed.
5-Visually assess the correlation between all the variables listed in the previous exercise.
library(corrplot)
cor<-cor(dati[26:34])
corrplot(cor)
6-Graphically depict how the values of CD8.naive are distributed in cases and controls.
library(sm)
sm.density.compare(dati$CD8.naive,dati$Case_Ctrl)
7-How many levels does the variable 'Coffee_Assumption' have? Also, assess the association between coffee consumption and mothers' BMI (Mother's_BMI).
levels(dati$Coffee_Assumption)
shapiro.test(dati$Mother.s_BMI)
bartlett.test(dati$Mother.s_BMI~dati$Coffee_Assumption)
summary(aov(dati$Mother.s_BMI~dati$Coffee_Assumption))
There is no significant association between coffee consumption and pregnant women's BMI.
8-Select the most appropriate graph to represent the median values of the variable 'Bodyweight_before_Pregnancy' in cases and controls.
boxplot(dati$Bodyweight_before_Pregnancy~dati$Case_Ctrl)
9-Is there a significant relationship between the age of pregnant women (Age_at_Delivery) and total epimutations (EpimutTot)? Apply a test and create a corresponding explanatory graph.
cor.test(dati$Age_at_Delivery,dati$EpimutTot)
scatter.smooth(dati$Age_at_Delivery,dati$EpimutTot)
No, there is no significant relationship between age and epimutations in the study subjects.
10-How does the birth weight of the newborn (birth_weight) vary with increasing gestational age (Pregnancy_Time_in_days)?
summary(glm(dati$Birth_Weight~dati$Pregnancy_Time_in_days,family = gaussian))
The birth weight of the newborn increases by 24.95 units for each unit increase in gestational age.
Since this is a count, it would have been appropriate to use the "Poisson" method if it were part of the instructional program.
11-Is there a significant association between abortion (Abortion) and stress?
chisq.test(dati$Abortion,dati$Stress)
No, there is no significant association between Abortion and Stress.
12-Among the pregnant women with diseases (Disease_during_pregnancy), how many have diabetes?
table(dati$Disease_during_Pregnancy,dati$Diabetes)
Among the pregnant women with diseases, two have diabetes.
13-Graphically represent the variable 'Inferility_Age' expressing the percentages for each of its levels.
class(dati$Infertility_Age)
percentuali <-round(table(dati$Infertility_Age)/sum(table(dati$Infertility_Age))*100)
labels<-paste(levels(dati$Infertility_Age), percentuali,"%",sep = "_")
pie(table(dati$Infertility_Age),labels)