Quantile regression

QUANTILE REGRESSION

Quantile Regression (QR) [1] is a methodology which extends regression for the mean to the analysis of the entire conditional distribution of the outcome variable. Instead, QR infers on the quantiles of the distribution, e.g., one can infer on the median of the distribution.

Thus, the advantages of QR model are clear:

It does not make assumptions about the distribution of the model's residuals as it happens for ordinaly least square (OLS) linear models
It doesn't need any link distribution as in generalized linear models (GLMM).
There is no need to apply log-transformations or other weird transformations to the dependent variables which can cause difficult interpretation of the coefficient estimated. And, most important, it can cause an incorrent inference since the null hypothesis using the transformed variable changes from the null hypothesis using the original variable (see [2] for details).
In presence of a skewed distirbution for the outcome variable it becomes more logical to infer on the quantiles rather than only on the mean

The downside of the methodology is that is less efficient compared to a OLS linear model and thus requires a higher sample size to achieve the same power.

If the dependent variable is distributed as in the figure above, one can notice that the mean is not necessarily the best statistic to summarize the distribution as it is strongly positively-skewed. In fact, the mode corresponds to the 25th quantile, and the mean is moved to the right towards higher values. One can think of making this distribution normal-shaped using transformations and/or removing extreme values classifying them as outliers, but this is bad practice as there would be loss of information, potential unjustified removal of observations and, as stated above, potential incorrect inference due to transformation of data.

In situations like these it would be more appropriate to evaluate the independent variable effects at different points of the dependent variable distribution, e.g., 25th quantile, median/50th quantile, 75th quantile, and 90th quantile, using quantile regression which can retain all original information.

The model can be implemented easily using R package lqmm built by Geraci and Bottai [3]. The package also handles mixed effects models, which is very useful in case of correlated data.

EXAMPLE

So let's take a simple example for illustrative purpose only. Suppose we want to evaluate a highly skewed dependent outcome, for example a biomarker, on a fixed effect, i.e. treatment vs placebo in a randomized setting. The treatment works well if it causes an increase in the biomarker. So the purpose is to investigate how the treatment affects biomarker level. We decide to evaluate the difference in biomarkers level between treated and placebo not using a linear model, but a quantile regression at 10th, 25th, 40th, 50th, 60th, 75th and 95th quantiles, obtaining the following results:

On the y-axis there are the biomarkers levels differences between subjects who took treatment and subjects who took placebo. On the x-axis there are the quantiles listed above. The dashed line represents the estimate that would have been obtained using a simple linear model (OLS estimate), with dotted lines representing the 95% confidence limits. Finally, for each quantile, we have QR estimates with 95% confidence intervals.

INTERPRETATION

Looking at the plot we can make the following considerations:

We can immediately see that QR treatment effects change along the quantiles of the outcome distribution, this means that the treatment effect is not the same along biomarker's distribution. This does not justify the use of a simple linear model, because we would expect the QR estimates to be the same as OLS estimate for all quantiles [4].
The median effect estimate is higher compared to the OLS estimate. Effect estimates decrease and not in a linear way. Thus, efficacy is lower for higher biomarkers values compared to lower biomarker values.
Since a higher efficacy for lower biomarker values (and viceversa) could be expected from a biological/clinical point of view, this methodology gives an overall overview on the efficacy of the treatment, which can be used to the target the treatment only for subjects with specific biomarker levels.

In summary, using a simple linear model producing an OLS estimate we would have simply concluded that the treatment causes a 2.5 increase in the biomarker level for any quantile of the biomarker distribution, while using a QR we note that this increase is not actually the same along the quantiles of the biomarker distribution. In particular, the efficacy is higher for low biomarker values, and then fall for high biomarker levels.

References:

[1] Quantile Regression and Its Applications: A Primer for Anesthesiologists. Steven J Staffa 1, Daniel S Kohane, David Zurakowski. Anesth Analg. 2019

[2] Log-transformation and its implications for data analysis. Changyong Feng et al. Shanghai Arch Psychiatry. 2014

[3] Linear quantile mixed models. Marco Geraci, Matteo Bottai. Stat Comput. 2014

[4] Getting Started with Quantile Regression. University of Virginia Library. https://data.library.virginia.edu/getting-started-with-quantile-regression/

Training Opportunities

Training Opportunities

GLabStat

QUANTILE REGRESSION