Over the past 12 years, thousands of genomic regions have been associated to complex diseases; however, identifying the true causal variants and genes is still a difficult task. One of the challenges is that the majority of the identified genetic variants are in regions that do not code for proteins, suggesting that these variants have regulatory effects. It has been hypothesized that the variants that increase the risk to develop the disease act as switches that regulate gene expression in an altered manner that lead to the development of disease. Hence, one of the most commonly used approaches to identify the causal genes and variants is to correlate the expression of nearby distal genes with the genotype of the risk variant. As the effects can be measured quantitatively, they are referred to as expression quantitative trait loci (eQTL).
There are two types of eQTLS: cis-eQTLs and trans-eQTLS. The genetic variants that affect genes located on the same chromosome are called cis-eQTLs, while variants that affect genes on other chromosomes are called trans-eQTLs. Because cis-eQTLs have stronger effects than trans-eQTLs, they are easier to identify and their analyses can be performed in relatively small sample sizes. That’s why the majority of eQTL studies performed so far have looked at the effects of the genetic variants on expression of nearby genes that are located within a one mega base-long region around the variants (cis-eQTLs). By contrast, trans-eQTLs have smaller effects and therefore larger sample sizes are required to capture them. There are important factors one must consider before conducting an eQTL study or using eQTLs to understand disease-biology. This article outlines some of the most important considerations.
eQTLs are often tissue specific, meaning a variant can be affecting gene X in one tissue, but not in other tissues. For this reason, it is important to look at eQTLs from disease-relevant cells and tissues when interpreting the results from disease-association analyses, especially for trans-eQTLs, which are more tissue-specific. For instance, if one is studying a liver disease, it is best to focus on eQTLs from hepatocytes rather than from bone or brain tissue, as the hepatocytes are the cells that are affected by the disease and might have different regulatory networks than bone or brain. While most of the initial eQTL studies were performed in tissues that were easily obtained, such as blood, nowadays multiple publicly available databases contain eQTL information from many cells and tissue types. It is even possible to map eQTLs using gene expression data measured at single-cell resolution, enabling investigators to look at more specific cell types. As the number of samples, cells, and tissues included in the eQTL analysis increases, the more the power it has to detect eQTLs.
Another characteristic of eQTLs is that they are context-specific, i.e., applying a stimulus (e.g. cytokines, enzymes, bacteria, viruses) to tissues showing eQTLs can diminish or promote their effects. Therefore, eQTLs from healthy individuals should be used with caution if the goal is to understand the disease biology. Until now, most of data from eQTL studies have come from healthy individuals. A better approach would be to look at patient-specific tissues or cells, as many pathways might be regulated differently as a result of the disease. The creation of biobanks and prospective cohorts has facilitated the collection of patient material even before the development of the disease, which will allow investigators to look at the regulatory networks in both contexts—healthy and diseased.
Correlation vs. causation
Despite the fact that eQTLs have been broadly employed to identify causal genes, it is important to keep in mind that eQTLs do not necessarily indicate causality. Rather, eQTLs establish a correlation between the genotype and the expression levels of the gene. A variant will commonly affect multiple genes, either in one tissue or multiple tissues, and it is still unclear how these multiple eQTLs translates into the disease biology. It could be that all the affected genes are acting in the same pathway, or that only one of the genes is expressed in a disease-relevant tissue, or that investigators have missed the true causal gene because the effect was context-dependent or the power to detect the eQTL was lacking. The best way to probe if a variant or a gene is causal is by performing functional studies in appropriate models. Nevertheless, eQTLs in disease-relevant tissues are an excellent way to prioritize candidate variants and genes for follow up studies.
System genetics approaches
After a disease association analysis, the genes prioritized by eQTLs are usually included in pathway enrichment analysis. In many cases, such analysis has led to the implication of new disease pathways, contributing to a better understanding of the disease pathophysiology. The use of system genetics approaches, including eQTL mapping, has also proven useful in identifying potential treatments. For example, in a study of rheumatoid arthritis, after applying a system genetics approach including eQTL mapping, researchers identified 27 drugs that are already in use to treat the disease plus other drugs that are prescribed to treat other diseases, but could also have a potential effect in rheumatoid arthritis patients.
All in all, the combination of eQTL mapping with other system genetics tools is an excellent approach to identify new pathways, gain more insight into the disease biology, and to identify new treatments.