A tutorial of epistasis detection using ETMA.

Introduction:

Epistasis Test in Meta-Analysis (ETMA) is a statistical method using summary data from genetic association studies to detect gene-gene interaction. This package etma has a main function for detecting epistasis using ETMA, and contains three complete example data sets.

Background:

Conventional genome-wide association studies (GWAS) have been proven to be a successful strategy for identifying genetic variants associated with complex human traits. However, there is still a large heritability gap between GWAS and transitional family studies. The ‘missing heritability’ has been suggested to be due to lack of studies focused on epistasis, also called gene–gene interactions, because individual trials have often had insufficient sample size. Meta-analysis is a common method for increasing statistical power. However, sufficient detailed information is difficult to obtain. A previous study employed a meta-regression-based method to detect epistasis, but it faced the challenge of inconsistent estimates. Here, we describe a Markov chain Monte Carlo-based method, called ‘Epistasis Test in Meta-Analysis’ (ETMA), which uses genotype summary data to obtain consistent estimates of epistasis effects in meta-analysis.

Installation:

User may open the main R window and enter the following text to install etma package (assuming an internet connection and appropriate access rights on the computer):

install.packages("etma")

After installation, the user will need to enter the following text to load the etma package:

library(etma)

Datasets:

Use the data command to load these data and the print command to view them as follows. To analyze the data, use help(read.table) to view the details. User can use the help command to view the detailed definition of variables.

GSTs family and cancer

data(data.GST)
head(data.GST)
##                 Study Ethnicity     Country            Cancer case.GSTM1.0
## 1            Yri 2012 Caucasian      Norway  Hodgkin lymphoma          111
## 2 Van Hemelrijck 2012 Caucasian Switzerland   Prostate cancer           98
## 3        Rudolph 2012 Caucasian      German Colorectal cancer          822
## 4     Ramalhinho 2012 Caucasian    Portugal     Breast cancer           35
## 5   Ovsiannikov 2012  Caucasian     Germany    Bladder cancer           94
## 6       Oliveira 2012       Mix      Brazil    Ovarian cancer           84
##   ctrl.GSTM1.0 case.GSTM1.1 ctrl.GSTM1.1 case.GSTT1.0 ctrl.GSTT1.0
## 1          567          110          477          189          965
## 2          172          105          188          168          296
## 3          844          932          923         1433         1459
## 4           76           66           45           54           97
## 5          113          102          122          163          188
## 6           90           48           42           93           98
##   case.GSTT1.1 ctrl.GSTT1.1
## 1           31           50
## 2           35           64
## 3          313          308
## 4           47           24
## 5           33           47
## 6           39           34

PAH metabolism pathway and oral cancer

data(data.PAH)
head(data.PAH)
##     Athour Year Country case.CYP1A1.0 case.CYP1A1.1 ctrl.CYP1A1.0
## 1     Sato 2000   Japan            68            74            90
## 2 Tanimoto 1999   Japan            32            68            62
## 3   Gronau 2003 Germany            55            18            94
## 4   Gatt?s 2006  Brazil            25            13            63
## 5      Cha 2007     USA            20            52            49
## 6 Matthias 1998      UK           110            14           165
##   ctrl.CYP1A1.1 case.GSTM1.0 case.GSTM1.1 ctrl.GSTM1.0 ctrl.GSTM1.1
## 1            52           50           92           78           64
## 2            38           57           43           58           42
## 3            35           32           41           63           66
## 4            39           14           24           63           39
## 5           114           35           37           86          123
## 6            28           51           71           83           95

RAS and chronic kidney disease

data(data.RAS)
head(data.RAS)
##   Author Year      Race                 Tyep case.ACE.0 case.ACE.1
## 1     Su 2014     Asian             combined        792        502
## 2 Shaikh 2014 Caucasian diabetic nephropathy         99        121
## 3 Pawlik 2014 Caucasian   glomerulonephritis        126        154
## 4   Chen 2014     Asian             combined        314        152
## 5   Zsom 2011 Caucasian             combined        266        352
## 6  Huang 2010     Asian   glomerulonephritis         49         45
##   ctrl.ACE.0 ctrl.ACE.1 case.AGT.0 case.AGT.1 ctrl.AGT.0 ctrl.AGT.1
## 1        859        429        193       1101        230       1058
## 2        107        123         49        171         83        147
## 3        180        194        141        139        179        195
## 4        617        281         73        393        150        748
## 5        198        202        328        290        200        200
## 6        168         72         14         80         40        200

Simple example:

The main function of etma package is ‘ETMA’, and ETMA use an n by 8 matrix including the numbers of variants of SNP1 and SNP2 in case and control in each study (n is the number of studies) to analyse gene-gene interaction. Thus, the inputs of ETMA function include: (1) the number of wild type of SNP1 in case group, (2) the number of mutation type of SNP1 in case group, (3) the number of wild type of SNP1 in control group, (4) the number of mutation type of SNP1 in control group, (5) the number of wild type of SNP2 in case group, (6) the number of mutation type of SNP2 in case group, (7) the number of wild type of SNP2 in control group, and (8) the number of mutation type of SNP1 in control group.

Because ETMA is based on MCMC and a 2-steps iteration process, the main options of ETMA function include: (1) the maximum number of iterations (default is 20), (2) the length of chain to obtain the study-level parameters in step 1 (default is 20,000), (3) the length of chain to obtain the global-level parameters in step 2 (default is 200,000), and (4) the start seed of this algorithm (default is a random seed). Moreover, user also can choose whether want to export MCMC plots in each iterations.

The main outputs include: (1) the beta values (logarithmic ORs) of each SNP and interaction term, (2) the variance covariance matrix of beta value, and (3) the p matrix in iterations process. According these outputs, we can calculate ORs, their confidence intervals, and p values.

Use the ETMA command to analyze gene–gene interaction using ETMA and save the results to ggint.toy (Note: the computing time in this example is about 3-5 secs).

ggint.toy=ETMA(case.ACE.0,case.ACE.1,ctrl.ACE.0,ctrl.ACE.1,
                  case.AGT.0,case.AGT.1,ctrl.AGT.0,ctrl.AGT.1,
                  data=data.RAS,iterations.step1=100,iterations.step2=300,
                  start.seed=1,show.detailed.plot=FALSE,show.final.plot=FALSE)

After the analysis, use the print and summary commands to view the result of gene–gene interaction analysis.

print(ggint.toy)
## Epistasis Test in Meta-Analysis (ETMA)
## A MCMC algorithm for detecting gene-gene interaction in meta-analysis.
## 
## This analysis include 34 studies. (df = 31) 
## 
##                       b      se    OR 95%ci.l 95%ci.u t value p value
## SNP1(mutation) -0.00458 0.04044 0.995   0.917   1.081 -0.1131  0.9106
## SNP2(mutation)  0.08809 0.04787 1.092   0.991   1.204  1.8402  0.0753
## Interaction     0.13528 0.06773 1.145   0.997   1.314  1.9974  0.0546
summary(ggint.toy)
## Epistasis Test in Meta-Analysis (ETMA)
## A MCMC algorithm for detecting gene-gene interaction in meta-analysis.
## 
## This analysis include 34 studies. (df = 31) 
## 
##                       b      se    OR 95%ci.l 95%ci.u t value p value
## SNP1(mutation) -0.00458 0.04044 0.995   0.917   1.081 -0.1131  0.9106
## SNP2(mutation)  0.08809 0.04787 1.092   0.991   1.204  1.8402  0.0753
## Interaction     0.13528 0.06773 1.145   0.997   1.314  1.9974  0.0546
## 
##                                     OR 95%ci.l 95%ci.u t value p value
## SNP1(wild type) & SNP2(mutation) 1.092   0.991   1.204  1.8402  0.0753
## SNP1(mutation) & SNP2(wild type) 0.995   0.917   1.081 -0.1131  0.9106
## SNP1(mutation) & SNP2(mutation)  1.245   1.180   1.313  8.3543 <0.0001

Complete example:

Following examples are complete examples. They need 20,000/200,000 learning time in step 1/step 2, respectively (default). Please note they need more than 15 mins, and one of example need about 3 hrs. The complete learning time is necessary in real data analysis. Please use default setting as following to analysis your data.

GSTs family and cancer (note: the computing time for this example is about 3 h):

ggint1=ETMA(case.GSTM1.0,case.GSTM1.1,ctrl.GSTM1.0,ctrl.GSTM1.1,
           case.GSTT1.0,case.GSTT1.1,ctrl.GSTT1.0,ctrl.GSTT1.1,
           data=data.GST,start.seed=1,show.detailed.plot=TRUE,show.final.plot=TRUE)
print(ggint1)
summary(ggint1)

PAH metabolism pathway and oral cancer (note: the computing time for this example is about 15 min):

ggint2=ETMA(case.CYP1A1.0,case.CYP1A1.1,ctrl.CYP1A1.0,ctrl.CYP1A1.1,
           case.GSTM1.0,case.GSTM1.1,ctrl.GSTM1.0,ctrl.GSTM1.1,
           data=data.PAH,start.seed=1,show.detailed.plot=TRUE,show.final.plot=TRUE)
print(ggint2)
summary(ggint2)

RAS and chronic kidney disease (note: the computing time for this example is about 15 min):

ggint3=ETMA(case.ACE.0,case.ACE.1,ctrl.ACE.0,ctrl.ACE.1,
           case.AGT.0,case.AGT.1,ctrl.AGT.0,ctrl.AGT.1,
           data=data.RAS,start.seed=1,show.detailed.plot=TRUE,show.final.plot=TRUE)
print(ggint3)
summary(ggint3)