The ‘xyz’ Algorithm for Fast Interaction Search in High-Dimensional Data

Gian-Andrea Thanei, Nicolai Meinshausen and Rajen D. Shah

The R-Package xyz contains an algorithm to estimate product interactions between a response vector \(Y\) and predictors \(X\). For example if all data were binary in \(\{-1,1\}\) one would like to find the pair \((l,k)\) maximizing: \[ \begin{equation} \mathbb{P}(Y=X_l X_k). \end{equation} \] A brute force approach would cycle through all possible pairs, which implies a quadratic run-time if there are \(p\) variables. The xyz-Algorithm provably solves this problem in sub-quadratic runt-ime. Its run-time depends on the interaction strength of the strongest pair. xyz can recover pairs in almost linerar run-time if the interaction is strong.

Functions

The xyz package offers two functions for interaction search:

Examples

xyz_search:

## Warning in normalizePath(path.expand(path), winslash, mustWork):
## path[1]="C:/Users/rds37/AppData/Local/Temp/Rtmp6xMSEF/Rbuild39447baa666f/
## xyz/vignettes/../inst/include": The system cannot find the path specified
#set dimensions
n<-100;p<-1000;
#create data
X<-matrix(2*rbinom(n*p,1,0.5)-1,n,p)
Y<-X[,1]*X[,2]

#find top 10 interactions
result<-xyz_search(X,Y,L=5,N=10,binary=TRUE,negative=TRUE)
#the first element contains the interaction pairs the second element contains their strength
print(result)
## intereaction pair: (1,2) strength: 1
## intereaction pair: (217,398) strength: 0.46
## intereaction pair: (524,888) strength: 0.42
## intereaction pair: (685,921) strength: 0.42
## intereaction pair: (18,105) strength: 0.4
## intereaction pair: (360,945) strength: 0.38
## intereaction pair: (23,68) strength: 0.38
## intereaction pair: (405,643) strength: 0.38
## intereaction pair: (415,668) strength: 0.38
## intereaction pair: (123,554) strength: 0.38

These were now just L=10 runs. This means we discovered the interaction in about \(\mathcal{O}(np^{1.005})\) operations instead of \(\mathcal{O}(np^2)\).

xyz_regression:

#set dimensions
n<-100;p<-1000;
#create data
X<-matrix(rnorm(n*p),n,p)

#build model
Y<-3*X[,5]+2*X[,1]*X[,2]-3*X[,7]*X[,4]+rnorm(n)

#find top 10 interactions
result<-xyz_regression(X,Y,L=10,n_lambda=10,alpha=0.9)
#the first element contains the main effects and the third the interaction effects, we look at the fifth lambda
print(result)
## Lambda sequence:
## lambda1=0.57601
##  lambda2=0.44598
##  lambda3=0.34531
##  lambda4=0.26736
##  lambda5=0.20701
##  lambda6=0.16028
##  lambda7=0.1241
##  lambda8=0.09608
##  lambda9=0.07439
##  lambda10=0.0576
## Discovered main effects: 1 Discovered interaction effects: 42
## Model parameters:
## intercept: -4.566117e-18
## Printing effects for lambda10=0.0576
## Main effects:
## Main effect: 5 coefficient: 0.3840953
## Interaction effect:
## Interaction effect: (4,7) coefficient: -0.4812117
## Interaction effect: (1,2) coefficient: 0.3750244
## Interaction effect: (208,479) coefficient: 0.06235549
## Interaction effect: (170,203) coefficient: -0.04201977
## Interaction effect: (891,931) coefficient: -0.03821096
## Interaction effect: (104,271) coefficient: -0.03634871
## Interaction effect: (393,592) coefficient: -0.03041407
## Interaction effect: (38,610) coefficient: 0.02600682
## Interaction effect: (23,847) coefficient: -0.02076678
## Interaction effect: (570,630) coefficient: -0.02047379
## Interaction effect: (423,481) coefficient: 0.01993759
## Interaction effect: (26,450) coefficient: 0.01842155
## Interaction effect: (583,825) coefficient: -0.01752191
## Interaction effect: (7,981) coefficient: -0.01726098
## Interaction effect: (104,453) coefficient: 0.01579584
## Interaction effect: (464,798) coefficient: -0.01543339
## Interaction effect: (449,911) coefficient: -0.01507906
## Interaction effect: (305,484) coefficient: 0.01480733
## Interaction effect: (338,809) coefficient: -0.01444916
## Interaction effect: (584,681) coefficient: -0.01396515
## Interaction effect: (214,615) coefficient: -0.01351377
## Interaction effect: (329,677) coefficient: -0.01337054
## Interaction effect: (562,565) coefficient: 0.01256765
## Interaction effect: (247,392) coefficient: -0.0123604
## Interaction effect: (23,186) coefficient: -0.01199699
## Interaction effect: (20,58) coefficient: -0.01157434
## Interaction effect: (332,712) coefficient: 0.01049054
## Interaction effect: (553,659) coefficient: -0.009991351
## Interaction effect: (596,708) coefficient: -0.009258795
## Interaction effect: (11,237) coefficient: -0.009236588
## Interaction effect: (354,876) coefficient: -0.008922487
## Interaction effect: (467,499) coefficient: -0.008647326
## Interaction effect: (708,812) coefficient: 0.006859055
## Interaction effect: (236,946) coefficient: -0.005231367
## Interaction effect: (531,709) coefficient: -0.004513708
## Interaction effect: (479,822) coefficient: 0.003664453
## Interaction effect: (106,691) coefficient: 0.003388497
## Interaction effect: (77,830) coefficient: -0.002939483
## Interaction effect: (661,694) coefficient: -0.002512046
## Interaction effect: (584,675) coefficient: 0.002174015
## Interaction effect: (114,941) coefficient: -0.00206018
## Interaction effect: (106,831) coefficient: 0.001846254
plot(result)
#predict
predict(result,matrix(rnorm(10*p),10,p))
##              [,1]
##  [1,]  1.05718927
##  [2,]  0.10747538
##  [3,]  0.12871009
##  [4,] -0.66864664
##  [5,]  0.64332971
##  [6,] -0.04412215
##  [7,] -0.05661955
##  [8,] -0.19031556
##  [9,]  0.33553702
## [10,]  1.20085180