Algebraic Methods for Studying Interactions Between Epidemiological Variables
Human Genetics Foundation, Turin, Italy
2 Department of Genetics, Biology and Biochemistry, University of Turin, Italy
3 Department of Mathematics, University of Genoa, Italy
4 Department of Mathematics, University of Turin, Italy
5 Imperial College, London, UK
⋆ Corresponding author. E-mail: firstname.lastname@example.org
Independence models among variables is one of the most relevant topics in epidemiology, particularly in molecular epidemiology for the study of gene-gene and gene-environment interactions. They have been studied using three main kinds of analysis: regression analysis, data mining approaches and Bayesian model selection. Recently, methods of algebraic statistics have been extensively used for applications to biology. In this paper we present a synthetic, but complete description of independence models in algebraic statistics and a new method of analyzing interactions, that is equivalent to the correction by Markov bases of the Fisher’s exact test.
We identified the suitable algebraic independence model for describing the dependence of two genetic variables from the occurrence of cancer and exploited the theory of toric varieties and Gröbner basis for developing an exact independence test based on the Diaconis-Sturmfels algorithm. We implemented it in a Maple routine and we applied it to the study of gene-gene interaction in Gen-Air, an European case-control study. We computed the p-value for each pair of genetic variables interacting with disease status and we compared our results with the standard asymptotic chi-square test.
We found an association among COMT Val158Met, APE1 Asp148Glu and bladder cancer (p-value: 0.009). We also found the interaction among TP53 Arg72Pro, GSTP1 Ile105Val and lung cancer (p-value: 0.00035). Leukaemia was observed to significantly interact with the pairs ERCC2 Lys751Gln and RAD51 172 G > T (p-value 0.0072), ERCC2 Lys751Gln and LIG4Thr9Ile (p-value: 0.0095) and APE1 Asp148Glu and GSTP1 Ala114Val (p-value: 0.0036).
Taking advantage of results from theoretical and computational algebra, the method we propose was more selective than other methods in detecting new interactions, and nevertheless its results were consistent with previous epidemiological and functional findings. It also helped us in controlling the multiple comparison problem. In the light of our results, we believe that the epidemiologic study of interactions can benefit of algebraic methods based on properties of toric varieties and Gröbner bases.
Mathematics Subject Classification: 62P10 / 62F03 / 92B05 / 13P10
Key words: polymorphism / interaction / Markov basis / Diaconis-Sturmfels algorithm / independence model / toric variety
© EDP Sciences, 2012