Integrating biological knowledge related to coexpression when analysing Xomic data

Abstract : Interpreting results provided by multivariate exploratory methods (such as Principal Component Analysis for instance) applied on genomic data is almost impossible at a gene level due to the number of genes. Integrative approaches which involve the incorporation of biological knowledge have become unavoidable. De Tayrac et al. (2009) proposed a strategy which allows to use an a priori information, such as Gene Ontology (GO) or Kegg terms to enhance their results. The idea consists in constituting modules of genes according to the a priori information and using those modules as a supplementary information in order to interpret results on the basis of the genes' functions. However, the composition of those modules may be disconnected from the structure of the genomic data to be studied and does not consider the di erent degrees of speci city of the terms which convey the existence of di erent levels of regulation. Hence appears the natural idea of improving the way modules are constituted. The aim of this talk is to propose a new approach combining Canonical Correspondence Analysis with Hierarchical Multiple Factor Analysis (Francoa et al., 2009) to get modules that have two main features: 1) they are constituted of genes that belong to the same biological processes; 2) they are constituted of genes that are co-expressed with respect to the data set of interest. The interpretation of the biological processes is thus facilitated by the co-expression of the genes within a group, whereas the method highlights a few key- genes whose functions can be easily taken into account to go deeper into the interpretation. An application of this method to a chicken microarray data set has allowed to bring out the well-known mechanisms implemented in reply to fasting, and to come up with new trails.
Complete list of metadatas
Contributor : Céline Martel <>
Submitted on : Friday, September 7, 2012 - 3:44:44 PM
Last modification on : Friday, November 16, 2018 - 1:31:18 AM


  • HAL Id : hal-00729543, version 1


Marie Verbanck, Sébastien Lê. Integrating biological knowledge related to coexpression when analysing Xomic data. 19th International Conference on Computational Statistics COMPSTAT, Aug 2010, Paris (FR), France. ⟨hal-00729543⟩



Record views