Jean Hausser

Improving entropy estimation and inferring genetic regulatory networks

PDF document

Description

My master thesis, under the supervision of Korbinian Strimmer at Ludwig Maximilian Universität in Munich, Germany. It explores how entropy and other information theoretic quantities may be used to reverse-engineer genetic regulatory networks from repeated microarray data. The problem of differentiating genes that undergo direct co-regulation from genes whose expression is similar because they belong to the same regulatory pathway is studied from a graphical modeling viewpoint. This leads to the criteria of conditional independence which can be evaluated by computing the conditional mutual information. The latter is completely characterized by the sum of the entropies of joint variables, underlining the need for an entropy estimator that is accurate even in low sampling conditions.

We introduce a new plug-in entropy estimator obtained from shrinking maximum likelihood multinomial proportions estimates to the maximum entropy target. We derive the closely related ZIPshrink and ZINBshrink entropy estimators which enhance the shrinkage estimator by first adjusting the shrinkage target depending on the fraction of structural zeros in the multinomial model. The fraction of structural zeros is estimated using a Zero-Inflated Poisson or Zero-Inflated Negative Binomial distribution to model the histogram of bin counts. We compare these three new estimators to state of the art estimators. We show that they give acceptable estimates even in the low sampling regime and are as accurate as the best estimator available today while being 100 faster, making it more suitable for large scale computations. We then compare existing approximations of conditional independence networks such as 0-1 networks and a data processing inequality based approach. As a conclusion, we briefly consider limitations of the method as well as issues related to unobserved variables, causal inference and time series as opposed to steady state experiments.

Part I serves both as an introduction and a motivation. It presents the notions of conditional independence and explains why entropy estimation is critical to genetic regulatory network inference. Part II has the core results of this report: it reviews existing entropy estimators for the discrete case, introduces a new entropy estimator based on the statistical notion of shrinkage and compares their performance. Finally, part III compares data processing inequality based approach to genetic regulatory networks reverse-engineering with the so-called 0-1 networks approach. It also has considerations about limitations, pitfalls and possible extensions of the method.

File type

PDF document

Language

English


Available versions

Published onSizeChanges
download06/09/2006 @ 14h02399.03 KBPublication du document.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License Powered by ePSY
generated in 142.5ms