Inferring Hidden Structure in Biological Data

SCHEME: CORE

CALL: 2012

DOMAIN: BM - Translational Biomedical Research

FIRST NAME: Jorge

LAST NAME: Gonçalves

INDUSTRY PARTNERSHIP / PPP: No

INDUSTRY / PPP PARTNER:

HOST INSTITUTION: University of Luxembourg

KEYWORDS: Bioinformatics, machine learning, probabilistic graphical models, mixture models, structure learning, stochastic optimal control, optimization, transcription factor binding, regulatory network.

START: 2013-09-01

END: 2015-08-31

WEBSITE: https://www.uni.lu

Submitted Abstract

One of the key challenges in biomedical research is to identify on a genome-wide level the components and the topology of regulatory networks, and to provide potential targets to interfere with such networks in the context of new preventive and therapeutic strategies. A fundamental problem that pertains to the above, involves the characterization and the in silico prediction of the genomic specificities of transcription factors (TFs).The emerging view about transcription regulation is that each TF molecule possesses an intrinsic binding affinity for each genomic sequence, which gives rise to a binding affinity landscape that modulates the transcription program. In this project we address the problem of modeling and extracting from high-throughput data the intrinsic binding affinities of TFs. The dominant model in the literature for characterizing the binding affinity of a given TF is the position weight matrix (PWM). The PWM is a Naive Bayes model that assumes that all positions in a binding site are statistically mutually independent given the binding event, an assumption that has been recently challenged. In this project we propose the use of state-of-the-art models and algorithms from the field of machine learning for unravelling the architectural structure of TF binding sites and inferring the intrinsic binding affinity of an input TF. The proposed models, based on probabilistic mixtures and stochastic controllers, are able to capture high order correlations among nucleotides, as well as the multi-modality of binding observed in some families of TFs. Unlike related approaches in the literature, the models and the algorithms we propose are simple to use and provide theoretical guarantees of optimality.A good mathematical model of TF binding can form the seed for significant advances in biology and biomedicine, as it can help predicting the effects of sequence variations in regulatory regions, and aid in the engineering of promoters and DNA-binding proteins with desired properties.

This site uses cookies. By continuing to use this site, you agree to the use of cookies for analytics purposes. Find out more in our Privacy Statement