predicting functionality of protein-dna interactions by integrating diverse evidence

background about TFs. ChIPchip, ChIPseq can define binding (with noise), but knowing when binding is functional is less well characterized.

TF binding should be inferred from diverse and complimentary data.

Binding doesn’t imply functionality. Binding and gene regulation are context dependant.

Functional vs Non-functional discrimination? Which factors determine functionality of binding?

Yeast data.
Define binding::
ChIPchip (pvals corres to strength of binding),
PSSM (200bp down and 800 up of the start codon) assoc with a significance score,
Nucleosome Occupancy – significantly lower around binding site.

Bayesian prob of binding based on the 3 sets

ROC curve shows integrated model performs best (known interactions in normal growth conditions with 5-fold cross validation)

Assuming binding is functional if it changes gene expression – as measured with arrays.

integration with gene expression data – find instances where differential binding correlates with differential expression (between normal growth conditions and stress condition).

Looked at functional binding rates in different stress conditions. Functional binding rate is context dependent – differs between stress conditions. Can rank the impact of particular transcription factors in different stress conditions.

Factors determining functionality (from FB and NFB enriched sets).
Distance from the start codon, orientation wrt to direction of transcription. Presence of absence of co-factors on the same promoter.

Feature selection to determine important factors.: Multi-variate random forest classification algorithm. Feature importance score calc based on change in estimation error before and after permuting values of a vector. Identify significant factors for each (condition, TF) pair, calc p value.

Discriminatory factors are different in different conditions, but in most cases, co-factors were most important. Can use this data to determine significant TF-TF co-factor interactions.

Now have 2 sets, 1 enriched in functional binding sites and the other enriched in non-functional.


