Dataset 1: ChIPseq-derived transcription factor binding sites. Mouse. Mapped to nearest Ensembl Gene ID,
Dataset 2: Human Illumina Ref6 expression array data (GPL6097, I think) from various cell lines with varying amounts of said transcription factor.
What are the targets of the transcription factor doing in the expression datasets?
As a quick approximation, map the Illumina human IDs to their Human Ensembl IDs, then grab the Ensembl IDs of the homologous mouse gene, filter to include only those annotated as nearest to a binding site and then it’s east to pull out the expression of the binding targets (listing the myriad reasons why the results could well be biologically meaningless is left as an exercise for the reader…).
Reason I ❤ biomaRt
ensembl.human = useMart("ensembl", dataset="hsapiens_gene_ensembl")
ensembl.mouse = useMart("ensembl", dataset="mmusculus_gene_ensembl")
homologs <- getLDS(
attributes=c('illumina_v1', 'ensembl_gene_id', 'chromosome_name'),