ChIPpeakAnno

ChIPpeakAnno is a Bioconductor package, built on top of all the Biostrings, BSGenome, IRanges stuff, that will let you annotate peaks with nearby features. Some features (eg. mouse and human TSS positions for current genomes) are made available as data with the package, but it also speaks biomaRt to fetch other features if you need them. It’s pretty fast – it annotated 35K peaks with their nearest Ensembl gene in seconds.

Quick Example:

#load the mouse mm9 TSS positions
library(ChIPpeakAnno)
data(TSS.mouse.NCBIM37)

#or you could grab them from biomaRt if you want to:
library(biomaRt)
ensmart<-useMart("ensembl", dataset="mmusculus_genes_ensembl")
TSS.mouse.NCBIM37 <- getAnnotation(mart=ensmart, featureType="TSS")

#The annotation from ChIPpeakAnno has chromosome names like "1", "2", "X", so if you have
#"chr1", remove the "chr"
peaks$Chr = gsub('chr','',peaks$Chr)

#make your peak data into a RangedData object
peak.ranges<-RangedData(
   ranges= IRanges(
        start=peaks$GenomeStart ,
        end=peaks$GenomeEnd  ),
   names = paste("H3K9me3_peak", seq(1:nrow(peaks)), sep=""),
   space = peaks$Chr
)

#and annotate your peaks with the nearest TSS data:
annotatedPeak = annotatePeakInBatch(peak.ranges[1:5,], AnnotationData = TSS.mouse.NCBIM37)
as.data.frame(annotatedPeak)
#  space     start       end width names strand            feature
#1     1 108068801 108070959  2159 peak4      1 ENSMUSG00000044340
#2     1 176431793 176434091  2299 peak5      1 ENSMUSG00000028354
#3     1 180265735 180267970  2236 peak1     -1 ENSMUSG00000039630
#4     1 132611692 132615119  3428 peak2     -1 ENSMUSG00000026409
#5     1 122014653 122018516  3864 peak3     -1 ENSMUSG00000026385
#  start_position end_position insideFeature distancetoFeature
#1      108068581    108290821          TRUE               220
#2      176431956    176752198         FALSE              -163
#3      180258431    180267915          TRUE              2180
#4      132585759    132612391          TRUE               699
#5      122009867    122017594          TRUE              2941

The annotatePeakInBatch function throws a wobbly if your peaks don’t have rownames. Something like the following will fix it:

[diff]

*** annotatePeakInBatch.R.old 2010-01-14 12:36:00.000000000 +0000
— annotatePeakInBatch.R 2010-01-14 12:37:47.000000000 +0000
***************
*** 26,31 ****
— 26,34 —-
"strand")
allChr.Anno = unique(space(TSS.ordered))
numberOfChromosome = length(unique(space(myPeakList)))
+ if(is.null(rownames(myPeakList))){
+ rownames(myPeakList)<-paste("peak", seq(1,nrow(myPeakList)),
sep="")
+ }
z1 = cbind(as.character(rownames(myPeakList)),
as.character(space(myPeakList)),
start(myPeakList), end(myPeakList))
colnames(z1) = c("name", "chr", "peakStart", "peakEnd")
[/diff]

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s