unix sort bed file

Just a reminder to myself:

sort -k1,1 -k2,2g -o out.bed in.bed

Or, if you have a header you need to skip:

head -n1 in.bed > out.bed
tail -n+2 in.bed | sort -k1,1 -k2,2g >> out.bed

And to sort, bgzip and tabix index

head -n1 in.bed > out.bed
tail -n+2 in.bed | sort -k1,1 -k2,2g | bgzip > out.bed.gz
tabix -p bed -S1 out.bed.gz
Advertisements

5 responses to “unix sort bed file

  1. OK, I’ve been looking at this for longer than I really should have, and I still can’t work out what the sort command is for.

  2. IIRC, it just dictionary sorts the on the chromosome column then numeric sorts on the start pos column.

    It does mean that your chromosomes end up in dictionary order though, so chr1, chr11, chr12. etc. I just had a look at sort man page and it has a “version sort” option which will order on version numbers in text. So this seems to order it properly:

    sort -k1,1V -k2,2g test.bed

  3. sortBed also works well if you have bedtools installed

  4. Please take a look at BEDOPS sort-bed: http://code.google.com/p/bedops/wiki/sortBed

    This tool suite also has a ‘bbms’ variant that allows sorting of BED files larger than system memory.

  5. We have written a suite of BED tools called BEDOPS, which includes lexicographical sorting in ‘sort-bed’: http://code.google.com/p/bedops/wiki/sortBed

    We also offer a ‘bbms’ variant of ‘sort-bed’ that allows sorting of BED files that will not fit into system memory.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s