At ISMB, the Galaxy guys were talking about using Galaxy as an interface for analysing NGS data. I’m having a go at getting it up and running on EC2. Notes are really for my own reference, but I thought I’d post them in case they were of use to anyone else.
Obviously, you need an AWS EC2 account in order to get this working. Once you’ve got your account set up, install the AWS command line tools from here (Ignore the negative reviews, they work fine on Linux.). There are a few environment variables to set too. EC2_HOME is where the tools are, a location you’ll need to add to your path. You’ll also need to tell them where your AWS private key and x509 certificate files are (files that should have been generated during AWS registration but, if not, see the AWS x509 docs). Something like the following in your ~/.profile should do the trick:
You’ll also need to create and register a key-pair to access your EC2 instances. The easiest way to do this is via the AWS management console – it’s fairly self-explanatory. Save the .pem file somewhere on your local machine (I’m using
Create a security group for the Galaxy server for which we can open appropriate ports. Then run an instance. I’m using the official Ubuntu Intrepid x86 server AMI as a base:
ami-5059be39. I’m also using
us-east-1b cos that’s where the EBS volume with all my ChIPseq data lives.
ec2-add-group galaxy -d ‘Group for Galaxy Server’
ec2-run-instances ami-5059be39 –region us-east-1 –availability-zone us-east-1b –key cassj –group galaxy –instance-type m1.small –instance-count 1
Connect to instance
Open up the ssh port (Am just opening it to everyone. Alternatively, you can restrict the IP addresses using CIDR format).
ec2-authorize galaxy -Ptcp -p22 -s 0.0.0.0/0
ec2din to check your instance is running and get its address, then ssh in using your keypair, something like:
ssh -i cassj.pem email@example.com
Install Galaxy on your running instance. The following will grab the latest version from the repository and stick it in
sudo apt-get install mercurial
sudo hg clone http://www.bx.psu.edu/hg/galaxy galaxy
sudo chown -R ubuntu:ubuntu galaxy
sudo sh setup.sh
And modify the file
universe_wsgi.ini so that the host is set to the appropriate place, eg
host = ec2-174-129-166-230.compute-1.amazonaws.com
Well, that was easy. You seem to need to run
run.sh as root initially, but after that it seems to be ok if you run as user ubuntu.
Install Apache for static files
By default Galaxy runs on port 8080. We’ll set up apache running on port 80, tell it to handle any of the requests for static files, to take the load off the Galaxy process and ask it to hand over anything else to Galaxy to deal with. So, install Apache2 and enable
sudo apt-get install apache2
sudo a2enmod rewrite
sudo a2enmod proxy
sudo a2enmod proxy_http
/etc/apache2/sites-available/default this will redirect the stuff handed to Apache to Galaxy:
RewriteRule ^/(.*) http://ec2-174-129-166-230.compute-1.amazonaws.com:8080/$1 [P]
And this will handle the limited number of static files that we want Apache to deal with:
RewriteRule ^/static/style/(.*) /galaxy/test/static/june_2007_style/blue/$1 [L]
RewriteRule ^/static/(.*) /galaxy/test/static/$1 [L]
RewriteRule ^/images/(.*) /galaxy/test/static/images/$1 [L]
RewriteRule ^/favicon.ico /galaxy/test/static/favicon.ico [L]
RewriteRule ^/robots.txt /galaxy/test/static/robots.txt [L]
More info on installing Galaxy can be found on the wiki
Restart apache with
sudo /etc/init.d/apache2 restart.
Authorize Apache and Galaxy Ports
ec2-authorize galaxy -Ptcp -p8080 -s 0.0.0.0/0
ec2-authorize galaxy -Ptcp -p80 -s 0.0.0.0/0
Now if you go to
http://<Your AWS URL> you should see your Galaxy installation.
It’s not going to be totally functional because we haven’t installed all of the underlying bioinformatics binaries but my plan is to have separate instances doing the actual analysis anyway. That’s tomorrow’s problem though…