Methods for performing the pipeline using Galaxy on the Amazon Cloud.

Here you will find information on the process of setting up your own Galaxy server with RNAmapper in the cloud using Amazon Web Services and the Amazon Elastic Compute Cloud. We have created an Amazon Machine Image that contains the same components as the downloadable version, but you can use Amazon's computing power to analyze your data.

The prospect of this may be intimidating, especially if you have never done anything like this before. However, rest assured that thanks to the polished interface and some pointers, it will be easy. Amazon charges for cloud use and data storage. Although the charges are reasonable, you may want to limit your use to times of actual data analysis, and stop or even terminate your web server once the analysis is complete.

As with the download version there is a learning curve to using Galaxy but there are several good webcasts that will help (Galaxy help, Galaxy screencasts) and there is specific help for implementing an Amazon version (Galaxy Amazon help).

RNAmapper online demo

RNAmapper documentation: source code license


MAKING AN RNAMAPPER GALAXY CLOUD IMAGE

Create an Account / Sign in

Go to http://aws.amazon.com/ec2/ and sign up, or sign in. You will need credit card information.

001.png

Go to the AWS Management Console. Once you have signed in, go to the AWS Management Console. Choose “Amazon EC2.” 

003.png

005.png


Setting up an instance

008.png
 

Choose the “Classic Wizard.”

009.png


Choosing the RNAmapper Amazon Machine Image (AMI)
Under “Community AMIs”, search for “RNAmapper.” If there are multiple choices, select the image with the most recent date or version number attached to it (latest version).

011.png

Processing large datasets requires an “extra large” instance. Switch the “Instance Type” to “extra large”, then click “continue” in this and the next screen.

013.png

To ensure compatibility with our settings, choose Kernel ID “aki-427d952b”; then continue.

014.png

Processing large datasets requires a fair amount of storage space. We need to add extra “hard disks” to the instance. In order to do that, we need to “edit” the storage device configuration.

015.png

Configuring the "Root Volume"
The root volume currently has a size of 100 GB. This will allow you to process 3-4 wildtype/ mutant pairs. If you need to store more data, simply increase the size of this volume.
016.png


Configuring the Instance Store Volumes
Next, we need to add an EBS volume with a comfortable size, say: 500 GB. Expect a processed HiSeq2000 whole genome sequencing dataset to take up ~250GB before cleaning up intermediate steps. For “Snapshot”, choose “RNAmapper_database” so that the instance properly attaches the database files to the galaxy root directory.

017.png
 
018.png

Creating the key pair
Amazon will create a key file to
authenticate you when you transfer files to your instance. This will be important once you start to move your own sequencing data to the instance. For now, choose a key name and value (webserver name), then download the key file.

019.png
 
020.png


Configuring the Firewall
The internet
is not always a safe place. Therefore the firewall will intercept all data transfer by default. You must open three ports in the firewall to allow Galaxy to communicate with the outside world, and you to transfer your data onto the server. So create a security group with ports TCP 80, 8080 and SSH (port 22) open, then continue.

021.png


Launching (Booting) your instance
Now it is time to launch your Instance. The machine will take a minute or so to boot once you hit “Launch.”

022.png

While the instance is launching, you can go to your “Instances” page to view it and get some important information.

023.png


Finding your instance's IP address
On the instances page, you should see your instance listed. Select it and scroll down in the info window and find the public DNS of your instance. Copy it. It should take only a minute for your machine to boot. Proceed once both of the status checks pass. If only one of two passes, you probably used the wrong kernel in your “Advanced Instance Options” dialogue. If so, just terminate this instance (Instance Actions!) and start over.

024.png


Registering your instance's disk volumes
Although we have requested a number of virtual disk drives from Amazon, we should make sure that our Galaxy server can properly access them. Connect to your instance by right-clicking on it in the instance menu, or by checking it and using “Instance Actions” to bring up a menu: from it, select “Connect to an instance.” Replace the user name with “ubuntu”, supply the path to your key file (downloaded in an earlier step) and launch the client.

026.png

The client may ask you if you wish to continue (yes) and will log you into your server. Type “sudo nano /etc/fstab and hit enter.

028.png

Your window should now look like this:

029.png

If not, copy the line that begins with “/dev/xvdf…” by marking it with your mouse, then move the cursor to a new line and press the middle mouse button (you can also right-click and copy/paste). Do this until you have 4 “/dev/xvdf” lines and then modify them to look like this image. You will need to change “xvdf”s so you get “xvdf”, “xvdg”, “xvdh”, and “xvdi” (each is a different drive), and then change the paths (“mount points”) to say “/tmp”, “/home/ubuntu/galaxy-dist/database/job_working_directory”, “/media/data”, and “/media/data2”, respectively. Then save with <control>-O and exit with <control>-X.

031.png

There are two more things to do. First you need to remove the old job directory by typing:

sudo rm –r ~/galaxy-dist/database/job_working_directory” (hit enter),

and then “sudo reboot” (hit enter). Your terminal will disconnect, and your Amazon instance will reboot (~1 min).

Now open your web browser and paste your instances public DNS into the address bar, followed by “:8080” (Galaxy’s “listen” port). If your instance is fully booted, you should now see the RNAmapper welcome page.

032.png


Registering your Galaxy account on your instance
Before you can use Galaxy, you need to register. To start, use the email address galaxy@hms.harvard.edu with whatever password and public name you like. This will give you administrator privileges. You can later register other accounts and elevate them to administrator privilege if you so choose.

034.png

Congratulations! You are now ready to use you RNAmapper Galaxy instance on Amazon! 

Stopping / Deleting your Instance when Done

Don’t forget to “STOP” (shut down) or even “Terminate” (delete) your instance once you are done processing your data, you have downloaded it and your project is complete. You will also need to delete your “Snapshots” and “Volumes” once your project is done if you do not want Amazon to charge you for storing your data (~$0.13/ month * GB).

054.png

Now that you have a running RNAmapper its time to start mapping your mutants. -- Galaxy 101

If you have any questions, comments or suggestions, please contact RNAmapper at gmail dot com

GPL by Nikolaus Obholzer, 2012