Methods for performing the pipeline using Galaxy on the Amazon Cloud.
Here you will find information on the process of setting up your own Galaxy server with RNAmapper in the cloud using Amazon Web Services and the Amazon Elastic Compute Cloud. We have created an Amazon Machine Image that contains the same components as the downloadable version, but you can use Amazon's computing power to analyze your data. The prospect of
this may be intimidating, especially if you have never done anything like this
before. However, rest assured that thanks to the polished interface and some pointers, it will be easy. Amazon charges for cloud use and data storage.
Although the charges are reasonable, you may want to limit your use to times of
actual data analysis, and stop or even terminate your web server once the
analysis is complete. As with the download version there is a learning curve to using Galaxy but there are
several good webcasts that will help (Galaxy help, Galaxy screencasts) and there is specific help for implementing an Amazon version (Galaxy Amazon help). RNAmapper online demo MAKING AN RNAMAPPER GALAXY CLOUD IMAGE Create an Account / Sign in Go to http://aws.amazon.com/ec2/ and sign up, or sign in. You will need credit card information. Go to the AWS Management Console. Once you have signed in, go to the AWS Management Console. Choose “Amazon EC2.” Choose the “Classic Wizard.” Choosing the RNAmapper Amazon Machine Image (AMI) Processing large datasets requires an “extra large” instance. Switch the “Instance Type” to “extra large”, then click “continue” in this and the next screen. To ensure compatibility with our settings, choose Kernel ID “aki-427d952b”; then continue. Processing large datasets requires a fair amount of storage space. We need to add extra “hard disks” to the instance. In order to do that, we need to “edit” the storage device configuration. Configuring the "Root Volume"The root volume currently has a size of 100 GB. This will allow you to process 3-4 wildtype/ mutant pairs. If you need to store more data, simply increase the size of this volume. Configuring the Instance Store Volumes Next, we need to add an EBS volume with a comfortable size, say: 500 GB. Expect a processed HiSeq2000 whole genome sequencing dataset to take up ~250GB before cleaning up intermediate steps. For “Snapshot”, choose “RNAmapper_database” so that the instance properly attaches the database files to the galaxy root directory. Creating the key pair Configuring the Firewall Launching (Booting) your instance Now it is time to launch your Instance. The machine will take a minute or so to boot once you hit “Launch.” While the instance is launching, you can go to your “Instances” page to view it and get some important information. Finding your instance's IP address On the instances page, you should see your instance listed. Select it and scroll down in the info window and find the public DNS of your instance. Copy it. It should take only a minute for your machine to boot. Proceed once both of the status checks pass. If only one of two passes, you probably used the wrong kernel in your “Advanced Instance Options” dialogue. If so, just terminate this instance (Instance Actions!) and start over. Registering your instance's disk volumes Although we have requested a number of virtual disk drives from Amazon, we should make sure that our Galaxy server can properly access them. Connect to your instance by right-clicking on it in the instance menu, or by checking it and using “Instance Actions” to bring up a menu: from it, select “Connect to an instance.” Replace the user name with “ubuntu”, supply the path to your key file (downloaded in an earlier step) and launch the client. The client may ask you if you wish to continue (yes) and will log you into your server. Type “sudo nano /etc/fstab ” and hit enter. Your window should now look like this: If not, copy the line that begins with “/dev/xvdf…” by marking it with your mouse, then move the cursor to a new line and press the middle mouse button (you can also right-click and copy/paste). Do this until you have 4 “/dev/xvdf” lines and then modify them to look like this image. You will need to change “xvdf”s so you get “xvdf”, “xvdg”, “xvdh”, and “xvdi” (each is a different drive), and then change the paths (“mount points”) to say “/tmp”, “/home/ubuntu/galaxy-dist/database/job_working_directory”, “/media/data”, and “/media/data2”, respectively. Then save with <control>-O and exit with <control>-X. There are two more things to do. First you need to remove the old job directory by typing: “sudo rm –r ~/galaxy-dist/database/job_working_directory” (hit enter), and then “sudo reboot” (hit enter). Your terminal will disconnect, and your Amazon instance will reboot (~1 min). Now open your web browser and paste your instances public DNS into the address bar, followed by “:8080” (Galaxy’s “listen” port). If your instance is fully booted, you should now see the RNAmapper welcome page. ![]() Registering your Galaxy account on your instance Before you can use Galaxy, you need to register. To start, use the email address galaxy@hms.harvard.edu with whatever password and public name you like. This will give you administrator privileges. You can later register other accounts and elevate them to administrator privilege if you so choose. Congratulations! You are now ready to use you RNAmapper Galaxy instance on Amazon! Stopping / Deleting your Instance when Done Don’t forget to “STOP” (shut down) or even “Terminate” (delete) your instance once you are done processing your data, you have downloaded it and your project is complete. You will also need to delete your “Snapshots” and “Volumes” once your project is done if you do not want Amazon to charge you for storing your data (~$0.13/ month * GB). If you have any questions, comments or suggestions, please
contact RNAmapper at gmail dot com GPL by Nikolaus Obholzer, 2012 |