This is the home of the bioinformatics pipeline RNAmapper -- Miller, Obholzer, Shah, Megason, and Moens (Genome Research, 2013).
    pubmed -- http://www.ncbi.nlm.nih.gov/pubmed/23299976
    paper -- pdf
RNAmapper uses RNA-seq data to identify both a region of the genome linked to a mutation as well as candidate mutations that may be causal for the phenotype of interest. We have shown that the method can identify mutations that cause nonsense or missense changes to codons, alter transcript splicing, or alter gene expression levels. Here you will find information on how to map your mutations using RNA-Seq data.

First you will need to prepare your samples and perform the RNA-seq experiment -- in vivo methods.
Second you will need to use the RNA-seq data to map your mutations. We provide two RNAmapper implementations:
    RNAmapper from the command line -- command line 101.
    RNAmapper using Galaxy -- galaxy download, galaxy online, galaxy 101.

To resolve common issues, find more downloadable data (SNPs, code), or contact us see the FAQ+ page.

In parallel our colleagues at Utah also developed an RNA-seq based mapping approach. We recommend using both methods to map your mutants. You can find more information on their software at their website -- http://yost.genetics.utah.edu/software.php

Support was provided by NINDS R21NS076950 to Cecilia Moens, NRSA fellowship F32NS074839 to Adam Miller, and NIDCD R21DC012097 to Sean Megason.

Now go clone your genes!


Fish carrying heterozygous mutations are crossed (A), and pools of WT sibling (+/+ and +/-) and mutant (-/-) embryos are collected (B) from which RNA is extracted from each. The first step of the mapping pipeline is to find quality SNPs in the wildtype pool (C - left side) to use as mapping markers. Good quality markers will have high coverage and a high percentage of heterozygosity -- both of these measures ensure that the marker will provide useful mapping information. The allele frequency of the SNP markers is then analyzed in the mutant RNA-seq data (C - right side). Any positions that are unlinked to the mutation (both on the same chromosome as well as on other chromosomes) will be heterozygous (SNP1), while linked regions will approach homozygosity (SNP2). The mutant allele frequency is then plotted against the genome (D and below), chromsome by chromsome, with regions of linkage appearing as peaks approaching homozygosity.

Genome Wide Mutant Allele Frequency (click to enlarge)

The black marks represent average allele frequencies of mutant markers at chromosomal positions. Each chromosome is noted at the bottom. The y-axis is the allele frequency from 0.5 to 1.0 (homozygous). In this example, the mutation maps to the middle of chromosome 5.

Candidate Identification
The RNA-Seq data allows not only for mapping, but also for identifying candidate mutations within the region of linkage. The next steps in the bioinformatic pipeline uses the RNA-Seq data within the region of linkage to identify candidate mutations that cause coding sequence changes, effect gene expression levels, or alter splicing.

Examples of nonsense mutations identified in RNA-seq mapping experiments
(click to enlarge)

Two separate experiments each identifying nonsense mutations within the RNA-seq data from the mapped region.

Example of a mutation effecting a gene's expression level in mutants (click to enlarge)

The downregulation of egr2b  is apparent when comparing the wildtype and mutant reads covering the locus.

Example of a mutation affecting transcript splicing (click to enlarge)

The improper splicing and inclusion of intronic sequence can be seen by looking at the alignment of the mutant reads against the genome.