Jump to content
Toggle sidebar
Neurobiology.Dev
Search
Create account
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Talk
Contributions
Navigation
Main page
Records
Recent changes
Random page
Tools
What links here
Related changes
Special pages
Page information
Editing
Nanopore RNA Sequencing Protocol
(section)
Page
Discussion
English
Read
Edit
View history
More
Read
Edit
View history
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===m6ANet=== {{warning|Before we get started, I highly recommend that you take a look at your output from the previous steps in the ''Integrative Genomics Viewer (IGV)'', which is [https://software.broadinstitute.org/software/igv/download available here]. This will allow you to visualize a gene of interest and ensure that you actually have mapped reads at this stage in the pipeline. Otherwise, you’ll be just wasting your time trying to extract methylation status on 0 reads! <b>Note:</b> I had issues running the setup install for ''m6ANet'' over an existing ''Conda'' installation. Likely there are some package version incompatibilities. I highly recommend running the setup and executing ''m6ANet'' from its own ''Conda'' stack, after installing only <b>Python 3.8</b>. This is covered in the Installations section above.}} Now that we have basecalled, mapped, and sorted RNA reads, we now want to try and detect m6A modifications. In the past, we have used various other methods such as ''EpiNano'' and ''Tombo'', but a more recent and powerful method has emerged called '''m6ANet'''. The advantage of ''m6ANet'' is that it uses very few software requirements out of the box (''python, Nanopolish, Pytorch,'' and ''Samtools''), does not require a specific basecaller version (like ''EpiNano-SVM'' does), evaluates <code>DRACH</code> motifs instead of just <code>RRACH</code> motifs, and has much greater accuracy than most other contemporarily available models. Therefore, we will use ''m6ANet'' for our detection. ====Data Preparation==== Before we can begin, however, we need to extract some data from our raw <code>.fast5</code> files using ''Nanopolish'' and do some data formatting. First up, we need to index our raw reads using ''Nanopolish'', which can be done via the following: <code>Bash</code> <syntaxhighlight lang="Bash"> cd $NBASE nanopolish index -d $NRAW/${SAMPLEID} ${SAMPLEID}_master.fastq </syntaxhighlight> Note that this should be called from the same directory as your .fastq master file. You can include the ''Guppy'' sequencing summary output by adding in the line <code>--sequencing-summary=sequencing_summary.txt</code> after <code>$NRAW/${SAMPLEID}</code> to speed up the indexing process, but I recommend against it if you get an error. In my experience, ''Guppy'' tells ''Nanopolish'' where files are located through this process, which may or may not be accurate. Next, we need to generate the <code>eventalign</code> output from ''Nanopolish'', which aligns events from the raw data to the transcriptome for you. To do this, you need to indicate the location of your reference transcriptome file, your sorted and aligned <code>.bam</code> file, and your basecalled master <code>.fastq</code> file. The output of <code>eventalign</code> is '''''very large''''' with a good nanopore run (~23+ GB per chromosome of data), so I highly recommend considering mapping your reads to a transcriptome “subset” first and then running <code>eventalign</code> on that instead. <code>Bash</code> <syntaxhighlight lang="Bash"> nanopolish eventalign \ -r ${SAMPLEID}_master.fastq \ -b $NALI/${SAMPLEID}_aligned.bam \ -g $GREF/${GREFERENCEID} \ --scale-events \ --signal-index \ --threads 50 > ${SAMPLEID}_eventalign.txt </syntaxhighlight> Now we can do some ''m6ANet'' data prep and generate a few files: <code>Bash</code> <syntaxhighlight lang="Bash"> conda activate m6anet m6anet-dataprep \ --eventalign $NBASE/${SAMPLEID}_eventalign.txt \ --out_dir $M6AO/${SAMPLEID} \ --n_processes 4 </syntaxhighlight> The output of this will generate the following: * <code>data.index</code>: Indexing of <code>data.json</code> to allow faster access to the file. * <code>data.json</code>: json file containing the features to feed into ''m6ANet'' model for prediction. * <code>data.log</code>: Log file containing all the transcripts that have been successfully pre-processed. * <code>data.readcount</code>: File containing the number of reads for each <code>DRACH</code> positions in <code>eventalign.txt</code> document. * <code>eventalign.index</code>: Index file created during <code>dataprep</code> to allow faster access of the ''Nanopolish'' <code>eventalign.txt</code> data during this step. ====Run Inference==== {{tip|M6ANet may run into issues if your shell open file limit is too low. The max number of open files can be viewed with the <code>ulimit -n</code> command. To change this, simply add a number following the command (e.g., <code>ulimit -n 100000</code>).}} Finally, we can execute the final bit of code to generate the final output: <code>Bash</code> <syntaxhighlight lang="Bash"> m6anet-run_inference \ --input_dir $M6AO \ --out_dir $M6AO \ --infer_mod_rate \ --n_processes 4 </syntaxhighlight> Note that the <code>--input_dir</code> here is the output folder from the previous step that contains your <code>eventalign.index</code> file, <code>data.index</code> file, etc. The <code>--out_dir</code> can be specified as something else, but I wanted to put the output from this last step back into the final folder. The output is a <code>.gz</code> file, which might need to be unzipped via the command line if you have trouble using the GUI to do so: <code>Bash</code> <syntaxhighlight lang="Bash"> cd $M6AO gunzip data.result.csv.gz mv data.result.csv ${SAMPLEID}_m6anet_result.csv </syntaxhighlight> ====R Analysis and Plots==== Then in <code>R</code>, we can import these results using the <code>read.csv</code> function like so: <code>R</code> <syntaxhighlight lang="Bash"> # Change name based upon what you saved it as in the previous step, if you renamed it based upon sample name. m6ANet_result <- read.csv(file=paste0(M6ANET_PATH, "/data.result.csv")) </syntaxhighlight>
Summary:
Please note that all contributions to Neurobiology.Dev may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Neurobiology.Dev:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)