Jump to content
Toggle sidebar
Neurobiology.Dev
Search
Create account
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Talk
Contributions
Navigation
Main page
Records
Recent changes
Random page
Tools
What links here
Related changes
Special pages
Page information
Editing
Nanopore RNA Sequencing Protocol
(section)
Page
Discussion
English
Read
Edit
View history
More
Read
Edit
View history
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
====Data Preparation==== Before we can begin, however, we need to extract some data from our raw <code>.fast5</code> files using ''Nanopolish'' and do some data formatting. First up, we need to index our raw reads using ''Nanopolish'', which can be done via the following: <code>Bash</code> <syntaxhighlight lang="Bash"> cd $NBASE nanopolish index -d $NRAW/${SAMPLEID} ${SAMPLEID}_master.fastq </syntaxhighlight> Note that this should be called from the same directory as your .fastq master file. You can include the ''Guppy'' sequencing summary output by adding in the line <code>--sequencing-summary=sequencing_summary.txt</code> after <code>$NRAW/${SAMPLEID}</code> to speed up the indexing process, but I recommend against it if you get an error. In my experience, ''Guppy'' tells ''Nanopolish'' where files are located through this process, which may or may not be accurate. Next, we need to generate the <code>eventalign</code> output from ''Nanopolish'', which aligns events from the raw data to the transcriptome for you. To do this, you need to indicate the location of your reference transcriptome file, your sorted and aligned <code>.bam</code> file, and your basecalled master <code>.fastq</code> file. The output of <code>eventalign</code> is '''''very large''''' with a good nanopore run (~23+ GB per chromosome of data), so I highly recommend considering mapping your reads to a transcriptome βsubsetβ first and then running <code>eventalign</code> on that instead. <code>Bash</code> <syntaxhighlight lang="Bash"> nanopolish eventalign \ -r ${SAMPLEID}_master.fastq \ -b $NALI/${SAMPLEID}_aligned.bam \ -g $GREF/${GREFERENCEID} \ --scale-events \ --signal-index \ --threads 50 > ${SAMPLEID}_eventalign.txt </syntaxhighlight> Now we can do some ''m6ANet'' data prep and generate a few files: <code>Bash</code> <syntaxhighlight lang="Bash"> conda activate m6anet m6anet-dataprep \ --eventalign $NBASE/${SAMPLEID}_eventalign.txt \ --out_dir $M6AO/${SAMPLEID} \ --n_processes 4 </syntaxhighlight> The output of this will generate the following: * <code>data.index</code>: Indexing of <code>data.json</code> to allow faster access to the file. * <code>data.json</code>: json file containing the features to feed into ''m6ANet'' model for prediction. * <code>data.log</code>: Log file containing all the transcripts that have been successfully pre-processed. * <code>data.readcount</code>: File containing the number of reads for each <code>DRACH</code> positions in <code>eventalign.txt</code> document. * <code>eventalign.index</code>: Index file created during <code>dataprep</code> to allow faster access of the ''Nanopolish'' <code>eventalign.txt</code> data during this step.
Summary:
Please note that all contributions to Neurobiology.Dev may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Neurobiology.Dev:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)