Jump to content
Toggle sidebar
Neurobiology.Dev
Search
Create account
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Talk
Contributions
Navigation
Main page
Records
Recent changes
Random page
Tools
What links here
Related changes
Special pages
Page information
Editing
Nanopore RNA Sequencing Protocol
(section)
Page
Discussion
English
Read
Edit
View history
More
Read
Edit
View history
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=Requirements= {{note|If you already have a system up and running with the necessary software and hardware, you may skip ahead to the Running Samples or Sequencing sections for processing data. Also, I have kept in the [https://github.com/novoalab/EpiNano EpiNano] listed requirements here for reference, but you should largely ignore these if you’re using [https://github.com/GoekeLab/m6anet m6ANet] instead, which I highly recommend.}} In order to acquire ''Nanopore'' data and properly analyze it, we need to ensure that our pipeline has all the necessary hardware and software packages installed on our computer before we can begin analyzing samples with the pipeline. Here is a list of things you will need to proceed further: ===Hardware=== * A computer with a '''strong CPU/GPU running a Unix-based system''' (either ''Linux'' Ubuntu 18+, ''macOS'' 10.11+ El Capitan, or ''WSL''). For this guide, we will assume you are running Linux with a CUDA enabled GPU, but you should be able to follow along on ''Mac'' as well (just know that some of the paths may be different to your files, you may need to download slightly different packages, and your system may process files slower). You can also use Windows via WSL, but I will not be covering how to set that up in this guide. If you wish to do that, [https://learn.microsoft.com/en-us/windows/wsl/install install WSL on Windows as described here] first and then follow the guide as if you were on a ''Linux'' machine. * 30 GB minimum of RAM (although ideally 64+, since 30 GB will be needed ''on top of the available RAM for the OS'' to basecall with ''Guppy''). You can go lower in RAM, but I generally don’t recommend it. * A [https://nanoporetech.com/products/minion MinION Mk1B Nanopore]. * A [https://store.nanoporetech.com/us/flow-cells.html flow cell]. * An [https://store.nanoporetech.com/us/flow-cell-priming-kit.html EXP-FLP002 Flow Cell Priming Kit]. * A [https://store.nanoporetech.com/us/direct-rna-sequencing-kit.html SQK-RNA002 Direct RNA Sequencing Kit]. Additional reagents may be required for the wet lab portion for RNA extraction and preparing the RNA library, but I will cover that in more detail in the Running Samples section below. ===Software=== * '''Biopython'''. Requires a specific version for ''EpiNano''. * [https://docs.conda.io/en/latest/ Conda]. * '''Dask'''. Requires a specific version for ''EpiNano''. * [https://community.nanoporetech.com/protocols/Guppy-protocol/ Guppy 6.4.2† for Linux]. * '''h5py'''. Requires a specific version for ''EpiNano''. * [https://java.com/en/download/ Java Runtime Version 8+]. * [https://github.com/lh3/minimap2 Minimap2]. * '''Nanopolish'''. Requires a specific version for ''EpiNano''. * '''Numpy'''. Requires a specific version for ''EpiNano''. * '''Pandas'''. Requires a specific version for ''EpiNano''. Installed automatically with ''m6ANet''. * [https://github.com/broadinstitute/picard/releases/tag/2.27.4 Picard]. * [https://pypi.org/project/pip/ Pip]. Python package installer. Installed automatically with ''m6ANet''. * [https://cran.r-project.org/src/base/R-4/ R 4.1.2 (Bird Hippie)]. * [https://www.rstudio.com/products/rstudio/download/#download RStudio Desktop 2021.09.2+382]. * [http://www.htslib.org/ Samtools]. * '''Scikit-learn'''. Requires specific version for ''EpiNano''. Most recent version used for ''m6ANet''. * [https://pytorch.org/ Torch 1.6.0]. ''†Guppy 6.4.2 is used here since it is the most recent algorithm version. EpiNano-SVM’s pretrained models were performed with Guppy 3.1.5, but that is [https://community.nanoporetech.com/posts/proposed-changes-to-the-fa no longer compatible with newly acquired .fast5 data]. Additionally, Guppy 3.1.5 does not have GPU support, which makes processing data take much longer. I will keep the EpiNano-SVM code in here, as well as how to install Guppy 3.1.5, but you should use Guppy 6.4.2 and m6ANet instead.'' ''Linux Base'' * '''Bash 3.2+'''. * '''Cmake'''. * '''Gfortran'''. * '''Libidn11''' 32-bit libraries (if using old ''Guppy'' versions on a 64-bit machine outside of a 32-bit container). * '''pyfaid'''. ''R Packages'' * [https://www.rdocumentation.org/packages/car/versions/3.1-1 car]. Required for ''EpiNano''. * [https://www.rdocumentation.org/packages/dplyr/versions/1.0.10 dplyr]. Required for ''EpiNano''. * [https://www.rdocumentation.org/packages/forcats/versions/0.5.2 forcats]. Required for ''EpiNano''. * [https://www.rdocumentation.org/packages/ggplot2/versions/3.4.0 ggplot2]. * [https://www.rdocumentation.org/packages/ggrepel/versions/0.9.2 ggrepel]. * [https://www.rdocumentation.org/packages/gridExtra/versions/2.3 gridExtra]. * [https://www.rdocumentation.org/packages/optparse/versions/1.7.3 optparse]. Required for ''EpiNano''. * [https://www.rdocumentation.org/packages/outliers/versions/0.15 outliers]. Required for ''EpiNano''. * [https://www.rdocumentation.org/packages/purrr/versions/0.3.5 purrr]. Required for ''EpiNano''. * [https://www.rdocumentation.org/packages/readr/versions/2.1.3 readr]. Required for ''EpiNano''. * [https://www.rdocumentation.org/packages/reshape2/versions/1.4.4 reshape2]. Required for ''EpiNano''. * [https://www.rdocumentation.org/packages/stringr/versions/1.5.0 stringr]. Required for ''EpiNano''. * [https://www.rdocumentation.org/packages/tibble/versions/3.1.8 tibble]. Required for ''EpiNano''. * [https://www.rdocumentation.org/packages/tidyr/versions/1.2.1 tidyr]. Required for ''EpiNano''. * [https://www.rdocumentation.org/packages/tidyverse/versions/1.3.2 tidyverse]. Note that we will install all of these below if you do not have them. For those specified to require a specific version for ''EpiNano'', we will cover those in the section dedicated to ''EpiNano'' below. If you are detecting m6A modifications using ''m6ANet'', you don’t need to worry about the ''EpiNano'' requirements. The ''m6ANet'' requirements should be installed automatically. ====Getting Reference Transcriptome==== {{warning|A reference '''transcriptome''' is required for using ''m6ANet'', but you may choose to use the genome if you are using ''EpiNano''.}} We will also need to download an appropriate reference transcriptome to map our reads to. I prefer to use the <code>GRCh38.p14</code> version, but you can use any you like. Just keep in mind that different reference assemblies may refer to transcripts or chromosomes using different IDs, which will change what you need to enter in the <code>${CHR}</code> (genome reference only) definition at the start of the protocol. Also, you can choose your own directory to place the reference transcriptome into, but just know that this path should also be updated in the <code>${REF}</code> shortcut in the Defining Files section above. {{note|If the <code>wget</code> command is not available already on your system, you can install it with <code>sudo apt-get install wget</code>.}} <code>Bash</code> <syntaxhighlight lang="Bash"> cd "/home/$USER/Research/Ref" wget "https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9606/110/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_rna.fna.gz" gzip -d GCF_000001405.40_GRCh38.p14_rna.fna.gz mv GCF_000001405.40_GRCh38.p14_rna.fna GRCh38.p14.rna.fna </syntaxhighlight> Note that here, I also renamed the transcriptome after extracting it for simplicity and ease of reference. A full list of all available genomes and transcriptomes can be accessed using the [https://ftp.ncbi.nlm.nih.gov/genomes/ NIH FTP], including older or more recent versions if you wish. You may also consider getting a reference genome / transcriptome from [https://useast.ensembl.org/index.html Ensembl] or [https://www.gencodegenes.org Gencode]. ===Installations=== ====Conda==== Most of the rest of the software (like '''samtools''') is more easily installed using '''Conda''' rather than directly from GitHub. Therefore, before we do anything else, we’re going to make sure Conda is installed on our workstation computer. If you suspect Conda is already installed on the system, run the conda <code>-V</code> command in the shell to see if it returns anything. You could, in theory, install all of the required software without ''Conda'', but I '''highly discourage''' it due to how much we will need to install and how simplified this process is using ''Conda''. To install, first go to the [https://docs.conda.io/en/latest/miniconda.html Miniconda documentation page] and download the shell script appropriate for your system (note the ''Python'' version requirements). Then, drop the script in your user root folder and execute the following commands in ''Terminal'' (in sequential order): {{note|Make sure you are in your root folder and you edit the name of the shell script below to match yours before executing the bash command below! Additionally, if you do not know how to access ''Terminal'', on Linux it can be pulled up with the keys <code>ctrl</code> + <code>alt</code> + <code>T</code> pressed simultaneously.}} <code>Bash</code> <syntaxhighlight lang="Bash"> bash Miniconda3-py39_4.12.0-Linux-x86_64.sh conda config --add channels defaults conda config --add channels bioconda conda config --add channels conda-forge </syntaxhighlight> In case you an unfamiliar with it, ''Terminal'' is an application on both ''Linux'' and ''Mac'' machines that gives you access to the shell. On Linux, it can be pulled up if you press the keys <code>ctrl</code> + <code>alt</code> + <code>T</code> simultaneously. You can run commands in here, which is usually for editing and moving files, changing permissions, installing software, or executing scripts. Now you can install software via ''Conda'' using the <code>conda install</code> command in ''Terminal''. ====EpiNano==== ''EpiNano'' is an algorithm written in python that is used to identify RNA modifications present in direct RNA sequencing reads, using the <code>.bam</code> files that are produced from ''Guppy'' and ''Minimap2''. It extracts a set of ‘features’ from the direct RNA sequencing reads, which will be used to predict whether the ‘error’ is caused by the presence of an RNA modification. Note that ''EpiNano'' can be used to compare two samples (one containing a methylase knockdown or knockout) in a pairwise fashion, which they refer to as ''EpiNano-Error'', or a single sample can be compared to a pre-trained model, referred to as ''EpiNano-SVM''. Unfortunately, the models for ''EpiNano-SVM'' were trained on an old version of ''Guppy'', which is no longer supported, and so we will use ''m6ANet'' for our m6A detection. I left in all the code required to get it up and running in case there is a reason for you to use it, and so I left this portion of the protocol in. '''If you plan on using only m6ANet , please skip this section.''' =====Pre-requisites===== ''EpiNano'', unfortunately, relies upon a series of tools and software packages, both inside and outside of ''R''. Therefore, we will need to check to ensure all of these are installed prior to getting ''EpiNano'', and install any that we do not have using either ''Conda'' or <code>install.packages()</code> (depending on the package). Many of the packages required are older versions of those commonly used, but some of the newer versions do work with ''EpiNano''. Therefore, I am going to make a ''Conda'' environment that is designed to be used specifically for ''EpiNano'', and then switch to that environment prior to the install. That way, we can install the older packages within an ''EpiNano'' “specific” environment that can be loaded and used separately from our base ''Conda'' environment. <code>Bash</code> <syntaxhighlight lang="Bash"> conda create --name epinano conda activate --stack epinano </syntaxhighlight> The text in parentheses next to your name in ''Terminal'' should now swap to <code>(epinano)</code>. To return to your base, simply type <code>conda activate --stack base</code> and execute. Note that at any time, you can see a list of installed ''Conda'' packages along with their versions via the following command: <code>Bash</code> <syntaxhighlight lang="Bash"> conda list </syntaxhighlight> If you just made a new stack named <code>epinano</code>, it should be empty. Let’s go ahead and now start installing packages. I like to start with <code>scikit-learn</code>, since it will install a lot of the correct dependencies for the other packages. To install the right version, add a double equals sign after the name to specify the version number, such as <code>conda install scikit-learn==0.20.2</code>. It may take a while to find the package this way. {{warning|''EpiNano'' 1.2 does '''NOT''' work with the latest versions of python, scikit-learn, etc. Therefore, you will want to install the version of each package as you see here, which can be done with <code>conda install package</code>. As packages install, pay attention to which packages might be upgraded or downgraded as you go along. If you do not, you might run into errors when trying to run ''EpiNano'' scripts if a package was changed without your knowledge!}} {| class="wikitable" |+EpiNano Conda Packages |- ! scope="col"| Package ! scope="col"| Version ! scope="col"| Notes |- |biopython |1.76 | |- |dask |2.5.2 | |- |h5py |2.8.0 | |- |java openjdk |1.8.0 |This should already be installed via a previous step. |- |minimap2 |2.14-r886 |This should already be installed via a previous step. |- |nanopolish |0.12.4 | |- |numpy |1.15.4 | |- |pandas |0.23.4 | |- |pysam |0.15.3+ | |- |python |3.6.7 |<b>Note:</b> Latest version of python doesn't work with scikit-learn 0.20.2! |- |sam2tsv | |Included with the EpiNano repo at <code>EpiNano/misc/</code>. |- |samtools |0.1.19 |This should already be installed via a previous step. |- |scikit-learn |0.20.2 |<b>Note:</b> <code>Epinano_Predict.py</code> does not work with the latest version. You ''must'' install this one for EpiNano. |} Now let’s check the ''R'' packages. Boot up ''R'' in the terminal by typing <code>R</code>, and execute the following in the console to generate a list of the currently installed packages. If you do not have R, you can install it [https://anaconda.org/r/r using Conda], or following the instructions provided [https://www.r-project.org/ here]. <code>R</code> <syntaxhighlight lang="R"> as.data.frame(installed.packages()[ , c(1, 3:4)]) </syntaxhighlight> Cross reference the list that prints with the following (again, sorted in alphabetical order here for you), and install any ''R'' packages that you do not have using the <code>install.packages()</code> command. I recommend starting with the car package, since it will install a lot of the others in the list automatically for you. I also didn’t have an issue installing the latest packages of each listed here, so you likely don’t need to install a specific version for ''EpiNano'' to run. {{warning|Some of these packages may have issues installing on ''Linux'' if certain commands are unavailable to ''R''. For example, you may need to also install <code>curl</code> or <code>gfortran</code> in order to get the packages <code>tidyverse</code> and <code>car</code> to install. Pay attention to the output in the console and read the directions for the next steps if any fail (which will be apparent to you if you see the line <code>Installation of package had non-zero exit status</code> printed anywhere). After all the installations are done, you should recheck the list of installed packages again (see note following the table below).}} {| class="wikitable" |+EpiNano R Packages |- ! scope="col"| R Package ! scope="col"| Version ! scope="col"| Notes |- |car |3.0-3 | |- |dplyr |1.0.1 | |- |forcats |0.4.0 | |- |ggplot2 |3.1.1 | |- |ggrepel |0.8.1 | |- |optparse |1.6.6 | |- |outliers |0.14 | |- |purrr |0.3.2 | |- |readr |1.3.1 | |- |reshape2 |1.4.3 | |- |stringr |1.4.0 | |- |tibble |3.0.3 | |- |tidyr |0.8.3 | |- |tidyverse |1.2.1 | |} You should check that these actually installed after finishing by executing the <code>as.data.frame(installed.packages()[ , c(1, 3:4)])</code> command once again. =====Installing EpiNano===== Now that all the pre-requisites are checked, we can go ahead and install ''EpiNano'' itself. For simplicity, I will keep mine in the <code>~/Research</code> directory, so that all of the stuff I use for the nanopore will be kept in one place. You can place it wherever you like, since we will define (or have defined) the file path to it in the console. <code>Bash</code> <syntaxhighlight lang="Bash"> cd ~/Research git clone "https://github.com/novoalab/EpiNano.git" </syntaxhighlight> ''EpiNano'' should now be installed. Note that there is a '''ReadMe''' available for ''EpiNano'' available [https://github.com/novoalab/EpiNano here] which goes over the requirements for it and how to execute it from the shell (Note: this readme should also be found as a file in your ''EpiNano'' folder after you copy the repo). It also describes some of the arguments that you will include when you execute it later. =====Preparing Reference for EpiNano===== One final thing that we want to do: we need to generate the appropriate files required to be able to run ''EpiNano'' later. This can be done by navigating to the folder containing your genome assembly (mine is at <code>~/Research/Ref</code>) and running a few lines of code (if you have files ending in <code>.fa.fai</code> and <code>.fa.dict</code> already in that folder for your assembly, you can skip this step). If you do not have a genome reference assembly yet, you may grab one from the [https://www.ncbi.nlm.nih.gov/projects/genome/guide/human/index.shtml NIH available here]. {{warning|The command <code>faidx</code> and <code>picard</code> may not run if you don’t have the appropriate software installed. You’ll know if it is installed or not when you try to execute it. If you do not have either of these installed, you can install <code>faidx</code> with the command <code>sudo apt install python3-pyfaidx</code> (''Linux''). <code>Picard</code> will need to be downloaded from the [https://github.com/broadinstitute/picard/releases/tag/2.27.4 Picard github here] as well (select the <code>picard.jar</code> file and place it into your root folder).}} <code>Bash</code> <syntaxhighlight lang="Bash"> sudo apt install python3-pyfaidx </syntaxhighlight> Next, we need to make the appropriate sequence libraries. This can be done as follows. Keep in mind that your paths may be different than the ones listed here! <code>Bash</code> <syntaxhighlight lang="Bash"> cd ~/Research/Ref faidx GRCh38.p13.genome.fa cd ~/Research/ java -jar picard.jar CreateSequenceDictionary \ -R ~/MOP2/anno/GRCh38.p13.genome.fa \ -O ~/MOP2/anno/GRCh38.p13.genome.fa.dict </syntaxhighlight> {{tip|The <code>picard.jar</code> file will need to be in the current working directory for the above code to work. If it isn’t found, make sure you place it in your ''Terminal'' working directory before attempting the code again, or switch your ''Terminal'' directory to the location of your <code>picard.jar</code> file with the <code>cd</code> command. Alternatively, you can make a shell shortcut to <code>picard.jar</code> by specifying the path.}} ====Guppy==== We’re going to install ''Guppy'' directly from ''Oxford Nanopore Technologies'', hereinthereafter simply ''ONT''. Since ''EpiNano'' models were basecalled with ''Guppy 3.1.5'', you will want to install and use the same version for ''EpiNano-SVM'', but note that this version has some requirements and limitations (see the below warning note). Additionally, ''Guppy 3.1.5'' is incredibly slow since it does not utilize a CUDA GPU, and so your analyses will be slower with ''Guppy 3.1.5''. For this protocol, I will install the latest ''Guppy (6.4.2)'' and use ''m6ANet'' for m6A detection. {{warning|'''CRITICALLY IMPORTANT:''' ''Guppy 3.1.5'', the version for which the default models for ''EpiNano-SVM'' were based off of, utilizes old 32-bit libraries which are unavailable and no longer supported on 64-bit Linux. Therefore, if you plan to use it, you need to also copy these libraries to your <code>/usr/lib</code> folder for it to work. These library files are available [https://ftp5.gwdg.de/pub/linux/archlinux/community/os/x86_64/libidn11-1.33-2-x86_64.pkg.tar.zst here]. Additionally, ''Guppy 3.1.5'' is longer compatible with newly acquired <code>.fast5</code> data using MinKNOW 21.10+. I will use a newer version of ''Guppy'' and ''m6ANet'' as a result, but I kept the code on how to use ''EpiNano-SVM'' (and ''EpiNano-Error'' for that matter) below in case you’re running older data through ''Guppy 3.1.5'', downgraded your ''MinKNOW'' software, or found newer RNA models.}} <code>Bash</code> <syntaxhighlight lang="Bash"> GUPPYVER="6.4.2" wget https://mirror.oxfordnanoportal.com/software/analysis/ont-guppy_${GUPPYVER}_linux64.tar.gz tar -xf ont-guppy_${GUPPYVER}_linux64.tar.gz </syntaxhighlight> ====Minimap2==== Next up is ''Minimap2'', which is a pairwise sequence alignment program that can align your processed reads to the transcriptome (or genome). Note that I will install it here using ''Conda'': <code>Bash</code> <syntaxhighlight lang="Bash"> conda install minimap2 </syntaxhighlight> You can also install it from GitHub like so, but know that you will need to do some additional work to allow you to call ''Minimap2'' from any directory without ''Conda''. <code>Bash</code> <syntaxhighlight lang="Bash"> git clone https://github.com/lh3/minimap2 cd minimap2 && make </syntaxhighlight> ====Samtools==== '''Samtools''' is a suite of programs that can be used in Linux and Mac to further process high-throughput DNA / RNA sequencing data. It’s name is a reference to the <code>.SAM</code> file, which stands for “'''S'''equence '''A'''lignment '''M'''ap format”, a precursor to the binary version of this file (the <code>.BAM</code> file). We will use ''Samtools'' to modify our aligned sequencing data prior to m6A modification detection, so you should have it pre-installed before trying to analyze any of your data. ''Samtools'' can be installed via ''Conda'': <code>Bash</code> <syntaxhighlight lang="Bash"> conda install -c bioconda samtools </syntaxhighlight> ====m6ANet==== '''m6ANet''' is a new m6A detection method that uses the variance in electrical signal across multiple transcript reads to determine if a base modification is present at a given locus. It requires much fewer pre-requisite software packages than other methods (such as ''EpiNano''), does not require a specific basecaller version, and is much more accurate than other contemporarily available detection methods (such as ''Tombo''). To install, we want to first make a ''Conda'' environment that is specific to ''m6ANet''. When you install it, it will automatically install a few ''Conda'' packages (like ''pytorch'') and you don’t want any conflicts with your existing packages. <code>Bash</code> <syntaxhighlight lang="Bash"> conda create --name m6anet conda activate --stack m6anet </syntaxhighlight> Next, move into the default directory where you want to store the ''m6ANet'' folder. Then, pull ''m6ANet'' from GitHub and install via the following (alternatively, you can do this using <code>pip</code> by executing <code>pip install m6anet</code>): <code>Bash</code> <syntaxhighlight lang="Bash"> git clone "https://github.com/GoekeLab/m6anet.git" cd m6anet python setup.py install </syntaxhighlight> You should now have ''m6ANet'' installed and ready to go. As with all of the other installations, '''''make sure you pay close attention''''' to the console to see if it throws an error during the installation. You may need to manually install or update other packages, like <code>openssl</code>, if it returns an error.
Summary:
Please note that all contributions to Neurobiology.Dev may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Neurobiology.Dev:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)