Jump to content
Toggle sidebar
Neurobiology.Dev
Search
Create account
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Talk
Contributions
Navigation
Main page
Records
Recent changes
Random page
Tools
What links here
Related changes
Special pages
Page information
Editing
Nanopore RNA Sequencing Protocol
(section)
Page
Discussion
English
Read
Edit
View history
More
Read
Edit
View history
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
====EpiNano-SVM (Single Sample)==== {{warning|'''EpiNano-SVM''' uses a pretrained <code>RRACH</code> model from ''Guppy 3.1.5''. Therefore, if you use this model, you will need to also have basecalled your data using ''Guppy 3.1.5''. If you are using a different basecaller algorithm, or ''Guppy'' of a different version, please do a pairwise comparison with EpiNano-Error as described above, insert m6A controls, or consider using ''m6ANet''.}} If we would like to extract the likelihood of m6A modifications using the pretrained models, we can do so by running the script <code>Epinano_Predict.py</code> and referencing our <code>RRACH</code> model. To do this, though, we must first divide our reads into kmers of 5 and isolate just the <code>RRACH</code> motifs. Isolating into 5 kmers: <code>Bash</code> <syntaxhighlight lang="Bash"> python $EPI/misc/Slide_Variants.py $NALI/${SAMPLEID}/${SAMPLEID}_${CHR}.plus_strand.per.site.csv 5 python $EPI/misc/Slide_Variants.py $NALI/${SAMPLEID}/${SAMPLEID}_${CHR}.minus_strand.per.site.csv 5 </syntaxhighlight> Filter for ''only'' <code>RRACH</code> <code>Bash</code> <syntaxhighlight lang="Bash"> cd $NALI/${SAMPLEID}/ awk -F"," 'NR==1; NR > 1{if($1=="AAACA" || $1=="AAACT" || $1=="GAACT" || $1=="GGACA" || $1=="AGACT" || $1=="GGACT" || \ $1=="AAACC" || $1=="AGACC" || $1=="GAACC" || $1=="GAACA" || $1=="AGACA" || $1=="GGACC") print}' \ ${SAMPLEID}_${CHR}.plus_strand.per.site.5mer.csv > ${SAMPLEID}_${CHR}_plus_RRACH.csv awk -F"," 'NR==1; NR > 1{if($1=="AAACA" || $1=="AAACT" || $1=="GAACT" || $1=="GGACA" || $1=="AGACT" || $1=="GGACT" || \ $1=="AAACC" || $1=="AGACC" || $1=="GAACC" || $1=="GAACA" || $1=="AGACA" || $1=="GGACC") print}' \ ${SAMPLEID}_${CHR}.minus_strand.per.site.5mer.csv > ${SAMPLEID}_${CHR}_minus_RRACH.csv </syntaxhighlight> Now that we have that, we can go ahead and run <code>Epinano_Predict.py</code>. Note that the code below is duplicated, because we will run it also on the data table that contains mRNA reads mapped to the reverse strand. <code>Bash</code> <syntaxhighlight lang="Bash"> python $EPI/Epinano_Predict.py \ --model $EPI/models/rrach.q3.mis3.del3.linear.dump \ --predict ${SAMPLEID}_${CHR}_plus_RRACH.csv \ --out_prefix ${SAMPLEID}_${CHR}_plus \ --columns 8,13,23 python $EPI/Epinano_Predict.py \ --model $EPI/models/rrach.q3.mis3.del3.linear.dump \ --predict ${SAMPLEID}_${CHR}_minus_RRACH.csv \ --out_prefix ${SAMPLEID}_${CHR}_minus \ --columns 8,13,23 </syntaxhighlight> So let’s break down what the above is doing. First, we are telling the console that we want to run the python script <code>Epinano_Predict.py</code> with the pre-trained model specified by <code>--model</code>, which will be used to detect m6A modifications. <code>--predict</code> specifies the table file that will be used for making predictions (i.e., your processed data), while <code>--out_prefix</code> specifies the prefix output file name. Finally, <code>--columns</code> specify the columns within the dataset that contain the features that will be used for the prediction of m6A sites. The final output will be in the same folder as you ran the command (in this case, in the <code>sorted_bam</code> subfolder), and will be saved as your specified <code>--out_prefix</code> + <code>.q3.mis3.del3.MODEL.rrach.q3.mis3.del3.linear.dump.csv</code> if you used the same model as I did. The next thing we want to do is import this data into ''R'' for data analysis and further visualization. <code>Bash</code> <syntaxhighlight lang="Bash"> # Load a few packages needed. suppressMessages (library(tidyverse)) suppressMessages (library(stringr)) # Specify the input file in the R working directory. You can check the working directory with getwd(). # Note: You could also specify the path to the output file location and point directly to it there too. plusstrandfile = paste(paste0(OUTPUT_PATH, SAMPLEID), CHR, "plus.q3.mis3.del3.MODEL.rrach.q3.mis3.del3.linear.dump.csv", sep = "_") minusstrandfile = paste(paste0(OUTPUT_PATH, SAMPLEID), CHR, "minus.q3.mis3.del3.MODEL.rrach.q3.mis3.del3.linear.dump.csv", sep = "_") # Read the csv input file and separate necessary columns. epi_res_plus <- read.csv (plusstrandfile, header=T)[,c("Window","Ref","Strand","ProbM","prediction")] epi_res_minus <- read.csv (minusstrandfile, header=T)[,c("Window","Ref","Strand","ProbM","prediction")] epi_res <- rbind(epi_res_plus, epi_res_minus) # Split the "Window" into two integers. split <- data.frame (str_split_fixed (epi_res$Window, '-', 2)) # Grab the split integer columns. split$first <- as.integer (as.character(split$X1)) + 2 split$middle <- as.integer (as.character(split$X2)) + 2 # Add the integer columns to the data frame. epi_res$Start <- split$first epi_res$End <- split$middle # Remove "Window" column after split. Left with "Ref" (chromosome), "Position" (start), "Strand", "ProbM", and "prediction". epi_res<- epi_res[,c("Ref", "Start", "End", "Strand", "ProbM", "prediction")] colnames(epi_res) <- c("Ref", "Start", "End", "Strand", "m6A_Prob", "m6A_Pred") # Assign the epinano output a name based on sample ID and chromosome. assign(paste(SAMPLEID, CHR, "epinano_result", sep = "_"), epi_res) </syntaxhighlight> This will import the data and isolate the columns for ''chromosome'', Start and End positions on the chromosome (with regard to GRCh38, our reference genome used), the ''Strand'', the ''probability of an m6A modification'', and the ''m6A EpiNano Prediction'': either listed as “mod” (for “modified”) or “unm” (for “unmodified”). =====Probing Genes===== Now we want to be able to visualize the data, preferably between two samples. This next code block can be executed directly in ''R'', all you have to do is change the first set of values to indicate which gene you want to probe and your samples you want to include. <code>R</code> <syntaxhighlight lang="R"> # Define your parameters. Note that your sample IDs are the same as above. GENE="ATF5" SAMPLE1_ID="FAR91556" # Usually WT SAMPLE2_ID="FAM95931" # Usually Mut # Code to run </syntaxhighlight> =====Plotting Results (ATF5)===== Here we can plot the results for the ATF5 gene. This will be modified in a future iteration. <code>R</code> <syntaxhighlight lang="R"> suppressMessages (library(reshape2)) suppressMessages (library(ggplot2)) suppressMessages (library(ggrepel)) suppressMessages (library(tidyverse)) suppressMessages (library(stringr)) # Isolate ATF5 WT122021_nano_ATF5 <- subset(FAR91556_chr19_epinano_result, Start >= 49928403) WT122021_nano_ATF5 <- subset(WT122021_nano_ATF5, End <= 49934438) IDHMUT_nano_ATF5 <- subset(FAM95931_chr19_epinano_result, Start >= 49928403) IDHMUT_nano_ATF5 <- subset(IDHMUT_nano_ATF5, End <= 49934438) #WT_mod <- subset(WT122021_nano_ATF5, m6A_Prob >= 0.05) #WT_unm <- subset(WT122021_nano_ATF5, m6A_Prob < 0.05) # Note: Some of these data were generated view the MotifFinder script. #49928702-49933935 # ATF5 Gene, Chr19:49929205-49933935 merge <- ggplot (WT122021_nano_ATF5) + ggtitle ("Merge") + # Underlying Strand geom_segment(aes(x=49928403, xend=49934438, y=0.125, yend=0.125), color="#383838", size=4) + # Exon 1 49,928,890 - 49,929,022 geom_segment(aes(x=49928890, xend=49929022, y=0.125, yend=0.125), color="#858585", size=6) + # Exon 2 49,929,223 - 49,929,421 geom_segment(aes(x=49929223, xend=49929421, y=0.125, yend=0.125), color="#858585", size=6) + # Exon 3 49,930,681 - 49,931,079 geom_segment(aes(x=49930681, xend=49931079, y=0.125, yend=0.125), color="#858585", size=6) + # Exon 4 49,932,271 - 49,934,086 geom_segment(aes(x=49932271, xend=49934086, y=0.125, yend=0.125), color="#858585", size=6) + # Location of RRACH sites in ATF5. This was ascertained using the Motif Finder function. #geom_linerange(data=ATF5_RRACH, aes(x=Hg38Position, ymin=0, ymax=1), color="#d9fbcb", size=0.4) + #geom_linerange(data=ATF5_MAZF, aes(x=Hg38Position, ymin=0, ymax=1), color="#476ad0", size=0.4) + # WT Dots geom_point(data=WT122021_nano_ATF5, aes(x=Start, y=m6A_Prob),shape = 21, colour = "black", fill = "#959595", size=2, stroke = 1.0) + # IDH Mut Dots geom_point(data=IDHMUT_nano_ATF5, aes(x=Start, y=m6A_Prob),shape = 21, colour = "black", fill = "#dd1818", size=2, stroke = 1.0) + # Instead of entering in the chromosome, paste it to the label from the data. xlab (paste0("Position on ", WT122021_nano_ATF5[1,1])) + ylab ("m6A Prob.") + theme_bw ()+ theme (axis.text.x = element_text(face="bold", color="black",size=11), axis.text.y = element_text(face="bold", color="black", size=11), plot.title = element_text(color="black", size=24, face="bold.italic", hjust = 0.5), axis.title.x = element_text(color="black", size=15, face="bold"), axis.title.y = element_text(color="black", size=15, face="bold"), panel.background = element_blank(), legend.position = "none", axis.line = element_line(colour = "black", size=0.5)) #geom_text_repel (data=mod, aes(x=Position, y=ProbM, label=Position), color="red", box.padding=3, point.padding=3, segment.size = 0.6, segment.color = "green") </syntaxhighlight> Final export on one graph: <code>R</code> <syntaxhighlight lang="R"> suppressMessages(library(gridExtra)) grid.arrange(wt, idh, merge, nrow = 2) </syntaxhighlight>
Summary:
Please note that all contributions to Neurobiology.Dev may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Neurobiology.Dev:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)