83 views
# A practical introduction to bioinformatics and RNA-seq using Galaxy - 2024 (**History**) ## General information **Date**: 10-09-2024 - 13-09-2024 **Time**: 09:00 - 13:00 (CEST) **Location**: Online ([Zoom]( https://epfl.zoom.us/j/64829799035?pwd=DYKW7EWKUzurBebej4hNaJEIW9F2xn.1)) **Code of Conduct**: [Galaxy Project Code of Conduct](https://galaxyproject.org/community/coc/) :::info ### Code of Conduct Participants are expected to follow those guidelines: * Use welcoming and inclusive language * Be respectful of different viewpoints and experiences * Gracefully accept constructive criticism and give constructive feedback * Focus on what is best for the community * Show courtesy and respect towards other community members ::: ### Schedule for the workshop | Day | Tutorial | Instructor | | -------- | -------- | -------- | | 1 - Tuesday 10th | [From peaks to genes](https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html) | Teresa | | 2 - Wednesday 11th | [QC](https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/quality-control/tutorial.html) | Diana | | 2 - Wednesday 11th | [Mapping](https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/mapping/tutorial.html) | Engy | | 3 - Thursday 14th | [Reference-based RNA-seq](https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html) - Part I | Teresa | | 4 - Friday | [Reference-based RNA-seq](https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html) - Part II | Pavan | [TOC] ### Before Introduction: who is who? Introduction to BioNT --> Silvia Di Giorgio ### Recommendations :::info To follow the workshop more efficiently, we recommend having a two-screen setup: for example, one to display the instructor’s shared screen and the collaborative pad, and another one for your own screen ::: ### How will the workshop be run? Why a specific setup for this workshop? - We want to welcome participants for different areas and (e.g. SMEs, job seeker, academia) - Importance of privacy - We would like to create an interactive atmosphere How will we do? - Zoom with panel view - Panelist: instructors & helpers - Only trainers will be visible - No personal data will be displayed - This [HedgeDoc](https://biont.biobyte.de/AZptJADHQBusn0t6tpBL5w?both#) document in Markdown for interactions - Markdown: lightweight markup language - [Documentation](https://biont.biobyte.de/features#Edit) ### How to participate? **Ask your questions, raise issues, interact with us in this Document** In addition, to help you navigate this document, we followed the structure of the tutorial and included: - Each Hands-on section (✏️ - where you will have to work) of the tutorial, including a part to ask questions or post issues you might face :::warning ✏️ Hands-on: Topic ##### ❓ Are you finished with this section? Add a '+' below - Yes: ++ - Waiting for the job to be done: - Need help: ##### Your questions Q: I need help with ... A: here is the answer I still do not ... ##### Do you need help? Please describe your issue - Please zoom in I do not see you screen+ + ::: A helper will help you - Question sections (❓ - where we ask you something ) for answering :::success ❓ We have some question: - ::: :::warning ##### ✏️ Hands-on: Set you up - Short link: rb.gy/1c9pc ##### ❓ Are you on this [HedgeDoc](rb.gy/1c9pc)? (Add a + when done) - Yes: +++++++++++++++++++++++++++++++ - + - - No please contact: Silvia Di Giorgio <silviadg87@gmail.com> ##### ❓ Do you need help? Please describe your issue+++ - - It is irrelevant but Idid not sign in here. Is it OK?You don't have to sign in. I will move this into the questions if that's fine with you . OK then - - ::: :::success ##### ❓ Have you ever used Markdown? (Add a +) - Yes: ++++++++++no - No +no++++++=++ - + - ++++*++++++++ - + - + - What is Markdown?++++++++++++++ ::: :::success ##### ❓ Have you ever used Galaxy? (Add a +)yes - Yes: +++++++++++++++++ - No+++++++++++++++++ - - - What is Galaxy?++ ::: :::success ##### ❓ We plan to adjust the break times according to the workshop's progression. Does this arrangement work well for everyone? - Yes +++++++++++++++++++++++++++++++++ - No ::: :::success ##### ❓ Those that answered no, could you clarify your needs? Do you need a specific break time? - - - ::: ### Icebreaker :::success ##### ❓ Tell us about a recent *First* in your life. visiting Portugal for the first time This could be big or small, perhaps you bought a house for the first time or you tried a new restaurant in your city. Recent can be any time in the past year. - I tried roller skates the first time during the weekend - i tried hourse riding - I participated in marathon event, running in 10 km race - Today, I run 5km - I tried hotpot - Visiting scandanavia in Christmas - I did Race for Life - I made gene knock-out - I wrote my first R script - I Travelled to greece to do some diving in spring! - - - I played guitar for a wedding - I ran 35km during the weekned - I sang with a stranger on a stage last weekend - - I took care of a stray cat for the first time - I competed in a the Tall Ships Races. First time sailing in competition - I travelled to London - I prepared for a job interview - Writing R script - I built muscles - gym? yes - I travelled to Villingen for the fist time. - I travelled to Croatia - me too! Cool DubrWhere did you visit?? To Opatija, you? Dubrovnik! loved it Me too, I'll visit it again definetly-same! great beaches The most beautiful sea I saw so far - I was in MasterChef! Great!! - I travelled to Croatia too - I travelled to Belgium for the first time+ - I I visited Frankfurt for the first time - Started meditating last week - Made Alfredo sauce for the first time - ::: ## Day 1 - Monday ### Table of Contents 1. [Galaxy Introduction](#Galaxy-Introduction) 2. [Tutorial: From peaks to genes](#Tutorial-From-peaks-to-genes-httpstraininggalaxyprojectorgtraining-materialtopicsintroductiontutorialsgalaxyintropeaks2genestutorialhtml) - [Pretreatments](#Pretreatments) - [Part 1 Naive approach](#Part-1-Naive-approach) - [Part 2 More sophisticated approach](#Part-2-More-sophisticated-approach) - [Share your work](#Share-your-work) 3. [Summary](#Summary) 4. [Feedback](#Feedback) :::success ##### ❓ Before we start: Is the screen clearly visible (add '+') - Yes ++++++++++++++++++yes big enough, yes++ - No ::: :::success ##### ❓ Do you need help? Please describe your issue - - - - ::: We will use Galaxy **Europe** and its training infrastructure (called TIaaS) :::warning ##### ✏️ Hands-on: Register and log in on Galaxy Europe - Register on [Galaxy Europe](https://usegalaxy.eu/) - Connect to Galaxy Europe using the [TIaaS link](https://usegalaxy.eu/join-training/biont-rnaseq) ##### ❓ Did you register on [Galaxy Europe](https://usegalaxy.e+u/)? (Add a + when done) - Yes ++++++ - yes+++++++++++++++++++++ - - No ##### ❓ Did you connect using the [TIaaS link](https://usegalaxy.eu/join-training/biont-rnaseq)? (Add a + when done) - Yes ++++++++++++++++++++ - yes++ - - No ::: Feel free to post **any questions** which pop up during the workshop either in the dedicated section of the workshop or in the general section at the end of the document. Each question will be **answered below the question**. Please feel free to **ask back**, if something is still unclear below the answer to the question. ### Galaxy Introduction - [Slides](https://training.galaxyproject.org/training-material/topics/introduction/tutorials/introduction/slides.html#1) :::success ##### ❓ Any questions regarding this introduction? Q: Can I register to several Galaxy servers? A: Yes, you can register to different public Galaxy servers Q: Is there an overview of all tools available? A: For the European Galaxy server, you can find an overview here: https://usegalaxy-eu.github.io/tools.html Q:If my RNA-seq data type is different than the history from that I made the workflow, let's say its pairs, then I need to change the workflow to? A: Most of our workflows have different parameters/inputs to select. For most of the data our workflow should work fine. Q: A: ::: ### Tutorial: [From peaks to genes](https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html) **Disclaimer**: We will not go through *Part 2: More sophisticated approach* section of the tutorial today #### Pretreatm+ents :::warning ##### ✏️ Hands-on: Open Galaxy ##### ❓ Are you finished with this section? Add a '+' below - Yes: +++++++++++++++++ye++s+++++++++, Yes+++yes - Waiting for the job to be done: - Need help: ##### Your questions Q1: A: Q2: A: Q3: A: Q4: A: ##### Do you need help? Please describe your issue - - - - ::: :::warning ##### ✏️ Hands-on: Open the tutorial 1. Go to [Galaxy](https://usegalaxy.eu/) (done in previous Hands-on) 2. Click on the hat (🎓) in the top bar 3. Navigate to the topic "Introduction to Galaxy Analyses" 4. Click on the tutorial "From peaks to genes" ##### ❓ Have you found the tutorial? (Add a + when done) - Yes: +++++++++++++++, yes++ - No: ##### Your questions Q1: A: ##### Do you need help? Please describe your issue - - - - ::: :::warning ##### ✏️ Hands-on: Create history ##### ❓ Are you finished with this section? Add a '+' below - Yes: +++yes +++++++++++++++++++++++++++++++++ - Need help: + (write your qus below, we are here to help you) ##### Your questions Q1:Please repeat the whole steps again for data uploading. A:please follow this line [Download the list of peak regions (the file GSE37268_mof3.out.hpeak.txt.gz) from GEO to your computer] (https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html#hands-on-data-upload) , if you just click on this "GSE37268_mof3.out.hpeak.txt.gz" on webpage , it will show you the save this tfile and you can upload this file locally as well. Let me know if you need help more Q2:The training section is freezing when I click on it so I can't get the link to download? A: please find the link here: https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE37268&format=file&file=GSE37268%5Fmof3%2Eout%2Ehpeak%2Etxt%2Egz Q3:I get content not available A: can you elobrate a bit more.. what you want to know ? Is it solved for you ? yes thank you! i re-uploaded thanks to the repetition Q4: Why do we need to choose "interval"? A: It is a Galaxy specific format we need. We know from the publication, the data is in txt format. To manipulate data we need interval instead of txt. https://usegalaxy.org/static/formatHelp.html#interval thanks! ##### Do you need help? Please describe your issue I am stil facing the issue to data upload. -> can you try as Teresa explains? yes i can, but im not getting the link to copy. -> Can you use this link: https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE37268&format=file&file=GSE37268%5Fmof3%2Eout%2Ehpeak%2Etxt%2Egz - what to choose in auto detect? -> Auto detect is used when you are not sure about the format. If you know the data format, I would recommend to specify. - Still unable to upload file-could you go a bit slower please? -you can click on this link :https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE37268&format=file&file=GSE37268%5Fmof3%2Eout%2Ehpeak%2Etxt%2Egz and download the file. after that, click on upload button ( on thetop left side) Here are the steps : Press Choose local files and search for your file on your computer Select interval as Type Press Start Press Close Wait for the upload to finish. Galaxy will automatically unpack the file. - thank you i've tried this, waiting for job to run now-> Cool. just a little behind on the current w: - No worries :)it's green now, thanks for your help! - what are we supposed to do after changing to mm9? - Ans - Click on the galaxy-pencil (pencil) icon (Edit attributes) in your dataset in the history - A form to edit dataset attributes is displayed in the central panel - Search for mm9 in Database/Build attribute and select Mouse July 2007 (NCBI37/mm9) (the paper tells us the peaks are from mm9) ::: :::warning ##### ✏️ Hands-on: Data upload ##### ❓ Are you finished with this section? Add a '+' below - Can you repeat this data upload step - Yes ++++++++++++++++ - Still waiting ++++++++++ - Need help: ##### Your questions Q1: What is the interval data format? A: It is a Galaxy specific format we need. We know from the publication, the data is in txt format. To manipulate data we need interval instead of txt. https://usegalaxy.org/static/formatHelp.html#interval --> That was very helpful, thank you! Welcome :) There is always additional information in the GTN tutorial itself, too. Q2: What is the difference between the first and the second upload? A: There are different ways to get data into Galaxy. Teresa showed 2 options Q3: Please where do you paste #peaks, the last step and what reason?> THANKS A: go to the dataset in your history, click on "add tags" and just type #peaks (enter). By tagging data sets, you can find them easier if your history get long and the tag will be propagated to all following steps whenever this dataset is taken as a input. More information is in the GTN tutorial: https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html#tip-adding-a-tag Q4: Is it possible to analyze fq.gz files (raw reads)? A: Yes, you can. there are many other tutorials regarding analysing the .fq.gz file directly ( without uncompressing them) Q5: Will analysing the .fq.gz file also be covered in this course? A: Q6: A: ##### Do you need help? Please describe your issue - The columns appear in a different order -> for your GEO input? - - ::: :::warning ##### ✏️ Hands-on: Inspect and edit attributes of a file ##### ❓ Are you finished with this section? Add a '+' below - Yes: ++++++++++++++++++++++++++yes - Need help: ##### Your questions Q1:How can I delete an uploaded file? I accidentally uploaded it twice and want to remove the first one while renaming the latest file to '1: xxxx' A:click on delete button of the uploaded one which you want ot delete. Don't worrying about the numbering of the files. Galaxy history take Q2:I am getting human, insted of mouse why? A: select the mouse from drop down option. (Human is the default) Q3:what should i choose in auto detect and unspecified option while pasting the link in upload tool option? A: click on upload button and select paste/fetch data and paste your link on the box and click start. Q4:thats all? no need to choose any option from auto detect and unspecified? how we can know that which option is best according to our data? ok catch u soon. i did not see any option withname datatype. --> Type (set all): A: depends on datasets. In the current tutorial you can mention the Type (set all) on type box. it gives you more clartiy about the data format. Reagrding the tool requirement, you should format your data in general. ::: :::warning ##### ✏️ Hands-on: Data upload from UCSC ##### ❓ Are you finished with this section? Add a '+' below - Yes: +++++++++++++++++++++++++++ - Waiting for the job to be done: + - Need help: (please specify below) ##### Your questions Q1: Help, I'm lost. The job don't get back i,to galaxy history A: on which step you are in. Where excatly you are in. On UCSC brower ? I oened the ucsc bowser in a new tab --> Look for teresa's instruction now OK Thanks I've missed the send to galaxy button ;-), --> Perfect :) Q2: My columns are not in the same order as teresa ones A: Make sure you select the same datasets and format options Done , Thanks !! Q3:I can't go back to the tutorial. when i press hat, it gives me only a picture of UCSC main A:https://usegalaxy.eu/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html#part-1-naive-approach Or you can refresh you webpage if something goes wrong, your history will not destroyed anyways Thanks! I had the same issue in firefox; it was OK after switching to MS edge browser. Q4:hi sorry i had to step away one second-what step are we on so I can catch up? thanks! A: we are here: Hands-on: Data upload from UCSC (https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html#hands-on-data-upload-from-ucsc) and imported just the UCSC genes from mouse to Galaxy--all good then thanks! ::: :::success ##### ❓ Are you back? - Yes +++++++++++++++++++++++++++ - No ##### ❓ Any questions regarding what we did until now? Q1: A: Q2: A: ##### ❓ Is the speed fine - Yes: +++++++++++++++++++++++ - Too slow: +++ - Too fast: + ::: #### Part 1 Naive approach :::warning ##### ✏️ Hands-on: View file content ##### ❓ Are you finished with this section? Add a '+' below - Yes: + - Waiting for the job to be done: + - Need help: ##### Your questions Q:How did you open them side by side? A:On top, click on window manager (make sure it's enabled), next to scholar icon, and then click on the "eye" icon on you data sets Q:Can you repeat this? A: What step? (https://usegalaxy.eu/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html#hands-on-view-end-of-file) Search for Select last lines from a dataset (tail) ( Galaxy version 9.3+galaxy1) tool and run with the following settings: “Text file”: our peak file GSE37268_mof3.out.hpeak.txt.gz “Operation”: Keep last lines “Number of lines”: Choose a value, e.g. 100 Q:When we clik on tool link A: Just go to the embedded GTN tutorial (from usegalaxy.eu) and click on the blue linked tool ( Select last lines from a dataset (tail) (Galaxy Version 9.3+galaxy1). It works only from the embedded tutorial when you open it in Galaxy -> is it clear? :) Q: A: ::: :::success ##### ❓ While the file from UCSC has labels for the columns, the peak file does not. Can you guess what the columns stand for? - Could be the chromosome number + - column 1 is chromosome, 2nd is starting position and 3rd is end position - Col1: Chromosome, Col2: start, Col3: end - ::: :::warning ##### ✏️ Hands-on: View end of file ##### ❓ Are you finished with this section? Add a '+' below - Yes: - Waiting for the job to be done: - Need help: ##### Your questions Q1:can you redo it the last step, i didn't catch the options to select on the tool parameters A: To follow, all steps are explained in the GTN tutorial in detail. Teresa just follows the training material (https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html#hands-on-view-end-of-file) :). Do you need help with a specific step? Q2: I don=t have this button there A: which one? the round arrow, back; you mean the re-run button? yes -> okay, go to your history, click on your dataset with the step you want to repeat, then it appears above the dataset preview as 4th icon thank you, but I have lost now -> we are here: https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html#hands-on-adjust-chromosome-names Q3:What does the & mean? A: & is a placeholder for the find result of the pattern search Q4:The filter done to the data peak and genes, and selecting specific rows. How was this performed. It is not by clicking the pencil symbol? A: which filter do you mean? I am not sure.:: I meant the "select last on data" -> select last is a tool which was run: https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html#hands-on-view-end-of-file. This was just a check point step. You can proceed also without this step :) Thank you I was able to solve. Super, happy to help! ##### Do you need help? Please describe your issue - I get an error occurring with the dataset - What kind of error is this ? i can't copy the text >> on which step you are in right now ?i'm trying to add chr with the replace text tool - the error message contains words like "undertermined and unmatched" - Make sure you are adding the parameter in right boxes Below is the step. - - Replace Text ( Galaxy version 1.1.3) : Let’s rerun the tool with two more replacements “File to process”: the output from the last run, chr prefix added “in column”: 1 param-repeat Replacement “Find pattern”: chr20 “Replace with”: chrX param-repeat Insert Replacement “Find pattern”: chr21 “Replace with”: chrY Are you on this step or one step earlier ? I am still on chromosome i mean adding the chr prefix File to process”: our peak file GSE37268_mof3.out.hpeak.txt.gz “in column”: 1 “Find pattern”: [0-9]+ This will look for numerical digits “Replace with”: chr& & is a placeholder for the find result of the pattern search and then Rename your output file 'chr prefix added' i've done it 3 times and it's still not working - Okay, Are you using same versionn : Replace Text ( Galaxy versiyeaon 1.1.3) ? ( lets' do it one by one ) yes so i clicked on the tool via the Okay, then slect the first file (with #peaks as input)yes done swlect column 1 ? in the parameter write (Find pattern”:): [0-9]+ done Replace with chr& (this will replace prefix )done - click on run now. if still you are facing an issue. please useupload image option in this docuemnt and show us your error , oh cool it's worked now! but i did all the same things ahahah how?!? Cool, may be unnoticed errr or parameter . thank you!! i will try catching up now with the rest --> Sure, :+1: - ::: :::success ##### ❓ How are the chromosomes named? - Numbers only - - - ##### ❓ How are the chromosomes X and Y named? - 20 and 21 - - - - ::: :::warning ##### ✏️ Hands-on: Adjust chromosome names ##### ❓ Are you finished with this section? Add a '+' below - Yes: ++++++ - Waiting for the job to be done: ++++++ - Need help: I run but the chr is not shown (please specifiy below in which step replace text in specific column (I move your question a bit below to the other questions) ##### Your questions Q1:what is the meaning of [0-9]+? A: Find pattern”: [0-9]+ means it looks for all numerical digits in this column Q2:What is the meaning of & symbol? A: & is a placeholder for the find result of the pattern search Q3: I run but the chr is not shown replace text in specific column A: a detailed explanation can be foun here: https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html#hands-on-adjust-chromosome-names Be careful to use the full file with 48,000 regions as input and not the one with 100 lines Q4: A: ##### Do you need help? Please describe your issue - can you please repeat how to change to chrX and Y. i did everything like in tutorial, but still have chr1 only -> chr X and Y appear only at the end of the file. Did you check the end of your file? Yes, there is only chr 1. How long is your file? 48,000 regions or 100 lines? click on the dataset to see this info (in your history) I have 48,647 regions -> okay, this should be the right file. Did you check the end of your file? by following these steps, you should get the right numbers because you replace them: https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html#hands-on-adjust-chromosome-names. Here is says "replace all digits 0-9 and add chr there". - ::: :::success ##### ❓ How many regions are in our output file? You can click the name of the output to expand it and see the number. - 48647 same, same, same, Yes! + - - - - - ::: #### Analysis :::warning ##### ✏️ Hands-on: Add promoter region to gene records ##### ❓ Are you finished with this section? Add a '+' below - yes, thanks - Yes: +++++++++++++++++++ - Waiting for the job to be done: + - Need help: ##### Your questions Q1: Does rule of 2000bp upward the start work for bacteria as well? A: We want to be sure to cover all promotor region. Therefore we choose 2000bp. In principle, you can lower the number when you know this regon is shorter Q2: Should the number of regions be reduced after performing this action? A: You can inspect this by click on "eye" button and check for number of rows and columns in principle. Following question: I've done that, the number of columns is the same but there's one region less and I'm not sure if that's expected (and why). Could you tell me on which step you have this ? After finding the promoter regions by flanking regions. In the ouptut of Promoter region you should have 47,246 regions and 12 columns. Is that correct in your case ? Yes, the number's correct. , So May be check the header if is everything is okay. if you are doing all correct. Thanks! Q3: Why 2000 and not 1200 ? A: We want to be sure to cover all promotor region. Therefore we choose 2000bp. In principle, you can lower the number when you know this region is shorter Q4: I also have the same as the participant of q2, it skipped 1 invalid line #47193L in the last step get flanks A: q2 participant here, thanks for the explanation! ##### Do you need help? Please describe your issue - my peak file is already bed but the genes one is gff-should convert that one? -> no gff should stay a gffcoolthanks! The peak region file was interval before, this needs to be changed - ::: :::warning ##### ✏️ Hands-on: Change format and database ##### ❓ Are you finished with this section? Add a '+' below - Yes: +++++++++++++++ - Waiting for the job to be done: ++ - Need help: ##### Your questions Q1: didnt catch hox to get the promoter region job done Thanks !!! A: Is all clear? Q2: Should I delete the interval format peaks file? A: No, you can keep all files in your history but be sure to take the right input file in later steps. Thats why (re)naming your files in a good way is super important. Q3:she needs to change file name A: She will check everything and realizing it :P , thanks for noticing Q4: A: ##### Do you need help? Please describe your issue - When I try the replace text , after running it shows chr1 in all of the column1 instead of chr20 and chr21 - ::: :::warning ##### ✏️ Hands-on: Find Overlaps ##### ❓ Are you finished with this section? Add a '+' below - Yes: ++++""+++ - Waiting for the job to be done: - Need help: ##### Your questions Q1:The converted promoter region doesnt show up when i try to select in *of* box? A: it does not appear as an input? correct. Usually then is has not the right format. Could you check please if your promotor region file is BED? Promotor file is BED - however it says for *of* accepted formats are giff and interval? I see what you mean. BED is accedpted as well and very similar to interval (I will note it down for correction, thanks!) It does not appear for me to select though. Did you rename it in the right way? Maybe the file has another name? Promotor regions - bed is the type - it has 47246 regions and 12 collums. I tried to find out what the problem may be. I fear I can not solve it now. Teresa will share with all of you the right file to continue. Sorry for this. To troubleshoot, I can just recommend to compare your and her history and go through all details in the GTN. Still stuck Q2: unfortunately, i selected human genome intead of mouse on the very start. can i repeat the same process with mouse? sure please. ok thank you A: Yes, You have to repeat that as the dataset is used here is for mouse. - https://usegalaxy.eu/u/teresa-m/h/from-peaks-to-genes ##### Do you need help? Please describe your issue - my itersect file looks different (no thick start-end etc) - Make sure you use correct input file for this step ( click on data set ( output) and click on re-run icon next to 'i' icon and check your input and parameters) - No view after clicking the eye-icon, after the intersect tools analysis - click on dataset,it will expand then you can see re-run icon ( im talking about 'i' icon not 'eye' icon :). --> check teresa's instruction now it's still not working, now I get 0 regions and 0 columns >> So in that case you are missing some option during your tool's run or you did not run the tool with correct input. check the steps and follow again :) ::: :::warning ##### ✏️ Hands-on: Count genes on different chromosomes ##### ❓ Are you finished with this section? Add a '+' below - Yes: ++++++++++ - Waiting for the job to be done: + - Need help: ++ ##### Your questions Q1: I got an error message, non float value chr1 was found... A: Check you add coorect option at the time of run the tool. ok, it worked the second time I tried Cool, Perfect. Q2:Hi, still stuck at the last step, where it wont allow me to select Promotor regions as an input file? A: is your file (which you are using as input) is in green color ? yes So you can drag this file in the input. check if you can do that. I can, it worked, the drop menu just didnt show it. It could happen, hopefully now it's working for you. Teresa will share her history now and you can compare yours and use her file. She will show how to do. Q3: A: ##### Do you need help? Please describe your issue - how can i be online in hedgedoc - If you have page open that means you are alreday online for the content. is that answer of your question ? - OK. thank you - Perfect i had to catch up a bit, where are we now? - we did hands- -on: Find ovec cool i'm at the right bit thanks - but also did count genes ::: :::success ##### ❓ Are you back? - Yes: +++++++++++++++ - No ##### ❓ Any questions regarding what we did until now? Q1: how to import your history? A: Go to the history you want to import and just click import. The history needs to be shared with or needs to be public to access it. Teresa will explain it in a minute Q2:why we are just sticked to only first gene? A: Just to explain and show you that we are getting a desired outputs and not disturbing the format. ##### ❓ Is the speed fine - Yes: +++++++++++++ - Too slow: ++ - Too fast: ::: :::warning ##### ✏️ Hands-on: Count genes on different chromosomes ##### ❓ Are you finished with this section? Add a '+' below - Yes: - Waiting for the job to be done: - Need help: ##### Your questions Q1:can we continue to work on our own history? do we need Teresa's history for any step? A: Yes, sure, if you have the right files available! Some participants had issues with some datasets and then it is safer to use Teresas history. If all is fine for your history, please use yours :) There is no additional value of her history (only error correction) Q2:What does this group mean? A: This was the step after intersecting the 2 datasets https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html#hands-on-count-genes-on-different-chromosomes Count genes on chromosomes? thanks; yes! :) "Group" was the tool name Q3: A: ##### Do you need help? Please describe your issue - my work is showing only one tag, but yours showing both tags. why? - Did you use a "#" before the tag? - yes; did you use the right inputs? w - ::: :::success ##### ❓ Which chromosome contained the highest number of target genes? - chr11++++ - chr11 - chr11+ - chr 11 - yee ::: #### Visualization :::warning ##### ✏️ Hands-on: Fix sort order of gene counts table ##### ❓ Are you finished with this section? Add a '+' below - Yes: ++++++++++++ - Waiting for the job to be done: - Need help: ##### Your questions Q1: A: ::: :::warning ##### ✏️ Hands-on: Draw barchart ##### ❓ Are you finished with this section? Add a '+' below - Yes: - Waiting for the job to be done: - Need help: ##### Your questions Q1:what this bar chat actually represent? A:gene count by chromosome number Q2:count means genes right? A: yes, genes count per chromosome. to see the actual integer value on Y axis - you can select 'integer' from drop-down menu for Y-Axis value type. Q3: I accidently clicked something in the Galaxy training material panel. How do I return to the tutorial? A: Just refresh or Go to https://usegalaxy.eu/. this will redirect you to your current history. Q3.2: It didn't work :( It still shows just an image. Are you able to go to Galaxy main page with https://usegalaxy.eu/ link? You can also click on very top left corner on Galaxy icon to go to your home page. Right, I refreshed. I then click the hat. It is still stuck on the workflow example image. . You can seach for 'From peaks to genes' in search option to go to your tutorial . btw [here](https://usegalaxy.eu/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html#file-preparation) is the link for tutorial. I have it open on another window for now then. The little tutorial hat thing is still broken. Q4: can we make heatmaps too? A: yes, we can make heatmaps. It will be explained during the [RNA-Seq data analysis tutorial](https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html#hands-on-plot-the-heatmap-of-the-normalized-counts-of-these-genes-for-the-samples) - some preprocessing necessary - it will be explained on Friday ::: #### Extracting workflow :::warning ##### ✏️ Hands-on: Extract workflow ##### ❓ Are you finished with this section? Add a '+' below - Yes: - Waiting for the job to be done: - Need help: ##### Your questions Q1:please give a quick rewiew too of the workflow. A: Here is the GTN description of this part https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html#hands-on-extract-workflow The workflow can be extracted from your history. First clean up your history and the follow the instruction. Q2: When looking at the workflow tutorial, I accidently clicked the example image. I'm stuck at the image view and can't go back. A: click on top left corner on galaxy icon or click on any other location out of the GTN box. ig you stuck with Galxy workflow page then save the workflow and go back to the main page. Q3:should we share history today A: Not today, We will let you know on friday about this with details on this. Q4:How can I leave the workflow editing window? A: Just save your workflow can we have the email aaddress to share them on? -> if this question is still open: muellert@informatik.uni-freiburg.de ##### Do you need help? Please describe your issue - A break? Is not end - dragging the arrow pointing outwards on the right of its box in the workflow is not working for me. - - ::: ### Summary - first analysis in Galaxy - get to know first data format - get to know data manipulation tools - created a workflow - shared your results and methods with others ### Feedback the course is great and helpful :::success ##### ❓ One thing that was good about today - Workflows are funny! - worked on galaxy for the first time! learned new things. The pace was good and easy to follow. please dont go faster tomorrow. - New knowledge - it was easy to follow and well explain with new tips - good excercise for novices as an introduction; the idea of sharing the tutor's history in case someone got stuck so that they can proceed further - truly a step by step tutorial - got new knowledge - Easy to follow - good tips and tricks to make analysis more streamlined and track data - - really good, got a bit stuck sometimes so really appreciated the help on here, very quick responses - ##### ❓ One thing to improve - - ##### ❓ Any other comments? - should we share history today - yes, today or latest until end of this course) - I just could notSs - will the lessons be recorded? - Yes - thank you! - thank you very much! - thank you! really good - Thank you very much! - thank you very much. - My hat function is broken :C I can't see tutorials, just a workflow image. -> can you refresh the page and open the embedded GTN again? - where to get work flow diagram? -> not really sure what you mean. Maybe this? https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/workflows/ ::: ## Day 2 - Tuesday ### Table of Contents 1. [Welcome](#Welcome) 2. [Repetition of the day before](#Repetition-of-the-day-before) 3. [Slides: Quality Control](#Slides-Quality-Control) 4. [Tutorial: Quality Control](#Tutorial-Quality-Control) - [Inspect a raw sequence file](#Inspect-a-raw-sequence-file) - [Assess quality with FASTQE 🧬😎 - short reads only](#Assess-quality-with-FASTQE:dna::sunglasses:-short-reads-only) - [Assess quality with FastQC - short & long reads](#Assess-quality-with-FastQC---short-&-long-reads) - [Trim and filter - short reads](#Trim-and-filter-short-reads) - [Processing multiple datasets](#Processing-multiple-datasets) - [Assess quality with Nanoplot - Long reads only](#Assess-quality-with-Nanoplot---Long-reads-only) - [Assess quality with PycoQC - Nanopore only](#Assess-quality-with-PycoQC---Nanopore-only) 5. [Slides: Mapping](#Slides-Mapping) 5. [Tutorial: Mapping](#Tutorial:Mapping) - [Prepare the data](#Prepare-the-data) - [Map reads on a reference genome](#Map-reads-on-a-reference-genome) - [Inspection of a BAM file](#Inspection-of-a-BAM-file) - [Visualization using a Genome Browser (IGV)](#Visualization-using-a-Genome-Browser-(IGV)) - [Visualization using a Genome Browser (JBrowse)](#Visualization-using-a-Genome-Browser-(JBrowse)) 7. [Summary](#Summary) 8. [Feedback](#Feedback) ### Welcome - today about quality control and mapping (foundation of HTS analysis) :::success ##### ❓ Before we start: Is the screen clearly visible (add '+') - Yes ++++ - Please zoom in Where is the zoom link? never mind got it ->Zoom link is the same than : [Zoom]( https://epfl.zoom.us/j/64829799035?pwd=DYKW7EWKUzurBebej4hNaJEIW9F2xn.1) where can i find the slides? [Quality Control](https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/quality-control/slides.html#1) ::: where can i find the link to the data? Please giude me to the tutorial? how to reach it? :::success ##### ❓ We plan to adjust the break times according to the workshop's progression. Does this arrangement work well for everyone? - Yes ++ - No ::: :::success ##### ❓ Those that answered no, could you clarify your needs? Do you need a specific break time? - I need a break about 11.00.It could be +-15 min. because it is lunch time at work. - - ::: ### Repetition of the day before :::success ❓ We have some question: - ::: ### Slides: [Quality Control](https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/quality-control/slides.html#1) **Disclaimer**: We will not go through the full slidedeck :::success ##### ❓ Any questions regarding this introduction to Quality Control? Q1: Does the sequencing facility/company is not doing a QC of the data they have sequenced? A: Sometimes they pass the data through an internal QC pipeline. But for a better transparency of the QC output, we would recommend to run FASTQC (the tool for QC in Galaxy) to know about the parameters used etc. It takes not much time :) Q2:whats difference bw fasta and fastq? A: A fasta file your one entry is two lines a n ID line starting with a '>' and in the next line the sequence. It is usally used for e.g. genomes. The Fastq files one entry has 4 lines. First a header starting with a '@', next line the sequence, next a line with starting with a '+', and the 4th containing the quality. https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/quality-control/tutorial.html#hands-on-inspect-the-fastq-file Most important: FASTQ has a quality information for each base - FASTA not. ok. Q3:Where to i find the tutorial instructions? within the hat button A:Here it is : https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/quality-control/tutorial.html Q4: A: Q5: A: ::: ### Tutorial: [Quality Control]l) (https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/quality-control/tutorial.htm **Disclaimer**: We will not go through the part *Assess quality with Nanoplot - Long reads only* of the Tutorial :::warning ##### ✏️ Hands-on: Open the tutorial 1. Go to [Galaxy](https://usegalaxy.eu/) (done in previous Hands-on) 2. Click on the hat in the top bar 3. Navigate to the topic "Sequence analysis" 4. Click on the tutorial "Quality Control" ##### ❓ Have you found the tutorial? (Add a + when done) - Yes: +++++ - No: ++ ##### Your questions Q1:What is the name of the tutorial? A: It is called Qualty control: [Quality Control](https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/quality-control/tutorial.html) If you are on the GTN home page you find it at the section 'Methodologies' and here Sequencing analysis (select) and than select Quality Control Q2:which is the link? -> see Q1 A:Can you open again upload tab? -> copy the date in the upload tool in "paste and fetch data" https://zenodo.org/record/3977236/files/female_oral2.fastq-4143.gz Q3: A: Q4: A: ##### Do you need help? Please describe your issue - WAIT, MY FILE IS STILL LOADING. PLEASE WAIT. - No worrys you can look at soom and follow Dianas explaintaions for now. - - please slow down+++++++++++++++++++++++++++++++++++ - ï dont have data yet - - please wait i am still back where do we download the file from? from here: https://zenodo.org/record/3977236/files/female_oral2.fastq-4143.gz copy the date in the upload tool in "paste and fetch data" please, what format of data, coul it be -> it is FASTQ ::: i don´t see the result only the headline #### Inspect a raw sequence file :::warning ##### ✏️ Hands-on: Data upload ##### ❓ Are you finished with this section? Add a '+' below - Yes: ++ - Waiting for the job to be done: - Need help: ##### Your questions Q1: i am not getting the link, please paste here. A: of the data: https://zenodo.org/record/3977236/files/female_oral2.fastq-4143.gz Q2:Do we specify the the file type during upload? for example do i have to select fastq from the drop down? A: It is recognized automatically. If you click on the uploaded dataset it should show you "fastq" Q3: A: Q4: A: ##### Do you need help? Please describe your issue - can you slow down?++++ - i dont see the result, only the headline - your file could be wrong. yo should copy the link - i copied the link - https://zenodo.org/record/3977236/files/female_oral2.fastq-4143.gz -this link ok i will try again, and the format is fastq,please? while loading? - I am a participant. maybe you should ask again in below Is that solved for you ? [helper here] ::: :::warning ##### ✏️ Hands-on: Inspect the FASTQ file ##### ❓ Are you finished with this section? Add a '+' below - Yes: +++++++ - Waiting for the job to be done: ++ - Need help: - Can you specify which line we write in? In this document ##### Your questions Q1: A: Q2:+ A: Q3: A: Q4: A: ##### Do you need help? Please describe your issue - its a bit fast right now - too fast, please slower - We will go slower now. Thanks for the feedback! - ::: #### Assess quality with FASTQE 🧬😎 - short reads only :::warning ##### ✏️ Hands-on: Quality check (FASTQE) ##### ❓ Are you finished with this section? Add a '+' below - Yes: +++++ - Waiting for the job to be done: + - Need help: ##### Your questions Q1:fastqce? or fastqe A: FASTQE Q2: Can you tell Diana to check the HedgeDoc? Like yesterday. A: We will do that Q3: - A: Q4:where can i find the pdf file, tutorial is following today? like previous day we just clicked on hat sign on glaxy. i am clicking on hat, but it shows previous day. A: The tutorial is here at GTN: https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/quality-control/tutorial.html If you are in usegalaxy.eu, you just click on the hat in the top panel and the embedded GTN will open. Does it answer your question? yes, great ##### Do you need help? Please describe your issue - please slow down+++++ -- i dont see the result, only the headline -> for which step? FASTQE? -please slow down Which link? Can you repeat this? The input data is here: https://zenodo.org/record/3977236/files/female_oral2.fastq-4143.gz. In the tutorial we are here:https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/quality-control/tutorial.html#hands-on-quality-check - - ::: #### Assess quality with FastQC - short & long reads :::warning ##### ✏️ Hands-on: Quality check ##### ❓ Are you finished with this section? Add a '+' below - Yes: +++++++++++++++++++++ - Waiting for the job to be done: ++ - Need help: Shortcut? --how to read that heatmap? Can you tell us what lline we have to write in the document? Because we have issues finding it - usually at the end of the doc. Put questions in line 410. Sorry, the shortcut/plot question was posted twice (here and in line 476). We try to avoid this. ##### Your questions A: Q2: A: Q3: A: Q4: A: ##### Do you need help? Please describe your issue - can I have the link for the tutorial we used yesterday? - Yeah Sure, https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html - <3 - - - ::: :::success ##### ❓ Do you want to know more about the FASTQC plot? - yes++++++++++++++ - no ++++ :::+ :::success ##### ❓ Do you have questions about the FASTQC plot? - If there is no adapter how will the adapter plot look like? - There is no line or a line at 0%. You can see it here in the adapter plot for the adapter SOLID small RNA adapters (purple line at 0% at the y axis): https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/quality-control/tutorial.html#adapter-content - - - - - - ::: #### Trim and filter - short reads :::warning ##### ✏️ Hands-on: Improvement of sequence quality ##### ❓ Are you finished with this section? Add a '+' below - Yes (previous feedback): +++++++++++++++++++ - Yes, job done: ++++++++++++++ - Job RE-run?: +++++++ - Waiting for the job to be done: +++++++++ - Need help: ##### Your questions Q1: Tell people in what line, friends are getting lost. A: You mean telling the line where people should add their "+" or "-"? Yes! Yes please !!! - we will try to make this more clear. Thank you for your feedback! Q2:Is it okay to skip trimming if there is no adapter shown after QC? A: We always recommend to trim because the probabilty is super high that you have a few bad quality bases. It is not only to remove adapters. It is very quick to run cutadapt and I would be always using it to be sure. Q3:if the reads are paired will the quality cut off and minimum length chnge? A: we will have an example on this later :). Read 1 and 2 need to be considered together. If one reads gets too short, the other belonging one will be discarded as well. Q4:please repeat the Cutadapt yes A: Q5: A: ##### Do you need help? Please describe your issue - **can you please repeat this step.** - you can finde the steps for this Hands-on part here: https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/quality-control/tutorial.html#hands-on-improvement-of-sequence-quality -> you can run the tool Cutadapt by setting the parameters as described within the tutorial (please let us know if it worked) - - - - - ::: :::success ##### ❓ What % reads contain adapter? - 461 (56.8%) + - 56.8 - 56.8% - - ##### ❓ What % reads have been trimmed because of bad quality? - (35.1%) + - 35.1% - - - ##### ❓ What % reads have been removed because they were too short? - 0 (0.0%) + - - - - ::: :::warning ##### ✏️ Hands-on: Checking quality after trimming ##### ❓ Are you finished with this section? Add a '+' below - Yes: +++++++++++++++++++++++ - Waiting for the job to be done: +++++ - Need help: ##### Your questions Q1:Is going to be break soon? A: ##### Do you need help? Please describe your issue - - - + ::: :::success ##### ❓ Compare the FASTQE output to the previous one before trimming above. Has sequence quality been improved? - yes - yes - - ::: :::warning ##### ✏️ Hands-on: Checking quality after trimming (FastQC) ##### ❓ Are you finished with this section? Add a '+' below - Yes: ++++++++++++++ - Waiting for the job to be done: +++ - Need help: ##### Your questions Q1: A: ##### Do you need help? Please describe your issue - - - ::: #### Processing multiple datasets :::warning ##### ✏️ Hands-on: Assessing the quality of paired-end reads ##### ❓ Are you finished with this section? Add a '+' below Here is the part for paired-end QC https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/quality-control/tutorial.html#processing-multiple-datasets - Yes: +++++++++++++++++++++++++++ - Waiting for the job to be done: ++++++ - Need help: ##### Your questions Q1: How do I know if I have paired-end data? A: You can know it because you have ordered it, or your sequencing facility will let you know. Or you see it when you have 2 read files per sample (forward and reverse). They belong together and show you, you have PE data Q2:Will there be a break soon? :) Haha great! A: We will do a longer break before we start with mapping. Q3:Is it possible to pair both data sets before the QC? A: You could but this is usally not done, because it could be that R1 has a much better qualtey than R2 and you would like to have this infromation. As we are inspecing the qualty always before we decide if the data qualtey is good enage and if furthere presessing (trimming) should be done. It could also be that the qulety is that low that you should first talk to the seqencing facility and see if you can find out if there was a issue while sequencing. Q4:Could you explain the duplication content parameter? A: Here is more information: https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/quality-control/tutorial.html#sequence-duplication-levels and https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/quality-control/tutorial.html#details-more-details-about-duplication Q5: What data do we select?, the webpage or the raw? A: In which step? The current for the multqc analaysis. We have up to frow output data (2 for subset 1 and 2 for subset 2) Here are the parameters: https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/quality-control/tutorial.html#hands-on-assessing-the-quality-of-paired-end-reads (raw data): good. what are the file input to select? For MultiQC the input is the Raw data output from FASTQC before (see tutorial) Q6:what output files you got from this analysis ? and how many? A: For FASTQC you get multiple output files. For each file that you applyed you get a report and a webpage. MULTIQC is that taking the repot files of the FASTQC runs (input) and aggregated this reports into one report. Here you get again a web page showing you the aggregated plots. please go slow ##### Do you need help? Please describe your issue - can you please repeat the last step? - This dataset is large and only the first megabyte is shown below. is that ok? - can you select the eye icon and see the full dataset? (I did move your question up) - ::: :::warning ##### ✏️ Hands-on: Improving the quality of paired-end data ##### ❓ Are you finished with this section? Add a '+' below - Yes: +++++++ - Waiting for the job to be done: ++ - Need help: ##### ❓ Follow-up activity on FastQC, are you done? - Yes:+++++ - Waiting:+ - Need help: ##### ❓ Follow-up activity on Cutadapt, are you done? - Yes:++++++++++ - Waiting:+++ - Need help: ##### Your questions Q1: why are we running the cut adapt if there are no adapters detected? A: Cutadapt removes also bad-quality bases as well. You always have them. Run always FastQC and then Cutadapt. Standard procedure :). It is fast and really worth the time to be sure you have clean data for your analysis. Good quality data saves you a lot of time later during your data interpretation. Thanks! :) Q2: Thank you A: Q3:Thanks! Thansk! A: Q4: Thank you A: ##### Do you need help? Please describe your issue - - - ::: --- ### Slides: [Mapping](https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/mapping/slides.html#1) **Disclaimer**: We will not go through the full slidedeck :::success ##### ❓ Any questions regarding this introduction to Mapping? Q1:should we merge the files for mapping. yes the forward and reverse. A: It is not merging files but you can put the forward and reverse file into one collection and you can run tools on collections. This is useful especially when you have a lot of samples. We will use collections tomorrow. Q2:What should we click now? I missed step. Oh, okay, thanks! A: Engy showed how to do a collection (dataset pair) as answer to Q1. You don't need to do this. [Here](https://training.galaxyproject.org/training-material/faqs/galaxy/collections_build_list.html) are infromation on how to do a collection in Galaxy. Q3:do we need to check if our RNAseq is stranded or not before mapping A: You don't need this infromation for most Mapping tools. But for the counting you will need to know the strandness. We will learn this tomorrow. ::: ### Tutorial: [Mapping](https://zenodo.org/record/61771/files/GSM461178_untreat_paired_subset_1.fastq https://zenodo.org/record/61771/files/GSM461178_untreat_paired_subset_2.fastq) :::success ##### ❓ Before we start: Is the screen clearly visible (add '+') + - Yes ++++++++++++++ +yes, we can hear you+++++ - Please zoom in ::: :::warning ##### ✏️ Hands-on: Open the tutorial 1. Go to [Galaxy](https://usegalaxy.eu/) (done in previous Hands-on) 2. Click on the hat in the top bar 3. Navigate to the topic "Sequence analysis" 4. Click on the tutorial "Mapping" ##### ❓ Have you found the tutorial? (Add a + when done) + - Yes: ++++++++++++++++++++++ - No: -+ - Link: https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/mapping/tutorial.html ##### Your questions Q1: A: ##### Do you need help? Please describe your issue - - - - ::: #### Prepare the data :::warning ##### ✏️ Hands-on: Data upload ##### ❓ Are you finished with this section? Add a '+' below - Yes: ++++++++++++++++++++++ - Waiting for the job to be done: ++++++ - while waiting you can read our news, our comics appearing in the middle panel :D - What middle panel? -> in usegalay.eu where you see your tool when selecting it. While running a tool, sometimes comics appear there or other funny stuff. Cool! - Need help: ##### Your questions Q1: A: ##### Do you need help? Please describe your issue - - - ::: :::success ##### ❓ What is a reference genome? - Mouse - assembled nucleic acid sequence representing genome of a species. - "premade" genome constructed of bits and pieces of other confirmed genomes ++++ - assembly of a mix of human genome - assembled genome before - ##### ❓ For each model organism, several possible reference genomes may be available (e.g. hg19 and hg38 for human). What do they correspond to? - - No idea ! - - The version of it? Great! The higher the number, the newer the version (new, more annotations etc) - - ##### ❓ Which reference genome should we use? - - - - ::: :::warning ##### ✏️ Hands-on: Mapping with Bowtie2 ##### ❓ Are you finished with this section? Add a '+' below - Yes: ++++++++++++++++++++++++++++ - Waiting for the job to be done: ++++++mine is stille running :( - Need help: ##### Your questions Q1:Does a FASTA file need to pass all QC sections with a green check mark in order to be successfully mapped to the genome? A: We consider a sample as good quality when more than 70% of the reads pass the QC. With such samples you can continue with mapping. Usually most newly sequenced samples have a very good quality score. But if you e.g. want to compare them to older samples, be careful and check the raw data (run FASTQC). Q2:There are a lot of alignment tools in galaxy, which one is worth using? A:It depends on your final aim of your work. there are many tools available but we recommend you to know about your final aim or what you want to achieve and then select the tool likewise. Thank you Q3:If there was a collection would they mapped to each other? A: You mean a collection of paired end data? If you have a collection of paired end data you would still select paired end when you are ask what is you input data. And tahn you add the collection. The information of paired end data is always used together while mapping to impove the mapping qualety. - should collection to be related the species - A collection should contain data that will be treated together. e.g. the paired end R1 and R2 file from one run sequencing experiment. - Thansk a lot! Q4: Could Bowtie2 also be used for miRNA reads? A: Yes. If you are only interested in miRNA there are some specific mapper only for small RNAs. You should also see what reference is prefereable if you only want to investigate miRNAs. ##### Do you need help? Please describe your issue - - - ::: :::success ##### ❓ What information is provided here? - reads that were mapped to the ref genome - Numbers of reads (quality) -statistics to show how well the alignment has worked - ways that they have been aligned+ - reads that aligned once or more -Alignment rate. - - ##### ❓ How many reads have been mapped exactly 1 time? - 42761 (85.52%) - 44731 (89.46%) aligned concordantly exactly 1 time - 85.52% - - - - - ##### ❓ How many reads have been mapped more than 1 time? How is it possible? What should we do with them? - - duplex regions, repetitive regions ++ - 10.72% - ##### ❓ How many pair of reads have not been mapped? What are the causes? - - - 3.76% bad quality ? too short sequences ? - - contaminations - - + ::: #### Inspection of a BAM file :::warning ##### ✏️ Hands-on: Inspect a BAM/SAM file ##### ❓ Are you finished with this section? Add a '+' below - Yes: - Waiting for the job to be done: - Need help: ##### Your questions Q1: A: ##### Do you need help? Please describe your issue - - - ::: :::success ##### ❓ Which information do you find in a SAM/BAM file? - potition of the read on the genome ? - quality inf - location of nucletides? - sequence - - - ##### ❓ What is the additional information compared to a FASTQ file? - location infomation - information on mapping - - - - - - - ::: :::warning ##### ✏️ Hands-on: Summary of mapping quality ##### ❓ Are you finished with this section? Add a '+' below - Yes: ++++++++++++++++++++++++++++++++ - Waiting for the job to be done: +++++ - Need help: ##### Your questions Q1:please show again the reference genome selectionm A: Can you see it now ? Engy did show it again and you can follow the tutorial [here](https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/mapping/tutorial.html#hands-on-summary-of-mapping-quality) Q2: Some time it says locally... cache: I chose this and i was able to select the mus10 A: The option you can choose is called `Locally cached/Use a built-in genome` and than you can select mm10 full. Q3:I cant see this in my history. A: Is it solved ? no in which step you are in now? Make sure you are following all the steps mentioned here (https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/mapping/tutorial.html#hands-on-summary-of-mapping-quality) Q4: OK, it appears under the "locally cached" option A: yes, select - Use reference sequence”: Locally cached/Use a built-in genome ##### Do you need help? Please describe your issue - just for understanding, for ever 5 million, is it 32000 mismatch or 32? - https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/mapping/tutorial.html#solution-3 - - [helper here] if your reference genome you are working with is missing on usegalaxy.eu, please contact the European Galaxy team. We can add it :) can you show the setting for 8 data??? ++++ it worked! thank you -> please press the "re-run" button from the dataset you want to know the settings for. It shows you all parameters, inputs etc what was used. These metadata is saved as long as the dataset exists. This is what we call transparency and reproducibility :) ::: :::success ##### ❓ What is the proportion of mismatches in the mapped reads when aligned to the reference genome? - 32663 same - 32663 - mismatches: 32663 - 32663 - mismatches: 32663 - - - ##### ❓ What does the error rate represent? - Probability of mismatch on one particular nucleotide? - mismatches over the (divided by) bases mapped - - - - - ##### ❓ What is the average quality? How is it represented? - 35.8 realted on the Phred score ? - average quality: 35.8 -35.8 - - - - ##### ❓ What is the insert size average? - 201.9 - 201.9 -201.9 - insert size average: 201.9 - - ##### ❓ How many reads have a mapping quality score below 20? Did you follow how to answer this question? - Yes: +++++++++++ - Not yet **Questions**: - +Can yo repeat the samtool after filter BAM - Yes - - will we have a break - we only have 30 min left. Can you still concentrate? - OK, yes - **Answer to the Question:** - reads mapped and paired: 91354 - - - - ::: #### Visualization using a Genome Browser (IGV) :::warning ##### ✏️ Hands-on: Visualization of the reads in IGV ##### ❓ Are you finished with this section? Add a '+' below - Please follow on Engy's screen, if you want to run IGV locally you will need to download it and run locally. ##### Your questions Q1:where did you get this information? i.e: filtering results? A: Do you mean information about IGV? - no, the last question on the previos step - from the tool `Samtools Stats` you get the information of how many reads map. If you first do it for the unfiltered reads and than Filter the bam file according to the mapping qualety. Than run the `Samtool Stats` again and you find out how many reads are mapped not. By checking the difference you know how many are filtered. Thanks !! Q2:I can't reach this page... A:You mean IGV or download (you can use this link to download https://igv.org/doc/desktop/#DownloadPage/#laest-release-of-igv-desktop-2182 ) You will be able to reach on that page once you have IGV installed. I recommend you to just see what Engy is showing. when i click on local Q3:I A: Engy invited you to look at her screen, the plan now is not to follow through what she is doing. Q4:Why some arrows are in different colors not gray? A:In IGV, arrows are basically for reads and different colors shows the different mapping qualities/strand. ##### Do you need help? Please describe your issue - - - ::: :::success ##### ❓ What could it mean if a bar in the coverage view is colored? - - - - - - - - ##### ❓ What could be the reason why a read is white instead of grey? - - - - - - ::: #### Visualization using a Genome Browser (JBrowse) :::warning ##### ✏️ Hands-on: Visualization of the reads in JBrowse ##### ❓ Are you finished with this section? Add a '+' below - Please follow on Engy's screen, if you want to run IGV locally you will need to download it and run locally. ##### Your questions Q1: Why some arrows not in gray/white? A:arrows are basically for reads and different colors shows the different mapping qualities/strands. Q2: Can you show teardrop on your browser? thank you! A: You can find the teardrop on the bottom if in this coloum most of the aligned nucleotides are swithich to a differen nucletied. It highlites that you have a SNP (single nucleotide polymorphism) at this postions. Q3:how do you select the particular part of the chromosome? I mean that how do you decide to look at this part? thank you! A: In IGV you can simply select a specific chromosome (drop down menue) or copy/paste a location in the genome, e.g. chr2:98,666,236-98,667,473 (chromosome 2, position x-y) We chose this part to show as it has many information at this location in regards of mapping the reads. we know the example dataset from earlier so it was easy to select this location and provide you the information on it. but in real project, it will depend on your task like which specific region you want to explore and so on. Q4: Can you please repeat how to open on the jbrowse A: Engy is showing it again now. You can find instruction [in the tutorial](https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/mapping/tutorial.html#jbrowse). wh did not we analyse filtered data in Jbrowse OK Q: Could you show again how to follow the turorial for JBrowse? I'm stuck after : Track Type”: BAM Pileups A: Engy will do it again for you now. Q: Could Bowtie2 also be used for miRNA reads? A: Yes, it can be used as it handles well the short reads. but ofcourse you can also explore more tool options for your real project. ##### Do you need help? Please describe your issue - - - ::: :::warning #### General questions about this section Q1: A: Q2: A: Q3: A: Q4: A: ::: ### Summary #### Quality Control 1. **Importance of Quality Control in NGS Data**: - Before proceeding with any downstream bioinformatics analysis, it’s crucial to perform quality control on sequencing data. Tools like FastQC and FASTQE provide valuable insights into sequence quality, identifying issues such as low-quality bases, adapter contamination, or sequence duplication that could bias results. 2. **Trimming and Filtering for Improved Data Quality**: - Utilizing tools like Cutadapt, you can trim low-quality bases and remove adapter sequences from your reads. This process enhances the overall quality of the dataset, ensuring that the subsequent analysis is based on accurate and reliable data. Filtering out poor-quality reads helps to maintain the integrity of the analysis. 3. **Handling Paired-End Data**: - When working with paired-end sequencing data, it’s essential to process the forward and reverse reads together. This synchronized treatment ensures that both sets of reads remain aligned and properly ordered, which is crucial for accurate mapping and downstream analysis. #### Mapping 1. **Definition**: - Read mapping is the process of aligning short DNA or RNA sequences (reads) obtained from sequencing technologies to a reference genome or transcriptome. thank you both 2. **Purpose**: - It helps to determine the origin of each read by finding its corresponding position on the reference genome, which is crucial for understanding gene expression, genetic variations, and mutations. 3. **Steps**: - Preprocessing: Reads are typically filtered and cleaned (e.g., removing low-quality reads). - Alignment: Specialized algorithms (e.g., Bowtie, BWA) map the reads to the most likely location on the reference genome. - Post-processing: Mapped reads are analyzed to identify features like gene expression levels, variants (SNPs, indels), or structural changes. 4. **Challenges**: - Dealing with sequencing errors or repetitive regions in the genome. - Handling large amounts of data efficiently, especially for high-throughput sequencing. 5. **Applications**: - Used in genomics for tasks like variant calling, genome assembly, and RNA-Seq for gene expression analysis. ### Feedback :::success ##### ❓ One thing that was good about today - All the two sections were quite important and easy to follow + - mapping part was super well taught! Thanks Engy! - mapping part was explained very well - Thank you both! - easy to follow and also provides explanation and tips - well presented in an easy to understand way - learnt new things, helping to future reearch. thank you - thank you for both sessions, second presenation waseyoxuplai ned vberyo thw - really important sections, will definitely use when i perform this kind of analysis - great mapping presentation, pace and enthusiasm! ##### ❓ One thing to improve the quality part was very fast at the beginning - quality part was a little bit fast, but thats fine, I caught all xd + - - the hedge doc was a bit confusing today. Questions did not fit the lines etc + - We will check tomorrow more that the Question line numbers are reportet correctly . Thank you! - I suggest to always giude the line number (may be two times) when referring to the HedgeDoc. But in general, i like the concept of using it for interaction, but definetly can be improved. - refer to line number as hedgedoc can get confusing where we are. - Thanks we will give line numbers tomorrwo - - interactions in the hedgedoc, addressing the issues ##### ❓ Any other comments? - Random question: is it possible to get a link to have access to the tutorial of yesterday explained by Teresa ? - Thanks !! - https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html - Great ! - Also consider that this document's history is available at: https://biont.biobyte.de/SOKvbVkPSGew6Y9S9Bb-mA# (so you can also check responses to questions that you may have missed) - Thanks again ! - Thank you! - Thanks ! - thanks :) - Thanks! -Thanks thank you. thanks! thank you! Do we need to do anything about the workflow today, in order to send you? ::: ## Day 3 - Thursday ### Table of Contents 1. [Welcome](#Welcome) 2. [Repetition of the day before](#Repetition-of-the-day-before) 3. [Slides](#Slides-Transcriptomics) 4. [Tutorial](#Tutorial-Reference-based-RNA-Seq-data-analysis) - [Data upload](#Data-upload) - [Quality control](#Quality-control) - [Mapping](#Mapping) - [Counting the number of reads per annotated gene](#Counting-the-number-of-reads-per-annotated-gene) 7. [Summary](#Summary) 8. [Feedback](#Feedback) ### Welcome :::success ##### ❓ Before we start: Is the screen clearly visible (add '+') - Yes +++++++++++++++++++++ - Please zoom in The panel on zoom is different from yesterday, can you please hide the other four panelist? Thank you! ::: :::success ##### ❓ We plan to adjust the break times according to the workshop's progression. Does this arrangement work well for everyone? - Yes ++++++++++++++ ::: :::success ##### ❓ Those that answered no, could you clarify your needs? Do you need a specific break time? - I need a break about 11:00. Because it is lunch time at work. - - ::: ### Repetition of the day before :::success ##### ❓ What do you remember from yesterday? - Mapping and aligment to reference genomes; QC; IGV viewer++++++ - How to QC and map data! - Adapter cutting - Use of Jbrowser, IGV, Interpretation of data - How to read QC plots - Genome Browser, visualization - Approaches for improving the quality - Using emojis to assess quality - - - - - ##### ❓ Do you have a question from the day before? - - - Q: how to send the work from yesterday? - A: We will let you know on friday how to share your histories with us, till then enjoy the sessions - How to get todays pdf? - link please - Tutorials are all linked at the top of this document: - Tutorial for today and tomorrow is: https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html - - - slides: https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/introduction/slides.html#1 - - - - - - ::: :::success ##### ❓ Any questions regarding this introduction to Transcriptomics? Q1:Tomorrow I won't be able to attend the lesson. Where can I find the recorded lesson? A: We will share the recordings with you after the workshop. All GTN material is freely available on https://training.galaxyproject.org. You are invited to run the RNAseq tutorial and all others tutorials by yourself if you want :) Thank you very much! Q2: A: Q3: A: ::: :::warning ##### ✏️ Hands-on: Open the tutorial 1. Go to [Galaxy](https://usegalaxy.eu/) (done in previous Hands-on) 2. Click on the hat in the top bar 3. Navigate to the topic "Transcriptomics" 4. Click on the tutorial "Reference-based RNA-Seq data analysis" ##### ❓ Have you found the tutorial? (Add a + when done) - Yes: ++++++++++++++++++++++ - No: ##### Your questions Q1: Today, I should leave a bit earlier. A: We will share the recordings with you after the workshop. All GTN material is freely available on https://training.galaxyproject.org. Thany you, but I am affraid I am not able to catch up tomorrow So for tomorrow session, we will create new history so won't be any issue with today' s session. Ok, thank you very much Q2:I cant find the tutorial; it's not letting me click the wee hat. thanks A: in. The embedded one should be accessible within usegalaxy.eu. If not, can you try to refresh usegalaxy.eu? In princinple, you can have GTN just open as a separated tab. The only disadvantage is that you can not click the tools inside the tutorial if it is not the embedded one and you need to search manually the tools (no problem at all to find it but a bit slower). Btw, which browser are you using?Edge; got it from the link but it may be my browzer then? Q3: A: Q4: A: ##### Do you need help? Please describe your issue - Teresa is so helpful, thanks! >>> She will be very happy to see this msg :) - - - ::: #### Data upload :::warning ##### ✏️ Hands-on: Data upload ##### ❓ Are you finished with this section? Add a '+' below - Yes: ++== - ++++++++++++++++++++++++++ - - Waiting for the job to be done: +++ - Need help: **Can you show again how to rename the files int he collection?** (see below) ##### Your questions Q1: what is the difference between collection and dataset options while adding to history?:) yes clear! A: Q2:Can you show again how to rename the files int he collection A: Just click into the (middle) field and rename during creating collections Q3: What is difference between fastqsanger and fastqc? A: FASTQC is quality report genrated from the FastQC tool. If you meant to ask FASTQ vs FASTQSANGER, there is a technical difference. A FSATQ can have different quality encodings in every 4th line. There are different variations of how they are encoded. FASTQSANGER enconding is one of them. For Illumina reads, you normally use FASTQSANGER as the filetype. Thanks! Q4:In article there was 7 sample i guess ( treated and untreated) Could all of them 1 collection? A: article ? you mean in libaries webpage ? No. In tutorial referance Brooks There are 4 samples in total ; Import the FASTQ file pairs from Zenodo or a data library: GSM461177 (untreated): GSM461177_1 and GSM461177_2 GSM461180 (treated): GSM461180_1 and GSM461180_2 No I ask different things. In referance study there is more sample. So were they could be 1 collection Yes, you can import as many as samples as data collection according to your working samples. it makes easy to do further analyses like teresa will show you in a bit. So in my study all treatmants should be 1 collection? That makes sense. In general collections simplify analysis. Instead of running a tool on each dataset separately, you can run once on a collection which internally runs on every dataset in the collection. So a collection can be on any set of files which needs the same set of parameters for processing. For example, if all your treatment files are paired-end and have the same read length, they most likely need same set of parameters for QC and mapping. So it makes sense to create a collection. Thanks a lot !! Q5: How can we import as collection from local computer A5: Just click on upload button and after that you can follow the same step like teresa showed. it just the different ways of uploading the data, she showed. Q6: A6: Q7: A7: Please repeat this repeat Pleaseee (what step?)Fastqc upload -please follow Teresa ##### Do you need help? Please describe your issue - can you please repeat how to make collection? I chose files that neede, but what to do next? please follow Teresa again. Thanks a lot!! - - - Is it just me or is the microphone going insane? - It's working fine for us. + + It's solved now ::: :::success ##### ❓ Do you have the data now as a collection? - yes ++++++++++++++++++++ - No - Q: I named as treated and untreated. The GMxxxxx doesn't matter, right? >>>> Great, thanks :) - no all good. You just need to know what is what. ::: :::success ##### ❓ How are the DNA sequences stored? - fastq - in fastq files+++++++++ - Fastq files - FASTQ file - fastq - FASTA+ - Note: FASTA format does not have sequence quality information. That's the main difference to FASTQ. - - - seq id, quality and nucleotide info - - fastQ - - - - ##### ❓ What are the other entries of the file? - quality of nucleotides, info about machine, and sequence itself - dont know - read quality + - information about sequence and quality - header - ++++ - - - - - - ::: #### Quality control :::warning ##### ✏️ Hands-on: Quality control ##### ❓ Are you finished with this section? Add a '+' below - Yes: ++++++ - Waiting for the job to be done: + - Need help: - It says "No compatible datasets available". But I have flattened. - same for me ! No compatible datasets available in fastqc !! - Please check whether your datatype is **fastqsanger** and not **fastqcsanger** - No, it is at the FASTQC level - We get "No compatible dataset available" alert - --> I solved the problem: click on the folder icon (Dataset Collection) and you will be able to select your flattened object - THANKS! >>> I hope it works though - It works! >>> Awesome! ##### Your questions Q1:please repeat two prvious steps. ++ A: you can follow this : https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html#hands-on-quality-control Q2:i still dont know how to start flatten aaaa just click haha thank you A: https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html#hands-on-quality-control - is all clear now? Q3: please perform multiQC A: Soon ; check step 4 from this here: https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html#hands-on-quality-control Q4: A: ##### Do you need help? Please describe your issue Q: can you please repeat from the veery begin this step A: -> please follow the tutorial: https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html; in which step are you? Q:it worked obnly for one of my files should i run it separately for each'? A: did you click on the folder icon? After doing that, it should select all the files in your collection. Q: can you wait a minute please A: yes Q:for me the files are super separatly - can i go forward anyway? A: If you don't create data collection then you have to run further tools for each file every time. we highly recommend you to create data collection to make it easy. - well i thought i made a collection - how to change it? - Change to what? - still loading for me - It is taking long time to load. -> **Tip:** Get history from Teresa to drag and drop the dataset you are missing: The history from Teresa is called: BioNT Workshop RNA-seq Part 1 (in public histories, you can import hers) Q: why do we need to do this step? is it necessary?++ A: not if you managed to run the previous step. We are doing this for people that got stuck with the collections Q: Is there a way to go back one step in Galaxy (middle workspace?) A: you can re-run this job Q: I mean just one step back like if I search something and click on it and I want to go back to my search ? A: Just click back button on your browser or click on Galaxy logo. But this will take you to the Galaxy home page, not your previous page. Ohh okay, I was afraid to to that. yep that's whyI am asking :( Q: So there is no way to do that? A: Unfortunately, not. Clicking back button on your browser might sometimes take you to the previous page but not always. ::: :::success ##### ❓ What is the read length? - 37bp ++ - 37 bp - 37 for both - Q: is it normal to have same read lengths for all the samples in one run? - A: In one sequencing run, they must have the same length. Sequencing length is something we choose. The aternative terminolgy for this is "number of cycles" of synthesis. The length you choose depends on the average fragment length that you observe in your samples. - 37 - - - - - - - - ::: :::success ##### ❓ What do you think of the quality of the sequences? - the 4th dataset is not that good! - treated_reverse has very few duplicate reads, highest quality decrease, no sequences with phred score above 1 million while the rest have peaks around 3 million. - - - - - - - - - - - ##### ❓ What should we do? - - - can you please repeat - can you repeat multiqc step? - Follow step 4 inside here: https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html#hands-on-quality-control - - - - - trimming and filtering :) - - - - - - ::: :::success ##### ❓ What is the relation between GSM461177_untreat_paired_forward and GSM461177_untreat_paired_reverse ? - - - - - - - - - - - - - - ::: :::warning ##### ✏️ Hands-on: Trimming FASTQs ##### ❓ Are you finished with this section? Add a '+' below - Yes: +++ - Waiting for the job to be done: - Need help: ##### Your questions Q1:why dont we specify for settings for R2? A: we are working on a paired data collection so no need to set a different settings for seperate file. Q2: A: Q3: A: Q4: A: ##### Do you need help? Please describe your issue - Q: I need longer break pls - A: we will try to take another break in one hour - - Q: For me the percentatge is different? GSM461177_untreated_2 0.9% GSM461180_treated_2 1.3% - A: Please check the quality cut-off and minimum length cut-offs that you provided during the tool run. Please click on re-run button on the dataset to check what parameters you used in the previous run. ::: :::success ##### ❓ Why do we run the trimming tool only once on a paired-end dataset and not twice, once for each dataset? - - - - - - - - - - - - - ::: :::success ##### ❓ How many sequence pairs have been removed because at least one read was shorter than the length cutoff? - - - - - - - - - - - - - ##### ❓ How many basepairs have been removed from the forward reads because of bad quality? And from the reverse reads? - - - - - - - - - - - - - ::: #### Mapping :::success ##### ❓ What is a reference genome? - assembled set of nucleotides of a species - - - - - - - - - - - ##### ❓ For each model organism, several possible reference genomes may be available (e.g. hg19 and hg38 for human). What do they correspond to? - - updated versions - completely sequenced genome? - different versions - - - - - - - - - ##### ❓ Which reference genome should we use? - - D. melanogaster - The latest version - - - - - - - - - - ::: :::warning ##### ✏️ Hands-on: Spliced mapping ##### ❓ Are you finished with this section? Add a '+' below - Yes: ++ - Waiting for the job to be done: + - Need help: i need help -> post your question below Q: did you add to history as dataset or collection? A: Just as dataset because it is a single file (reference gtf) Splicing mapping part in the tutorial: https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html#hands-on-quality-control ##### ❓After running STAR: Are you finished with this section? Add a '+' below - Yes:+++++++_ - Waiting for the job to be done:+++++++++++++++++++ - Need help: FYI: we are working on a different part of the doc so a lot is getting lost.can we get the line numbers of where things are being shared/updated/asked please?thnaks! - Do you have multiQC results to look at: +++++++++++++ No i cant, because i dont see my log file under collection drop down options ive done everything but still waiting for results ##### Tool to run MultiQC - Yes: - Waiting (if you are still waiting for the previous step, please import the history including the STAR step): on what collection please? ##### Your questions Q1:Could i use splice-awared mappers also to map RNA to bacterial genomes? A: You don't need a splice aware mapper if there is no splicing happening in the genome. You can the tool that you used yesterday. Do you remeber that? Ah! yes, thanks! But wait. The mapper we will use today is a splice aware but it can also map to unspliced genomes equally good. Q: Yes, but do i need to use another tool then? A: Other contestants are Bowtie2, BWA-MEM2. Thanks! Q2: From where do I get gft files? A: You can download gtf files from e.g. UCSC or Ensembl for different organisms. We recommend from Ensembl: https://www.ensembl.org/info/data/ftp/index.html If you want an older reference GTF, then you can browse the whole FTP here: https://ftp.ensembl.org/pub/ Q3: i am not getting ref bases rna seq after transcriptoms. A: maker sure you are following all the steps, on which step you are stuck ? Q4:what is gene model for splice junctions in comparison to reference genome? A: reference genome is the FASTA file of sequences containing sequences of all chromosomes of . For human it is around 3GB in size. The gene model is the GTF file (a tabular file) containing information about what genes present at which places on the reference genome. follow up: why is the gtf file needed for mapping, shouldn't it be possible to map it without such information? Your intuition is correct. It is possible without GTF. If you are working with non annotated organisms, then you don't have a GTF. In that case, STAR will map to the reference genome and try to find where are splice junctions are. This is a bit tricky for the tool and might find some false postive splice junctions. If you already have an annotation in GTF, it is good to provide so that you will help the mapper to simplify its work and guide it with the information that you already have. Q: i am not getting the cutadapt trimmed file as a collection it say's theyre not fasta how do i make the cutadapt files in fasta format? A: we don't need to convert the files to fasta. in which step you are getting this errror ? -- It is solved for you ? so i am on rna star tool parameters-i go on pairend as collection but the only option i have is the initial 2 pe fastq collection. so the trimmed one is not showing up Check if you followed all the previous steps or running RNA STAR with current setting : RNA STAR ( Galaxy version 2.7.11a+galaxy0) with the following parameters to map your reads on the reference genome: ``` “Single-end or paired-end reads”: Paired-end (as collection) param-collection “RNA-Seq FASTQ/FASTA paired reads”: the Cutadapt on collection N: Reads (output of Cutadapt tool) “Custom or built-in reference genome”: Use a built-in index “Reference genome with or without an annotation”: use genome reference without builtin gene-model but provide a gtf “Select reference genome”: Fly (Drosophila melanogaster): dm6 Full param-file “Gene model (gff3,gtf) file for splice junctions”: the imported Drosophila_melanogaster.BDGP6.32.109_UCSC.gtf.gz “Length of the genomic sequence around annotated junctions”: 36 This parameter should be length of reads - 1 “Per gene/transcript output”: Per gene read counts (GeneCounts) ``` so i can see the collections when i go on (as individual datasets) but they won't come up when press (as collection) and i've checked and they're in the right format Check if you select on folder icon then only you will be able to see that file but when i click on collection it doesn't give me the file icon you can drag the file into the box instead of using drop-down option that still doesn't work :( Did you get the trimmed results as you need to use cutadapt results for RNA STAR. so my output from that is "cutadapt from collection14: readout 1 output" and it is "list with 4 fastsanger datasets" ![](https://biont.biobyte.de/uploads/bda88047-0917-45b5-b69b-568a36a73dfc.png) Check if the seeting is the same for you. the first part yes-then i cannot see "cutadapt on collection"--only the initial 2Pe fastq file. maybe i'll try re running the cut adapt in this break! - Sure. thanks for your help so far :) Q: were we supposed to run cutadapt on the initial collection or on the flattera A: on the collection. i think that's where i went wrong :') thanks!' Glad, you figured it out! Q: Why do we have to upload the gff file? Is it not already if selected drosophila as reference? A: GTF (or annotation or gene model) is different from the reference genome. A reference genome is the FASTA file of sequences containing sequences of all chromosomes of your organism of interest. For human it is around 3GB in size. The gene model is the GTF file (a tabular file) containing information about what genes present at which places on the reference genome. Q: Are there some specifing read mappers for bacterial RNA in Galaxy or we just use a regular one like Bowtie or BWA? A. These tools are quite verstile and work well with bacterial reads. You can use RNA STAR or HISAT2 as well. It depends on your choice and project need. - still waiting for job to be done. - We are trying to speed up RNA STAR job :) ##### Do you need help? Please describe your issue - galaxy being suuuuper slow ++ i meant coz of the hat button not working - Seems like? or? but it is not! It's is because RNA STAR use in parallel CPUs and ~45GB of memory per run and we are running same kind of Job at the same time. Let's say we are all running STAR at a time, we need 2 Terrabytes of memory!! That's probably double the size storage (not memory) on your laptop! Whenever you run mapping jobs, please be patient. Often it takes some time to start the job. Also make sure to double/tripple check al the parameters becase this step is time an resource consuming :) - - - It is still loading+ - still loading for me as well - for me the tutorial is also no more working (me too) tutorials won't come up for me either [Helper here:] I reported the issue to our admin team :) # Please use this link until the GTN is back: Same++++++ - my hat icon is not working yes, the Training Network as a whole is not working (see other people reporting the same above), please use the link provided above if you need it - still waiting for the qc to finish running ::: :::success ##### ❓ What percentage - - 83.1 and 79 - 83.1 and 79 - 83.1 and 79 - 83.1, 79 - 83.1%, 79% - - 83.1% and 79 - - - 83.1 and 79 - - - - ##### ❓ What are the other available statistics? - mapped to multiple loci, too many loci, too short - Q: What is too many? How much, is this specified? - Number of splices: Annotated; Average mapped length; - Number of Reads. - Alignment Scores, - - - - - - - - ::: Go here: https://training.galaxyproject.org can you show your settings once again please My Galaxy crashed! - try to refresh the page The Github is als not working, it is crashing :( https://galaxyproject.github.io/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html We are sorry for this, try to copy and paste this link when you see this crash or click on back icon on your browser Same again ++I :::warning ##### ✏️ Hands-on: IGV inspection Done by observing Teresa's screen and commenting it together ::: #### Counting the number of reads per annotated gene :::success ##### ❓ Look at Fig.19. How many reads are found for the different exons? - - - - - - - - - - - - - ##### ❓ Look at Fig.19. How many reads are found for the different genes? - - - - - - - - - - - - - ::: :::warning ##### ✏️ Hands-on: Determining the library strandness using Infer Experiment ##### ❓ Are you finished with this section? Add a '+' below - Yes: + - Waiting for the job to be done: - Need help: Q. I could not proceed with these steps with my galaxy is crashing, I am watching for now, and hopefully in evening i will try on my history. would it be fine? A. Yeah, completely fine. - I'm not able to proceed, the bam file from STAR is unavailable when I try to select in Infer Experiment (but it appears green in the history) even the one that I copy from Teresa history is also unavailable, I will try later on today. Solved - ##### Your questions Q1: A: Q2: A: Q3: A: Q4: A: ##### Do you need help? Please describe your issue - - - - - - - - - - - ::: :::success ##### ❓ What are the “Fraction of the reads explained by” results for GSM461177_untreat_paired? - "1++,1--,2+-,2-+": 0.4626 - "1+-,1-+,2++,2--": 0.4360 - - - - - - - - - - ##### ❓ Do you think the library type of the 2 samples is stranded or unstranded? - unstranded - - - - - - - - - - ::: (Question and comments here are moved to the feedback section) ### Summary - How to prosess RNA-seq data step by step 1. QC including trimming - Use tools like `FastQC` to inspect the qualety of your data - After the inspection you can trimm for bad qualtey bases, trimm adaptes, if there are any, using a tool like `CutAdapt` 2. Mapping - When mapping RNA-seq data to a genome you need to pick a splice awar mapper like `RNA STAR` - The ouptup format of a mapping tool is in BAM format. - You can visulaize you mapped reads using a Genome browser like `IGV` 4. Counting including assesin strandness of the data - Before you can count data you should know the strandness of your sequencing library - If you don't know the strandness of the data you can use a tool like ` Infer Experiment` - Next you can count your mapped reads (will be done tomorrow!) - Datatypes you learned until today - FASTA - FASTQ - BAM - GTF :::success ##### ❓ General Questions Q1: what is the difference between fasta and fastq? A: In a FASTA file, each entry consists of two lines: a header and a sequence. In a FASTQ file, there are two additional lines for each entry that contain the quality information of the sequence. Sequences in a FASTQ format include not only the sequences themselves but also quality information. Question: How can we zoom from the mapping to find where our reads are matching or are more present e.g for srna seq if I want to find the location of the small RNAs on the genome? A: If you view the mappings on IGV you can search for your gene of interest directly on the top. But normally to find out which genes are expressed more, we have to count/quantify. We will perform this step tomorrow. - Thank you. Q: More to this, I mean for e.g if I don't know where my small rnas are coming from and I want to find? A: If you are unsure what types of sRNAs are present within your genome, you can use tools like BioMart or specific ncRNA databases, to identify these sRNAs and their locations. Once you have the locations, you can check their coverage in IGV. However, to determine which sRNAs are represented in your data, you should perform a quantification analysis (this will be done tomorrow). ::: ### Feedback Is current history enough for certificate or should we complete tutorial before we send you today's history? - Yes if the history contains all the steps we did during the day this is enough for the certificats. If you want to test the steps we sciped on your own we do encourage you. However it is not nessesary :) we started and continued too slow but the end it is too fast,this is not ideal i think ++++++ - We will repeat tomorrow again the steps of you perfomed today, to help you chatching up :::success ##### ❓ One thing that was good about today - Everything in the first two hours were amazing. But we understand it is technical issue and can happen anytime. + - - Thank you for this best tutorial. - the backup plans ;)+ - the explanations for the analysis and reads ++ - Nice and clear tutorial - You guys are really focused on explaining it all, patient and clear - ##### ❓ One thing to improve - - Too much focus on technical things, sometimes dilutes the main reason why we're doing certain things and a deeper understanding+++ - the summary of yesterday was super good with information on what to use etc., maybe this could be done for every day this detailed? - - - - ##### ❓ Any other comments? - thank yo very much! see you tomorrow. - Thank youuu and see you tomorrow! - Thank you very much for today! - Thank you. - Thank you - - Thank you - Thank you! - Thanks! (Questions here are moved up to the general questions section) ::: ## Day 4 - Friday ### Table of Contents 1. [Welcome](#Welcome) 2. [Repetition of the day before](#Repetition-of-the-day-before) 4. [Tutorial](#Tutorial:Reference-based-RNA-Seq-data-analysis) - [Analysis of the differential gene expression](#Analysis-of-the-differential-gene-expression) - [Functional enrichment analysis of the DE genes](#Functional-enrichment-analysis-of-the-DE-genes) 7. [Summary](#Summary) 8. [Feedback](#Feedback) ### Welcome :::success ##### ❓ Before we start: Is the screen clearly visible (add '+') - Yes the sound is OK ++++++++++ - - Please zoom in ::: :::success ##### ❓ We plan to adjust the break times according to the workshop's progression. Does this arrangement work well for everyone? - Yes ++++++++++++ - No ::: :::success ##### ❓ Those that answered no, could you clarify your needs? Do you need a specific break time? - - - ::: ### Repetition of the day before :::success ##### ❓ What do you remeber from yesterday? - Uploading RNA data, quality check, mapping to a reference, estimation of the strandness +++++ - Anaylsis via visualization - - - - - - - ##### ❓ Do you have a question from the day before? - Why did we use the Infer Experiment tool? - The tool will tell you if your sequencing library is stranded or unstranded. - What is the mean of "sample stranded or unstranded"? very clear thank you! Is that why we upload the rna file then? - Pavan will explain it now. You can also read up in the [tutorial]( https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html#estimation-of-the-strandness) - Hello, I had to quit yesterday at 12 for a medical rendez-vous. I have imported Teresa’s history. It is correct if I continue today starting with the last task (#100) of her history? - We ended yesterday in the tutorial [here](https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html#hands-on-counting-the-number-of-reads-per-annotated-gene). The shared history has the Intere experiment as a last step. And in addition below the next step `Feature counts`. But sure you can use this history to follow the next steps. Thanks !! - Yesterday i couldnt follow much towards the end, can you remind me if we will start with Read count today? - Yes absolutely, today's session starts with Read count. >> thank you! - we don't see screen - It should be+ visible now! - Yes, Percfect !! - - - - - - - ::: ### Tutorial: [Reference-based RNA-Seq data analysis](https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html) Part 2: Counting and Analysis of the differential gene expression :::warning ##### ✏️ Hands-on: Open the tutorial 1. Go to [Galaxy](https://usegalaxy.eu/) (done in previous Hands-on) 2. Click on the hat in the top bar 3. Navigate to the topic "Transcriptomics" 4. Click on the tutorial "Reference-based RNA-Seq data analysis" ##### ❓ Have you found the tutorial? (Add a + when done) - Yes: ++++++++++++ - No: ##### Your questions Q1:After the STAR move, other parts of history are on the copied one from Theresa, not on my personal? Will that be a problem for certification, or should i just copy it real quick? A: You can copy them now or you copy them after todays training. It will be only few steps that are missing from yesterday and than Pavan will start with a fresh histroy and new data. Q2: What exactly do we need to copy from yesterday? And how? A: Just try to get a clean history following the steps of yesterday. If you could not perform the steps after `RNA STAR` than just copy this steps to your original history. Is this now more clear? Yes, thank you Q3:How can I copy from my yesterday history to new for today? A: Create a new history --> Click on show History side by side --> And drag the required files (RNA STAR BAM and Drosophila gtf file) from your yesterday's history to the current (new) history. You can now follow Pavan to see how can you do it. Is this resolved for you? Q4: What is the difference between the GTF file (gene annotation) and the reference genomes we used in previous days (mm9 or dm6)? A: A referece genome contains the sequence of the whole genome in FASTA format. So the whole DNA sequence. But it will not contian additional information of the Gene features. This additional information can be found in the Annotaion, the GTF file. Here you do not have the DNA sequence but you have positional information like where a gene starts and stops, what exons this gene has etc. Overall, this file consists of information of gene annotations, mapping genomic features like exons and transcripts to specific coordinates on the reference genome. amazing thank you! ##### Do you need help? Please describe your issue - - please slow down, I am searching the right history - Pavan just started a new history and copied the RNA star BAM file to this new history. You can copy the BAM file to click on the `history site by site` view, you can find in the history menu (upper right corner button), or on the left side clicking on history. Once you are in the multiple history view you can check if you have your history from yesterday in the view. If not you need to select it under `select history`. Than you can drack and drop the file from the history yesterday to your current history. - ::: ##### Using FeatureCounts :::warning ##### ✏️ Hands-on: Counting the number of reads per annotated gene ##### ❓ Are you finished with this section? Add a '+' below - Yes: ++++++++++ - Waiting for the job to be done:+ - Need help: ##### Your questions Q1: Do we use the BED12 or gtf D. melanogaster genes? A: For Feature counts you need to use the gtf file Q2: Can this tool also be used for bacterial RNA seqs? Then adjusting the exon with consensus or something? or is this not applicable? A: Q3: For the second time i am getting an error message? Fatal error: Exit code 255 () A: The second time running Featur counts? - i got error first time - Did you import the bam and gtf file from yesterdays shared history? Please try to use this files, to make sure the input is correct. -I am currently in that process 👍 - works now, probably was something wrong with my mapped BAM file, i used Theresas' shared now and it worked - Perfect! Q4: What is the difference between a gff and a gtf file? Can i convert it? Is there a tool? Or can i also use gff? A: GTF (Gene Transfer Format) and GFF (General Feature Format) are both file formats used to store genomic annotations.The main difference is GTF is a stricter version of GFF (GFF2), while GFF (typically GFF3) supports more flexible and hierarchical annotations. Okay thanks! Can i also use gff then for this purpose, or is it strict to gtf? because there is always talked about the "gff/gtf" We generally advise to use GTF file if available, but if not you can convert your GFF file to GTF file using the `gffread` tool. Great, thanks! Q5: A: ##### Do you need help? Please describe your issue - I did everything for featureCounts based on the tutorial, but the work does not start. No errors. - can you follow Pavan today for this step and see if it works now? If not please let us know. - I copied the entire work history. Files copied one by one to a nother work history dosent work. - its too fast today, i got lost, what is the tool,please? - We are at the Feature counts step [here](https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html#hands-on-counting-the-number-of-reads-per-annotated-gene). You can check the parameter settings in the tutorial to run the tool. - - - i noticed that Pavan has an updated tools embedded in his hat tutorial.while my hat shows older versions, is it going to effect my analysis? - Please press CTRL+F5 >>> thank you Pavan, it worked. i can run this tool but it ends every time as an error with red background - what should i do in this case? im still running but problably my internet is too slow to run this tool - it doesnt work still i tried 4 times, aleways red on the finish - Please check this parameter “Does the input have read pairs”: Yes, paired-end and count them as 1 single fragment - “Gene annotation file”: Drosophila_melanogaster.BDGP6.32.109_UCSC.**gtf.gz** Q: can you please show your galaxy i dont know, if i have the right output - For Feature counts you will get 3 output files: (1) count, (2) length, (3) summary. In the next step MultiQC you will get two output files -i have only two i dont have lenght A: Can you double check if you used the latest version of FeatureCounts? i have only one, how can i check? i refresh all a now it is ok, thank you - Are you still running featureCounts? + - Now Pavan is discussing its results by the MultiQC plots Q: What happens when we get less than 60%? A: You should get back and find what could be the reason. Did you use the correct Annotations. How was the mapping quality before? How was the quality of your sequenced reads? Is it explainable by the quality from before? Q: A: ::: :::success ##### ❓ How many reads have been assigned to a gene? - 8249329 and 8434375 - 63% - 8249329 and 8434375 - Untreated: 8249329; treated: 8434375 - Untreated: 8249329; treated: 8434375 ++ - - 8249329 untreated and for treated: 8434375 - - 63% - - - - - - - - ##### ❓ When should we be worried about the assignment rate? What should we do? - below 50%?? - - check if your annotation fits - could also be contamination then - - Check in IGV if reads are actually being mapped to genes. Maybe the wrong reference was chosen or something. - - - - - - - - ::: :::success ##### ❓ Which feature has the most counts for both samples? (Hint: Use the Sort tool) - FBgn0284245 (128739 and 127416) ++ =eEF1α1 - FBgn0284245 128739 and 127416 - highest genes number 3 and 4 are drosi ribosomal genes... - - - - - - - - - - - - - ::: #### Identification of the differentially expressed features :::warning ##### ✏️ Hands-on: Import all count files ##### ❓ Are you finished with this section? Add a '+' below - Yes: ++++++++++++++++++++++++++++++++++ - Waiting for the job to be done: + - Need help: ##### Your questions Q1: According to featureCounts results, ther is not significant differenece between samples gene counts. So, can we say treatment did not make an important difference on samples transcriptome? So treatment did not work well maybe? A: How did you concluded that there is no diference? - Results according to MultiQC assignments (63% and 60%) is it wrong? - No this is is the correct result form Feature counts. Here we do the counting for treated and controle separately. In the next step we will do the differental expression analysis, where you can see which genes are up and down regulated by comparing treated against control. - Thanks a lot Q2: I need to leave today earlier, so I will finish analysis by myself, but I have a question: can you please maybe send an email about how to share my history with you, so I can get a certificate? :) A: In the tutorial of the first day you can find a description on how to [share your history](https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html#share-your-work). Additionaly i will sent out an Email to all of you on what to do to get a certificat. Q2.1 Ok, thanks!! Q3:SO for these data sets already feature counts is run? A: Yes we prepared the count tables for you for all of the 7 files (control and treatment). - Thanks! Q4: A: ##### Do you need help? Please describe your issue - My computer shut down, so I lost the last part of adding the count files - Don't worry. you can just follow the tutorial as eveything mentioned there - I should add the count files to history as colletion or as a dataset? - (we prepared the count tables for you for all of the 7 files (control and treatment, Pavan will explain in a bit) - https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html#hands-on-counting-the-number-of-reads-per-annotated-gene - - ::: :::success ##### ❓ Any questions regarding normalization? Q1: Does it recognize it as difference between single and paired end, or just sort of adjusts if the difference exist. Point being, can we, if needed, treat those factors as anything we could consider not a biological difference? - understood 👍 A: You can tell the tool that which samples are paired-end and which are single-end (as factor level parameter) and then tool will be able to handle this. Q2: A: Q3: A: Q4: A: ::: ##### Choose Your Own Tutorial: Basic :::warning ##### ✏️ Hands-on: Determine differentially expressed features ##### ❓ Are you finished with this section? Add a '+' below - Yes: +++++++++++++++++++++++ - Waiting for the job to be done: ++++++++ - Need help: ##### Your questions Q1: A: Q2: A: Q3: A: Q4: A: ##### Do you need help? Please describe your issue - repeat plese.Determine differentially expressed features - Can you please refresh the step? - Pavan will do this now: thsanks - Help! I got error"No size factor was used" after running - Ok it worked now, a file was selected in the option Tabular file... - ::: :::success ##### ❓ How many genes are there in Drosophila dm6? - 23932? Correct+++ - 23932+ Q: How do you find the numner of the genes? A: Please click on any of the count output files and the number of lines correspond to the number of genes. Do you find it? No. Can you Pavan please show us.. I couldn't find it. -> click on any of the count output files in your history. It is shown under the data set title then. Do you find it? -> What do you see when doing what we suggested? A different number? Or nothing? Or dont you find the dataset in your history? We need more details to help ![](https://biont.biobyte.de/uploads/136720a1-479b-445d-9ffc-2525f4c46f6e.png) If you go on any of your featrueCount outputs you will find the number of lines 23993. This number minus 1 because of the header line.++ Q: Are these trainings recorded? Can we see these videos somewhere? A: yes, we will provide it after the workshop Great, thanks! Q: A: ##### ❓ What is the first dimension (PC1) separating? - treated from untreated - treated from untreated - - - - - - - - - - ##### ❓ And the second dimension (PC2)? - single vs paired - single vs paired - - - - - - - - - - - - ##### ❓ What can we conclude about the DESeq design (factors, levels) we choose? - - factors are correctly separated? - samples are correctly categorizae by factors - - - - - - ::: :::success #### ❓ Any Questions? Q1:I have a question about factors. If I conduct an experiment where I apply a treatment to both wildtype and knockout samples, and the RNA is extracted by two different people, would I have three factors: treatment (treated & non-treated), genotype (WT & KO), and extraction (person 1 & person 2)? Is this correct? And factor treatment is PC1, genotype is PC2, and extraction is PC3? A: exactly. Give as first factor the most important influence (usually it is treatment, yes) Q: Can you add more than two factors? What would happen if we added more factors in? like 5? then you wouldn't have enough axes to plot the variance A: You can add more factors but only it shows the first ones. Give the most important factore first.cool thansk! A: If you want to compare all factors together, put all factor levels in the first factor and than Under "output options" choose "Output all levels vs all levels of primary factor (use when you have >2 levels for primary factor)" Q: **Will we have a break?** A: Now :-) Q: 15 min +++++++++++++++++++ A: Q: A: ::: #### Add this box after each break Let's come back at 11:35 (CEST) :::success ##### ❓ Are you back? - Yes+++ - No ##### ❓ Any questions regarding what we did until now? Q1: If we had changed the order between Factor levels Treated/Untreated, would the "normalized count" for each condition be different? Or is it only the ratio between the two that will be done in the reverse order? A: The normalized counts won't change. Only the fold changes will be flipped. Thanks! Q2: A: ##### ❓ Is the speed fine - Yes:++++++++++++++++++++++++++++++ - Too slow:+++ - Too fast: ##### ❓ What should we do at the end (add a +) - Visualization:++++ - GO-term analysis:+++++++++++++++++++++++++++++++++ -Gene Annotation Helper here: But all topics are in the GTN material! You can run the full tutorial by yourselves later to finish and cover all topics Q: what do you mean by visualizatioN? A: Creating volcano plot and heatmap plot. Visualizing gene expression change across samples, and significant DEGs. I see, thank you! You're welcome. Q: can you please mention step by step to get on quality control and mapping for day 2. following the hat.... Find mapping and QC under https://training.galaxyproject.org/ -> Methodologies -> Sequence analysis thanks Q: Is it possible to use the generated figures for e.g. publications when referring to Galaxy? Or are there any issues? A: some journals accept Galaxy histories (links) as methods part in a publication because a Galaxy history captures all parameters set. Nice!++++ Q:just a comment: i found it very funny and true when he said that this file roams around your lab for months after the analysis!!! (But this is true, right?)yes, absolutely! :) I know, helper 3 was a wet lab scientist for 10 years :)nothing is offending hahahahah, its funny A: We will tell this to Pavan :) Q:what should be the formate of header? A: tabular I have the same value for fold change as you Pavan ++ same as Pavan ::: :::warning ##### ✏️ Hands-on: Annotation of the DESeq2 results ##### ❓ Are you finished with this section? Add a '+' below - Yes: ++++++++++++ - Waiting for the job to be done: +++++++++++++ - Need help: ##### Your questions Q1: A: Q2: A: Q3: A: Q4: A: ##### Do you need help? Please describe your issue - - - - ::: :::warning ##### ✏️ Hands-on: Add column names ##### ❓ Are you finished with this section? Add a '+' below - Yes: ++++++++ - Waiting for the job to be done: - Need help: ##### Your questions Q1: A: Q2: A: Q3: A: Q4: A: ##### Do you need help? Please describe your issue - - - - ::: :::warning ##### ✏️ Hands-on: Extract the most differentially expressed genes ##### ❓ Are you finished with this section? Add a '+' below - Yes: +++++++++++++ - Waiting for the job to be done: - Need help: ##### Your questions Q1:I cannot select the anotated Deseq results in the filter option A: Change the datatype to tabular. Dose this work? Q2:**Repeat the filter please** A: If you want to do it by your self you can follow the tutorial [here](https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html#hands-on-extract-the-most-differentially-expressed-genes) Q3:Sorry I did not get, why c"7"? is it any default or any pattern that i can use there? for the filter condition A: Column 7 (c7) contains the p-adjusted-value, which should be 5% (0.05) to get the significant changes Q4: A: ##### Do you need help? Please describe your issue - I cannot select the anotated Deseq results in the filter option - Change the datatype to tabular. Dose this work? - Yes! thank you 👍 - Can you share history with us? - - - need to add header as a tabular, had the similar issue - Pavan missed a step to rename the file. - on which dataset we should do this filtering step? - For the first filtering step you should use the `Annotated DESeq2 results` file. In the second filtering step you use the output of the first filtering step. And don't forget to rename the file - - - - - - ::: :::success ##### ❓ How many genes have a significant change in gene expression between these conditions? - - 955+ - - - - - - - - - - - ::: :::success ##### ❓ How many genes have been conserved? **We wil update the anwers in the tutorial next week** - 205 ++++++ - 204 !? - 204 + - 205 - 205 - 205 - 204 - 208 - Why are answering this as number of gene conserved?? I guess it might mean that the genes passed filtering? -> yes, conserved in terms of "kept after filtering" - - - - - Q: We didnt specicify if first row is name of the column? in that file the first line is NOT column names so it's OK - A: Yes, it is OK. The first row is header by default. - - ##### ❓ Can the Pasilla gene (ps, FBgn0261552) be found in this table? - yes - - - - - - - - - - - ::: :::success ### ❓ Any Questions Q: Can you explain more please on the absc3>1? A: abs(c3)>1 means return True if the absolute value of the number in column 3 is more than 1. [abs(2)=2, abs(-2)=2] Q: I got 103. just put c3>1 instead of abs(c3)>1. A: If you do not add abs() you only filted for values greate than 1 but not for the once that are smaller than -1. Therefore your file has less entrys. Q:my compute file is getting red again and again. I have the same issue as one of the lines can't be computed; the rest seem to be fine. A: Can you share the error or share your history with us? Failed to convert some of the columns in line #8651 to their expected types. The error was: "could not convert string to float: 'NA'" for the line: "FBti0060344 0 NA NA NA NA NA" It's published - Can you give us the name of your history? Ok I found the issue - autodetection of column types wasn't turn off - You did use the wrong input you should use the annotaed deseq2 output. 34 in your history Another issue found, thank you a lot! ::: **Can you share history with us? :)** #### Gene Ontology analysis :::warning ##### ✏️ Hands-on: Prepare the first dataset for goseq ##### ❓ Are you finished with this section? Add a '+' below - Yes: ++++++ - Waiting for the job to be done: - Need help: ##### Your questions Q1: When to share history? and to which email. A: Q2: A: Q3: A: Q4: A: ##### Do you need help? Please describe your issue - my plot came out differently-super super dense, you can't make out any of the words--could pavan pls say how many lines he got in his tabular results? - - - ::: :::warning ##### ✏️ Hands-on: Prepare the gene length file ##### ❓ Are you finished with this section? Add a '+' below - Yes: ++++ - Waiting for the job to be done: - Need help: ##### Your questions Q1: What is "TI" A: Not sure if I understand what you mean. where did you get the 'TI' from? Q2:FBNTI A: You mean "FBTI0060344" for example? It means this gene is a Transposable element. Q3: A: Q4: A: ##### Do you need help? Please describe your issue - Was the sample-specific gene background provided somehow at any step? Maybe the gene IDs provided in the datasets already has the full gene sample list. But I am not entirely sure. - The genes are coming from the Flybase database ("FB"). >>> That's the reference. I mean the gene sample background. - - ::: :::warning ##### ✏️ Hands-on: Perform GO analysis ##### ❓ Are you finished with this section? Add a '+' below - Yes: +++++ - Waiting for the job to be done: +++ - Need help: ##### Your questions Q1: A: Q2: A: Q3: A: Q4: A: ##### Do you need help? Please describe your issue - - - - - - - - ::: :::success ##### ❓ How many GO terms are over-represented with an adjusted P-value < 0.05? How many are under-represented? - - - - - - - - - - - - - ##### ❓ How are the over-represented GO terms divided into MF, CC and BP? And for under-represented GO terms? - - - - - - - - - - - - ##### ❓What is the x-axis? How is it computed? - - - - - - - - - - - - ::: #### KEGG pathways analysis :::warning ##### ✏️ Hands-on: Perform KEGG pathway analysis ##### ❓ Are you finished with this section? Add a '+' below - Yes: + - Waiting for the job to be done: ++ - Need help: ##### Your questions Q1: A: Q2: A: Q3: A: Q4: A: ##### Do you need help? Please describe your issue - - - - ::: :::success ##### ❓ How many KEGG pathways terms have been identified? - - 125 - - - - - - - - - - - ##### ❓ How many KEGG pathways terms are over-represented with an adjusted P value < 0.05? - - - - - - - - - - - - - ##### ❓ What are the over-represented KEGG pathways terms? - - - - - - - - - - - - - ##### ❓ How many KEGG pathways terms are under-represented with an adjusted P value < 0.05? - - - - - - - - - - - - ::: :::warning ##### ✏️ Hands-on: Overlay log2FC on KEGG pathway ##### ❓ Are you finished with this section? Add a '+' below - Yes: - Waiting for the job to be done: - Need help: ##### Your questions Q1: A: Q2: A: Q3: A: Q4: A: ##### Do you need help? Please describe your issue - - - - ::: :::success ##### ❓ What are the colored boxes? - - - - - - - - - - - - - ##### ❓ What is the color code? - - - - - - - - - - - - - ::: #### General questions How to send the results? Is it enough to share the history? can we have recordings from this cpurese,plese? where can we find it :::success ##### ❓ General questions Q2: Was the sample-specific gene background provided somehow at any step? Maybe the gene IDs provided in the datasets already has the full gene sample list. But I am not entirely sure. A: If you're referring to goseq analysis or any similar functional enrichment analysis, we provide the full gene list normally but additional annotation of padj or fold changes. Q3:can you add this history to public histories? i can't find it A: You can eiter make the history public or share it directily with the email adress `muellert@informatik.uni-freiburg.de` Q4:Do we need to sahre history today? +++ A: Until Monday 16th of September. Q5: When we need to share our history? I would like to repet this excercises, so it would be great if it could be done next week+ A: You can share the history until Monday the 16th. If you have diffeculties with this, please email us. ::: ### Summary byeee thanks ### Feedback :::success ##### ❓ One thing that was good about today - Explanation was good, learned alot! Thank you for your pateince. All panelists, instructors, and helpers were great. - - Have wanted to learn about the DEG...and i found it very useful - cool methods! - - Good explantions to why steps are used - - you were really patient - nice work - the pace can be frustrating but it's part of making sure everyone is on the same page and accomodating for different abilities. maybe if the workshop were a little longer then we wouldnt have to rush in the latter parts (which tend to be the harder/more intricate stages) - - feel a lot more confident in talking about rna seq methods and take on new experiments-thank yoU! - great work today ##### ❓ One thing to improve - pacing - faster or slower? the ending was very rushed - The workshop is very good in general and i would recommend it to my Friends and Collegues. I have a feedback to improve the pacing. I realised that we did most of the important things in very fast pace, and spend lot of time in simple steps. I beleieve this is because of the mixed population of the participants. So what can be done, an additional half day, where people can get used to the software itself, for example, using GNT section, making, copying and switching histories. In this way, we can spend major time leaning improtant analysis. The Breaks can be also be assigned with a strategy to let people catch up.+ - 100% I completely agree. We will consider your valuble suggestion. It makes sense to get comfortable with the Galaxy in general before such long and complex analysis. - I enjoy the workshop, it was very good and the documentation is excellent to catch up when we are lost.I learned a lot of new features that will be usefull. Probably one more day could be usefull to do every steps. - - - - - - - - - - ##### ❓ Any other comments? - Zooming in the Galaxy page - while it enables to see details, you can't see much and it would be useful to glance at the Tutor's dashboard to check if we set parameters as they should be - Thank you all very much - Thank you! - Thanks to all of you 4 folks ! And thaks for you patience ;-) - Thank you very much! - is there a link for the python course? - [Python workshop](https://www.cecam.org/workshop-details/from-zero-to-hero-with-python-1362) :::: :::info #### Learn more about data types and databases You can follow this two tutorials by your self the next week: 1. [Learning about one gene across biological resources and formats](https://training.galaxyproject.org/training-material/topics/data-science/s/online-resources-gene/tutorial.html) 2. [One protein along the UniProt page](https://training.galaxyproject.org/training-material/topics/data-science/tutorials/online-resources-protein/tutorial.html) If you have any questions regarding this or other tutorials of the workshop you can ask them in this [questions document](https://biont.biobyte.de/TJc95XlcQjebe0NMnSpOmg#) which is open until the **September 20th** ::: please repeat once the process of sharing histories. please, where to find the recordings? :::info #### Please fill the post-workshop survey Please fill the [post-workshop survey](https://survey.bio-it.embl.de/831859?lang=en) ::: :::info #### How to get a certificate 1. Complete the post-workshop survey 3. Share your Galaxy histories (having your name or unique identifier) with `muellert@informatik.uni-freiburg.de` 4. To finally request the certificate, send an Email to `muellert@informatik.uni-freiburg.de` with your unique personal identifier and a note to what histories you shared :::