Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Jeeel-03 authored Aug 28, 2021
1 parent 0fab908 commit 6f75b91
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Team-Collins-Hackbio-2021
#### In Stage 1 of HackBio Internship_2021, our task was to perform analysis on a genomic dataset using various bioinformatic tools of our choice. Malaria is a disease caused by parasitic protozoans in the genus Plasmodium, which are transmitted between vertebrate hosts by female mosquitoes in the genus Anopheles. The predominant malaria parasite species that infect humans are Plasmodium falciparum (Neafsey et al., 2021) and our team agreed to work on Plasmodium falciparum (P. falciparum).
The objective of our analysis was to evaluate variations in the genome sequence with the reference by performing a variant call analysis.
*The objective of our analysis was to evaluate variations in the genome sequence with the reference by performing a variant call analysis.*
![This is an image of the workflow.](https://github.com/Jeeel-03/Team-Collins-Hackbio-2021/blob/main/Workflow%20for%20ppt.png)
### **Analysis**
###### We started our analysis by performing a quality assessment on the dataset generating a *FastQC report* and then *filtering out poor-quality reads* to re-assess the quality to get a *consolidated MultiQC report* from all the samples. Following this, indexing of the reference genome that we retrieved from the PlamoDB database was done, which enables locating possible alignment sites for query sequences in a genome, rapidly saving time during alignment. Later, *reads were mapped to the reference genome*, and we got a SAM file. Further, this SAM file output was sorted by various SAM attributes generating a BAM output file, which was then executed for the AddOrReplaceReadGroups command to align all the reads to a single new read-group. Next was to validate our BAM file to a specific format to ensure that no errors appear later due to improper formatting. Subsequently, any duplicate reads were identified and characterized as being derived from a single piece of DNA. Followed by base recalibration, variant calling was executed using Haplotype caller to obtain a raw VCF file, for our sample. Variant annotation was then performed on the vcf file to annotate the dataset and identify genetic variation from the ‘reference’ genome of P.falciparum.
Expand Down

0 comments on commit 6f75b91

Please sign in to comment.