-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
merging kraken profile functionality started
- Loading branch information
Showing
12 changed files
with
864 additions
and
17 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
command i used to make the sample sheet | ||
|
||
python3 workflow/scripts/prep_sample_sheet.py --delimiter="_" --r1_common="_1" --r2_common="_2" --path="../cat_reads" | ||
reads in ../cat_reads and ended up working for wverything hehe | ||
|
||
noticed that NTM00399 did assemble, which i noted in my rotation report that it didnt... maybe there were just too few assemblies | ||
on the same note, two samples are taking quite a long time: NTM00262, NTM00067 | ||
|
||
had to delete snakemamba conda and reinstall w code in documentation. i think it became too bloated w all my mlms but not sure | ||
|
||
with that new install, i also had to update snakemake cause kept getting a snakemake error. 7.3.8>7.18 | ||
|
||
20221116 | ||
|
||
noticed snakemake stalls at dropping short contigs script + no python gives Biopython not recocgnized error -> biopython not installed? | ||
|
||
found which conda env it is w/ "snakemake -n --list-conda-envs" | ||
activated it with minipython and tested parse_contigs function --> it worked, so prob not the install | ||
means prob a problem with snakemake?? maybe path? | ||
i think it may be tht there are issues in calling the correct python environment for the "script" execute in snakemake... | ||
...maybe try installing in snakemamba | ||
did conda install -c bioconda biopython=1.78 in snakemamba env | ||
WOO this seemed to have worked...but why didnt snakemake work og? | ||
|
||
noticed map2contigs wasnt working after dropping short contigs started working | ||
in the rule description, "mem=<>" was blank since god knows when... | ||
set it to be 25G | ||
|
||
also, deleting NTM00263, NTM00067 from sample sheet | ||
|
||
20221201 | ||
turns out i accidentally deleted NTM01067 instead of NTM00067; its been corrected and rerun | ||
|
||
also commited the changes from last time RE debugging and made some new changes to the report which now seems kinda fixed | ||
|
||
NTM01464_k99_1692 only has 1 contig > 1kb in megabit and none in spades therefore removing |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
20221204 | ||
|
||
starting analysis on shotgun bal data | ||
|
||
theres one mistake in sample log -- B22_NaNa was misnames B22_NaDNase so both | ||
reads got put into one. THANKFULLY not a frameshift mistake -- need better | ||
naming conventions to avoid this mistake in the future | ||
|
||
will omit NaDNase samples--will rerun bcl convert with updated sample sheet | ||
and add them in toorrow. | ||
|
||
B22_NaDNase MGX_DATA /projects/p31648/bmo_shotgun_fastq/reads/B22_NaDNase_S12_L001_R1_001.fastq.gz /projects/p31648/bmo_shotgun_fastq/reads/B22_NaDNase_S12_L001_R2_001.fastq.gz | ||
|
||
|
||
20221205 | ||
|
||
restarting analysis with updated data | ||
|
||
python3 workflow/scripts/prep_sample_sheet.py --delimiter="_S" --path="/projects/p31648/bmo_shotgun_fastq" --subdirectories=Truefdsa | ||
reads in ../cat_reads and ended up working for wverything hehe | ||
|
||
20221213 | ||
|
||
last time i added a bunch of functionality including bowtie2 alignment etc. | ||
this led to improved efficiency and showed greater host-depletion and taxonomic depth from metaphlan | ||
this led to the conclusion that duplicate reads from adapter dimers are contributing to artificial low alignment in some samples | ||
thus, a read deduplication step is necessary to more robustly | ||
|
||
today, im updating fastp to v0.23.2 in order to use its deduplication functionality | ||
activated conda environment with fastp | ||
conda update fastp | ||
|
||
fastp needs to be optimized with downstream stuff | ||
fastp -i /projects/p31648/bmo_shotgun_fastq/ds.d2b2785572ed457391fae708169f4be5/B21_LyNa_S77_L001_R1_001.fastq.gz -I /projects/p31648/bmo_shotgun_fastq/ds.d2b2785572ed457391fae708169f4be5/B21_LyNa_S77_L001_R2_001.fastq.gz --out1 B21_LyNA.r1.a.fastq.gz --out2 B21_LyNA.r2.a.fastq.gz --detect_adapter_for_pe --dedup --thread 23 --length_required 50 -j ./B21_LyNa_fastp.json -h ./B21_LyNa_fastp.html -V --overrepresentation_analysis -g --merge --merged_out B21_LyNA.merged.a.fastq.gz --low_complexity_filter --include_unmerged --correction | ||
|
||
and then include -U merged + unpaired in the bowtie mapping | ||
|
||
|
||
|
||
20221214 | ||
|
||
starting to test the other components | ||
|
||
fastp > bowtie2 etc. finished up yesterday wit dedup except B21_LyDNase needs more memory | ||
|
||
one liner for flagstat | ||
cd into bowtie_out directory | ||
for i in $(ls -d *) ; do sed -e "s/^/$i\t/" ${i}/*.flagstat.tsv >> tmp.txt ; done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
|
||
# 20230125 | ||
|
||
Starting analysis of Hauser 2023 mouse microbiome competition analysis project | ||
|
||
Using mlm pipeline | ||
|
||
Transfered data to globus last week | ||
|
||
Generate sample sheet | ||
python3 workflow/scripts/prep_sample_sheet.py --delimiter="_S" --path="/projects/b1042/HartmannLab/jack/Hauser2023_InVivoCompetition/Hauser11_10.21.2022" | ||
|
||
temporarily moved sample sheet for bmo project to have suffix "_bmo" | ||
|
||
Downlaoding mouse bowtie2 index from sourcefource | ||
https://bowtie-bio.sourceforge.net/bowtie2/index.shtml # info | ||
https://benlangmead.github.io/aws-indexes/bowtie # download/wget links | ||
M. musculus GRCm39 | ||
wget https://genome-idx.s3.amazonaws.com/bt/GRCm39.zip | ||
|
||
There are two CZ1D-5 | ||
BC-19_CZ1D-5 | ||
**BC-20_CZ1D-5** | ||
BC-21_CZ3D-5 | ||
assume that BC-20_CZ1D-5 is CZ2D-5 | ||
chaned in mapping file and in metaphlan species, etc. | ||
|
||
Need to add in nonpareil and other features | ||
|
||
20230127 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
|
||
# 20230131 | ||
Processing LyPMA fragmentation optimization | ||
|
||
Changed alignment genome back to human from mouse | ||
|
||
Updated prep_sample_sheet.py script so it sorts sample sheet alphabetically (easier navigation) | ||
|
||
Command line code for making sample sheet | ||
python3 workflow/scripts/prep_sample_sheet.py --delimiter="_S" --path="/projects/p31648/bmoe008_LibraryFragmentLength/" --subdirectories=True | ||
|
||
Would like to clean up bowtie2 mapping and implement a compression step on fastq files | ||
|
||
Worked! Got much cleaner version of bowtie2 that cuts out all intermediate bam/sam files except final, mapped, sorted bam | ||
also unmapped reads now in fastq.gz | ||
|
||
20230201 | ||
|
||
AHHHHHHHHHH WHYYYYYYYY | ||
cluster deleted all conda envs and older mlm files that haven't been modified in awhile | ||
forced to start a new mlm repo, fastest solution | ||
|
||
AHHHHH -- snakemamba directory now isnt working, have to reinstall too... | ||
deleted and tried reinstalling from yaml, but snakemake has error like reported in 20221103_notes.md | ||
|
||
solution | ||
make sure to check ~/.condarc | ||
apparently conda-forge doesnt play nice with defaults channel anymore(?)/with mamba | ||
deleted biocore, defaults, and biobakery from channels list | ||
added back defaults | ||
|
||
ok so this solution took hours to get around and, not gonna lie, was really rough | ||
- deleted all conda environments (i.e., rm -rf .conda) | ||
- from now on, we're fully switching to mamba module | ||
- did mamba init bash and then restarted shell | ||
- upon relogin, has "(base)" active since mamba was initiated | ||
- mamba create -c conda-forge -c bioconda -n snakemake snakemake | ||
- started working again | ||
|
||
**Need** to containerize conda envs to prevent this from happening agains... | ||
|
||
also had to manually "conda update fastp" to change fastp from 0.20.1 > 0.22.0 | ||
changes in the conda yaml in case | ||
older version, which somehow was being updated in other mlm, doesn't support dedup | ||
|
||
|
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.