Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DIA-NN Phospho search issues #1032

Open
JHKC12 opened this issue Jun 7, 2024 · 12 comments
Open

DIA-NN Phospho search issues #1032

JHKC12 opened this issue Jun 7, 2024 · 12 comments

Comments

@JHKC12
Copy link

JHKC12 commented Jun 7, 2024

Hi there,

I'm trying to run 4x phospho enriched and 4x global samples in DIA-NN (1.8.2 beta 39) but i am not sure if DIA-NN is bugged or if its still searching and taking a long time.

From what i have read in the forums, the workflow for phospho dia in DIA-NN is briefly as follows:

  • input phospho and global raw files

  • add fasta (in this case, mouse)

  • check both boxes under "precursor ion generation" ('FASTA digest for lib free' and 'deep learning based spectra')

  • missed cleavages 1

  • max no. variable mods 3

  • phospho checked

  • precursor charge range 2-4

  • library generation as "IDs, RT and IM profiling"

  • MBR checked

  • other settings as default

  • when this has finished, perform a second DIA-NN search with the spectral library generated with similar settings

However, i am running the first search and the current log for the search is below:
[0:00] Loading FASTA C:\Users\mproteomics\Desktop\Analysis Data\Sequences Databases\uniprot-proteome_UP000000589_MOUSE__55,086 (Dec 2023).fasta
[0:27] Processing FASTA
[4:13] Assembling elution groups
[7:39] 50147387 precursors generated
[7:39] Gene names missing for some isoforms
[7:39] Library contains 54858 proteins, and 22143 genes
[8:56] Encoding peptides for spectra and RTs prediction
[13:04] Predicting spectra and IMs

I have been running this for about 2 days now and i am now sure if this is a DIA-NN bug or if i am doing something wrong

is there a workflow that explains how to perform phospho samples?

thanks in advanced

@vdemichev
Copy link
Owner

What's the amount of RAM in the system? Not enough RAM could be the only reason why it's slow in this case.

@JHKC12
Copy link
Author

JHKC12 commented Jun 11, 2024

Hi Vadim,

we have about 120gb RAM. I am in the process of getting the PC upgraded to more processing power.

But is the workflow above the correct way to go with phospho DIA?

@vdemichev
Copy link
Owner

We describe exact recommendations for phospho here now:
https://github.com/vdemichev/DiaNN?tab=readme-ov-file#ptms-and-peptidoforms

@JHKC12
Copy link
Author

JHKC12 commented Jun 25, 2024

We describe exact recommendations for phospho here now: https://github.com/vdemichev/DiaNN?tab=readme-ov-file#ptms-and-peptidoforms

Hi Vadim,

we upgraded our PC with higher processing power and also used the latest version of DIA-NN with the settings used as per the recommendations provided (using 'ultrafast' as well). However, the search seems to always be stuck at the step "[16:54] Predicting spectra and IMs".

Is it common for this step of the workflow to take long in the generation of the spectral library with 3 variable modifcations?

@vdemichev
Copy link
Owner

On a Ryzen 7950X (16 cores) or 10980XE (18 cores) it takes about 3 minutes per million precursors. That is, a rough time to generate a 50-million precursor library on such a PC would be ~150 minutes - with DIA-NN 1.9. So no, it should not take much longer. Can you please share the full log generated so far using the 'Save log' button?

@JHKC12
Copy link
Author

JHKC12 commented Jun 26, 2024

On a Ryzen 7950X (16 cores) or 10980XE (18 cores) it takes about 3 minutes per million precursors. That is, a rough time to generate a 50-million precursor library on such a PC would be ~150 minutes - with DIA-NN 1.9. So no, it should not take much longer. Can you please share the full log generated so far using the 'Save log' button?

heres the log saved so far:


Skyline not found
MSFileReader found: MSFileReader Core 31

diann.exe --lib "" --threads 32 --verbose 1 --out "C:\Users\mproteomics\Desktop\Analysis Data\Sequences Databases\SpecLib_FREE\Mouse_Phos.tsv" --qvalue 0.01 --matrices --out-lib "C:\Users\mproteomics\Desktop\Analysis Data\Sequences Databases\SpecLib_FREE\Mouse_Phos-lib.tsv" --gen-spec-lib --predictor --fasta "C:\Users\mproteomics\Desktop\Analysis Data\Sequences Databases\uniprot-proteome_UP000000589_MOUSE__55,086 (Dec 2023).fasta" --fasta-search --min-fr-mz 200 --max-fr-mz 1800 --met-excision --min-pep-len 7 --max-pep-len 30 --min-pr-mz 300 --max-pr-mz 1800 --min-pr-charge 2 --max-pr-charge 4 --cut K*,R* --missed-cleavages 1 --unimod4 --var-mods 3 --var-mod UniMod:21,79.966331,STY --peptidoforms --reanalyse --relaxed-prot-inf --rt-profiling
DIA-NN 1.9 (Data-Independent Acquisition by Neural Networks)
Compiled on Jun 8 2024 20:00:31
Current date and time: Mon Jun 24 13:14:29 2024
CPU: GenuineIntel Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
SIMD instructions: AVX AVX2 AVX512CD AVX512F FMA SSE4.1 SSE4.2
Logical CPU cores: 64
Thread number set to 32
Output will be filtered at 0.01 FDR
Precursor/protein x samples expression level matrices will be saved along with the main report
A spectral library will be generated
Deep learning will be used to generate a new in silico spectral library from peptides provided
Library-free search enabled
Min fragment m/z set to 200
Max fragment m/z set to 1800
N-terminal methionine excision enabled
Min peptide length set to 7
Max peptide length set to 30
Min precursor m/z set to 300
Max precursor m/z set to 1800
Min precursor charge set to 2
Max precursor charge set to 4
In silico digest will involve cuts at K*,R*
Maximum number of missed cleavages set to 1
Cysteine carbamidomethylation enabled as a fixed modification
Maximum number of variable modifications set to 3
Modification UniMod:21 with mass delta 79.9663 at STY will be considered as variable
Peptidoform scoring enabled
A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step
Heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers, GO/pathway and system-scale analyses
The spectral library (if generated) will retain the original spectra but will include empirically-aligned RTs
Exclusion of fragments shared between heavy and light peptides from quantification is not supported in FASTA digest mode - disabled; to enable, generate an in silico predicted spectral library and analyse with this library
The following variable modifications will be scored: UniMod:21
WARNING: MBR turned off, two or more raw files are required

0 files will be processed
[0:00] Loading FASTA C:\Users\mproteomics\Desktop\Analysis Data\Sequences Databases\uniprot-proteome_UP000000589_MOUSE__55,086 (Dec 2023).fasta
[0:34] Processing FASTA
[5:39] Assembling elution groups
[10:38] 50147387 precursors generated
[10:38] Gene names missing for some isoforms
[10:38] Library contains 54858 proteins, and 22143 genes
[11:22] Encoding peptides for spectra and RTs prediction
[16:54] Predicting spectra and IMs


@vdemichev
Copy link
Owner

Thanks! Still, could it be that the RAM is all full? What is the physical RAM occupied amount shown by the Task Manager? This would be an explanation for it taking a very long time.

@JHKC12
Copy link
Author

JHKC12 commented Jun 26, 2024

it says its only using 5-10% of memory

@vdemichev
Copy link
Owner

50 million database will be tens of gigabytes, strange. I would try to restart DIA-NN and if takes long see what's the reported RAM consumption.

@JHKC12
Copy link
Author

JHKC12 commented Jun 27, 2024

i restarted DIANN and changed a few settings:

  • precursor charge range from 2-4 to 2-3
  • max number of variable mods from 3 to 2
  • increased thread count to 50

there has been further progress and is now saying:

0 files will be processed
[0:00] Loading FASTA C:\Users\mproteomics\Desktop\Analysis Data\Sequences Databases\uniprot-proteome_UP000000589_MOUSE__55,086 (Dec 2023).fasta
[0:21] Processing FASTA
[2:19] Assembling elution groups
[3:32] 20805637 precursors generated
[3:32] Gene names missing for some isoforms
[3:32] Library contains 54858 proteins, and 22143 genes
[3:43] Encoding peptides for spectra and RTs prediction
[4:50] Predicting spectra and IMs
[926:24] Predicting RTs

this has been running for about 15 hours now so i am not sure if this is still taking too long. the memory usage is now at around 30%

@vdemichev
Copy link
Owner

This is indeed quite strange. The CPU load in Task Manager does correspond to what you'd expect, based on the number of threads set, and no other high-CPU tasks are run on the machine at the same time?

@JHKC12
Copy link
Author

JHKC12 commented Jul 5, 2024

This is indeed quite strange. The CPU load in Task Manager does correspond to what you'd expect, based on the number of threads set, and no other high-CPU tasks are run on the machine at the same time?

Not that i was able to see. we managed to get through the data but it looks to take a lot longer than what others have experienced

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants