DIA-NN Phospho search issues #1032

JHKC12 · 2024-06-07T01:36:56Z

Hi there,

I'm trying to run 4x phospho enriched and 4x global samples in DIA-NN (1.8.2 beta 39) but i am not sure if DIA-NN is bugged or if its still searching and taking a long time.

From what i have read in the forums, the workflow for phospho dia in DIA-NN is briefly as follows:

input phospho and global raw files
add fasta (in this case, mouse)
check both boxes under "precursor ion generation" ('FASTA digest for lib free' and 'deep learning based spectra')
missed cleavages 1
max no. variable mods 3
phospho checked
precursor charge range 2-4
library generation as "IDs, RT and IM profiling"
MBR checked
other settings as default
when this has finished, perform a second DIA-NN search with the spectral library generated with similar settings

However, i am running the first search and the current log for the search is below:
[0:00] Loading FASTA C:\Users\mproteomics\Desktop\Analysis Data\Sequences Databases\uniprot-proteome_UP000000589_MOUSE__55,086 (Dec 2023).fasta
[0:27] Processing FASTA
[4:13] Assembling elution groups
[7:39] 50147387 precursors generated
[7:39] Gene names missing for some isoforms
[7:39] Library contains 54858 proteins, and 22143 genes
[8:56] Encoding peptides for spectra and RTs prediction
[13:04] Predicting spectra and IMs

I have been running this for about 2 days now and i am now sure if this is a DIA-NN bug or if i am doing something wrong

is there a workflow that explains how to perform phospho samples?

thanks in advanced

vdemichev · 2024-06-09T19:14:46Z

What's the amount of RAM in the system? Not enough RAM could be the only reason why it's slow in this case.

JHKC12 · 2024-06-11T00:17:33Z

Hi Vadim,

we have about 120gb RAM. I am in the process of getting the PC upgraded to more processing power.

But is the workflow above the correct way to go with phospho DIA?

vdemichev · 2024-06-11T06:58:56Z

We describe exact recommendations for phospho here now:
https://github.com/vdemichev/DiaNN?tab=readme-ov-file#ptms-and-peptidoforms

JHKC12 · 2024-06-25T00:55:54Z

We describe exact recommendations for phospho here now: https://github.com/vdemichev/DiaNN?tab=readme-ov-file#ptms-and-peptidoforms

Hi Vadim,

we upgraded our PC with higher processing power and also used the latest version of DIA-NN with the settings used as per the recommendations provided (using 'ultrafast' as well). However, the search seems to always be stuck at the step "[16:54] Predicting spectra and IMs".

Is it common for this step of the workflow to take long in the generation of the spectral library with 3 variable modifcations?

vdemichev · 2024-06-25T05:17:16Z

On a Ryzen 7950X (16 cores) or 10980XE (18 cores) it takes about 3 minutes per million precursors. That is, a rough time to generate a 50-million precursor library on such a PC would be ~150 minutes - with DIA-NN 1.9. So no, it should not take much longer. Can you please share the full log generated so far using the 'Save log' button?

JHKC12 · 2024-06-26T03:21:35Z

On a Ryzen 7950X (16 cores) or 10980XE (18 cores) it takes about 3 minutes per million precursors. That is, a rough time to generate a 50-million precursor library on such a PC would be ~150 minutes - with DIA-NN 1.9. So no, it should not take much longer. Can you please share the full log generated so far using the 'Save log' button?

heres the log saved so far:

Skyline not found
MSFileReader found: MSFileReader Core 31

diann.exe --lib "" --threads 32 --verbose 1 --out "C:\Users\mproteomics\Desktop\Analysis Data\Sequences Databases\SpecLib_FREE\Mouse_Phos.tsv" --qvalue 0.01 --matrices --out-lib "C:\Users\mproteomics\Desktop\Analysis Data\Sequences Databases\SpecLib_FREE\Mouse_Phos-lib.tsv" --gen-spec-lib --predictor --fasta "C:\Users\mproteomics\Desktop\Analysis Data\Sequences Databases\uniprot-proteome_UP000000589_MOUSE__55,086 (Dec 2023).fasta" --fasta-search --min-fr-mz 200 --max-fr-mz 1800 --met-excision --min-pep-len 7 --max-pep-len 30 --min-pr-mz 300 --max-pr-mz 1800 --min-pr-charge 2 --max-pr-charge 4 --cut K*,R* --missed-cleavages 1 --unimod4 --var-mods 3 --var-mod UniMod:21,79.966331,STY --peptidoforms --reanalyse --relaxed-prot-inf --rt-profiling
DIA-NN 1.9 (Data-Independent Acquisition by Neural Networks)
Compiled on Jun 8 2024 20:00:31
Current date and time: Mon Jun 24 13:14:29 2024
CPU: GenuineIntel Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
SIMD instructions: AVX AVX2 AVX512CD AVX512F FMA SSE4.1 SSE4.2
Logical CPU cores: 64
Thread number set to 32
Output will be filtered at 0.01 FDR
Precursor/protein x samples expression level matrices will be saved along with the main report
A spectral library will be generated
Deep learning will be used to generate a new in silico spectral library from peptides provided
Library-free search enabled
Min fragment m/z set to 200
Max fragment m/z set to 1800
N-terminal methionine excision enabled
Min peptide length set to 7
Max peptide length set to 30
Min precursor m/z set to 300
Max precursor m/z set to 1800
Min precursor charge set to 2
Max precursor charge set to 4
In silico digest will involve cuts at K*,R*
Maximum number of missed cleavages set to 1
Cysteine carbamidomethylation enabled as a fixed modification
Maximum number of variable modifications set to 3
Modification UniMod:21 with mass delta 79.9663 at STY will be considered as variable
Peptidoform scoring enabled
A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step
Heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers, GO/pathway and system-scale analyses
The spectral library (if generated) will retain the original spectra but will include empirically-aligned RTs
Exclusion of fragments shared between heavy and light peptides from quantification is not supported in FASTA digest mode - disabled; to enable, generate an in silico predicted spectral library and analyse with this library
The following variable modifications will be scored: UniMod:21
WARNING: MBR turned off, two or more raw files are required

0 files will be processed
[0:00] Loading FASTA C:\Users\mproteomics\Desktop\Analysis Data\Sequences Databases\uniprot-proteome_UP000000589_MOUSE__55,086 (Dec 2023).fasta
[0:34] Processing FASTA
[5:39] Assembling elution groups
[10:38] 50147387 precursors generated
[10:38] Gene names missing for some isoforms
[10:38] Library contains 54858 proteins, and 22143 genes
[11:22] Encoding peptides for spectra and RTs prediction
[16:54] Predicting spectra and IMs

vdemichev · 2024-06-26T05:44:51Z

Thanks! Still, could it be that the RAM is all full? What is the physical RAM occupied amount shown by the Task Manager? This would be an explanation for it taking a very long time.

JHKC12 · 2024-06-26T06:07:13Z

it says its only using 5-10% of memory

vdemichev · 2024-06-26T06:11:09Z

50 million database will be tens of gigabytes, strange. I would try to restart DIA-NN and if takes long see what's the reported RAM consumption.

JHKC12 · 2024-06-27T00:31:31Z

i restarted DIANN and changed a few settings:

precursor charge range from 2-4 to 2-3
max number of variable mods from 3 to 2
increased thread count to 50

there has been further progress and is now saying:

0 files will be processed
[0:00] Loading FASTA C:\Users\mproteomics\Desktop\Analysis Data\Sequences Databases\uniprot-proteome_UP000000589_MOUSE__55,086 (Dec 2023).fasta
[0:21] Processing FASTA
[2:19] Assembling elution groups
[3:32] 20805637 precursors generated
[3:32] Gene names missing for some isoforms
[3:32] Library contains 54858 proteins, and 22143 genes
[3:43] Encoding peptides for spectra and RTs prediction
[4:50] Predicting spectra and IMs
[926:24] Predicting RTs

this has been running for about 15 hours now so i am not sure if this is still taking too long. the memory usage is now at around 30%

vdemichev · 2024-06-27T05:35:13Z

This is indeed quite strange. The CPU load in Task Manager does correspond to what you'd expect, based on the number of threads set, and no other high-CPU tasks are run on the machine at the same time?

JHKC12 · 2024-07-05T01:17:28Z

This is indeed quite strange. The CPU load in Task Manager does correspond to what you'd expect, based on the number of threads set, and no other high-CPU tasks are run on the machine at the same time?

Not that i was able to see. we managed to get through the data but it looks to take a lot longer than what others have experienced

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DIA-NN Phospho search issues #1032

DIA-NN Phospho search issues #1032

JHKC12 commented Jun 7, 2024

vdemichev commented Jun 9, 2024

JHKC12 commented Jun 11, 2024

vdemichev commented Jun 11, 2024

JHKC12 commented Jun 25, 2024

vdemichev commented Jun 25, 2024

JHKC12 commented Jun 26, 2024

vdemichev commented Jun 26, 2024

JHKC12 commented Jun 26, 2024

vdemichev commented Jun 26, 2024

JHKC12 commented Jun 27, 2024 •

edited

Loading

vdemichev commented Jun 27, 2024

JHKC12 commented Jul 5, 2024

DIA-NN Phospho search issues #1032

DIA-NN Phospho search issues #1032

Comments

JHKC12 commented Jun 7, 2024

vdemichev commented Jun 9, 2024

JHKC12 commented Jun 11, 2024

vdemichev commented Jun 11, 2024

JHKC12 commented Jun 25, 2024

vdemichev commented Jun 25, 2024

JHKC12 commented Jun 26, 2024

vdemichev commented Jun 26, 2024

JHKC12 commented Jun 26, 2024

vdemichev commented Jun 26, 2024

JHKC12 commented Jun 27, 2024 • edited Loading

vdemichev commented Jun 27, 2024

JHKC12 commented Jul 5, 2024

JHKC12 commented Jun 27, 2024 •

edited

Loading