FASTA format #1029

TANIAKMONS · 2024-06-05T09:40:06Z

Hello,

I have an issue with the FASTA format. It is a FASTA format which was made from the Illumina Sequencing and annotated with KREGG. We have tried a first time wihtout Uniprot annotation and it did not.
Will it work if the FASTA is composed of different annotation uncluded the Uniprot one ? it seems that we can't just have the Uniprot FASTA format.

Thanks in advance
TK

vdemichev · 2024-06-05T17:10:10Z

Hi TK,

Protein sequence IDs should be read correctly from any FASTA. All other information you can always pull out of the FASTA using some FASTA-reading R package, to annotate DIA-NN's output report.

We have tried a first time wihtout Uniprot annotation and it did not.

How did it manifest?

Best,
Vadim

saradufour · 2024-06-06T10:24:20Z

Hi,

I'm having the same issue in the library free search. The FASTA header for example looks like this:

>P62874,Q3TQ70|TX=10090 OS=Mouse GN=ENSMUSG00000029064.16,Gnb1 TA=NM_001160016.1,ENSMUST00000105616.10,XM_017319977.2,NM_001160017.1,ENSMUST00000030940.14,ENSMUST00000176637.2,ENSMUST00000165335.8,NM_008142.4 PA=ENSMUSP00000030940.8,NP_032168.1,ENSMUSP00000135091.2,XP_017175466.1,ENSMUSP00000101241.4,NP_001153488.1,ENSMUSP00000130123.2,NP_001153489.1,P62874,Q3TQ70
(fasta file from openprot (microprotein identification) with > 500000 entries)
and the output in the log is the following:

[0:48] Processing FASTA
[1:35] Assembling elution groups
[2:47] 23495123 precursors generated
[2:47] Gene names missing for some isoforms
[2:47] Library contains 1 proteins, and 1 genes
[2:51] Encoding peptides for spectra and RTs prediction

Any idea how to fix this issue?

Thanks !
Best,
Sara

vdemichev · 2024-06-20T15:52:36Z

Hi Sara,

DIA-NN will not correctly extract protein names from this. It should get the IDs OK though, i.e. you can annotate DIA-NN output using some FASTA-reading R package.

Best,
Vadim

TANIAKMONS · 2024-07-01T09:01:07Z

Hi Vadim,

I had the same thing than Sara (Library contains 1 proteins, and 1 genes).
We have done a scrpit to incorporate Uniprot annotations within the FASTA and now we use DIANN 1.9.
This is the result we have:

10 files will be processed
[0:00] Loading FASTA C:\Tania\output_proteinpilot2.fasta
[2:07] Processing FASTA
[4:11] Assembling elution groups
[6:57] 59894740 precursors generated
[6:58] Gene names missing for some isoforms
[6:58] Library contains 717220 proteins, and 1 genes
[7:09] Encoding peptides for spectra and RTs prediction
[9:53] Predicting spectra and IMs
[370:52] Predicting RTs
[409:47] Decoding predicted spectra and IMs
[411:19] Decoding RTs
[412:01] Saving the library to C:\Tania\DIA-NN\1.9\report.predicted.speclib
[415:57] Initialising library

First pass: generating a spectral library from DIA data

[418:51] File #1/10
[418:51] Loading run C:\Tania\PSF21h.wiff
[421:59] 59872940 library precursors are potentially detectable
[423:20] Processing.

Since it is very long to process .... we will run it on a more powerfull server, it works with linux. Is it the smae command lin ethan with DIANN 1.8 ?

Thanks,
Kind Regards,

TK

vdemichev · 2024-07-01T09:13:42Z

Hi TK,

I would suggest to try the recommended settings first, which should result in much smaller predicted library & search space.

No, I don't recommend using 1.8.1. If you do, please make sure to use the predicted library generated by 1.9.

Best,
Vadim

TANIAKMONS · 2024-07-15T14:16:35Z

Hi,

I'm having the same issue in the library free search. The FASTA header for example looks like this:

>P62874,Q3TQ70|TX=10090 OS=Mouse GN=ENSMUSG00000029064.16,Gnb1 TA=NM_001160016.1,ENSMUST00000105616.10,XM_017319977.2,NM_001160017.1,ENSMUST00000030940.14,ENSMUST00000176637.2,ENSMUST00000165335.8,NM_008142.4 PA=ENSMUSP00000030940.8,NP_032168.1,ENSMUSP00000135091.2,XP_017175466.1,ENSMUSP00000101241.4,NP_001153488.1,ENSMUSP00000130123.2,NP_001153489.1,P62874,Q3TQ70 (fasta file from openprot (microprotein identification) with > 500000 entries) and the output in the log is the following:

[0:48] Processing FASTA [1:35] Assembling elution groups [2:47] 23495123 precursors generated [2:47] Gene names missing for some isoforms [2:47] Library contains 1 proteins, and 1 genes [2:51] Encoding peptides for spectra and RTs prediction

Any idea how to fix this issue?

Thanks ! Best, Sara

Hi Sara,

We seems to have both a large amount of precusors, can you tell me what kind of computer or server do you use for your analysis once the library is generated ?
Our computer is able to generated a library as a first step but doesn't seems to move much in the second step with the raw data.

Best,
Tania

vdemichev · 2024-07-15T14:20:11Z

Hi Tania,

What is the amount of RAM? If you wish, I can take a look at the log.

Best,
Vadim

TANIAKMONS · 2024-07-15T14:25:37Z

This was the log of the first step, the library generation:
report.log.txt

TANIAKMONS · 2024-07-15T14:27:21Z

This is the 2nd step:

vdemichev · 2024-07-15T14:31:36Z

Metaproteomics I guess? Yes, can take a very long time. I would also suggest using Peptidoform scoring in this case.
You can look up in Task Manager if there's enough free physical RAM.
Mass Accuracies are better fixed to 20ppm MS2 and 12ppm MS1. Can also use --mass-acc-cal 20 if the instrument is properly calibrated. All this will speed things up a bit. Can also try to run Search and RAM usage: Ultra-fast mode at first (fold-change faster but noticeably less IDs), to get a preliminary feeling about the data.

TANIAKMONS · 2024-07-15T14:39:10Z

Yes, Metaproteomics indeed.

vdemichev · 2024-07-15T14:41:06Z

Seems fine. I would run first with the settings I suggested and ultra-fast mode. In fact, would also regenerate the library with precursors charges restricted to 2-3, and then run ultra-fast mode. After this works, you can explore slower (and potentially more thorough) analysis methods.

TANIAKMONS · 2024-07-15T15:21:30Z

Thank you Vadim, I will try definitely.

Best,
Tania

TANIAKMONS · 2024-07-17T10:20:37Z

Hi,

Does it speed up the process also if I convert all the wiif file from Sciex to .dia prior the analysis ?

Thank you again for your help

Tania

vdemichev · 2024-07-17T11:24:29Z

Hi Tania,

Will save ~2min/file, based on the screenshot, i.e. not worth it in this case.

Best,
Vadim

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FASTA format #1029

FASTA format #1029

TANIAKMONS commented Jun 5, 2024

vdemichev commented Jun 5, 2024

saradufour commented Jun 6, 2024

vdemichev commented Jun 20, 2024

TANIAKMONS commented Jul 1, 2024

vdemichev commented Jul 1, 2024

TANIAKMONS commented Jul 15, 2024

vdemichev commented Jul 15, 2024

TANIAKMONS commented Jul 15, 2024

TANIAKMONS commented Jul 15, 2024

vdemichev commented Jul 15, 2024

TANIAKMONS commented Jul 15, 2024

vdemichev commented Jul 15, 2024

TANIAKMONS commented Jul 15, 2024

TANIAKMONS commented Jul 17, 2024

vdemichev commented Jul 17, 2024

FASTA format #1029

FASTA format #1029

Comments

TANIAKMONS commented Jun 5, 2024

vdemichev commented Jun 5, 2024

saradufour commented Jun 6, 2024

vdemichev commented Jun 20, 2024

TANIAKMONS commented Jul 1, 2024

vdemichev commented Jul 1, 2024

TANIAKMONS commented Jul 15, 2024

vdemichev commented Jul 15, 2024

TANIAKMONS commented Jul 15, 2024

TANIAKMONS commented Jul 15, 2024

vdemichev commented Jul 15, 2024

TANIAKMONS commented Jul 15, 2024

vdemichev commented Jul 15, 2024

TANIAKMONS commented Jul 15, 2024

TANIAKMONS commented Jul 17, 2024

vdemichev commented Jul 17, 2024