Question about the filter steps between the main report and the matrix in DIANN 1.9 #1056

momo-0521 · 2024-06-20T08:12:57Z

Hi Vadim

Thanks for your work in DiaNN 1.9.
When analyzing the results from version 1.9, I've observed discrepancies between the number of Protein.Group entries filtered by R and those reported in report.pg_matrix. Are there additional filtering steps being applied? I suspect that the "Additional 5% run-specific protein-level FDR filter applied to the protein matrices, use --matrix-spec-q to adjust it" might be impacting the results. However, I'm unsure how to address this issue.

report_pg <- diann_load("report.pg_matrix.tsv")
length(unique(report_pg$Protein.Group))
[1] 13121
df<-read_parquet("report.parquet")
length(unique(df$Protein.Group[df$Lib.Q.Value <= 0.01 & df$Lib.PG.Q.Value <= 0.01 ]))#14126
[1] 14126

Thank you in advance

vdemichev · 2024-06-20T08:16:58Z

Hi,

Please try:
df<-read_parquet("report.parquet")
length(unique(df$Protein.Group[df$Lib.Q.Value <= 0.01 & df$Lib.PG.Q.Value <= 0.01 & df$PG.Q.Value <= 0.05]))

Best,
Vadim

momo-0521 · 2024-06-20T08:37:17Z

Thank you for your advice。

I have tried this, but it does not work.It affected the number of precursors but had no effect on the entries in Protein.Group.

df<-read_parquet("report.parquet")
length(unique(df$Protein.Group[df$Lib.Q.Value <= 0.01 & df$Lib.PG.Q.Value <= 0.01 & df$PG.Q.Value <= 0.05]))
[1] 14126

Thank you again!
T

vdemichev · 2024-06-20T08:38:29Z

Is this MBR output?

momo-0521 · 2024-06-20T08:40:03Z

Yes, it is MBR output.

vdemichev · 2024-06-20T08:42:02Z

Can you please share both the .parquet and pg_matrix?
A quick check: do the timestamps (date modified) on those files match?

Best,
Vadim

momo-0521 · 2024-06-20T09:08:50Z

Thank you!
Please find the file in Google Cloud.
https://drive.google.com/file/d/1TAU2fQ1pnf4PXOqAlVVFMu4zM3Vg4L-Q/view?usp=sharing
https://drive.google.com/file/d/1jd-vLFXjsfTy4_dgd-ztEwd8RzqsXoD_/view?usp=sharing

vdemichev · 2024-06-20T09:58:07Z

length(unique(df$Protein.Group[df$Lib.PG.Q.Value <= 0.01 & df$PG.Q.Value <= 0.05 & df$PG.MaxLFQ > 0]))
[1] 13121

Works if filter for non-zero quantities too :)

momo-0521 · 2024-06-21T01:13:40Z

Thank you very much for your great help.

Best wishes!

momo-0521 · 2024-06-21T08:33:51Z

Hi, Vadim

Thanks for your help yesterday. I have encountered a new question. When I utilized ‘diann_maxlfq’ to estimate protein group quantities, the results appear to differ significantly from those obtained from 'pg_matrix' as well as the 'PG.MaxLFQ' column. Below is the code I employed, which functioned correctly in DIANN 1.8 but has raised some concerns in DIANN 1.9. Do you have any suggestions or advice on this issue?
protein.groups <- diann_maxlfq(df[df$Lib.PG.Q.Value <= 0.01 & df$PG.Q.Value <= 0.05 & df$PG.MaxLFQ > 0,],
sample.header = "Run",
group.header="Protein.Group",
id.header = "Precursor.Id",
quantity.header = "Precursor.Normalised")

Thank you in advance!

vdemichev · 2024-06-21T08:41:53Z

diann_maxlfq implements a simple MaxLFQ algorithm, different from what DIA-NN uses internally. The results will therefore always differ.

momo-0521 · 2024-06-21T09:34:27Z

Thank you. I understand.

Another question is about species-specifc precursors. Our samples contain a mixture of human and mouse proteins. When running DIANN 1.9, we used both human and mouse FASTA files and add additional options including '--species-genes' and '--species-ids'. We would like to exclude precursors specific to mouse or shared between both species, and instead focus only on human-specific precursors to quantify their associated proteins. Under these parameter settings, we would like to know if the 'PG.MaxLFQ' value is calculated from human-specific and mouse-specific precursors?

Best wishes!

vdemichev · 2024-06-21T11:27:17Z

It's calculated using all precursors matched to the protein group (Protein.Group column). So in this case you'd want to just discard all entries in the .parquet report with Protein.Ids column string containing 'MOUSE'.

wangrui85 mentioned this issue Aug 19, 2024

IP experiment with DIA quntification #1136

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the filter steps between the main report and the matrix in DIANN 1.9 #1056

Question about the filter steps between the main report and the matrix in DIANN 1.9 #1056

momo-0521 commented Jun 20, 2024

vdemichev commented Jun 20, 2024

momo-0521 commented Jun 20, 2024

vdemichev commented Jun 20, 2024

momo-0521 commented Jun 20, 2024

vdemichev commented Jun 20, 2024

momo-0521 commented Jun 20, 2024

vdemichev commented Jun 20, 2024

momo-0521 commented Jun 21, 2024

momo-0521 commented Jun 21, 2024

vdemichev commented Jun 21, 2024

momo-0521 commented Jun 21, 2024

vdemichev commented Jun 21, 2024

Question about the filter steps between the main report and the matrix in DIANN 1.9 #1056

Question about the filter steps between the main report and the matrix in DIANN 1.9 #1056

Comments

momo-0521 commented Jun 20, 2024

vdemichev commented Jun 20, 2024

momo-0521 commented Jun 20, 2024

vdemichev commented Jun 20, 2024

momo-0521 commented Jun 20, 2024

vdemichev commented Jun 20, 2024

momo-0521 commented Jun 20, 2024

vdemichev commented Jun 20, 2024

momo-0521 commented Jun 21, 2024

momo-0521 commented Jun 21, 2024

vdemichev commented Jun 21, 2024

momo-0521 commented Jun 21, 2024

vdemichev commented Jun 21, 2024