-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculating Q.Value #876
Comments
Does DIA-NN use the standard definition of a q-value? |
Yes, the definition is standard. What DIA-NN does is indeed calculate the #decoys/#targets ratio, and then for some IDs that appear very confident it has special algorithm that allows assigning them lower q-values, below 1 / #targets which would be the minimum otherwise. This is why you see those q-values below 10^-4. |
It's necessary because for global q-value calculation later on it's important to detect IDs with very high confidence. Also for situations when <100 precursors are identified at 1% FDR. |
The FDR algorithm implemented in Spectronaut was developed by John D. Storey from the Department of Biostatistics at the University of Washington, Seattle. It was published in 2003 and has over 10'000 citations. It is a well recognized approach to FDR estimation that was originally introduced in DNA microarray data analysis: We use this Qvalue/FDR definition in Spectronaut because it makes fewer assumption. Especially in regards to the search-space composition (decoy to false target ratios) and is therefore very universal. However, I also have to side with Vadim here that extrapolating the CScore is fairly normal as nobody likes to have a pure 0 for a qValue (people tend to think that a 0 qvalue means something is wrong). In contrary, all qvalues before the first decoy observation would necessarily be 0 so there would be no further qualitative distinction for a precursors that scored higher. So to answer your question (and Vadim correct me if I am wrong in the context of DIA-NN): |
Furthermore, the Storey FDR strategy is also used by Skyline and OpenSWATH. All of the main peptide-centric search tools (except DIA-NN) I know of use the Storey method. So calling it non-standard is doing it a disservice. |
If I understand it correctly, the PNAS paper you posted is about "converting" BH FDR to q-value. This method has been used not only in DIA, but also in many DDA tools. BUT. it is not about how to calculate p-value, which is the input of the BH procedure, from other test statistics or scores in proteomics community. That paper assumes that p-value is already available. I think what this thread about is how to calculate the FDR from the CScore or whatever score which is not p-value. As you also know, Proteomics tools normally does not calculate any p-value, and the FDR it not from the BH procedure. ps. I think proteomics people normally does not distinguish FDR and q-value in some literatures. Most people assume the FDR has been already “monotoned" which is actually q-value. Best, Fengchao |
Fully agree Oliver, it's standard in the field, what I meant is not pure decoys/targets ratio. |
@fcyu |
Hi @oliver-bernhardt , Thanks for the correction! I guess I don't know DIA tools as good as you do. Strictly speaking, it should be CScore -> p-value -> BH FDR -> Storey q-value, right? Since the BH procedure and the Storey's method have no issue, the real question here is how to calculate p-value from CScore (for the tools you mentioned), or calculate the target-decoy-based FDR from the scores (for the tools don't calculate p-value). Are those DIA tools use the "standard" procedure for this step? What is the "standard", BTW? Best, Fengchao |
Hi Oliver, thanks for chiming in. I am very well accustomed with BH and how FDR is performed for DDA analysis. I would like to also second Fengchao's question - it would be great to know how to calculate p-values from CScore. The root of my question is to gain more insight into how target-decoy FDR is working in DIA-NN (and other peptide-centric tools). |
Hi,
How is the Q.Value in the report.tsv calculated?
I noticed across multiple experiments that the Q.Values have a strange distribution. For instance, I searched the "RD139_Narrow_UPS1_25fmol_inj1" file from MSV000087597 (Gotti et al) with 1.8.2 beta 27 using a predicted spectral library, and the Q.Value distribution is as follows:
Aren't q-values typically calculated by counting # of decoys / # of targets? Shouldn't the distribution appear exponential?
Searching the same file with MSFragger-DIA yields a distribution more in line with what I would expect:
As a consequence, the distribution of PEP values also seems off (not cumulatively increasing)
The text was updated successfully, but these errors were encountered: