Skip to content

A package to estimate FDR in mass-spectrometry searching results using decoy-free approach

License

Notifications You must be signed in to change notification settings

shawn-peng/DecoyFree-MSFDR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Decoy-Free FDR Estimation for Mass-Spectrometry

This a package to estimate FDR in mass-spectrometry searching results from search engines using decoy-free approach. This software will analyze the matching scores output from the search engining.

Install

Make sure python is installed, python official website.

Install the pip, python package management tool,

python -m ensurepip --upgrade

Then, install our package,

pip install decoyfree-msfdr

Usage

Regular MS/MS search FDR estimation

decoyfree-msfdr [-h] [-f INPUT_FILE] [-d INPUT_DIR] [-t INPUT_TYPE] [-m EVAL_MODEL]
                [--score-field SCORE_FIELD] [--threads THREADS] [-c CONSTRAINTS] [-s MODEL_SAMPLES]
                [-r RANDOM_SIZE] [--tolerance TOLERANCE] [--show_plotting] [--out_dir OUT_DIR]

Cross-linked peptide MS/MS search FDR estimation

decoyfree-xlmsfdr [-h] [-f INPUT_FILE] [-d INPUT_DIR] [-t INPUT_TYPE] [-m EVAL_MODEL]
                  [--score-field SCORE_FIELD] [--threads THREADS] [-c CONSTRAINTS] [-s MODEL_SAMPLES]
                  [-r RANDOM_SIZE] [--tolerance TOLERANCE] [--show_plotting] [--out_dir OUT_DIR]
options:
  -h, --help            show this help message and exit
  -f INPUT_FILE, --input-file INPUT_FILE
  -d INPUT_DIR, --input-dir INPUT_DIR
  -t INPUT_TYPE, --input-type INPUT_TYPE
                        format of input files, supported formats are csv,tsv,idXML
  -m EVAL_MODEL, --eval-model EVAL_MODEL
                        existing model to be evaluated
  --score-field SCORE_FIELD
                        the field name holding PSM scores
  --threads THREADS     number of threads
  -c CONSTRAINTS, --constraints CONSTRAINTS
                        Choices of constraints to be used
  -s MODEL_SAMPLES, --model_samples MODEL_SAMPLES
                        Number of samples/top scores to be used in modeling
  -r RANDOM_SIZE, --random_size RANDOM_SIZE
                        Number of random starts per skewness setting
  --tolerance TOLERANCE
                        Threshold of the change of the point-wise log-likelihood for the EM algorithm to determine the
                        convergence
  --show_plotting       Show plotting while fitting the model
  --out_dir OUT_DIR     The place to save results

Options

Option Argument Default Description
-f, --input-file Path N/A Path to the search result file
-d, --input-dir Path N/A Path to the directory holding search result files
-t, --input-type csv, tsv, idXML idXML Search result format
-m, --eval-model Path N/A Path to an existing model to be evaluated
-c, --constraints no_constraint,
unweighted_pdf_mode
unweighted_pdf_mode Choices of constraints to be used
-s 1, 2 2 Number of samples/top scores to be used in modeling
--threads Integer 1 Number of threads
-r, --random_starts Integer 2 Number of random starts per skewness setting
--tolerance Float 1e-8 Threshold of the change of the point-wise log-likelihood
for the EM algorithm to determine the convergence
--show_plotting Bool False Show plotting while fitting the model
--out_dir Path ./results The place to save results

Examples

Regular MS/MS search with MSGF+ engine

Suppose the MS/MS search result with MSGF+ software is saved in .tsv format, data/sample.tsv. You can run the FDR estimation algorithm with the following command,

decoyfree-msfdr -f data/sample.tsv -t tsv --out_dir results/sample --threads 10

Multiple search results

If you have multiple search results from MSGF+ saved in the 'data/sample/' directory, and you want to use them all together to build a single model, do the following,

decoyfree-msfdr -d data/sample/ -t tsv --out_dir results/sample --threads 10

Note: this will search the directory for all the files with .tsv extension. If you specified other formats, it will search for the files with the corresponding extension.

MS/MS search results with other engines

If you are using another search engine, please specify the following information,

Option Default Description
--score_field EValue The score field used to model the data
--log_scale True Whether to model on the log scale of the data
--neg_score True Whether to take negative of the score. In our model, higher score means better. On log-scale, this is done after taking log
--spec_ref_fields "#SpecFile,SpecID" Comma separated fields to identify a spectrum uniquely

XL-MS/MS search with OpenPepXLLF engine

Suppose the MS/MS search result with MSGF+ software is saved in .idXML format, data/sample.idXML. You can run the FDR estimation algorithm with the following command,

decoyfree-xlmsfdr -f data/sample.idXML -t idXML --out_dir results/sample --threads 10

Note: Currently, idXML is the only format we support. Please let us know if you need to use another format in issues. I'll add support to that.

XL-MS/MS search results with other engines

If you are using another search engine, please specify the following information,

Option Default Description
--score_field "OpenPepXL:score" The score field used to model the data
--log_scale False Whether to model on the log scale of the data
--neg_score False Whether to take negative of the score, on log-scale, this is done after taking log

About

A package to estimate FDR in mass-spectrometry searching results using decoy-free approach

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages