We modified original Prolucid search engine to be compatible to ComPIL metaproteomics data analysis.
ProLuCIDCompil can be download here: ProLuCIDComPIL.jar.
MS2 and SQT are plaintext file formats detailed in the following publication:
MS2 files can be generated from instrument RAW files using a tool such as RawConverter
ProLuCIDComPIL takes an MS2 file as input. MS2 files contain MS/MS precursor ion, charge, and fragment information:
MS2 Format
S 000040 000040 960.22797
I RetTime 0.25
I PrecursorInt 6606.3
I IonInjectionTime 150.000
I ActivationType HCD
I PrecursorFile MSMS_sample.ms1
I PrecursorScan 34
I InstrumentType FTMS
Z 4 3837.89004
109.4537 168.2 0
111.1992 175.5 0
112.6070 188.2 0
136.0749 575.7 0
143.1249 190.1 0
152.1059 178.3 0
...
ProLuCIDComPIL outputs search results in the SQT file format, which contains unfiltered proteomic scoring information, including the best scoring peptide matches for each scan, parent proteins for each matched peptide, and other search-related information.
SQT Format
S 10210 [information for scan #10210]
M 1 [best scoring peptide match]
L [parent protein for peptide match 1]
L [parent protein for peptide match 1]
L [parent protein for peptide match 1]
M 2 [second-best scoring peptide match]
L [parent protein for peptide match 2]
L [parent protein for peptide match 2]
...
*tested on CentOS 7
Requirements
- Java 1.8 (Oracle or OpenJDK)
- MongoDB 3.0+
- MongoDB databases can be running locally (
localhost
), remotely as a single node (typically using TCP port 27017), or sharded behind amongos
process (typically port 27018)
- MongoDB databases can be running locally (
- Databases
- see metaproteomics repository for build_compil
- Download Compil 2.0 here: ftp://massive.ucsd.edu/MSV000082943/updates/2018-12-26_titusj_78e282d8/sequence/combined_reverse.fasta
Download build_compil
-
go to directory where build_compil is downloaded
-
edit ex/python/multiprocess_JSON_import.py
- Change "HOST" and "PORT" variables to match your mongodb configuration
-
edit create_compil
- Change variable "FASTADB" so that it is assigned to "${ORIGFASTADB%.}"_renumbered."${ORIGFASTADB##.}"
- Change "MONGO_HOST" and "MONGO_PORT" to match your mongodb configuration
- Examples
- "HOST = localhost"
- "PORT = 27017"
- Examples
-
edit blazmass.params
- Change "mongoDB_URI" parameter to match your mongodb configuration
- Example: "mongodb://localhost:27017"
- Change "mongoDB_URI" parameter to match your mongodb configuration
- create_compil is current located in build_compil
- Go to directory where build_compil is installed
- Update the blazmass.params if needed
- run "create_compil path/to/fasta/file database_name database_name"
- the fasta file should not have any reverse proteins on it
- Example:
- "create_compil ~/testFasta/test.fasta testDB testDB"
ComPIL/MongoDB integration by Sandip Chatterjee & Greg Stupp
Download sample search.xml here.
- Edit " <database_name>[database path]</database_name>" line and replace "[database path]" with path to fasta file
- Example:
- <database_name>/home/yateslab/project_data/prolucid_compil/2610search/example.fasta</database_name>
- Example:
- Edit "<mongo_db_name>[insert database_name]</mongo_db_name>" line and replace [insert database name] with database name
- Example:
- <mongo_db_name>testDB</mongo_db_name>
- Database name should be the same as "database_name" used in step 3 in "Upload Fasta File to MongoDB" process
- Example:
- Edit " <mongo_uri>[insert database url]</mongo_uri>" and replace [insert database url] with mongodb url
- Example:
- <mongo_uri>mongodb://localhost:27017</mongo_uri>
- Example:
- Edit other parameters as necessary. The other parameters would match those of a regular prolucid search.xml
Download ProLuCIDCompil here.
- Run "java -Xmx10G -jar prolucid_compil.jar example.ms2 search.xml [num_threads]"
- search.xml - edit search.xml as described in "Configure Search Parameters"
- [num threads] - number of threads to assign to search; in general assigning more threads to search increased performance but increased strain on mongodb server and memory usage on local node. The optimum number of threads assigned per node would heavily depend on node specification, network configuration, and mongodb sharding configuration. For our 8 shard mongodb set up, I assigned 2 threads per node and had no more than 60 threads access the mongodb server.
- Example:
- java -Xmx10G -jar prolucid_compil.jar example.ms2 search.xml 4
- Download DTASelect here
- When running DTASelect, add "-noDB" option to DTASelect.params or command line arguments. When the fasta file reaches sizes greater than 1 GB, DTASelect runs very slowly when it attempts to load the fasta file. "-noDB" stops DTASelect from reading database files, and allows the program run the rest of analysis without issue.