PGN_MS2 is a computational tool that generates a customizable peptidoglycan (PGN) database from user-defined parameters. Furthermore, it can simulate MS/MS spectra for each PGN and compile these predicted MS/MS spectra to a spectral library in the NIST format (.msp). The spectral library (.msp) is compatible with open-access and vendor software, e.g. MS-DIAL, for automated matching and scoring of experimental MS/MS peaks, facilitating automated PGN identification. Read the open access paper here.
PGN_MS2 is written in Python 3.9 and uses RDKit to manipulate molecules. A graphical user interface (built with easygui) is available. The following Python packages are required:
rdkit
pandas
numpy
yaml
joblib
easygui
The Python environment can be created with Conda using the following command:
conda env create -f /PATH/OF/PGN_MS2/environment.yaml
The GUI of PGN_MS2 can be run from command line with the following command:
python /PATH/OF/PGN_MS2/UserInterface.py
A more detailed user guide for the GUI can be found here.
Alternatively, PGN_MS2 can be run with an IDE.* Sample code is provided with ManualRun.py
*Spyder 3.9 is not compatible with RDKit and must be ran from a separate environment. See Spyder's FAQ for more information.
Output is stored in /output. Each file is named with a prefix comprising the starting datetime and a user-given name (e.g. 20240605_Ecoli). The various outputs are divided among the three subfolders as such:
Subfolder | Filename | Description |
---|---|---|
compounds | [prefix].xlsx | MS1 database in spreadsheet format. Monomers, dimers and trimers are shown on separate sheets. |
[prefix].pickle | MS1 Database in pickle format. | |
[prefix].yaml | User-defined settings saved in yaml format. | |
[prefix]_graphical_summary.svg | Graphical summary of settings used to generate the PGN library. | |
msp | [prefix].msp | MS2 database. Different adduct forms are given as separate entries. |
[prefix]_[number].pickle | MS2 Database in pickle format. Saved in batches of 5,000 compounds, which is indicated by [number]. | |
peaklists | [prefix]_spectradata.xlsx | MS2 database in spreadsheet format. Each batch has its own sheet. Each compound is presented as its own table containing the top 200 most intense ions. |
[prefix]_iondata.xlsx | All ions and their respective structures are tabulated in this file. |
PGN_MS2 imports chemical information from an internal library located at:
data/PGN.xlsx
PGN_MS2 was designed to accomodate most PGN chemotypes. It is able to generate PGN with:
- modified glycans: acetylation (increase/decrease), glycolylation (anMurNGlyc) and dehydration (anMurNAc).
- stem peptide sequences up to eight amino acids long. Supported amino acids include the canonical amino acids as well as non-canonical amino acids commonly found in PGN (mDAP, Orn, γ-isoGln).
- bridge peptides (i.e. branch peptides, side chains) that are attached to either diamino/dicarboxy amino acids in the stem peptide.
- a wide variety of modifications such as lactamization, endopeptidase digestion.
- two different polymerisation modes: either through glycosidic bonds or peptide bonds.
This tool was built by members of Qiao Lab. MS/MS spectra for all identified PGN from the a/m paper is also available as a download on MoNA. PGN_MS2 was used in combination with MS-DIAL, an open source MS analysis software, available here.
The following can be found in the Supplementary Information of our paper:
- Nomenclature (Table S1)
- Overview of GUI (Table S2)
Read the open access paper here.