- Python v3.9.13
- Pandas v1.4.4
- Numpy v1.22.0
- Numba v0.56.0
- PyQt5 v5.15.7
- PyYAML v6.0
This program cannot be run on a Linux subsystem in Windows.
- Clone or download the Spectrum_Processing directory on your pc.
- Start the processing with the following command:
python SPeDE.py <Path\to\ppmc_interval_index.csv> <Project_directory> <Output_directory> [-d <density>] [-c <cluster>] [-l <local>] [-m <cutoff>] [ -o <output_format>\] [-p <peak-count>] [-n <name>] [-q <validation_name>] [-v] [-e]
- Clone or download the Spectrum_Processing and GUI directory to your pc into the same folder.
- Start the graphical interface by opening GSPeDE.py.
- Select the correct parameters.
- Start processing by clicking Start.
- The results will be put in the specified output folder.
SPeDe is a program that is used to dereplicate large sets of MALDI-TOF MS spectra. The analysis consist of screening the dataset for spectra with unique spectral features and outputs the reduced set of selected reference spectra. Spectra not assigned as a refererence are matched according to their matching reference spectra.
The program allows you to perform the "on the fly" dereplicate process of MALDI-TOF MS spectra and summarizes them into unique references and matching spectra.
The output will be written to a specified folder.
The program takes a file containing peak interval boundaries, a directory containing all data and an output directory in which it places its output. The output always consists of a reference list and optionally also includes a file containing the uniqueness matrix, a krona output file and copies the extracted references to a subfolder.
First an overview of the command line program.
intervals
: A path to the ppmc_interval_index.csv file.project_directory
: A path to the directory containing all PKL and FMS files.
These files all have to be in the same folder.output_directory
: The path to a folder to which the program writes its output.
-d density
: The PPM threshold, default 700.-c cluster
: The PPVM cluster threshold in percentage, default 75.-l local
: The PPMC local threshold in percentage, default 50.-m cutoff
: The S/N cutoff in M/Z, default 30.-o output format
: Output format of the reference list output file, default csv.
CSV is currently also the only option.-v output validate
: Print the validation data to an output file data_validation.csv, default false.-p peak count threshold
: Peaks with an S/N value >30 are counted. If the amount of such peaks in one spectrum is greater or equal to the peak count threshold, the spectrum is eligible to be a reference spectrum, default 5.-n output name
: The name of the reference list output file. Extension must match output format, default <current_time_>SPeDE_output.csv.-q validation name
: The name of the data validation matrix. This must be a .csv file, default <current_time_>data_validation.csv.-e copy files
: Copy the resulting unique reference files to a subfolder, default false.-k krona output
: Generate a krona txt file, ready to be processed by the krona software.
The program places all of its output into the folder specified by output_directory
. Every run produces a reference list named <current_time_>SPeDE_output.csv
or -n
whenever this option is used.
Optionally, when the -v
flag is set, the program will also output a validation matrix named
validation_matrix.csv or -q
whenever this option is used.
This matrix is the uniqueness matrix of the spectra in the project directory.
When the -e
flag is set, all spectra files that are marked as references will be copied to a subfolder
References in the output folder.
When the -k
flag is set, a txt file will be generated which is ready to be processed by the krona software.
Now, the GUI will be covered.
The GUI exists out of 2 major parts: An upper part, which is the variable part, and a lower part which default values and optional inputs.
In the lower part you can distinguish the following sections: configuration IO, default values, key buttons and additional processing options.
See Setup for first time use.
All inputs in the upper part of the screen are mandatory.
Project directory
: Location of your project with PKL and FMS files. The files should be directly contained in the project directory.Output directory
: Folder for the output files.
Both files can be selected using their respective picker buttons.
Output type
: Currently only CSV is supported.Density
: The PPM threshold, default 700.Cluster
: The PPVM cluster threshold in percentage, default 75.Local
: The PPMC local threshold in percentage, default 50.Cutoff
: The S/N cutoff in M/Z, default 30.Intervals
: Path to the ppmc_interval_index.csv file. This file can also be selected with the picker button to the right of the input field.
These values (except for output type) can only be edited when the Default values checkbox is unchecked. Checking this box will also reset the default inputs.
The Load config and Store config button allows you to store all required and default values (except output type) to a .yaml file. At any time, a config file can be loaded and its values will be loaded into the GUI.
At startup, the file default_config.yaml is always loaded if present.
Pressing the Start button will initiate the processing of the spectra. All input values will be checked for legitimacy. A pop-up window will inform you about any progress.
You are able to abort the processing in the pop-up window, but beware that any progress will be lost.
The Exit button will exit the program after a confirmation prompt.
When installing SPeDE, you have to point the program towards the ppmc_interval_index.csv file.
- Uncheck the Default values checkbox.
- Use the file picker right of the Intervals input field to select the ppmc file.
It is located in<installation_location>/pkgs/GUI/ppmc_interval_index.csv
- Leave Project directory and Output directory blanc.
- Click on Store Config.
- Continue to save when given a warning.
- Overwrite the default_config.yaml file.
- Now, when you start GSPeDE, the intervals will be loaded automatically.
The Default values checkbox resets the values in the input fields when checked.
The Validation matrix checkbox enables the output of a uniqueness matrix. This matrix
is also written into the output directory under the name validation_matrix.csv.
The Copy unique references checkbox defines whether or not the resulting unique references should be copied to
a subfolder in the output folder.
The Krona output checkbox define whethere or not you want to generate a krona output txt file.
Made using Sphinx. Sphinx can be setup with sphinx-autostart.
All documentation files are located in the Documentation folder.
Sphinx uses .rst
files to feed to its autodoc software. These files can be generated with sphinx-apidoc.
The functions to be documented can be edited inside an rst file.
Gather all rst files in the source folder and execute the sphinx-build software to generate documentation.
Made using pynsist.
The main file is installer.cfg, located in the main project folder. Note that this folder has to be in the most upper
level of the project to function correctly since it must be able to access any package top-level.
Be sure to remove all files that don't need to be included in the installer.
Additional information about the config file can be found at their website.
Beware that all dependencies of the program have to be listed in the installer.cfg file, not only the ones listed by pipreqs.
Any unknown error will be written to err.txt, which is located in the directory of the GSPeDE file. Consult this file for more information about errors.