FLASHDECONV 2.0 BETA+, FINALLY WITH A GUI!

Finally a GUI is here. You can find the GUI command in [OpenMS path]/bin folder. Go to [OpenMS path]/bin and run FLASHDeconvWizard! FLASHDeconv 2.0 beta+ works for MS1 and MS2 spectral deconvolution and feature deconvolution. It supports various output formats (e.g., *.tsv, *.mzML, *.msalign, and *.feature). FLASHDeconv 2.0 stable version will be officially integrated in OpenMS 2.7.0 released in near future. FLASHDeconv 2.0 beta+ also supports TopPIC identification better than the previous version, by generating all msalign and feature files for TopPIC inputs. We also added spectral merging function to support QTOF dataset analysis and NativeMS dataset analysis.

Changes:

  • FLASHDeconvWizard (GUI) is added!
  • FLASHIda support (-in_log option)
  • We no longer recommend profile mode spectra. Peak picked spectra (by MSConvert vendor provided peak picking) are recommended as inputs.
  • merging_method option is introduced to merge or average MS2 spectra.
  • use_ensemble_spectrum option has been removed (replaced by -merging_method).
  • target_mass option is added to perform targeted deconvolution (deconvolution quality control is relaxed for target masses) – target_sequence or proteoform option will be soon added.
  • min_precursor_snr option is introduced that (currently) only affects msalign and feature files for TopPIC.
  • out_topFD_feature option is introduced that outputs feature file for TopPIC. In TopPIC, no need to use -x option with this feature file input.
  • Quality measure score (QScore) is added per each deconvolved mass in spectral deconvolution results. QScore is the probability that a mass is identified, learned by a logistic regression (related publication will be added here). Note that it is the probability that the mass is “identified” not “correct.”
  • Both MS1 and MS2 deconvolution have been extensively improved (tested by proteoform ID sensitivity, coupled with TopPIC).
  • Works well for both centroid and profile spectra. In particular for MS2, centroid spectra should be used.
  • Support negative charges (set by -Algorithm:min_charge and -Algorithm:max_charge parameters; see below).
  • Parameter set is redefined (see below).
  • Batch execution is not supported for FLASHDeconv binary. Separate batch files will be prepared soon.
  • Deconvolved spectra may be output in mzml format (-out_mzml [mzml file]).
  • Deconvolved MS1 spectra may be output in Promex format (-out_promex [ms1ft file]).
  • Deconvolved MS1/2 spectra may be output in TopFD format (-out_topFD [msalign file per MS level]).
  • Deconvolved MS1/2 features may be output in TopFD format (-out_topFD_feature [feature file per MS level]).
  • Effective harmonic artifact elimination in mass dimension effectively reduces false negatives while keeping true positives.

Under development

  • Proforma 2.0 support (-target_seq option)
  • Deep learning based deconvolution quality measure
  • QScore training interface
  • Parameter set for different protocols (e.g., Native-MS, HighRes TDP, …)
  • Merge into OpenMS 3.0

Installation

FLASHDeconv installation files (OpenMS-2.x.0-HEAD-, for windows *.exe, for mac *.dmg, and for linux .deb) and source code (-src.tar.gz) are found in here. For the latest version, go to the bottom side of the page and select the most recent installation file.

Parameters

FLASHDeconv basic parameters are found by simply running FLASHDeconv. Only -in and -out are mandatory. FLASH advanced parameters are found by running FLASHDeconv –helphelp. FLASHDeconv parameters have six categories: FLASHDeconv tool parameters, FLASHDeconv algorithm parameters, Spectral deconvolution parameters, Feature tracing parameters, Isobaric quantification parameters and Tagger parameters. Firstly the basic parameters in each category are described, and then the advanced ones are explained.

FLASHDeconv tool parameters:

Parameter Description
in Input file (mzML) (valid formats: ‘mzML’)
out Default output tsv file containing deconvolved features (valid formats: ‘tsv’)
out_spec1 Output tsv file containing deconvolved MS1 spectra. Likewise, use -out_spec2, …, -out_spec4 to specify tsv files for MS2, …, MS4. (valid formats: ‘tsv’)
out_spec2 Output tsv file containing deconvolved MS2 spectra. (valid formats: ‘tsv’)
out_spec3 Output tsv file containing deconvolved MS3 spectra. (valid formats: ‘tsv’)
out_spec4 Output tsv file containing deconvolved MS4 spectra. (valid formats: ‘tsv’)
out_mzml Output mzml file containing deconvolved spectra (of all MS levels) (valid formats: ‘mzML’)
out_quant Output tsv file containing isobaric quantification results for MS2 only (valid formats: ‘tsv’)
out_annotated_mzml Output mzml file containing annotated spectra. For each annotated peak, monoisotopic mass, charge, and isotope index are stored as meta data. Unannotated peaks are also copied as well without meta data. (valid formats: ‘mzML’)
out_msalign1 Output msalign (topFD and ProMex compatible) file containing MS1 deconvolved spectra. Likewise, use -out_msalign2 for MS2 spectra. The file names for MS1 and MS2 should end with ms1.msalign and ms2.msalgin respectively to be able to be recognized by TopPIC GUI. (valid formats: ‘msalign’)
out_msalign2 Output msalign (topFD and ProMex compatible) file containing MS2 deconvolved spectra. The file name should end with ms2.msalign to be able to be recognized by TopPIC GUI. (valid formats: ‘msalign’)
out_feature1 Output feature (topFD compatible) file containing MS1 deconvolved features. Likewise, use -out_feature2 for MS2 features. The MS1 and MS2 feature files are necessary for TopPIC feature intensity output. (valid formats: ‘feature’)
out_feature2 Output feature (topFD compatible) file containing MS2 deconvolved features. The MS1 and MS2 feature files are necessary for TopPIC feature intensity output. (valid formats: ‘feature’)
keep_empty_out If set, empty output files (e.g., *.tsv file when no feature was generated) are kept.
mzml_mass_charge <0: uncharged 1: +1 charged -1: -1 charged> Charge state of deconvolved masses in mzml output (specified by out_mzml) (default: ‘0’) (min: ‘-1’ max: ‘1’)
write_detail To write peak information per deconvolved mass in detail or not in tsv files for deconvolved spectra. If set to 1, all peak information (m/z, intensity, charge and isotope index) per mass is reported.
precursor_snr Precursor SNR threshold for TopFD MS2 msalign tsv files. (default: ‘1.0’)
min_mz <m/z value> If set to positive value, minimum m/z to deconvolve. (default: ‘-1.0’)
max_mz <m/z value> If set to positive value, maximum m/z to deconvolve. (default: ‘-1.0’)
min_rt If set to positive value, minimum RT (in second) to deconvolve. (default: ‘-1.0’)
max_rt If set to positive value, maximum RT (in second) to deconvolve. (default:'-1.0')
ini Use the given TOPP INI file
log Name of log file (created only when specified)
instance Instance number for the TOPP INI file (default: ‘1’)
debug Sets the debug level (default: ‘0’)
threads Sets the number of threads allowed to be used by the TOPP tool (default: ‘1’)
write_ini Writes the default configuration file
write_ctd <out_dir> Writes the common tool description file(s) (Toolname(s).ctd) to <out_dir>
write_nested_cwl <out_dir> Writes the Common Workflow Language file(s) (Toolname(s).cwl) to <out_dir>
write_cwl <out_dir> Writes the Common Workflow Language file(s) (Toolname(s).cwl) to <out_dir>, but enforce a flat parameter hierarchy
write_nested_json <out_dir> Writes the default configuration file
write_json <out_dir> Writes the default configuration file, but compatible to the flat hierarchy
no_progress Disables progress logging to command line
force Overrides tool-specific checks
test Enables the test mode (needed for internal use only)
-help Shows options
-helphelp Shows all options (including advanced)


FLASHDeconv algorithm parameters (with prefix FD: )

parameter Description
FD:ida_log Log file generated by FLASHIda (IDA*.log). Only needed for coupling with FLASHIda acquisition
FD:report_FDR Report qvalues (roughly, point-wise FDR) for deconvolved masses. Decoy masses to calculate qvalues and FDR are also reported. Beta version.
FD:allowed_isotope_error Allowed isotope index error for decoy and FDR report. If it is set to 2, for example, +-2 isotope errors are not counted as false. Beta version. (default: ‘0’)
FD:use_RNA_averagine If set, RNA averagine model is used.
FD:preceding_MS1_count Specifies the number of preceding MS1 spectra for MS2 precursor determination. In TDP, the precursor peak of a MS2 spectrum may not belong to any deconvolved masses in the MS1 spectrum immediately preceding the MS2 spectrum. Increasing this parameter to N allows for the search for the deconvolved masses in the N preceding MS1 spectra from the MS2 spectrum, increasing the chance that its precursor is deconvolved. (default: ‘3’) (min: ‘1’)
FD:isolation_window Default isolation window with. If the input mzML file does not contain isolation window width information, this width will be used. (default: ‘5.0’)
FD:forced_MS_level If set to an integer N, MS level of all spectra will be set to N regardless of original MS level. Useful when deconvolving datasets containing only MS2 spectra. (default: ‘0’) (min: ‘0’)
FD:merging_method Method for spectra merging before deconvolution. 0: No merging 1: Average gaussian method to perform moving gaussian averaging of spectra per MS level . Effective to increase proteoform ID sensitivity (in particular for Q-TOF datasets). 2: Block method to perform merging of all spectra into a single one per MS level (e.g., for NativeMS datasets). (default: ‘0’) (min: ‘0’, max: ‘2’)


Spectral deconvolution parameters: (with prefix SD: )

parameter Description
SD:tol Ppm tolerance for MS1, 2, … (e.g., -tol 10.0 5.0 to specify 10.0 and 5.0ppm for MS1 and MS2, respectively) (default: ‘[10.0 10.0]')
SD:min_mass Minimum mass (Da) (default: ‘50.0’)
SD:max_mass Maximum mass (Da) (default: ‘1.0e05’)
SD:min_charge Minimum charge state for MS1 spectra (can be negative for negative mode) (default: ‘1’)
SD:max_charge Maximum charge state for MS1 spectra (can be negative for negative mode) (default: ‘100’)
SD:precursor_charge Charge state of the target precursor. All precursor charge is fixed to this value. This parameter is useful for targeted studies where MS2 spectra are generated from a fixed precursor (e.g.,Native-MS). (default: ‘0’) (min: ‘0’)
SD:precursor_mz Target precursor m/z value. This option must be used with -target_precursor_charge option. Otherwise, it will be ignored. If -precursor_charge option is used but this option is not used, the precursor m/z value written in MS2 spectra will be used by default.(default: ‘0.0’) (min: ‘0.0’)
SD:min_cos Cosine similarity thresholds between avg. and observed isotope pattern for MS1, 2, … (e.g., -min_cos 0.3 0.6 to specify 0.3 and 0.6 for MS1 and MS2, respectively) (default: ‘[0.85 0.85]')
SD:min_snr Minimum charge SNR (the SNR of the isotope pattern of a specific charge) thresholds for MS1, 2, … (e.g., -min_snr 1.0 0.6 to specify 1.0 and 0.6 for MS1 and MS2, respectively) (default: ‘[1.0 1.0]')
SD:max_qvalue Qvalue thresholds for MS1, 2, … Effective only when FDR estimation is active. (e.g., -max_qvalue 0.1 0.2 to specify 0.1 and 0.2 for MS1 and MS2, respectively) (default: ‘[1.0 1.0]')


Feature tracing parameters: (with prefix ft: )

parameter Description
ft:mass_error_ppm Feature tracing mass ppm tolerance. When negative, MS1 tolerance for mass deconvolution will be used (e.g., 16 ppm is used when -SD:tol 16). (default:'-1.0’)
ft:quant_method Method of quantification for mass traces. For LC data ‘area’ is recommended, ‘median’ for direct injection data. ‘max_height’ simply uses the most intense peak in the trace. (default: ‘area’) (valid: ‘area’, ‘median’, ‘max_height’)
ft:min_sample_rate Minimum fraction of scans along the feature trace that must contain a peak. To raise feature detection sensitivity, lower this value close to 0. (default: ‘0.1’)
ft:min_trace_length Minimum expected length of a mass trace (in seconds). Only for MS1 (or minim um MS level in the dataset) feature tracing. For MSn, all traces are kept regardless of this value. (default: ‘10.0’)
ft:max_trace_length Maximum expected length of a mass trace (in seconds). Set to a negative value to disable maximal length check during mass trace detection. (default:'-1.0’)
ft:min_cos Cosine similarity threshold between avg. and observed isotope pattern. When negative, MS1 cosine threshold for mass deconvolution will be used (default: ‘-1.0’)


Isobaric quantification parameters: (with prefix iq: )

parameter Description
iq:type Isobaric Quantitation method used in the experiment. (default: ‘none’) (valid: ‘none’, ‘itraq4plex’, ‘itraq8plex’, ‘tmt10plex’, ‘tmt11plex’, ‘tmt16plex’, ‘tmt18plex’, ‘tmt6plex’)
iq:isotope_correction Enable isotope correction (highly recommended). Note that you need to provide a correct isotope correction matrix otherwise the tool will fail or produce invalid results. (default: ‘true’) (valid: ‘true’, ‘false’)
iq:reporter_mz_tol M/z tolerance in Th from the expected position of reporter ion m/zs. (default: ‘2.0e-03’)


Tagger parameters: (with prefix tagger: )

parameter Description
tagger:max_tag_count Maximum number of the tags per length (lengths set by -min_length and -max_length options). The tags with different amino acid combinations are all treated separately. E.g., TII, TIL, TLI, TLL are distinct tags even though they have the same mass differences. but are counted as four different tags. (default: ‘0’) (min: ‘0’)
tagger:min_length Minimum length of a tag. Each mass gap contributes to a single length (even if a mass gap is represented by multiple amino acids). (default: ‘4’) (min:‘3’ max: ‘30’)
-tagger:max_length Maximum length of a tag. Each mass gap contributes to a single length (even if a mass gap is represented by multiple amino acids). (default: ‘10’) (min: ‘3’ max: ‘30’)
tagger:flanking_mass_tol Flanking mass tolerance in Da. (default: ‘200.0’)
tagger:max_iso_error_count Maximum isotope error count per tag. (default: ‘0’) (min: ‘0’ max: ‘2’)
tagger:min_matched_aa Minimum number of amino acids in matched proteins, covered by tags. (default: ‘5’)
tagger:fasta Target protein sequence database against which tags will be matched.
tagger:out Tagger output file.


Running FLASHDeconv with GUI

GUI command is found under [OpenMS path]/bin directory. From the bin directory, type

./FLASHDeconvWizard

And this window pops up.

From the “LC-MS files” menu you can select (possibly multiple) mzML files to analyze. The selected files are analyzed with the same parameter set.

Then if you go to the “Run FLASHDeconv” menu, you can control all the parameters and output options.

You can see the progress in the log window.

The default output folder is [home directory]/FLASHDeconvOut folder. You may change this by using Browse button in the right side. Below we have four toggle output buttons.

If “masses per spectrum” is selected (selected by default), spectrum level deconvolution results (per MS level) are generated (in tsv format). In the command line, this is controlled with -out_spec option, and users must specify file name per MS level. But in GUI, simply activating this “masses per spectrum” button will set the output spectrum file name per MS level automatically.

If “mzML” is selected, the deconvolved spectra are generated in mzML format.

“Promex (.ms1ft)” triggers the Promex format output generation (only for MS1), and “TopFD (.msalign,*.feature)” triggers the TopFD format output generation (both msalign and feature formats). Again, these buttons override -out_promex, -out_topFD, and -out_topFD_feature in the command line and automatically set the output file names.

The box below the toggle buttons controls the parameters. In default it shows only basic parameters. If the “Show advanced parameters” toggle button is activated, the advanced parameters will appear.

Lastly, the “Log” menu shows the log from FLASHDeconv. During or after FLASHDeconv run, one may check the log from FLASHDeconv from this menu. Here, also the command line commands corresponding to the current parameter selection by GUI also appear for reference and future use.

Running FLASHDeconv on command line

Runnable FLASHDeconv file can be found under [OpenMS path]/bin directory.

The mandatory options are -in and -out options. FLASHDeconv 2.0 only takes mzML file as its input. Basic parameters could be adjusted by the user according to instrumental setup. For input mzML file conversion from raw file, we recommend to use MSConvert with vendor provided peak picking methods.

For example if one wants to deconvolve /User/me/data/infile.mzml and get the result /User/me/out/outfilefeature.tsv,

one could run FLASHDeconv by typing as follows in the directory where FLASHDeconv is installed.

./FLASHDeconv -in /User/me/data/infile.mzml -out /User/me/out/outfilefeature.tsv

Output files

  • Deconvolved feature file (*.tsv) specified by -out
  • (optional) Deconvolved MSn spectra files (*.tsv) specified by -out_spec
  • (optional) Deconvolved mzML spectra file (*.mzML) specified by -out_mzml
  • (optional) Deconvolved MS1 in promex output format (*.ms1ft) specified by -out_promex
  • (optional) Deconvolved MSn spectra files in topfd output format (*.msalign) specified by -out_topFD
  • (optional) Deconvolved MSn feature files in topfd output format (*.feature) specified by -out_topFD_feature

Example datasets

Mass spectrometry datasets(*.raw and *.mzML) and corresponding results have been uploaded to MassIVE (https://massive.ucsd.edu) and are available under accession number MSV000084001.