PIONEER: Pipeline for Generating High‐Quality Spectral Libraries for DIA‐MS Data

Data‐independent‐acquisition mass spectrometry (DIA‐MS) is a state‐of‐the‐art proteomic technique for high‐throughput identification and quantification of peptides and proteins. Interpretation of DIA‐MS data relies on the use of a spectral library, which is optimally created from data acquired from the same samples in data‐dependent acquisition (DDA) mode. As DIA‐MS quantification relies on the spectral libraries, having a high‐quality, non‐redundant, and comprehensive spectral library is essential. This article describes the major steps for creating a high‐quality spectral library using a combination of multiple complementary search engines. We discuss appropriate strategies to control the false discovery rate for the final spectral library as a result of merging multiple searches. © 2021 The Authors Current Protocols © 2021 Wiley Periodicals LLC.


INTRODUCTION
There has been an exponential increase in the use of mass spectrometry (MS)−based proteomics techniques in the last two decades. Data-dependent acquisition (DDA) and dataindependent acquisition (DIA) are the most common MS data acquisition techniques. The DDA mode is generally used in discovery studies with the aim of identifying the maximal number of proteins from a limited number of complex biological samples. By contrast, DIA is more frequently used to quantify proteins by combining the merits of both DDA and targeted acquisition methods such as selective reaction monitoring (SRM; Ludwig et al., 2018;Peterson, Russell, Bailey, Westphall, & Coon, 2012), enabling largescale and consistent protein quantification. Sequential windowed acquisition of all theoretical fragment ion spectra (SWATH)-MS operates in DIA mode, which can accurately quantify thousands of proteins in a reproducible manner (Collins et al., 2017;Poulos et al., 2020). Peptide identification in DIA-MS data requires a spectral library, which is a curated, searchable, and non-redundant collection of peptide tandem mass spectra. These spectra are usually generated by pooling and fractionating cohort samples running in DDA mode. The acquired spectra are searched against the theoretical spectra that are generated by in silico digestion of a protein database. The spectral library thus serves as a template, providing information about the underlying protein, peptide sequences, massto-charge ratios (m/z) of precursor and fragment ions, precursor and fragment charges, fragment ion types, relative fragment ion intensities, and normalized retention time. By comparing the tandem mass spectra generated in DIA mode with the information in the spectral library, peptides can be reliably identified and accurately quantified (Ludwig et al., 2018).
The protocols in this article provide a step-by-step guide to the generation of a highquality spectral library using a combination of search engines to increase the protein coverage and to control the false discovery rates (FDR) in order to minimize incorrect identifications (Fig. 1). Basic Protocol 1 describes how to perform searches using three complementary open-source search engines, namely X!Tandem (Craig & Beavis, 2004), Comet (Eng, Jahan, & Hoopmann, 2013), and MSGF+ (Kim & Pevzner, 2014). Basic Protocol 2 illustrates how to merge the results from different search engines using Pep-tideShaker (Vaudel et al., 2015). Basic Protocol 3 presents the final step of spectral library generation, which uses Skyline (MacLean et al., 2010) to create the final library from the merged results. The Alternate Protocol depicts a command-line version for Basic Protocols 1 and 2, which can be used to automate large-scale jobs consisting of multiple fractionated samples. Also, a Support Protocol demonstrates the creation of a concatenated FASTA database containing decoy sequences, and the merging of multiple spectral libraries with retention time differences using iSwathX (Noor et al., 2019). Basic Protocols 1 and 2 and the Support Protocol can be implemented in either Windows or Linux environments, and Basic Protocol 3 and the Alternate Protocol require Windows 10 or later. The final library is compatible with OpenSWATH (Rost et al., 2014) and other common DIA-MS analysis tools such as Peakview ® , Skyline (MacLean et al., 2010), Spectronaut (Bruderer et al., 2015), and DIA-NN (Demichev, Messner, Vernardis, Lilley, & Ralser, 2020) when formatted accordingly. All files described in these protocols can be downloaded from the link provided in Internet Resources.

STRATEGIC PLANNING
Peptide identification is the most time-consuming step in Basic Protocol 1 and Alternate Protocol. There is a range of search engines available for this task, and each one has its advantages and disadvantages. Many studies have reported increased identifications with the use of multiple search engines (Cho et al., 2015;Matthiesen, Prieto, & Beck, 2020;Paulo, 2013;Shteynberg, Nesvizhskii, Moritz, & Deutsch, 2013). While Basic Protocol 1 describes the use of three search engines, namely X!Tandem, Comet, and MSGF+, Basic Protocol 2 shows the merged results of a different combination of three search engines consisting of X!Tandem, Mascot (Perkins, Pappin, Creasy, & Cottrell, 1999), and MSGF+. These two different sets of search engines were used to illustrate the versatility of the protocols. The four search engines used in the two sets were chosen based on complementarity, compute resource requirements, and run time, weighted by the requirement for a commercial license for Mascot. Researchers without a commercial license for Mascot can utilize the other three search engines, which are open source. If computer resources are limited, researchers are encouraged to use either X!Tandem or Comet only for faster computation, whereas stand-alone MSGF+ can be used for more thorough searches. It is advised to use an odd number of search engines, which allows consensus identifications by majority voting. The protein databases should be in FASTA format, and the decoy sequences (preferably reverse sequences) should have a suffix of _Reversed appended to the FASTA header. Also, retention time (RT) peptides (Searle et al., 2018) should be added to the same database before initializing any search.

SEARCHING DDA-MS FILES WITH MULTIPLE SEARCH ENGINES
Raw data are first converted to the Mascot generic format (MGF), which can be converted from proprietary instrument files of various MS vendors such as SCIEX, ThermoFisher, Bruker, and Agilent. The resulting MGF file will be searched against the respective protein database of interest. Sample data (Supp.Data) are provided from HEK293 cell line fractions acquired in DDA mode on a SCIEX TripleTOF 6600 instrument with a 90-min high performance liquid chromatography (LC) gradient. The data files are in SCIEX wiff format. In this protocol, we use the SearchGUI (Barsnes & Vaudel, 2018) tool, which provides an easy-to-use graphical user interface (GUI) for searching using multiple search engines. It supports the following search engines: X!Tandem, MyriMatch (Tabb, Fernando, & Chambers, 2007), MS Amanda (Dorfer et al., 2014), MS-GF+ (Kim & Pevzner, 2014), Comet, Tide (Diament & Noble, 2011), and Andromeda (Cox et al., 2011). Here, X!Tandem, MS-GF+ and Comet are used as the three default search engines, and others can be selected if required. The protein database used in this study consists of Uniprot (UniProt, 2019) canonical protein sequences appended with decoys and RT peptides.

Necessary Resources Hardware
A computer with Windows 10 or later, or Ubuntu, preferably a workstation A minimum of 16 GB RAM

Software (download the latest versions from the links provided in Internet Resources)
MSConvert (Proteowizard) SearchGUI Java version 8.0 or higher

Input files
Spectrum raw files such as wiff, raw, etc. Protein database in FASTA format (with decoys appended) Parameter file (par) Manda et al.

Figure 2
MSConvert main interface to add the input wiff files, set the parameters and convert to MGF format.
Converting raw files to MGF format 1. Open the MSConvertGUI and change the default settings to vendor-specific as displayed in Figure 2. Browse and locate the folder with the 10 HEK wiff files (../HEK_suppdata/RAW_files/). The default output directory will be the same directory. Change if you want a different location.
The MGF format is a generic format accepted by almost all search engines. Because it is a time-consuming step, users are advised to convert all files beforehand. Click on the "Spectrum Matching" tab to verify that the settings and location of the FASTA database (../HEK_suppdata/FASTA_database/..) is correct (Fig. 4). Click "Ok" to return to the main screen. 7. Choose an output folder for the result files and click on "Start the Search." The search should be completed in about an hour or more depending on the system's memory and available cores. Mascot outputs a dat format, X!Tandem outputs XML, Comet outputs pepXML, and MSGF+ outputs mzIdentML (mzid). Three search engines yield 30 output files.

MERGING RESULTS FROM MULTIPLE SEARCH ENGINES
Here, we describe the merging of results from different searches into a single mzid file. This can be achieved by using PeptideShaker software (Vaudel et al., 2015). Pep-tideShaker reanalyzes the results and converts scores of different search engines to posterior error probability values. These values are used to combine different libraries internally. It also handles the FDR at various levels of interest using the target-decoy approach. In the current approach, we combine the results from a different set of three search engines, namely X!Tandem, Mascot, and MSGF+, after applying 1% FDR at peptidespectrum match (PSM), peptide and protein levels for the final results. Although the tool is available both for the GUI and command-line interface (CLI), we describe only GUI here. Procedures for CLI can be found in the Alternate Protocol.

Necessary Resources Hardware
A computer with Windows 10 or later or Ubuntu, preferably a workstation A minimum of 16 GB RAM Manda et al.

of 24
Current Protocols 2. Fill the required fields in the "PeptideShaker-New Project" module of the software (Fig. 6).
3. Specify a project name in the line marked "Project Reference" (Fig. 6).
4. Specify a sample name in the line marked "Sample Name" (Fig. 6).
5. Under the "Input Files" box, browse and locate the folder with the identification files (../HEK_suppdata/SearchEngineResults/..) (Fig. 6) in the space line marked "Identification File(s)." These are the search result files from the different search engines. In this case, we have 30 result files from three search engines.

of 24
Current Protocols

Figure 6
PeptideShaker "New Project" module to provide project details, input files, and search parameters.
7. Click on Browse in the line marked "Database File" to set the identification parameters used during the searching/identification (Basic Protocol 1, step 6). This will lead to another module, "Identification Settings." The parameters can be saved as par format for future use.
To change or set any of the individual parameters, follow the points 9-13. The settings are the same as described earlier in Basic Protocol 1 for SearchGUI (Fig. 4).
Database: Select the same FASTA database as in Basic Protocol 1. Modifications: Select the same "Fixed Modifications" and "Variable Modifications" as in Basic Protocol 1, step 6. Protease & Fragmentation: Select the same as in Basic Protocol 1, step 6.
11. In addition to the "Spectrum Matching" settings, click "Show Advanced Settings" to further specify spectrum and precursor/fragment settings (Fig. 7).
Manda et al.

of 24
Current Protocols

Figure 7
PeptideShaker "Identification Settings" module to provide "Spectrum Matching" parameters and "Advanced Settings." The parameter file can be imported by "Import from File." 14. Go back to "PeptideShaker-New Project" module and click "Load Data!". This will load the input files, start the data processing, and perform the merging and filtering of the data.
Output 15. Once completed, PeptideShaker will show the list of identified proteins and peptides, their precursor and fragment level spectra, and other spectral information in different tabs in the GUI (Fig. 9).
16. Save the PeptideShaker project in Compomics Peptide Shaker Format (cpsx). The already saved results can be found at (../HEK_suppdata/ PeptideShakerResults/). These files can be reloaded in the PeptideShaker to visualize the results anytime later (Fig. 10).
17. In the PeptideShaker, click "Export Project" to export the results in mzid format (../HEK_suppdata/PeptideShakerResults/..) (Fig. 10). This file will be used as an input in Basic Protocol 3 to generate the final spectral library.

Figure 8
PeptideShaker "Advanced Settings" in "Identification Settings" to set (A) "Import Filters," which allow setting the minimum and maximum peptide length, missed cleavages, and isotopes of the peptide to include in the library, and (B) "Validation Levels," which allow setting the False Discovery Rate (FDR) at all protein, peptide, and PSM levels.

Figure 9
PeptideShaker results interface showing detailed results at protein, peptide, and PSM level. Protein coverage, peptide confidence, and fragment level spectra can be visualized in the main interface.
Manda et al.

of 24
Current Protocols Figure 10 PeptideShaker interface to save and export the results. This module allows saving the project in cpsx and zipped format. The merged results can be exported in mzid format, which is compatible with the PRIDE repository.

CREATING SPECTRAL LIBRARIES FROM MERGED RESULTS
The final step of the procedure consists of generating the spectral library from the merged mzid file from Basic Protocol 2. Here, we use the Skyline (MacLean et al., 2010) interface to generate the library from the output of PeptideShaker. Skyline provides a detailed set of parameters for precursor and fragment ions, along with their charges and modifications being included in the library. Moreover, it has a module to calibrate the retention time using standard RT peptides, either pre-defined or set by the user. The final spectral library with the calibrated retention time can be exported from Skyline to different formats, and can be directly incorporated into a range of DIA-MS data analysis tools.

Necessary Resources Hardware
A computer with Windows 10 or later, preferably a workstation A minimum of 16 GB RAM

Software
Skyline (download the latest version from the link provided in Internet Resources)

Input files
MGF files and location Merged result file from PeptideShaker (Basic Protocol 2) 1. Open Skyline and create a new document from the "File" menu.
2. Before importing and building the library from the mzid file from Basic Protocol 2, "Peptide" and "Transition" settings need to be set using the "Settings" menu, which defines what precursor and fragment ions should be included in the library.
The peptide settings specified below are specific to the example provided in this study. Based on these settings, researchers are advised to adjust the settings for their projects accordingly.
The transition settings specified below are specific to acquisition settings for the SCIEX instrument in this study. Researchers are advised to adjust the settings according to the acquisition method in their experiment.
Manda et al.

of 24
Current Protocols Internal standard type None Figure 11 Skyline module for building the library from PeptideShaker results. The confidencescore cut-off and standard RT peptides can be selected in this module.
5. The next step is to build the library. For this, go to the "Settings" > "Peptide Settings" > "Build Library" module ( Fig. 11) and fill it as follows: "Name": Provide the name for the library. "Output Path": Set the output path where the library files are saved. "Data source": Set the data source as "Files." "Cut-off score": Set the cut-off score as 0.99. Uncheck "Keep redundant library." Manda et al.

of 24
Current Protocols 6. After filling these settings, press "Next." In the "Build Library" module, click "Add Files" and import the mzid file from the PeptideShaker output files (../HEK_suppdata/PeptideShakerResults/..). Then, click "Finish." 7. Skyline will start reading and importing the mzid file, and the status can be seen at the bottom left of the Skyline interface.
8. After it finishes reading the file, the "Spectral Library Explorer" module will appear. This module can also be accessed from "View" > "Spectral Libraries." "Spectral Library Explorer" also shows the list of those modifications that are found in the peptides in addition to those already defined in Table 1 using a separate module called "Add Modifications." In "Spectral Library Explorer," each peptide and its corresponding spectrum can be visualized (Fig. 12). It will generate the spectral library in blib format (../HEK_suppdata/Skyline/..).

For simplicity, we have selected unmodified peptides only [excluding Carbamidomethylation (C)]. Researchers can select the modifications of their interest based on the experimental design and biological question of interest.
Manda et al.

of 24
Current Protocols

Figure 12
Skyline module for spectral library explorer. Using this explorer, fragment spectra for each peptide in each library can be visualized.
9. To export this library from Skyline, these peptides would need to be added to the target list in Skyline. To add the peptides to the target list, click "Associate Proteins" and "Add All" in "Spectral Library Explorer" module. During the process of adding peptides to the target list, Skyline will notify if any peptides belong to more than one protein. Click "Do not Add" and click "OK" to add all the unique peptides and associated proteins in the target list. The number of proteins, peptides, precursors, and transitions can be visualized at the bottom right of the Skyline interface (Fig. 13).
10. To perform the retention time calibration and generate indexed retention time (iRT) peptides, go to the "Settings" > "Peptide Settings" > "Prediction" tab. In the "Retention Time Predictor" module, click on the small calculator symbol and click "Add" to add a new calculator to predict the retention time of the peptides (Fig. 14A). In the "Edit iRT Calculator" module, fill as follows: Name: Provide the name for the calculator. iRT database: Provide the path where the iRT database with predicted retention time values is saved. iRT standards: Either select from the given set of standard peptides or click on "Add" to add the list of your standard peptides.
Other iRT values: Add all the library peptides in the calculator to retrieve their iRTs (indexed retention times) by clicking "Add" and "Add Spectral Library." After adding all the peptides, a dialog box will appear. It shows that the peptides have been added successfully along with the regression model. A plot of actual and predicted retention times can be visualized by clicking "Success" in this dialog box. Click "Ok" to finish creating the calculator. The resulted calculator in irtdb format will also be saved in the same folder (../HEK_suppdata/ Skyline/..).

of 24
Current Protocols

Figure 13
Skyline module for the target list. Proteins, peptides, and transitions added from the library can be visualized here along with the spectra.
11. To retrieve these iRTs, the last step is to add a predictor. For this, go to "Settings" > "Peptide Settings" > "Prediction" tab. In "Retention Time Predictor" module ( Fig.  14B), click on the drop-down menu and click "Add" and fill as follows: Name: Provide the name for the predictor. Calculator: Select the calculator created in the previous step.
13. This library can now be used with OpenSWATH or converted and used in any DDA-MS library-based DIA-MS data analysis software, including but not limited to PeakView, Spectronaut, Skyline, and DIA-NN (Demichev et al., 2020).

USING COMMAND-LINE INTERFACE (CLI) FOR AUTOMATING TASKS
Users can perform all of the steps in Basic Protocols 1 and 2 using the CLI of MSConvert, SearchGUI, and PeptideShaker, which requires some basic knowledge of shell scripting. This protocol is recommended when researchers have a large number of DDA-MS raw files and aim to automate the process of library generation. Once all the parameters and settings are optimized by the users, they can be used in the CLI.

Necessary Resources Hardware
Same as in Basic Protocols 1 and 2

of 24
Current Protocols Figure 15 Preview of the columns to be included in the library file while exporting from Skyline using the "Export Report" module of the software.

Access to CLI on Windows or Linux PeptideShaker project file (cpsx)
Converting raw files to MGF format 1. On a Windows machine, open "Windows" > "Command Prompt" and navigate to the folder containing the installation of MSConvert, usually at C:ࢨProgram Files (x86)ࢨProteoWizard 3.0.18351 64-bitࢨ on a 64-bit machine.
$ cd C: ࢨProgram Files (x86)ࢨProteoWizard 3.0.18351 64-bitࢨ 2. Run the following command in the folder (on a Windows machine), assuming the wiff files are in the location HEK_suppdata/RAW_files/ and the desired output folder is HEK_suppdata/RAW_files/.

Figure 16
Preview of the concatenated target/decoy database used for performing searches, displaying the type and version of database and number of sequences stored in the database.

Searching using SearchGUI
3. SearchGUI contains an built-in CLI called SearchCLI. To run SearchCLI, navigate to the folder containing the SearchGUI installation. The following commands assume that all converted MGF files are in the location HEK_suppdata/RAW_files/. The parameter file is the same as used in Basic Protocols 1 and 2.

Run the command without any arguments to access additional parameters or to choose different search engines. These commands can be run on either a Windows or Linux installation of SearchGUI.
Merging search results using PeptideShakerCLI 4. Assuming the MGF files are located in HEK_suppdata/RAW_files/ and search results from an earlier step are in HEK_suppdata/PeptideShakerResults/, navigate to the folder containing the PeptideShaker installation and type the following: $ java -cp PeptideShaker-X.Y.Z.jar eu.isas. peptideshaker.cmd.PeptideShakerCLI -experiment HEK -sample HEK_samples -replicate 0 -identification_ files HEK_suppdata/PeptideShakerResults/ -spectrum_ files HEK_suppdata/RAW_files/ -out HEK_suppdata/ PeptideShakerResults/ ThreeSearchEnginePepShaker. cpsx -log HEK_suppdata/srllog -db HEK_suppdata/ FASTA_database /20180608_Uniprot_Canonical_RT_ concat_target_decoy.Fasta -id_params HEK_suppdata/ threesearchengine_50ppm.par The -experiment, -sample, and -replicate are free text information, which can be added according to the user's experiment. -log <dest> is recommended, as it generates a log file of all commands executed and helps in debugging. This command generates a peptideshaker project file with a cpsx extension. This can be loaded into the GUI for any further analysis. These commands can be run on either a Windows or Linux installation of SearchGUI.
Exporting as mzid 5. The cpsx project file can be used with peptideshakerMzidCLI to export the results as mzid, which will contain results from all the searches conducted. This file can be further used to create the final spectral library. Navigate to the folder with the peptideshaker installation and run the following command: $ java -cp PeptideShaker-X.Y.Z.jar eu.isas. peptideshaker.cmd.MzidCLI -in HEK_suppdata/ PeptideShakerResults/ ThreeSearchEnginePepShaker. cpsx -output_file HEK_suppdata/PeptideShaker Results/ ThreeSearchEnginePepShaker.mzid -contact_first_name YourFirstName -contact_last_ name YourLastName -contact_email yourname@ university.edu -contact_address "Your address" -organization_name OrganizationName -organization_ email "yourname@university.edu" -organization_address "Your Address" This will generate a single mzid file from all the search results. Replace the personal details with ones pertaining to your experiment. The mzid file can then be used to follow Basic Protocol 3 as described earlier.

CREATING CONCATENATED FASTA FILES
SearchGUI provides a quick way to create a concatenated target/decoy database using the GUI or command line using FastaCLI.

Necessary Resources Hardware
Same as in Basic Protocols 1 and 2

Software (Converting)
SearchGUI Database (FASTA format) 1. To create a combined FASTA file using the GUI, open the SearchGUI "Spectrum Matching" settings as explained in Basic Protocol 1 (Fig. 4).
2. Click "Edit" on the "Database (FASTA)" and select the database file FASTA of interest. A prompt will appear The selected FASTA file does not seem to contain decoy sequences. Add decoys?. Click "Yes" 3. Decoy will be appended and the information about database details such as "Name," "Species," "Type(s)," "Version" are displayed as shown in Figure 16.
To create the combined FASTA in CLI, navigate to the SearchGUI installation folder and execute: $ java -cp SearchGUI-X.Y.Z.jar eu.isas.searchgui. cmd.FastaCLI -in NameofFastafile -decoy The output will be a NameofFastafile_concatenated_target_decoy. fasta. This can be further used for all the analysis in all Basic Protocols and Alternate Protocol.

Background Information
DDA is a method where a fixed number of precursor ions are selected on the basis of abundance and analyzed by tandem MS, while DIA is an alternative approach that continuously acquires fragment-ion spectra in an unbiased fashion. SWATH-MS is a state-ofthe-art DIA method, which allows fast mass spectrometric conversion of small amounts of tissue into a single, permanent digital file representing the quantitative proteome of a biological sample (Guo et al., 2015;Ludwig et al., 2018). This technique, uses peptidecentric scoring for large-scale identification and quantification of peptides and proteins on the basis of robustness, quantitative characteristics, and a high degree of reproducibility (Gillet et al., 2012;Ludwig et al., 2018).
A variety of strategies have been developed to analyze the SWATH-MS data, which include both spectrum-centric and peptidecentric methods (Ting et al., 2015). The peptide-centric approach relies on a highquality spectral library. The spectral library contains the information on the m/z and LC retention times for all representative peptide features in the samples (Ludwig et al., 2018). The generation of these libraries usually requires acquisition of MS data in DDA mode under the same LC conditions as in DIA mode. The libraries are generated from pooled and fractionated samples or synthetic analogs of peptides of interest. A wide range of chromatographic chemistries are available for fractionation of pooled samples (Yeung et al., 2020). Spectral libraries can be sample-specific or generated using publicly available resources. The SWATHAtlas (http: // www.swathatlas.org) is a publicly available resource, which contains published spectral libraries on several species including human, E. coli, and yeast. Since the instrument and LC conditions differ, the ideal practice is to generate a sample-specific library (Ludwig et al., 2018). A drawback of the spectral library approach is that peptides can be only identified when they are present in the library. As alternatives, spectral library−free or spectrum-centric approaches have been developed, which can generate libraries from the DIA-MS data without the need for any sample-specific libraries (Demichev et al., 2020;Tiwary et al., 2019;Tsou et al., 2015;Yang et al., 2020).
A typical procedure of spectral library generation can be broadly classified into four main steps: (1) searching the raw spectra against a database of interest, (2) merging of results from different searches and statistical validation, (3) retrieving confidently identified spectra and creating a consensus library, and (4) further quality filtering on the library. For large-scale studies, libraries generated on different mass spectrometers under different LC conditions, and from different biological samples, tend to have differences in their retention times. Such libraries can be merged by iSwathX (Noor, Mohamedali, & Ranganathan, 2020), which creates a single unified library with the retention time alignment. Over the past years, several software tools have been developed for the generation of spectral libraries, such as SpectraST (Lam et al., 2007), X!Hunter (Craig, Cortens, Fenyo, & Beavis, 2006), Bibliospec (Frewen, Merrihew, Wu, Noble, & MacCoss, 2006), Pepitome (Dasari et al., 2012), and MSPepSearch (https: // chemdata.nist.gov/ ). These are mostly built for DDA-MS data analysis. Similar tools for DIA-MS were lacking before Schubert et al. (2015) provided detailed steps to generate spectral libraries using open-source tools Manda et al.  (Deutsch et al., 2010), ProteoWizard , and OpenMS (Sturm et al., 2008). Some of these tools and steps are tedious and hard to execute without programming knowledge. Although commercial tools are user-friendly, they provide fewer controls over the workflow and require licenses. ProteinPilot-PeakView (Shilov et al., 2007) and Pulsar-Spectronaut (Bruderer et al., 2015) are two popular commercial products. We present here an easyto-use procedure consisting of three protocols, which generates a high-quality spectral library from multiple searches using a variety of open-source software packages. These protocols have been fully automated at the ACRF International Centre for the Proteome of Human Cancer (ProCan ® ), which is capable of processing 10,000 tumor samples per year with six mass spectrometers operating in concert (Poulos et al., 2020;Tully et al., 2019) to generate sample-specific high-quality spectral libraries.

Critical Parameters
Parameters that have a significant impact on the overall performance and run time include the selection of candidate search engines, search parameters, and the number of fractions used for search. The selection of search engines is crucial because each search engine provides its own unique set of identification features. Studies in the past have compared different search engines (Cho et al., 2015;Matthiesen et al., 2020;Searle et al., 2018;Shao & Lam, 2017) to identify the optimal combinations. Our experience and published studies suggest that adding one more search engine leads to an increase in identifications (5%-10%), albeit with an increased ac-cumulation of false positives (Barkovits et al., 2020;Jones, Siepen, Hubbard, & Paton, 2009;Tu et al., 2015). Researchers are thus advised to use multiple search engines with caution. The search parameters are important because they specify parent and fragment ion mass tolerance, enzymes, number of missed cleavages, and the size of the protein database. A larger number of fractions facilitates a deeper coverage of the proteome (Mertins et al., 2018;Yeung et al., 2020).

Troubleshooting
See Table 3 for problems that may arise with these protocols, along with the possible causes and solutions.

Time Considerations
The most time-consuming step is the conversion from raw files to the MGF format, which can take around 12-15 min per file. The search time depends on different parameters such as modifications selected, size of the database, and size of the acquired data file. The search speed ranks from the fastest to the slowest are Comet, X!Tandem, Mascot, and MSGF+. A typical search of a raw file against the human proteome database of about 20,000 proteins with decoys takes about 10 min. The merging step of Basic Protocol 2 usually takes about 20 min to 1 hr depending on the number of files. The final spectral library generation takes around 30 min.