The design of peptide arrays is inherently more complex than that of DNA arrays. µPepArray Pro features strong content design capabilities and maximized user control.
Peptide microarrays (peptide arrays) have increasingly become an important research tool for studying protein detection, profiling, and protein–protein interactions, and they have the potential to foster high throughput protein analysis as DNA arrays did for genomics research a decade ago. Recently, technologies have emerged that allow flexible synthesis of high-density peptide arrays based on specific application needs (e.g., phosphopeptide microarrays). To fully unleash the power of this promising research tool, significant efforts are required to develop computational and informatics resources that facilitate the experimental design and data analysis for a wide range of peptide array-based applications. The design of peptide arrays is inherently more complex than that of DNA arrays. We herein introduce µPepArray Pro, a Web-based general-purpose peptide array design program. µPepArray Pro features strong content design capabilities and maximized user control. The program suits the needs of a diversity of design tasks, works with a variety of peptide array configurations, and is highly expandable: new functionalities can be developed and added to µPepArray Pro with relative ease.
µPepArray Pro is accessible at
The design of a peptide array is the process of determining a proper way to fill in – using a set of peptides – the spatially arranged spots (or features) in a pre-defined array configuration, in order to accomplish the objectives of a peptide array based study. In this application note, we describe the use of µPepArray Pro, the first general-purpose peptide array design tool. µPepArray Pro is a Web-based program capable of guiding the researcher through the entire peptide array design process – from content design (task-based seed peptide selection, derived peptide generation, and peptide sequence evaluation) through array layout writing. µPepArray Pro is developed to suit a diversity of design needs and work with a variety of peptide array configurations, and it is highly expandable – new functionalities can be developed and added into each element of the array design process with relative ease.
Peptide Array Design Process
The aim of an array design process is to design a set of probes, and devise a proper way to fill in the spatially arranged spots (or features) using these probes in a pre-defined array configuration, to accomplish the objectives of an array-based study.
Studies involving peptide array experiments are often ‘‘hypothesis-driven’’ in nature. In these studies, the researchers are immensely interested in probe-level details – what is the peptide sequence, how is the sequence generated (is it part of a natural protein or is it an artificial peptide? – in the latter case, are there similar peptides occurring in any natural proteins?)? Therefore, in peptide array design, the researcher needs to sit in the ‘‘driver’s seat’’ and be given much more control over the content design process than in oligonucleotide array design. A proficient peptide array design program must allow ample flexibility for the user to compose the lists of peptides freely as he or she desires; meanwhile, it should provide the user with adequate assistance in the functionalities he or she needs (e.g., for protein database look up, for derived peptide writing, i.e., generating sequences related to a given peptide as specified by the user) to carry out the array design efficiently. Additionally, a useful feature of a proficient peptide array design program should have is to put in place an undesired sequence pattern filter to help the researcher remove peptides that are problematic during array synthesis.
µPepArray Pro Features
µPepArray Pro is a fully functional peptide array design program developed to suit a diversity of design needs and work with a variety of peptide array configurations, although it was developed with µParaflo® microfluidic chip platform1 as the primary target platform. Conceptually, a µPepArray Pro-based peptide array design process consists of two components – the content design component and the array layout component. The content design component further comprises two elements: (i) constructing task-based peptide groups (TBPGs) and (ii) assembling the array-level peptide list (ALPL).
Figure 1 – A flow chart of peptide array design process using µPepArray Pro
In a peptide array study, the researcher often desires to carry out multiple, relatively independent tasks on a single array experiment. For example, in a B-cell epitope screening study, the researcher might want to examine (with a tiling design paradigm) two different segments of the antigen protein. As another example, in a kinase substrate specificity study, the researcher might want to include the reported substrate peptides (and some peptide sequences derived from them) of several different kinases in a single peptide array, and perform binding experiments using each kinase to check its interactions with its reported substrates as well as its cross-reactions with the reported substrates of other kinases. In µPepArray Pro, the term task-based peptide group (TBPG) is used to refer to the group of peptides designed for a single task. In the example of B-cell epitope screening, the tiling peptides for each of the two segments constitute a TBPG. Similarly, in the kinase substrate specificity study example, the known substrates for an individual kinase and the peptides derived from these known substrates constitute a TBPG.
With one or more TBPGs constructed, the user would assemble an array-level peptide list (ALPL) – the list of all peptides (with replicate information) to be included in a peptide array. The number of peptides included in an ALPL is constrained by the capacity of the selected array configuration, which defines the length and width of the array (in numbers of spots, e.g., 128 _ 31) as well as the locations of spots reserved for quality control purposes.
Finally, the array layout component of µPepArray Pro is responsible for making the layout of the array, conforming to the specified array configuration according to specifications provided by the user.
Constructing a TBPG
The TBPG is a basic unit of the content design component, and TBPG construction is the most involved element of the peptide array design process with µPepArray Pro. The construction of a TBPG consists of three steps: (1) seed peptide selection; (2) derived peptide generation; and (3) peptide assessment and filtering. Considering the diversity of content design needs researchers have, each of these steps is developed to be easily expandable, i.e., new functionalities can be developed and easily added into each of these steps to address new requirements raised by users.
After logging in, the user can choose the option Make/View TBPGs to arrive at a page where he or she can start to construct a new TBPG, or look up a previously constructed TBPG. The user then chooses the Construct a new TBPG option, and he or she is guided through the three-step TBPG construction process.
(1) Selecting Seed Peptides
The user formulates a list of seed peptides that are most essential to the design task. These peptides are termed seed peptides. For example, in a B-cell epitope screening task, the user might choose to use a tiling design to cover a segment of the antigen protein using a number of tiling peptides. These tiling peptides would be considered as seed peptides for this design task. In a kinase substrate specificity study, the user might choose to include – in the list of seed peptides – all known substrate sequences
of the kinase. Currently, four modes of creating seed peptide lists are implemented in µPepArray Pro: (i) unguided mode, in which the user would prepare the seed peptide list manually; (ii) site picking mode, in which the user would pick a number of peptides truncated from a protein sequence; (iii) tiling mode, in which tiling peptides would be made along a protein sequence in a region specified by the user; and (iv) peptide database query mode, in which the user would obtain a list of peptides by querying an peptide database. At present, the function querying a specific peptide database, PepCyber:P~Pep2, has been implemented.
If the user chooses the unguided mode for selecting seed peptides, he or she is prompted to provide a name and (optionally) a textual description of the seed peptide list. The list of seed peptides can either be uploaded as a comma separated value (CSV) file or be pasted to the text area provided. The data file (or pasted text) should contain no header row and include two columns: (a) sequence of the seed peptide; and (b) a brief description of the seed peptide.
If the user chooses the Site picking mode for selecting seed peptides, he or she is prompted to paste a protein sequence and (optionally) provide a description of the protein sequence pasted. Alternatively, the user can provide the Swiss-Prot accession of a protein, and the sequence and description of the protein is automatically retrieved from the Swiss-Prot database. Next, the user can select the number of peptides truncated from the protein sequence as seed peptides.
If the user chooses the Tiling mode for selecting seed peptides, he or she is prompted to provide a protein sequence, or provide the Swiss-Prot accession of a protein, as in the Site picking mode. Then he or she can specify the parameters for the tiling design. These parameters include the starting and ending positions, the peptide length, and the shift between neighboring peptides. A list of tiling peptides would be generated, and the user can choose to include any number of peptides in the list as seed peptides.
In the Peptide database query mode for seed peptide selection, the user could set query parameters and submit a query to the PepCyber:P~Pep database. The user can examine the list of phosphopeptides returned, and choose to include any number of the phosphopeptides as seed peptides.
The detailed information about how each seed peptide is selected is documented automatically, and is displayed when the detailed info button is clicked in the seed peptide table. During seed peptide selection, a summary of the number and type of the seed peptides is displayed in the ‘‘information box’’ in the right panel. After the seed peptide list is finalized, the user will enter the second step of TBPG construction: derived peptide generation.
(2) Generating Derived Peptides
The user can choose to include a number of derived peptides – which are peptide sequences that are related to selected seed peptides – into the TBPG. These derived peptides are added either for exploration purposes or as control peptides for the design task. Currently, four protocols of generating derived peptides are implemented in µPepArray Pro: (i) truncation, in which shorter peptide sequences are generated with either or both ends of an existing peptide sequence being truncated; (ii) mutation, in which new peptides are generated by mutating one or more amino acids in an existing peptide sequence to different amino acids; (iii) alanine-scan, in which each amino acid in an existing peptide sequence is substituted for by alanine, one at a time, to generate a list of new sequences; and (iv) task-specific controls – at present one task-specific control peptide generating function has been implemented for replacing one or more phosphorylatable amino acids (serine, threonine, and tyrosine) by a non-phosphorylatable amino acid as specified by the user.
Starting from the seed peptide list, the user could choose a subset of the peptides and apply to these peptides any of these four protocols for multiple times in an iterative manner to generate a list of derived peptides. To achieve this, µPepArray Pro maintains a current peptide list (CPL) at any time during derived peptide generation. The CPL includes both the seed peptides and the derived peptides that have been generated thus far. The user chooses a subset of the peptides in the CPL, and chooses one of the four derived peptide generating protocols to apply to the selected peptides to generate a number of new derived peptides. The CPL is then updated to include these newly generated derived peptides. This process continues iteratively until the user decides that all derived peptides he wishes to include in the task have been generated.
When the Truncation protocol is selected, the user is prompted to select a subset of peptides from the CPL and choose from three options: (a) truncating from the N-terminal side; (b) truncating from the C-terminal side; and (c) truncating from both sides. After the truncated peptides are generated, the user can include some or all of these new peptides in the CPL. The Mutation protocol can be applied to only individual peptides, i.e., the user can only choose a single peptide from the CPL when the Mutation protocol is specified. The user is prompted to specify which amino acids in the selected peptide he or she would like to be mutated, and which amino acids they should be mutated to. The user can choose to include some or all of the newly generated mutated peptides to the CPL.
When the alanine-scan protocol is applied, a list of new peptides is generated by mutating each amino acid in the selected peptides to alanine, one at a time.
The Task-specific controls protocols allow the generation of control peptides in a task-specific manner. Currently, one task specific controls protocol – non-phosphorylatable amino acid substitution – is implemented. When this protocol is selected, the user is prompted to choose a number of peptides containing phosphorylatable amino acids (serine, threonine, or tyrosine), and specify which amino acid(s) they should be substituted for.
The ‘‘history’’ about how each derived peptide is generated is automatically documented, and is displayed when the Detailed info button is clicked in the CPL table. During derived peptide generation, a summary of the peptides in the CPL is displayed in the ‘‘information box’’ on the right panel. After the derived peptide list is finalized, the user enters the third step of TBPG construction: peptide re-evaluating and filtering.
(3) Re-evaluating Peptides
At this step, the peptide sequences generated at the previous steps are reevaluated. Currently, two peptide re-evaluation functions are included in µPepArray Pro: (i) occurred sequence checking, where the peptide sequences are queried against the Swiss-Prot protein database, and the peptides that have occurred in any proteins in the protein database are highlighted for the user to examine; and (ii) undesired sequence pattern checking, where all peptides are checked for undesired sequence characteristics that may lead to synthesis efficiency problems. At present, the undesired sequence pattern checking function is tuned for the µParaflo® microfluidic on-chip peptide synthesis platform. Similar functions for other peptide array platforms can be easily added in the future.
The occurred sequence checking function sends a query to the Swiss-Prot database for each peptide sequence. All peptide sequences that have occurred in any proteins in Swiss-Prot are highlighted for the user to examine. The user can choose to remove any peptides that he considers problematic.
The undesired sequence pattern checking function checks the peptides for sequence patterns that may lead to synthesis efficiency problems. Currently, a collection of sequence patterns known to cause difficulty in on-chip peptide synthesis on the mParaflo1 microfluidic chip platform are checked. The peptides carrying these undesired sequence characteristics are tagged. The user can choose to remove some or all of the tagged peptides as he or she deems proper.
Following peptide re-evaluation, if the user is satisfied with the TBPG constructed, he or she can click the Submit the TBPG button. The user is prompted to provide a name and (optionally) a description of the TBPG. Following of a routine error-checking procedure, the TBPG is stored into the supporting database of µPepArray Pro.
Assembling the ALPL
In µPepArray Pro, the user can choose the option Make/View ALPLs, then select Assemble a new ALPL to start assembling an ALPL. The user is first prompted to select an array configuration: he or she can either choose one from the list of existing array configurations, or follow the instruction to define a new array configuration. After the array configuration is selected, the user is guided to add one or more TBPGs to the ALPL. For each TBPG added, the user also needs to specify the number of replicates. The ‘‘information box’’ on the right panel displays the total number of spots in the array configuration chosen, the number of filled spots, and the number of spots remaining to be filled. After the user is satisfied with the ALPL assembled, he can click the Submit the ALPL button. After filling in the name and an optional description, the ALPL is stored into the supporting database of µPepArray Pro.
Making Array Layout
After assembling an ALPL, the user can choose the option Make/ View Layouts, and then select make a new layout. The user is prompted to select an ALPL to use, and choose from three layout options: (1) randomized layout; (2) sequential layout (by rows); and (3) sequential layout (by columns).
If the randomized layout option is chosen, the peptides in the chosen ALPL are arranged to the available spots in the chosen array configuration in a completely randomized manner. This option is suitable for most common array design purposes.
If the sequential layout (by rows) or the sequential layout (by columns) option is chosen, the peptides in the chosen ALPL are used to fill in the available spots sequentially in the row-by-row or column-by-column manner, respectively.
After the array layout is made, the user can submit it to the supporting database of µPepArray Pro. A stored array layout (with rich information about all peptides) can be retrieved from the database and output to a proper file format to direct the array production at a later time.
- Zhu Q, Hong A, Sheng N, Zhang X, Matejko A, Jun KY, Srivannavit O, Gulari E, Gao X, Zhou X. (2007) Microparaflo biochip for nucleic acid and protein analysis. Methods Mol Biol 382, 287–312. [abstract]
- Gong W, Zhou D, Ren Y, Wang Y, Zuo Z, Shen Y, Xiao F, Zhu Q, Hong A, Zhou X. et al. (2008) PepCyber:P~PEP: a database of human protein protein interactions mediated by phosphoprotein binding domains. Nucleic Acids Res 36, D679–683. [article]